Common Workflow in ML
1. Define the Problem
Be specific
Identify the ML task
What is Machine Learning Task?
Supervised Learning
Unsupervised Learning
2. Build the Dataset
Four aspects of working with data
Data Collection
find and collect data relevant to the problem
Data Inspection
look for Outliers (that is not normal)
missing or incomplete data
transform your data
Summary Statistics
tells the trend, scale or shape of the data
Data Visualization
Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.
Outliers are data points that are significantly different from others in the same sample.
3.Train the Model
before beginning to train the model we need to split the data
majority of the data will be held for training (generally 70-80% of the data)
and remaining data will be used during model evaluation
The model training algorithm iteratively updates a model's parameters to minimize some loss function.
Let's define those two terms:
Model parameters: Model parameters are settings or configurations the training algorithm can update to change how the model behaves. Depending on the context, you’ll also hear other more specific terms used to describe model parameters such as weights and biases. Weights, which are values that change as the model learns, are more specific to neural networks.
Loss function: A loss function is used to codify the model’s distance from this goal. For example, if you were trying to predict a number of snow cone sales based on the day’s weather, you would care about making predictions that are as accurate as possible. So you might define a loss function to be “the average distance between your model’s predicted number of snow cone sales and the correct number.”
4. Evaluate the Model
The metrics used for evaluation are likely to be very specific to the problem you have defined.
Using Model Accuracy
Model accuracy is a fairly common evaluation metric. Accuracy is the fraction of predictions a model gets right.
Here's an example:
Petal length to determine species
Imagine that you built a model to identify a flower as one of two common species based on measurable details like petal length. You want to know how often your model predicts the correct species. This would require you to look at your model's accuracy.
Using Log Loss
Log loss seeks to calculate how uncertain your model is about the predictions it is generating. In this context, uncertainty refers to how likely a model thinks the predictions being generated are to be correct.
For example, let's say you're trying to predict how likely a customer is to buy either a jacket or t-shirt.
Log loss could be used to understand your model's uncertainty about a given prediction. In a single instance, your model could predict with 5% certainty that a customer is going to buy a t-shirt. In another instance, your model could predict with 80% certainty that a customer is going to buy a t-shirt. Log loss enables you to measure how strongly the model believes that its prediction is accurate.
In both cases, the model predicts that a customer will buy a t-shirt, but the model's certainty about that prediction can change.
5. Use the Model
Last updated