Common Workflow in ML

1. Define the Problem

  • Be specific

  • Identify the ML task

What is Machine Learning Task?

  • Supervised Learning

  • Unsupervised Learning

2. Build the Dataset

Four aspects of working with data

  1. Data Collection

    • find and collect data relevant to the problem

  2. Data Inspection

    • look for Outliers (that is not normal)

    • missing or incomplete data

    • transform your data

  3. Summary Statistics

    • tells the trend, scale or shape of the data

  4. Data Visualization

Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.

Outliers are data points that are significantly different from others in the same sample.

3.Train the Model

before beginning to train the model we need to split the data

majority of the data will be held for training (generally 70-80% of the data)

and remaining data will be used during model evaluation

The model training algorithm iteratively updates a model's parameters to minimize some loss function.

Let's define those two terms:

  • Model parameters: Model parameters are settings or configurations the training algorithm can update to change how the model behaves. Depending on the context, you’ll also hear other more specific terms used to describe model parameters such as weights and biases. Weights, which are values that change as the model learns, are more specific to neural networks.

  • Loss function: A loss function is used to codify the model’s distance from this goal. For example, if you were trying to predict a number of snow cone sales based on the day’s weather, you would care about making predictions that are as accurate as possible. So you might define a loss function to be “the average distance between your model’s predicted number of snow cone sales and the correct number.”

4. Evaluate the Model

The metrics used for evaluation are likely to be very specific to the problem you have defined.

Using Model Accuracy

Model accuracy is a fairly common evaluation metric. Accuracy is the fraction of predictions a model gets right.

Here's an example:

Petal length to determine species

Imagine that you built a model to identify a flower as one of two common species based on measurable details like petal length. You want to know how often your model predicts the correct species. This would require you to look at your model's accuracy.

Using Log Loss

Log loss seeks to calculate how uncertain your model is about the predictions it is generating. In this context, uncertainty refers to how likely a model thinks the predictions being generated are to be correct.

For example, let's say you're trying to predict how likely a customer is to buy either a jacket or t-shirt.

Log loss could be used to understand your model's uncertainty about a given prediction. In a single instance, your model could predict with 5% certainty that a customer is going to buy a t-shirt. In another instance, your model could predict with 80% certainty that a customer is going to buy a t-shirt. Log loss enables you to measure how strongly the model believes that its prediction is accurate.

In both cases, the model predicts that a customer will buy a t-shirt, but the model's certainty about that prediction can change.

5. Use the Model

Last updated