BDA-602 - Machine Learning Engineering

Dr. Julien Pierret

Lecture 9

The Case Against Machine Learning

  • Minimum Viable Product (MVP)
    • Just enough features to deploy the product
    • Deploy to a subset of "nice" early adopters
      • Forgiving
      • Give feedback
  • Machine learning may be more than the minimum
    • Building models
      • Time
      • Data
  • Simple programatic solutions are better
    • Come to life faster
    • Cheaper to produce
      • Don't need tons of data
      • Don't need compute

›

Avoidance is not Prevention

  • Start with a programming solution
  • Get everything working
    • Frontend
    • Backend
  • Move to a simple model
  • Other problems need to be solved first
    • Deployment?
    • Getting data?
    • Updating data?
    • Training the model?
  • Iterate,...
  • Iterate,...
  • Iterate,...

›

Planet Labs 🛰️


›

Story-time - Maintenance Chatbot 🤖

  • Please ask questions as I go along

›

What is Machine Learning?


›

Machine Learning - Optimization

  • ML algorithms are all solving optimization problems
  • Optimizing a loss function
    • Regression: Minimize the MSE
    • $$ Y_{i} = \beta_{0} + \beta_{1}X_{i} + e_{i} $$
      $$ \frac{1}{n}\sum_{i=1}^{n}{e_{i}^2} $$
      $$ \frac{1}{n}\sum_{i=1}^{n}\left(Y_i - \hat{Y}_i\right)^{2} $$
    • MSE most common loss function in statistics
  • Neural Nets: Minimize different functions

›

Optimizations?

  • Idea is simple
    • Mountain range
    • Need to get to the lowest point
    • Some valleys go deeper than others
      • You can get stuck in the wrong valley
    • Regression is a $k + 1$ dimensional problem
      • Can solve for it directly
    • Neural Nets can have crazy high number of dimensions
      • Must iterate many times and travel down one step at a time

›

Newton's Method

  • Root finding method
    • Where equation = zero
  • Not the minimum
  • But if we do it on the derivative!
    • Solving for slope = 0
      • Could be a minimum or maximum
  • $$ x_{n+1} = x_{n} - \frac{f'(x_n)}{f''(x_n)} $$
  • Then we've rewritten this to solve for local minimum
    • Goal is to find the global minimum


›

Higher Dimensional Optimizations


›

Optimization Problems


›

Optimization Problems

  • Excel Solver
  • Solution isn't always modeling
  • May already have a function that explains relationship
    • Need to maximize/minimize the output of this function
    • Subject to some constraints

›

Optimization Problems - Example

  • Assume we have a function that takes in 3 inputs
  • $$ f(x, y, z) $$
  • Subject to a bunch of constraints
  • $$ x < y, x+y > z $$
›

Optimization Problems - Brute Force

  • We could brute force it
    • $x$: [0, 10]
    • $y$: [-10, 10]
    • $z$: [20, 10]
    • We would skip ($x$, $y$, $z$) that are not within our constraints
    • Step size important!
  • Problems
    • $f(x,y,z)$ could be expensive: slow
    • Large step size: miss global max/min
    • Small step sizes: slow

›

Optimization Problems - Libraries

  • Linear programming
  • Optimizations in general

›

Optimization Problems - Why?

  • These kinds of problems do come up
    • Net Present Value (NPV) calculations
      • Minimizing a loss
      • Maximizing a profit
  • Flexibility
    • Modeling not always the solution
    • It all depends on the problem you need to solve
  • ... now lets build some models!

›

Explainable Models


›

Linear Regression


›

Linear Regression


›

Logistic Regression


›

Logistic Regression


›

Bias-Variance Tradeoff


  • Bias error
    • Errors from assumptions in learning algorithm
    • High bias and we miss relationships between predictors and response
    • Underfitting
  • Variance
    • Sensetivity to small fluctuations in training set
    • High variance and we model random noise
    • Overfitting
  • Need to the right balance between them

›

Cross Validation


›

Cross Validation - K-fold

Cross Validation - Stratified


›

Cross Validation - Grid Search




›

Cross Validation - Custom


›

Explainable Models - Overfitting

  • Focus
      • Regression
      • Logistic Regression
    • Variable Selection is very important
      • Make or break your model

›

Explainable Models - Features

  • You can't use all the features
  • Garbage features look like they improve performance
  • Let's look at some generated data

                

›

Explainable Models - Simple Model - Actual vs Predicted


›

Explainable Models - Simple Model - Summary Statistics


                

›

Explainable Models - Adding Junk

  • Picked the correct variables
  • What if we pick bad variables?
  • Add random garbage!

                

›

Explainable Models - Adding More Junk


                

›

Explainable Models - Adding Even More Junk


                
  • Rank Deficent

›

Explainable Models - $R^2$

html

›

Explainable Models - Overfitting - Predicted vs Actual

html

›

Explainable Models - Overfitting - MSE

html

›

Building Explainable Models - Overview

  • Regression Models
    • Feature Inspection
    • Baseline Model
    • Feature Engineering
      • Imagination
      • Brute Force
        • Variable Combinations
        • Fork Models?
    • Feature Selection
    • K-Fold Cross Validation

›

Building Explainable Models - Feature Inspection

  • Continuous
    • Plots
    • Rank Ordering
    • Normality assumptions
      • Boxcox
  • Categorical
    • Plots
    • Rank Ordering
    • Sub-categories predictive
      • t-test
      • ANOVA
  • Get a feel for the data

›

Building Explainable Models - Baseline Model


›

Building Explainable Models - Feature Engineering

  • Imagination
    • How to combine features
    • Ways to combine information
    • New data sources
    • ...
  • Un-supervised Learning
    • Try some
    • Inspect how they perform
  • Categoricals
    • Look-up tables
    • Sub-categories
      • Forking models
  • Brute Force
    • Inspect the plots, look for patterns
    • Come up with new features

›

Building Explainable Models - Feature Selection

  • Correlations
    • Continuous / Nominal / Ordinal
    • Reduce the number of features
  • Rankings
    • Ignore features with low rankings
      • Many ranking techniques
  • Stepwise regression
    • Forwards / Backwards
    • Use it to help get the numbers down a bit more
    • Not perfect, but may help

›

Building Explainable Models - K-Fold Cross Validation

  • Cross Validation Time
  • Imagine our final variables are: A, B, C, D


Choose # of Combinations Combinations Best Avg. MSE
${4 \choose 1}$ 4 {A}, {B}, {C}, {D} 5
${4 \choose 2}$ 6 {A, B}, {B, C}, {C, D}, {A, C}, {A, D}, {B, D} 4
${4 \choose 3}$ 4 {A, B, C}, {B, C, D}, {A, C, D}, {A, B, D} 3
${n \choose 4}$ 1 {A, B, C, D} 4

  • For each ${n \choose k}$
    • Pick the best variable combination that minimized avgerage MSE among k-folds
    • Not restricted to MSE

›

Building Explainable Models - Too many combinations

Choose # of Combinations
${20 \choose 1}$ 20
${20 \choose 2}$ 190
${20 \choose 3}$ 1,140
${20 \choose 4}$ 4,845
${20 \choose 5}$ 15,504
${20 \choose 6}$ 38,760
${20 \choose 7}$ 77,520
${20 \choose 8}$ 125,970
${20 \choose 9}$ 167,960
${20 \choose 10}$ 184,756
${20 \choose 11}$ 167,960
${20 \choose 12}$ 125,970
${20 \choose 13}$ 77,520
${20 \choose 14}$ 38,760
${20 \choose 15}$ 15,504
${20 \choose 16}$ 4,845
${20 \choose 17}$ 1,140
${20 \choose 18}$ 190
${20 \choose 19}$ 20
${20 \choose 20}$ 1

Building Explainable Models - Noticing a pattern

  • y-axis: Best MSE among the combinatiotns
  • X-axis: $k$ in ${n \choose k}$

Building Explainable Models - Polynomial Curve fitting

  • Run calculations on
    • $k = 1,2,3,4$
    • $k = n, n-1, n-2, n-3$
  • Fit a polynomial to the curve
      • Predict which $k$ is the minimum
      • Run that $k$
      • Repeat
  • If $k$ predicted is $k$ calculated
    • Check neighboring $k$'s ($k-1$, $k+1$) to double check

›

Building Explainable Models - Finalizing Model

  • Might be able to improve the model
    • Calculate the rankings on the model residuals against all the predictors not used
    • Plot everything
    • There may be a new variable that works well with the particular variable combination we have
  • Put floors and ceilings on all your continuous predictors
    • In production, a wildly small/big number won't disrupt the model
    • 1%-5%
  • One final Step
    • Those predictors are the best set of variables
    • Train final model on the whole dataset with them (no hold-out)
    • This is the final model

›

In Summary

  • Don't be in a big rush to build models
  • Machine learning background
    • Everything is an optimization problem
    • Newton's Method
  • Optimization Problems
    • Predictive models not always the solution
  • Explainable Models
    • Regression based models
  • Cross Validation
    • Bias-Variance Tradeoff
    • K-Fold, Stratified, Grid Search
  • Overfitting
  • Building Explainable models
    • Feature Inspection, Engineering, Selection
    • Cross-Validation Combinatorics