BDA-602 - Machine Learning Engineering

Dr. Julien Pierret

Lecture 14


  • Tuesday, May 5th - Presentations
    • "Officially" only have 2 hours
    • ...
  • Saturday, May. 13th - Just before midnight PST
    • Finals officially end Friday
      • I figured an extra Saturday can help you out
      • I will stay up until midnight to mass copy everyone's repos
    • Final PR due with all code
    • Wiki finished and available on your Github project page
    • Nothing is stopping you from making commits early!

Presentation Times

  • NULL of you are in the class
  • ~5 minutes per presentations (rapid mode!)
    • Tuesday, May. 9th - Finals Day
      • All have to go in one day 😞

What's being graded

  • In order of Importance
    • Code
    • Github Wiki Final Report
    • Presentation
  • Intentionally not giving the weights for each

What I want to see - Prediction and Data

  • Discuss what you wanted to predict. What's in the source dataset
    • Did you have to construct this predictor
    • Why do you want to predict this?
  • Data conversions
    • Discuss data conversions: Cleaned / Organized / ETL (Extract Transform and Load) Procedure
    • Mistakes in the data?
      • What was wrong?
      • How did you fix them?

What I want to see - Feature Engineering

  • Generate a ton of features to inspect (I'm not giving a number out intentionally!)
  • You've built tools to analyze features: use them!
    • Variable/Feature Importance
      • p-values & t-scores
      • Diffenence with mean of response
        • Plots
        • Rankings
      • Random Forest
    • Brute force variable combinations
      • Try to see if other variable combinations exist that you didn't think of
        • Plots
        • Rankings

What I want to see - Feature Engineering

  • Show off the predictors you came up with
    • Were they predictive?
    • Why or why not? Discuss, even if the feature was a failure
    • Plots - Show me why they performed the way they did!
      • Continuous Response / Continuous Predictor
      • Continuous Response / Categorical Predictor
      • *Categorical Response / Continuous Predictor
      • *Categorical Response / Categorical Predictor

What I want to see - Model Building

  • Build a few models (not giving a number)
    • Modeling examples (just suggestions)
      • Random Forest = Easy
        • It's harder to not do it
        • You can get the variable importance out of it while you're at it
      • Logistic regression
        • I gave you a great formula for a combinatorics approach to building a regression model
      • Build whatever kind of model you want!
  • Missing data?
    • How did you handle it?
  • Anything interesting?

What I want to see - Model Evaluations

  • Compare the models you built against one another
    • We went over a ton of evaluation techniques
    • Don't forget to train/test split
  • Which one is the best one?
    • Why?
  • Show off the performance metrics on the best model


  • docker-compose
    • Everyone should have a working docker-compose script
    • The goal is I should be able to run
      • docker-compose up
    • ... and everything should magically work
      • Data is processed
      • Results (plots/figures/reports...) generated
      • Everything you reference in your Wiki report should come from this output in some way or another
  • Make sure you TEST THIS OUT!
    • You must make sure this works
    • It's fine to have your code buddy test it out
    • You may be turning in your own code, but it's fine to collaborate with your code buddy
    • If code buddy can't get your script working - worry
      • Don't ask code buddy to test it 15 minutes before it's due
      • Get a head start on this


  • You will be presenting slides for your presentation
    • Don't need to be fancy
    • Google / Powerpoint / whatever you want
    • I don't need a copy of it
      • Your Wiki will have the same content (and hopefully more!)
  • I know presentation is just around the corner
    • Your written report (wiki) and code can reflect more work then you presented
      • I won't hold it agains you
      • It'll probably make me happy
  • I would highly suggest putting links in your wiki to techniques you used
    • If you used a cool modeling technique I didn't cover
      • Put a hyperlink to a page describing it


  • The final report should be of sufficient length to cover your modeling process in enough depth that someone without access to your code could recreate your work
    • If you come up with a novel idea: Explain it in depth
    • Make sure the report has graphs backing you up
  • This isn't an English class
    • I could care less how strong your command of the English language is
    • As long as I can understand what you're doing
    • I don't care about spelling errors
      • My slides are probably riddled with them

AMA - Ask Me Anything