BDA-602 - Machine Learning Engineering

Dr. Julien Pierret

Lecture 2

Python Libraries - Demo

  • Setup pip
  • Make a PR


In Summary

  • We will use pip-compile along with for dependency management
  • argparse will help us pass arguments to our progrems
  • Pandas
    • Load small datasets
    • Preparing datasets for analysis
  • Numpy
    • Manipulate arrays
    • Built new features
  • sci-kit learn
    • Build repeatable transformations
    • Train ML models
    • Build reusable pipelines
  • Plotly
    • Inspect candidate predictors
    • Visualize our data / results
  • We will avoid using Jupyter notebooks

Homework - Tutorials 📓

Homework - References 📚

Homework - Cheatsheets

Homework - Assignment 1 - Due Sept 9

  • Classic Fisher's Iris dataset (source)
  • Load the above data into a Pandas DataFrame
  • Get some simple summary statistics (mean, min, max, quartiles) using numpy
  • Plot the different classes against one another (try 5 different plots)
    • Scatter, Violin, ...
    • Try to see visual differences between them
  • Analyze and build models - Use scikit-learn
    • Use the StandardScaler transformer
    • Fit the transformed data against random forest classifier (try other classifiers)
    • Wrap the steps into a pipeline
    • Don't worry about train/test split. (Our model will be cheating)
  • Special Notes
    • Code should just work. Make sure to lint it! No Jupyter Notebooks!
    • Work outside of your master branch and make a PR
    • Buddy should review PR, after reviewed, send it to me