I will stay up until midnight to mass copy everyone's
repos
Final PR due with all code
Wiki finished and available on your Github project
page
Nothing is stopping you from making commits early!
Presentation Times
NULL of you are in the class
~5 minutes per presentations (rapid mode!)
Tuesday, May. 9th - Finals Day
All have to go in one day 😞
What's being graded
In order of Importance
Code
Github Wiki Final Report
Presentation
Intentionally not giving the weights for each
What I want to see - Prediction and Data
Discuss what you wanted to predict. What's in the source dataset
Did you have to construct this predictor
Why do you want to predict this?
Data conversions
Discuss data conversions: Cleaned / Organized / ETL (Extract
Transform and Load) Procedure
Mistakes in the data?
What was wrong?
How did you fix them?
What I want to see - Feature Engineering
Generate a ton of features to inspect (I'm not giving a
number out intentionally!)
You've built tools to analyze features: use them!
Variable/Feature Importance
p-values & t-scores
Diffenence with mean of response
Plots
Rankings
Random Forest
Brute force variable combinations
Try to see if other variable combinations exist that you
didn't think of
Plots
Rankings
What I want to see - Feature Engineering
Show off the predictors you came up with
Were they predictive?
Why or why not? Discuss, even if the feature was a failure
Plots - Show me why they performed the way they did!
Continuous Response / Continuous Predictor
Continuous Response / Categorical Predictor
*Categorical Response / Continuous Predictor
*Categorical Response / Categorical Predictor
What I want to see - Model Building
Build a few models (not giving a number)
Modeling examples (just suggestions)
Random Forest = Easy
It's harder to not do it
You can get the variable importance out of it while
you're at it
Logistic regression
I gave you a great formula for a combinatorics
approach to building a regression model
Build whatever kind of model you want!
Missing data?
How did you handle it?
Anything interesting?
What I want to see - Model Evaluations
Compare the models you built against one another
We went over a ton of evaluation techniques
Don't forget to train/test split
Which one is the best one?
Why?
Show off the performance metrics on the best model
Code
docker-compose
Everyone should have a working
docker-compose script
The goal is I should be able to run
docker-compose up
... and everything should magically work
Data is processed
Results (plots/figures/reports...) generated
Everything you reference in your Wiki report should come
from this output in some way or another
Make sure you TEST THIS OUT!
You must make sure this works
It's fine to have your code buddy test it out
You may be turning in your own code, but it's fine to
collaborate with your code buddy
If code buddy can't get your script working - worry
Don't ask code buddy to test it 15 minutes before it's
due
Get a head start on this
Extras
You will be presenting slides for your presentation
Don't need to be fancy
Google / Powerpoint / whatever you want
I don't need a copy of it
Your Wiki will have the same content (and hopefully
more!)
I know presentation is just around the corner
Your written report (wiki) and code can reflect more work
then you presented
I won't hold it agains you
It'll probably make me happy
I would highly suggest putting links in your wiki to
techniques you used
If you used a cool modeling technique I didn't cover
Put a hyperlink to a page describing it
Extras
The final report should be of sufficient length to cover your
modeling process in enough depth that someone without access to
your code could recreate your work
If you come up with a novel idea: Explain it in depth
Make sure the report has graphs backing you up
This isn't an English class
I could care less how strong your command of the English
language is