BDA-602 - Machine Learning Engineering

Dr. Julien Pierret

Lecture 4

Databases - Getting the data

  • Even if you can get access to the data
  • ... it won't be organized in a way easy to model with
  • Need to learn how to query the data

SQL - Entity Relationship Diagram

SQL - Entity Relationship Diagram

SQL - Functions


              

SQL - Other JOINs

  • There are other JOINs you need to learn
    • INNER JOIN
      • Same as a JOIN
    • CROSS JOIN
      • All rows joined with one another
    • LEFT JOIN / OUTER JOIN
      • Containns all rows from LEFT table
      • No row found on RIGHT table, values filled with NULLs
    • RIGHT JOIN
      • Same as LEFT but reversed

SQL - What Normally Happens

  • SQL as Data Bank
    • Use SQL queries to generate new predictors
    • Pull data from SQL into Python. Perform operations then store results back in SQL
  • Model building
    • Gather all the needed variables from Sql
      • Predictors
      • Response(s)
      • Each row one observation for training on
    • Combine it with data from other sources
      • Images
      • Unstructured Data that will be processed
    • Do some data preperation
    • Build a model

In Summary

  • Data is King
    • Get it wrong, everything else fails
  • ML models require structured data
  • As file formats go, parquet is the best one
  • SQL > NoSQL
  • SQL
    • Learn how to query data from a SQL database
      • WHERE
      • GROUP BY
      • JOIN
      • CASE
      • functions
      • ...
    • Know how to get data from SQL into Python

Homework - References 📚 and Tutorials 📓