BDA-602 - Machine Learning Engineering
Dr. Julien Pierret
Lecture 6
Spark - Parquet Files
In Summary
Hadoop: slow, hard to use
MapReduce
Spark: fast, easier to use
Partitions
Lazy execution
Resilient Distributed Dataset
Dataframes
SQL
User Defined Functions
Pipelines
Transformers
Estimators