SDSU Big Data Hackathon

DATA > Models



Dr. Julien Pierret

Saturday - October 8th 2022

DATA > Models

  • Two main points
    • Don't limit yourself to a single dataset
      • Would you limit an artist's choice of colors?
    • Don't downplay feature engineering
      • Would you not allow an artist to mix colors?
  • A better model can raise accuracy by centimeters
  • Better predictors can raise accuracy by meters

Different Data

  • Better model lift from new features

Color - RGB (3)

Grayscale - Bl (1)


Pansharpening

  • Panchromatic sharpening
    • Been around a while
  • Imaging satellites have multiple imaging sensors
    • Multispectral sensors
      • Separate out different spectrums (RGB)
      • Have high spectral resolution
    • Panchomatic sensors
      • See the whole spectrum (B&W)
      • Have high spatial resolution
  • You can have high spatial resolution or high spectral resolution imagery
    • But not both!

Pansharpening (cont.)

Venice, Italy

Multispectral

Panchromatic

Pansharpening

  • Two different datasets used!

New Features

  • Feature engineering is more important than modeling!
    • Model only as good as what you put into it
      • New ways to look at the data
    • One of my favorite parts of the job
      • Let your imagination run free
      • When I interview I dig heavily into this

Combining Features

  • Loan to Value (LTV)
    • Loan / Appraised Value
    • Great indicator of risk
  • Value to Automated Valuation Models (AVM)
    • Appraisal Value / AVM
    • Appraiser commited fraud?
  • Unsupervised learning - Looking for fraud
    • Set clusters VERY high
      (k = 1000)
      • Fraud is a rare event
      • Most clusters will be garbage
      • Others full of fraud
    • Grouped bad clusters together
    • Extra boolean feature fed into the final model

Using them both together



Without new features

With new features


Thank you!