[1, 0, 0]
[0, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 0]
[1, 0]
[0, 1]
Nationality | C1 | C2 | C3 |
French | 0 | 0 | 1 |
Italian | 1 | 0 | 0 |
German | 0 | 1 | 0 |
Other | −1 | −1 | −1 |
Nationality | C1 | C2 |
French | +0.25 | +0.50 |
Italian | +0.25 | −0.50 |
German | −0.50 | 0 |
Color | Binary 1 | Binary 2 | Binary 3 |
0 | 0 | 0 | |
0 | 0 | 1 | |
0 | 1 | 0 | |
0 | 1 | 1 | |
1 | 0 | 0 | |
1 | 0 | 1 | |
1 | 1 | 1 |
Color | Ranking |
1 | |
2 | |
3 | |
4 | |
5 | |
6 |
word2vec
Bonus - Proximity to an internation airport
›statsmodels
)
($i$) | LowerBin | UpperBin | BinCenters | BinCount | BinMeans ($\mu_{i}$) | PopulationMean ($\mu_{pop}$) | MeanSquaredDiff |
---|---|---|---|---|---|---|---|
0 | -3.563517 | -2.930781 | -3.247149 |
1
|
-3.616800 | 0.014081 |
13.183300
|
1 | -2.930781 | -2.298045 | -2.614413 |
2
|
-2.641231 | 0.014081 |
7.050685
|
2 | -2.298045 | -1.665308 | -1.981676 | 22 | -1.900161 | 0.014081 | 3.664322 |
3 | -1.665308 | -1.032572 | -1.348940 | 43 | -1.310229 | 0.014081 | 1.753797 |
4 | -1.032572 | -0.399836 | -0.716204 | 94 | -0.671962 | 0.014081 | 0.470656 |
5 | -0.399836 | 0.232900 | -0.083468 | 129 | -0.070665 | 0.014081 | 0.007182 |
6 | 0.232900 | 0.865636 | 0.549268 | 120 | 0.487407 | 0.014081 | 0.224037 |
7 | 0.865636 | 1.498372 | 1.182004 | 59 | 1.158775 | 0.014081 | 1.310324 |
8 | 1.498372 | 2.131108 | 1.814740 | 25 | 1.875793 | 0.014081 | 3.465971 |
9 | 2.131108 | 2.763844 | 2.447476 |
5
|
2.522430 | 0.014081 |
6.291814
|
37.42208/10 = 3.74 |
NULL
s
($i$) | LowerBin | UpperBin | BinCenters | BinCount | BinMeans ($\mu_{i}$) | PopulationMean ($\mu_{pop}$) | MeanSquaredDiff | PopulationProportion ($w_{i}$) | MeanSquaredDiffWeighted |
---|---|---|---|---|---|---|---|---|---|
0 | -3.563517 | -2.930781 | -3.247149 | 1 | -3.616800 | 0.014081 | 13.183300 | 0.002 |
0.026367
|
1 | -2.930781 | -2.298045 | -2.614413 | 2 | -2.641231 | 0.014081 | 7.050685 | 0.004 |
0.028203
|
2 | -2.298045 | -1.665308 | -1.981676 | 22 | -1.900161 | 0.014081 | 3.664322 | 0.044 | 0.161230 |
3 | -1.665308 | -1.032572 | -1.348940 | 43 | -1.310229 | 0.014081 | 1.753797 | 0.086 | 0.150827 |
4 | -1.032572 | -0.399836 | -0.716204 | 94 | -0.671962 | 0.014081 | 0.470656 | 0.188 | 0.088483 |
5 | -0.399836 | 0.232900 | -0.083468 | 129 | -0.070665 | 0.014081 | 0.007182 | 0.258 | 0.001853 |
6 | 0.232900 | 0.865636 | 0.549268 | 120 | 0.487407 | 0.014081 | 0.224037 | 0.240 | 0.053769 |
7 | 0.865636 | 1.498372 | 1.182004 | 59 | 1.158775 | 0.014081 | 1.310324 | 0.118 | 0.154618 |
8 | 1.498372 | 2.131108 | 1.814740 | 25 | 1.875793 | 0.014081 | 3.465971 | 0.050 | 0.173299 |
9 | 2.131108 | 2.763844 | 2.447476 | 5 | 2.522430 | 0.014081 | 6.291814 | 0.010 |
0.062918
|
37.42208/10=3.74 | 1.0 | 0.901566 |
sklearn.tree.DecisionTreeClassifier
********************************************************************************
Original Dataset
********************************************************************************
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica
[150 rows x 5 columns]
criterion | max_depth | score | |
---|---|---|---|
0 | gini | 1 | 0.666667 |
1 | gini | 2 | 0.933333 |
2 | gini | 3 | 0.960000 |
3 | gini | 4 | 0.966667 |
4 | gini | 5 | 0.960000 |
5 | gini | 6 | 0.960000 |
6 | entropy | 1 | 0.666667 |
7 | entropy | 2 | 0.933333 |
8 | entropy | 3 | 0.960000 |
9 | entropy | 4 | 0.953333 |
10 | entropy | 5 | 0.953333 |
11 | entropy | 6 | 0.953333 |
sklearn.ensemble.BaggingClassifier
sklean.ensemble.RandomForestClassifier.feature_importances_
sklearn.inspection.permutation_importance
hw_04
git checkout -b hw_04