One source
scipy.stats.pearsonr
scipy.stats.kendalltau
scipy.stats.pointbiserialr
df["SeniorCitizenContinuous"] = [
random.uniform(0.0, 0.5) if v == 0 else random.uniform(0.5, 1.0)
for v in df["SeniorCitizen"].values
]
SeniorCitizen == 0
, random number between
(0.0
, 0.5
)
SeniorCitizen == 1
), random number between
(0.5
, 1.0
)
scipy.stats.entropy
Given a discrete random variable $X$, with possible outcomes
$x_{1},...,x_{n}$, which occur with probability $\mathrm {P}
(x_{1}),...,\mathrm {P} (x_{n})$, the entropy of $X$ is formally
defined as:
$ \mathrm {H} (X)=-\sum _{i=1}^{n}{\mathrm {P} (x_{i})\log \mathrm
{P} (x_{i})}$
sklearn.decomposition.PCA
explained_variance_ratio_
1
[0.39, 0.32, 0.19, 0.1 , 0. ]
[0.39, 0.72, 0.9 , 1. , 1. ]
n
=100 variables: $ \binom{100}{2} = 4,950
$
n
=1000 variables: $ \binom{1000}{2} =
499,500$
n
x_1 = numpy.random.randn(n)
x_2 = numpy.random.randn(n) + 4
y = 5 + x_1 + x_2 + numpy.random.randn(n) / 5
correlation, p = pearsonr(x=x_1, y=x_2)
print(correlation)
0.02282513458861322
😲
PROC VARCLUS
(Python:
varclushi)
x_1 = numpy.random.poisson(lam=2, size=n)
x_1 = numpy.array([chr(x + 65) for x in x_1])
x_1_a = numpy.array([10 if x == "A" else 0 for x in x_1])
x_1_d = numpy.array([20 if x == "D" else 0 for x in x_1])
x_1_f = numpy.array([5 if x == "F" else 0 for x in x_1])
x_2 = numpy.random.randn(n) + 4
y = 5 + x_1_a + x_1_d + x_1_f + x_2 + numpy.random.randn(n) / 3
scipy.stats.boxcox
scipy.stats.probplot
scipy.stats.kstest