πŸ‘©πŸ»β€πŸ’Ό

Statistics and Mathematics!



Descriptive Statistics

Describes how spread out the data are, or what is the central tendency of our data. Pearson believed exact statistics could be found given a large number of samples

Inferential Statistics

When the math allows us to make inference about something we're not able to access, count, weigh, etc. Fisher believed whatever we do cannot get us to the real number. Instead there are two concepts we can take care of to get better estimates:

Mode

πŸ’‘
bimodal, multimodal Happens when we have two or more underlying groups put together!

Normal Distribution: on either sides of the center, we have roughly the same number of records, although Most records are in the middle and it's unimodal - only has one peak.

Interquartile range - IQR

No extreme values, only the middle 50 percent.

if the median is closer to one of the ends of our inter quartile range, it means that quartile has more similar and close values

πŸ’‘
Moments are a set of statistical parameters to measure a distribution. Four moments are commonly used: 1. Mean 2. Variance: variance describes the scale of the distribution 3. Skewness 4. Kurtosis: Peakedness or flatness. kurtosis describes the spread of the distribution in a scale-independent manner

Francis Galton:

  • Inventor of eugenics
  • Discoverer of fingerprints!
1850 portrait of Francis Galton

Probability

Empirical probability: something observed in actual data. not generalized

Theoretical probability: truth that cannot be directly discovered


Mutually exclusive: the probability of each event happening is independent of another one happening

  • London born - in love with Karl Marx - studied Politics first - found out about regression to the mean
  • He became Galton's (Darwin's cousin) protΓ©gΓ©!

πŸ’‘
PDF If a random variable is continuous, then the probability can be calculated via probability density function, or PDF for short. The shape of the probability density function across the domain for a random variable is referred to as the probability distribution and common probability distributions have names, such as uniform, normal, exponential, and so on.

In statistics, If a data distribution is approximately normal then about 68% of the data values lie within one standard deviation of the mean and about 95% are within two standard deviations, and about 99.7% lie within three standard deviations

πŸ’‘
Cofounder (cofounding variable, extraneous determinant, etc.) Some variable that influences both dependent and independent variable that leads to a spurious association.

Covariance

Standard deviation and Variance

Variance is also distorted by outliers as it is calculated using our mean, and is sensible to large or small values. When the variance is low in one variable among many with similar scales, we can understand that this variable does not have much impact on our label.

Standard deviation β†’ more sense : it's somehow the average amount we expect the records to differ from the average value.

The sample variance: a little smaller than the main variance

Correlation Coefficient (Pearson coefficient/r)

the covariance divided by the variance to bound the numbers.

When two variables are correlated:

Skewness

The third moment

Kurtosis

Forth moment

how thick the ends are

Bias types:

Logistic Regression

For understanding the logistic regression formula, we consider a logistic model that comes from a log of odds

Odds of somethig

πŸ’‘
Log of adds (logits): its the log of odds. Why do we get the log? to make everything symmetrical. log(6) = - log(1/6)

Regression

coefficient of determination, denoted R2R^2ο»Ώ

The proportion of the variation in Y being explained by the variance in x! a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

yˉ {\bar {y}} is the mean of the observed data

SSR=βˆ‘(Yi^βˆ’Yβ€Ύ)2SSR = \sum (\hat{Y_i}-\overline{Y})^2ο»Ώ

SST=SSM=βˆ‘(Yiβˆ’Yβ€Ύ)2SST = SSM = \sum(Y_i - \overline{Y})^2ο»Ώ

The denominator in R-squared is showing the variation in data as it is the deviation of each example from the mean of all examples. And the nominator is the distance of our predictions to the true values.

We can never know what beta null or one is but we can estimate them. The theory is that it exists.

Degrees of freedom

Degrees of freedom is determined by the number of observations and also the number of variables. the more data we have, the more freedom we have to move around our prediction line or plane. But the more variables there are the less freedom we have for defining the plane. So:

DegreesOfFreedom(df)=nβˆ’kβˆ’1DegreesOfFreedom(df) = n-k-1

n: # of examples

k: # of variables

-1: for the label

  • Criticized Pearson a lot!
  • Was a euginistic but not racist, I think.
  • Working on crops, he developed the ANOVA method.
  • He outlined hypotheses on natural selection and creatures population (e.g., the sexy son hypothesis!)
Ronald Fisher

Maximum likelihood:

How likely it is for us to have this mean/center given this data

ANOVA

Adjusted R-squared

When we have low number of examples and two many variables our plane is defined by only those examples therefore we have a over-fitted plane.

To prevent this from effecting our model we adjust the R-squared.

The more variables, the less R-squared we will have.

Rβ€Ύ2=1βˆ’(SSESST).nβˆ’1nβˆ’kβˆ’1\overline R^2 = 1 - (\frac {SSE}{SST} ).\frac{n-1}{n-k-1}

Hypothesis for regression

We say there is no relationship between X and Y unless we find enough evidence to reject this null hypothesis. When coeff is not null it is significant.

Test statistics

Does our score fit our null hypothesis well?

Z-score (The critical value)

How many standard errors we are away from the mean of the whole. We use it when we know the true population std. Like when we vaccinate people and know what percentage got sick so we calculated p(1βˆ’p)p(1-p)ο»Ώ and then got a sample of 600 and 400.

population. (when comparing two means from a samples and the mean of the whole population we're saying how many standard deviations we are away from the pop mean)

Zgroup=Xβ€Ύgroupβˆ’MeanΟƒnZ_{group} = \frac{\overline{X}_{group} - Mean}{\frac{\sigma}{n}}ο»Ώ

with a critical value(threshold we identify whether or not the answer is significant, or whether or not we can reject the null)

T-student distribution

  • T-test is a form of z-test. with large datasets we get the approximation of Z-test.

The author of its paper was interested in working on small samples from a larger population. It's a small sample size approximation of a normal distribution.

Xβ€Ύβˆ’ΞΌSn\frac{\overline{X}-\mu}{\frac{S}{\sqrt{n}}}
English Gosset AKA student

Effect Size: When using T-student and alpha make sure you actually take care of the size too as the std is being divided by the size of our sample. this may effect the t score and then we may get a small p-value!

Formula for groups:

tβˆ’student=differenceβˆ’0Averagevariationt-student = \frac{difference - 0}{Average variation} ο»Ώ

Chi squared

With this we compare our model with the real and true sampled observations.

Relationship between two variables. For example, you could test the hypothesis that men and women are equally likely to vote "Democratic," "Republican," "Other" or "not at all."

Take continues variables and see how well we can test the fit of categorical variables.

test πŸ”—

Ο‡2\chi^2ο»Ώ Goodness of fitness

Many categories, one variable

πŸ’‘
Check! Is the E for every category more than 5???

Formula:

Ο‡2=βˆ‘(Oiβˆ’Ei)2Ei\chi^2 = \sum \frac{(O_i - Ei)^2}{E_i}ο»Ώ

Test of Independance

Checking if having one category in your features actually affects the variable we're checking!

  1. Calculate the expected values and then compare them How to calculate the expected value??? Well, first of all how many said no? Then multiply this percentage with the total number of ravenclaws or hufflepuffs to see based on the distribution of yes and nos we have how many of ravenclaws or hufflepuffs we expect to have.

Test of Homogeneity

Whether two samples come from the same population

Interaction variables

Logit models (Logistic models)

F-Statistic and test

Fisher wanted to see the difference between different potato's weights based on the fertilizer they used!

Types of data

Categorical and Quantitative

Quantitative

Categorical

  • He expounded the mathematics of extremes
  • Was a war war || survivor
  • Worked with Fisher
Gumbel

Experiments

Control group: no treatment, drug or change

Single blind study: the researcher knows about the change and purpose of study, but the volunteer does not

Matched pair experiments: on very similar people or groups.

Repeated Measures design: experiment with the same subject

In vitro experiments = test-tube experiments colloquially

Geometric distribution

Psβˆ—(1βˆ’P)(1βˆ’s)P^s * (1-P)^{(1-s)}

Birthday paradox: unique birthdays given a number of people in one room.

Math revision

norm: magnitude of a vector Euclidean norm: l2 norm

πŸ’‘
When to use L2? with large negative values. But when we want to calculate the norm of small non negative elements→ we use L1

NaΓ―ve BayesπŸ”—

The skewed distributions

Poisson distribution:

Trees

Singular value decomposition πŸ”—

The law of large numbers:

If you throw a fair, six-sided dice many times, the average of throwing it six times, the probability of each outcome, etc. will be clear (will get close to the expected value) after all those times.

Confounding variables = extraneous variables