What is Statistics?
Statistics is the science of collecting, organising & analysing data [cite: 1125, 1127, 1129]. It generally deals with the tabulation and interpretation of numerical data [cite: 1141, 1143].
Sample Mean: Just the average of the values [cite: 1442, 1443].
$$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}$$ [cite: 1322]
Standard Deviation:
$$S = \sqrt{\frac{\sum (x - \bar{x})^2}{n-1}}$$ [cite: 1519]
Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter [cite: 1910, 1912, 1914].
Aim: To find the Best Fit Line with Minimal Errors [cite: 42]. This is often solved statistically by Ordinary Least Squares [cite: 16, 17, 19].
Equation of a straight line:
$$Y = a + bX + \epsilon$$ [cite: 330]
Where:
$Y$ = Dependent Variable [cite: 331]
$X$ = Independent Variable [cite: 332]
$a$ = Intercept [cite: 333]
$b$ = Slope [cite: 334, 335]
$\epsilon$ = Residual Error [cite: 336, 337]
The squared error cost function is used to find the average of all errors [cite: 100, 116]. We want to minimize $J(\theta_0, \theta_1)$ [cite: 85].
$$J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x_i) - y_i)^2$$ [cite: 115, 121]
Gradient Descent is the optimizer (Repeat Convergence Algo) [cite: 122, 123, 188].
$$\theta_j = \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$$ [cite: 191]
The Gauss-Markov Theorem implies that the least squares estimator has the smallest Mean Squared Error (MSE) among all the linear estimators [cite: 585, 586]. OLS is the Best Linear Unbiased Estimator (BLUE) [cite: 565, 566, 605].