Once we run the linear regression model, we get residuals as summary, so to determine more about residuals, there are some interesting points I accidentally found and it helped me so I am drafting ...
- Residuals, have mean Zero so it does mean that residual is balanced among the data points , so no pattern its just scattered and there will almost equal positive and negative.
so if I run the linear regression in R-
fit <- lm ( relation ~ person , data = people) ,
so to justify the theory , just do the simple mean of residuals-
mean(fit$residuals) - must give a value very close to zero.
- There is no correlation between residuals and predictors.
cov(fit$residuals, people$person) .
While googling I found a new equation -
- var(data) = var(estimate) + var(residuals)
Regression line is the line through the data which has minimum (least) squared 'error', the vertical distance between actual predictor and the prediction made by line.
Squaring the distances ensures the data points above and below the line are treated the same.
The method of choosing the 'best regression line' (or fitting a line to the data) is known as ordinary least squares.