Thursday, October 16, 2014

Why did I try Residual Plot over my dataset

Why Residual Plot


Well I had previously a sick data of India Population, which was correct but I altered it and made it worst. So now the population based on year is linear increasing suddenly it touched the pinnacle so ideally this data is never meant for Linear Regression but bound with my habit , I ran linear regression on them and found this -



Ok above is the linear line I got and that is terrible , believe because I ran the predictor and I got brilliant bad result :(

For year 1800,2030,2040 I got 
     1         2         3 

-11839.78 824736.40 861109.28 

So it does man there was no India in map :O , what that's not possible I messed it up ...
Well i already mean it to make the data work properly , but nothing helped.

So Now I knew that I need to transform my data to some format so I searched on internet and found some keyword named Residual Plot.


Well, what again new concept, why should I learn this....

Residual is the error between an actual value of dependent variable and predicted value. So avoiding all these mind blowing keyword behind , I finally derived that its a way to find a model is a 'good fit' or not.

There is 2 very basic and easy thing to remember in residual plot-
 1. The residuals for the 'good' regression model are normally distributed, and random.

 2. The residuals for the 'bad' regression model are non-Normal, and have a distinct, non-random pattern.




So from above , we can see a sure-shot case of bad data and model and I know surely this model is bad as my model definitely shows a pattern, superb pattern of growing....


More , by chance I need more-

If I know this far, I must draw a conclusion by drawing some example with a good fit of data within model , lets see hows residual looks-

Following Sample data -

x <- runif(100,-3,3)

y <- x+ sin(x) + rnorm(100,sd =.2)


and I got -


Good one, isn't it , but lets not be in hurry, lets see the Residual plot-



What I can see a pattern now , I sin wave, ahh... so it says the model looks to be good but it not, so just don't go with scatter plot or model, there may be trouble inside, there is no harm to run the residual plot .





No comments: