What Is Residual Standard Deviation?
Regression analysis is a method used in statistics to show a relationship between two different variables, and to describe how well you can predict the behavior of one variable from the behavior of another.
Residual standard deviation is also referred to as the standard deviation of points around a fitted line or the standard error of estimate.
- Residual standard deviation is the standard deviation of the residual values, or the difference between a set of observed and predicted values.
- The standard deviation of the residuals calculates how much the data points spread around the regression line.
- The result is used to measure the error of the regression line’s predictability.
- The smaller the residual standard deviation is compared to the sample standard deviation, the more predictive, or useful, the model is.
Understanding Residual Standard Deviation
Residual standard deviation is a goodness-of-fit measure that can be used to analyze how well a set of data points fit with the actual model. In a business setting for example, after performing a regression analysis on multiple data points of costs over time, the residual standard deviation can provide a business owner with information on the difference between actual costs and projected costs, and an idea of how much-projected costs could vary from the mean of the historical cost data.
Formula for Residual Standard Deviation
Residual=(Y−Yest)Sres=n−2∑(Y−Yest)2where:Sres=Residual standard deviationY=Observed valueYest=Estimated or projected valuen=Data points in population
How to Calculate Residual Standard Deviation
To calculate the residual standard deviation, the difference between the predicted values and actual values formed around a fitted line must be calculated first. This difference is known as the residual value or, simply, residuals or the distance between known data points and those data points predicted by the model.
To calculate the residual standard deviation, plug the residuals into the residual standard deviation equation to solve the formula.
Example of Residual Standard Deviation
Start by calculating residual values. For example, assuming you have a set of four observed values for an unnamed experiment, the table below shows y values observed and recorded for given values of x:
If the linear equation or slope of the line predicted by the data in the model is given as yis = 1x + 2 where yis = predicted y value, the residual for each observation can be found.
The residual is equal to (y – yis), so for the first set, the actual y value is 1 and the predicted yis value given by the equation is yis = 1(1) + 2 = 3. The residual value is thus 1 – 3 = -2, a negative residual value.
For the second set of x and y data points, the predicted y value when x is 2 and y is 4 can be calculated as 1 (2) + 2 = 4.
In this case, the actual and predicted values are the same, so the residual value will be zero. You would use the same process for arriving at the predicted values for y in the remaining two data sets.
Once you’ve calculated the residuals for all points using the table or a graph, use the residual standard deviation formula.
Expanding the table above, you calculate the residual standard deviation:
|Sum of each residual squared, or Σ(y-yis)2|
Observe that the sum of the squared residuals = 6, which represents the numerator of the residual standard deviation equation.
For the bottom portion or denominator of the residual standard deviation equation, n = the number of data points, which is 4 in this case. Calculate the denominator of the equation as:
- (Number of residuals – 2) = (4 – 2) = 2
Finally, calculate the square root of the results:
- Residual standard deviation: √(6/2) = √3 ≈ 1.732
The magnitude of a typical residual can give you a sense of generally how close your estimates are. The smaller the residual standard deviation, the closer is the fit of the estimate to the actual data. In effect, the smaller the residual standard deviation is compared to the sample standard deviation, the more predictive, or useful, the model is.
The residual standard deviation can be calculated when a regression analysis has been performed, as well as an analysis of variance (ANOVA). When determining a limit of quantitation (LoQ), the use of a residual standard deviation is permissible instead of the standard deviation.