Mathematical Concepts  Used Here

The standardised residual, which is expressed in units of the residuals’ standard deviation, is a measurement of how far each measured value of the response variable deviates from its predicted value in the linear regression model. It is determined by:

standardized residual = residual / (sqrt(MSE) * sqrt(1 – hii))
 

where MSE stands for the mean squared error of the model, ‘hii’ is the leverage for each observation, and residual is the residual for each observation. The leverage quantifies the weight that a measurement has in influencing the model’s fitted values.

The linear regression hypotheses can be verified with the help of the standardised residual plot. It charts the standardized residuals versus the model’s fitted values. The standardised residuals should be randomly distributed around zero and the plot should show no clear patterns or trends if the assumptions of linear regression are true.

Using the plot in R, you can produce a standardized residual plot. Using the which argument set to 1, call the ‘lm( )’ method. This will result in a plot.

Here,

We attempt to model the connection between a dependent variable y and one or more independent variables X in linear regression analysis. A linear regression model with only one independent variable has the following general equation:

y = β0 + β1X + ε

where X is the independent variable, β0 represents the intercept, β1 represents the slope coefficient, represents the error term, and y represents the dependent variable. The unexplained variation in the dependent variable that is not taken into consideration by the independent variable is represented by the error term. In order to predict the value of y given a value for X, linear regression analysis aims to estimate the coefficients β0 and β1 that best match the data.

The sum of squared errors, which is the total of the squared differences between the observed values of y and the predicted values of y, must be minimized when fitting a linear regression model. This can be mathematically written as:

SSE = Σ(yi – ŷi)2

where yi is the value of y that was witnessed, i is the value of y that was predicted, and  is the symbol that represents the sum of all values of i. The least squares method, which entails identifying the values of  β0 and  β1, can be used to estimate the coefficients  β0 and  β1 that minimise the sum of squared errors.

Once the linear regression model’s coefficients have been estimated, we can use them to forecast the dependent variable y given a value for X. You can write the expected value of y as:

ŷ = β0 + β1X

The residual is the difference between the measured value of y and the predicted value of y, and it can be written as follows:

ei = yi – ŷi

The unaccounted-for variance in the dependent variable that is not accounted for by the independent variable is represented by the residuals. The linear regression model is regarded as being legitimate if the residuals have a normal distribution with a mean of zero and a constant variance. The linear regression model, however, might not be reliable if there is a trend in the residuals, such as nonlinearity or heteroscedasticity, and further steps may be required to enhance the model.

The residuals are divided by their expected standard deviation to produce the standardised residuals. They are helpful in locating any outliers or influential data that might have an impact on the model. A standardised residual is deemed to be an influential observation and needs to be looked at more carefully to see if it is influencing the model if its absolute value is higher than 2.



Standardized Residual in R

The distinction between a dependent variable’s observed value and expected value is known as a residual in statistics. A sort of residual known as a “standardized residual” has been standardised to have a mean of zero and a standard deviation of one. It is employed in regression analyses to quantify how far a data point deviates from the predicted value and to spot potential outliers.

Similar Reads

Concepts:

To compute the standardised residual, subtract the anticipated value from the observed value, then divide the result by the estimate’s standard error. The accuracy of predicting the dependent variable from the independent variable is measured by the standard error of the estimate....

Steps to be followed:

Load the required R packages, such as ‘car'(companion to applied regression) and ‘ggplot2’, which include the tools for generating consistent residuals and rendering them. R’s ‘lm()’ function can be used to fit a regression model. Utilize the ‘rstandard()’ function from the ‘car’ package to determine the standardised residuals. To spot probable outliers, visualise the standardised residuals using a scatterplot or a histogram. To comprehend the link between the dependent and independent variables, interpret the standardised residuals....

Using the ‘plot( )’ and ‘plot.lm( )’ functions, you can draw the ‘simple plot’ and ‘standardised residual plot’, respectively in R. Here is an illustration:

R #Create some arbitrary data x <- rnorm(50) y <- 2*x + rnorm(50)   # Create a model of linear regression model <- lm(y ~ x)   # Plotting the simple plot plot(x, y, main = "Simple Plot")   # Plotting the standardized residue plot plot(model, which = 1, main = "Standardized Residue Plot")...

Output:

...

Simple histogram and  Standardized Residual plot :

...

Output:

R # Generate some random data x <- rnorm(50) y <- 2*x + rnorm(50)   # Fit a linear regression model model <- lm(y ~ x)   # Plot the simple histogram hist(y, main = "Simple Histogram")   # Plot the standardized residual plot plot(model, which = 1, main = "Standardized Residue Plot")...

Here are some  scatter plot examples using R:

...

Output:

...

Output:

...

Output:

A straightforward scatter diagram with a regression line:...

Another example utilising a simulated dataset is as follows:

...

output:

...

Mathematical Concepts  Used Here:

...