How to Calculate the P-Value of an F-Statistic in R
F-test is a statistical test and it produces the F-statistic which possesses F distribution under the null hypothesis. This article focuses on how we can compute the P-value of an F-statistic in R Programming Language.
Finding P-value of an F statistic in R
R provides us pf() function using which we can determine the p-value associated with the F-Statistic. The function has the following syntax:
Syntax: pf(F_statistic, dataframe1, dataframe2, lower.tail = FALSE)
Parameters:
- F_statistic: It represents the value of the f-statistic
- dataframe1: It represents the degrees of freedom 1
- dataframe2: It represents the degrees of freedom 2
- lower.tail = TRUE: Returns the probability associated with the lower tail of the F distribution.
- lower.tail = FALSE: Doesn’t return the probability associated with the lower tail of the F distribution.
Example:
Consider an example of having the following parameters:
- fstat: 7
- df1: 4
- df2: 5
- lower.tail = FALSE
R
pf (7, 4, 5, lower.tail = FALSE ) |
Output:
Hence, the p-value associated with F-statistic comes out to be equal to 0.027. F-test is also used to test the overall significance of a regression model.
Computing p-value from F-statistic for a regression model
Consider that we have a dataset that shows the total distance traveled, total emission generated, mileage obtained at the end:
R
# Create a dataset dataset <- data.frame (distance = c (112, 217, 92, 98, 104), emission = c (4.5, 9.8, 12.1, 3.2, 7.6), mileage = c (15, 12, 16, 19, 21)) # Display the dataset dataset |
Output:
Now, we can fit a linear regression model to this data using distance and mileage as the predictor variables and mileage as the response variable. To fit a regression model, R provides us lm() using which we can fit the linear regression model easily. It has the following syntax:
Syntax: lm( formula, dataframe )
Parameters:
- formula: It represents the formula for the linear model.
- dataframe: It represents a data frame that contains the data.
To print the summary of the linear model, we can use the summary() function. This function has the following syntax:
Syntax: summary(model)
Parameters: model: It represents a model
The complete source code is given below:
R
# Create a dataset dataset <- data.frame (distance = c (112, 217, 92, 98, 104), emission = c (4.5, 9.8, 12.1, 3.2, 7.6), mileage = c (15, 12, 16, 19, 21)) # Fit a regression model model <- lm (mileage ~ distance + emission, data = dataset) # Display the output of the model summary (model) |
Output:
The F-statistic for the overall regression model comes out to be equal to 1.321. This F-statistic has 2 degrees of freedom for the numerator as well as for the denominator. The p-value for this F-statistic is equal to 0.4309.
We can calculate this equivalent p-value with the help of the following code:
R
# Compute the p-value pf (1.321, 2, 2, lower.tail = FALSE ) |
Output:
As you can see in the output, we got an almost similar result.