Prediction intervals
A prediction interval is an estimate of an interval in which future observations will fall, with certain probability, based upon present or past background samples taken.
Prediction intervals are useful for predicting, for a given x, the y value of the next experiment.
For instance… A 95% prediction interval is the y range for a given x where there is a 95% probability that the next experiment’s y value will occur.
Given a linear regression equation ŷ = mx + b and x0, a specific value of x, a c-prediction interval for y is
ŷ – E < y < ŷ + E
where

The point estimate is ŷ and the maximum error of estimate is E. The probability that the prediction interval contains y is c.
.
Example. Construct a 95% prediction interval for the company sales when the advertising expenditures are $2100. What can you conclude?
Here’s the data.
| Advertising expenses (1000s of $), x |
Company sales (1000s of $), y |
|
2.4 |
225 |
|
1.6 |
184 |
|
2.0 |
220 |
|
2.6 |
240 |
|
1.4 |
180 |
|
1.6 |
184 |
|
2.0 |
186 |
|
2.2 |
215 |
Here’s what we have figured out thus far…
.
r = 0.9129
m = 50.7287
b = 104.0608
x = 1.975
y = 204.25
The regression line is ŷ = mx + b = 50.7287x + 104.0608
The total variation is 3813.5.
The explained variation is 3178.1502.
The unexplained variation is 635.3441.
r2 = 0.8334
This means that 83.3395% of the variation of y can be explained by the relationship between x and y.
The remaining 16.6605% of the variation is unexplained and is due to other factors or to sampling error.
se = 10.2903
That means the standard deviation of the company sales for a specific advertising expenditure is about $10,290.30.
.
This above is very impressive.
.
Let’s continue and find a 95% prediction interval when the advertising expenditures are $2100.
.
We need some of the above information and these next few items.
c = 0.95
x0 = $2100, for this application we use 2.1
n = 8
d.f. = 6
tc = 2.447 This comes from our t-distribution table. The same values as a confidence interval. Remember this is a t-distribution, so the degrees of freedom are very important.
.
Find the point estimate.
ŷ = 50.7287x + 104.0608
For x0 = 2.1, the point estimate ŷ = 50.7287(2.1) + 104.0608

Find E.


Find the interval.
Using ŷ = 210.5911 and E = 26.8576
ŷ – E < y < ŷ + E

Left endpoint 183.7335
Right endpoint 237.4487
Prediction Interval: (183.7335, 237.4487)
So, you can conclude with 95% confidence that when the advertising expenditures are $2100, the company sales will be between $183,733.50 and $237,448.70.

























