Upload
dereklee34
View
220
Download
0
Embed Size (px)
Citation preview
7/28/2019 Regression Analysis & Confidence Interval
1/3
1
Regression analysis & Confidence Interval
Regression analysis
The standard mathematical way to present the relationship when Y is a linear function of X
is:
In this relationship, Yis the dependent variable andXis the independent variable while a
represents the value at which the line intersects the Y axis on a graph and b is the slope
coefficient of the line.
Regression analysis is a statistical method to evaluate the relationship and causality between
different variables by fitting a function to a given dataset. It is a powerful and flexible tool
that allows to disintegrate the effect of many factors on a specific indicator.
For example, the GDP growth in a country depends, to different extent, on the
unemployment rate, terms of trade, wage levels, consumption etc. Regressions are often
used to forecast or estimate values that are unobservable.
In the graph below, you can see a scatterplot where each blue dot represents an
observation iwith the given values for X (horizontal axis) and Y (vertical axis).
The red line is fitted to the dataset using the Ordinary Least Squares (OLS) method. The OLS
method estimates the parameters of the relationship (the intercept a and the slope
coefficient b) so that the sum of the squared deviations of each observation from the fitted
line is minimized for the dataset.
Given that all of the data points do not lie on the line, an error term must be added to the
relationship equation to reflect the uncertainty in the estimation of unknown values. This
error term is assumed to be an independent and identically distributed (i.i.d.) random
variable, meaning that the error terms across the sample are uncorrelated and have the
same statistical distribution with an expected mean equal to zero.
7/28/2019 Regression Analysis & Confidence Interval
2/3
2
0
20
40
60
80
100
0 20 40 60 80 100
A common regression model in finance is the market model (the formula below) which
states that the return on a financial asset depends on the return on the market and the
companys riskiness. Thus, the return on the asset is the dependent variable and the return
on the market portfolio is the independent variable. The intercept is noted with alpha, the
slope coefficient with beta and error term with epsilon. Beta measures the responsiveness
of the stock to the market portfolio and is also considered to be a measure of risk. If the
company is listed and its stock returns are observable, then the riskiness of the companyrelative to the market can be estimated by running a regression. Usually, the market return
is proxied by the return on the market index.
Confidence interval
The confidence interval indicates the reliability of an estimate. The confidence interval is the
set of values reasonably consistent with the observed results. For example, the 99%
confidence interval is the range of values that has a 99% chance of containing the true value
of the estimated variable.
With less rigor, it is possible to say that the confidence interval represents the range of
values within which we are 99% certain to find the true value we are looking for.
Alternatively, we coud say that if an estimated variable is within the 99% confidence interval,
then we are 99% sure that the true value of the estimated variable is in the interval. We can
thus accept the estimated value as it has only 1% chance of being outside the interval.
7/28/2019 Regression Analysis & Confidence Interval
3/3
3
The confidence interval gives an indication of the uncertainty of the estimate. Within theconfidence interval, the true valueis not statistically significantly different from the observed
result. The upper and lower bounds are by definition the limit values. If the observed result
is above the upper limit or below the lower limit then the true value is statistically
significantly different from the estimate. Consequently, the estimate is not good enough.
The confidence level (or confidence coefficient) gives the frequency at which the observed
interval contains the sought after parameter, the 95% level is used most commonly.