Regression Analysis & Confidence Interval

Embed Size (px)

Citation preview

  • 7/28/2019 Regression Analysis & Confidence Interval

    1/3

    1

    Regression analysis & Confidence Interval

    Regression analysis

    The standard mathematical way to present the relationship when Y is a linear function of X

    is:

    In this relationship, Yis the dependent variable andXis the independent variable while a

    represents the value at which the line intersects the Y axis on a graph and b is the slope

    coefficient of the line.

    Regression analysis is a statistical method to evaluate the relationship and causality between

    different variables by fitting a function to a given dataset. It is a powerful and flexible tool

    that allows to disintegrate the effect of many factors on a specific indicator.

    For example, the GDP growth in a country depends, to different extent, on the

    unemployment rate, terms of trade, wage levels, consumption etc. Regressions are often

    used to forecast or estimate values that are unobservable.

    In the graph below, you can see a scatterplot where each blue dot represents an

    observation iwith the given values for X (horizontal axis) and Y (vertical axis).

    The red line is fitted to the dataset using the Ordinary Least Squares (OLS) method. The OLS

    method estimates the parameters of the relationship (the intercept a and the slope

    coefficient b) so that the sum of the squared deviations of each observation from the fitted

    line is minimized for the dataset.

    Given that all of the data points do not lie on the line, an error term must be added to the

    relationship equation to reflect the uncertainty in the estimation of unknown values. This

    error term is assumed to be an independent and identically distributed (i.i.d.) random

    variable, meaning that the error terms across the sample are uncorrelated and have the

    same statistical distribution with an expected mean equal to zero.

  • 7/28/2019 Regression Analysis & Confidence Interval

    2/3

    2

    0

    20

    40

    60

    80

    100

    0 20 40 60 80 100

    A common regression model in finance is the market model (the formula below) which

    states that the return on a financial asset depends on the return on the market and the

    companys riskiness. Thus, the return on the asset is the dependent variable and the return

    on the market portfolio is the independent variable. The intercept is noted with alpha, the

    slope coefficient with beta and error term with epsilon. Beta measures the responsiveness

    of the stock to the market portfolio and is also considered to be a measure of risk. If the

    company is listed and its stock returns are observable, then the riskiness of the companyrelative to the market can be estimated by running a regression. Usually, the market return

    is proxied by the return on the market index.

    Confidence interval

    The confidence interval indicates the reliability of an estimate. The confidence interval is the

    set of values reasonably consistent with the observed results. For example, the 99%

    confidence interval is the range of values that has a 99% chance of containing the true value

    of the estimated variable.

    With less rigor, it is possible to say that the confidence interval represents the range of

    values within which we are 99% certain to find the true value we are looking for.

    Alternatively, we coud say that if an estimated variable is within the 99% confidence interval,

    then we are 99% sure that the true value of the estimated variable is in the interval. We can

    thus accept the estimated value as it has only 1% chance of being outside the interval.

  • 7/28/2019 Regression Analysis & Confidence Interval

    3/3

    3

    The confidence interval gives an indication of the uncertainty of the estimate. Within theconfidence interval, the true valueis not statistically significantly different from the observed

    result. The upper and lower bounds are by definition the limit values. If the observed result

    is above the upper limit or below the lower limit then the true value is statistically

    significantly different from the estimate. Consequently, the estimate is not good enough.

    The confidence level (or confidence coefficient) gives the frequency at which the observed

    interval contains the sought after parameter, the 95% level is used most commonly.