Summary on various statistical models and measures

Embed Size (px)

DESCRIPTION

see title

Citation preview

Hoorcollege 1Scientific Learning: Type 1 vs Type 2Type 1: Association/Description of factsType 2: Causal relations: one variable influencing anotherIdeal experiment: Randomized controlled trial (RCT) -> Control vs treatment groups. Difference between sample means in two groups estimation of casual effect. However, is very expensive and difficult to execute.

Analyse aan de hand van data -> verschillende soorten dataObservational versus Experimental data.Experimental data designed to evaluate treatment/policy etc.Observational data not designed specifically for causal effects (HC 9/10)

Data has a time dimension: -Time series: look at the same subject over the course of a period- Cross sections: look at different subjects at the same point in timeIf we combine the two we get a data set which consists of so called panel data.Ecological fallacy: erroneously drawing conclusions about individuals solely from the observations of higher aggregations.Conceptualization: The process through which we specify what we mean when we use particular terms in research. (difficult in social scenes)(e.g. specifying happiness)Operationalization: is the development of specific research procedures that will result in empirical observations representing those concepts in the real world. (measuring a theoretical concept)Grossman Model: Being healthy feeling good creates a motive for people to invest in health as it enables them to enjoy leisure time better and work harder. Individuals that earn higher wages, have a higher opportunity cost of time, and consequently are more likely to invest in their health.Operationalization: criteria for measurement quality-Reliability: the quality of the measurement method that suggests the same data would have been collected each time in repeated observations of the same phenomenon. Cronbachs Alpha (0-1): higher values imply higher reliability. However, doesnt say anything about validity.-Validity : a term that describes how accurately a measure reflects the concepts it is intended to measure.

Slides voor voorbeeld schietbaan.

Hoorcollege 2Linear regression models are suitable for continuous dependent variable (y) related to any kind of variable (x)Covariance: tells us if X and Y tend to move in the same or in opposite directions.Correlation: always between -1 and 1. Expresses the strength of the relationship between x and y. Linear regression tries to assess a cause and effect relationship and to quantify this relationship. It is bound by several assumptions. Linear model:

the error term expresses the vertical distance between the point predicted by the model and the actual observation. The true Betas are unknown, so we need to obtain a value for them through a best fit line. In order to do so we use a method called Ordinary Least Squares (OLS):

The R^2 expresses how well the model fits the data. It tells us how much of the variance in Y is explained by the model.

Assumptions of OLS:-Zero-Conditional Mean: The mean of the error terms (u), given any value of X, is zero.-Y and X independent/identically distributed. (basically, this means that we need a random sample)-Large outliers are unlikely to occur

The zero-conditional mean assumptions holds if:1) X is random2) We believe that X is uncorrelated with other factors that influence Y. (which basically implies that X is in fact random)

The IID of Y and X holds when the sample is drawn using random sampling. It does not hold when we use dependent observations, such as a sample that consists of the same unit of observation over time. (basically, it only works with cross-section data, not with time series/panel data)Large outliers are unlikely if : Predictors Beta (hat) are computed from a random sample and therefore are random variables with a probability distribution themselves. The intercept of the model (beta 0) implies the average Y when X is zero. However, it might not be useful to interpret the intercept when X=0 is not a realistic data point.There are also binary regressors, which means that the only options for X are 0 or 1. This means the average Y when X=0 is beta 0, and when X=1 the average Y is beta0+beta1.Hoorcollege 4Because the betas are just estimates of the effect one extra unit of x has on y, different samples will yield different estimates. If we could draw all possible random samples, on average we would find the true value of beta 1 and beta 0. However, this is impossible.In general we will test the hypothesis if our estimate beta 1 equal to an arbitrary beta, and reject this null hypothesis when our estimate is sufficiently far away from this value.In order to test hypotheses, we use a concept called the p-value. This value basically expresses, given that the null hypothesis is true, the chance that your estimate happened (more extensively, the chance that you drew a sample which yields an estimated beta which is equal to the one we got).

The same procedure applies to one-sided hypotheses concerning beta1, the t-stat is the same but the p-value is different (1/2).

OR

A confidence interval contains the true value of beta in x% of the cases (x arbitrary, usually 90, 95 or 99).Alternative interpretation of CI: set of all values that cannot be rejected using a two sided hypothesis at 100-x%.HomoskedasticityVariance of the error term: even it is dependent on X, the OLS estimators Beta (hat) 0 and 1 remain unbiased and consistent. Also the standard errors used are valid, and so are the hypothesis tests, p-values and confidence intervals.Homoskedasticy means that the variance of the error term is constant for every X. If this holds, then the formulas of standard errors can be simplified and the OLS estimator is efficient among all unbiased linear estimators.Hoorcollege 5:Recap: linear regression.How confident can we be that the beta (OLS estimator) is unbiased? E.g. if we measure the influence of height on income, maybe height is also correlated with intelligence and intelligence is also correlated with income!If this is the case, then the error term also includes intelligence and hence could be rewritten as:

If we have access to data on the omitted independent variable then we can simply include this variable in the model and consider the more complete model. In this case, the omitted variable that was added can also be called a control variable.Assumptions:1. Zero-conditional Mean2. X and Y i.i.d.3. Large outliers are unlikely4. No perfect multicollinearity (NEW ASSUMPTION FOR MULTIPLE REGRESSION)

The estimators in OLS follow a multivariate normal distribution (in large samples). Beta is then normally distributed:

Note:

How well does the multiple regression model fit the data?

R^2

Therefore we use a concept called the adjusted R^2:

Things to keep in mind when measuring goodness-of-fit- if the dependent variable is measured in different units/defined differently across models, you cannot compare the measures of fit- measures of fit only express how well the model explains variance, predicts Y, but is silent about whether assumptions being violated or not.Hoorcollege 6If there is a certain correlation between an estimator beta(hat) and the error term, the estimator is biased. To test the significance of a single estimator in a multiple regression model:

However, it is also possible to test for multiple hypotheses or so called joint hypothesesThese usually have a null hypothesis including multiple restrictions (b1 = 0, b2=0 etc.) and an alternative hypothesis stating that at least one of the restrictions does not hold.Please note: because we have 2 (or more) coefficients, we know that they follow a bivariate normal distribution. Hence we should use this instead of the individual normal distributions of the separate variables. This is due to the fact that if we use separate tests and reject H0 if abs(t1) > 1.96 or abs(t2) > 1.96, the power of this test is lower than 5%. Therefore, we use an F-test.

So:

However, it is also possible to do an F-test with uncorrelated t-stats

This is already a much simpler formula than the original one. The most simple F-test is the one with only a single restriction:

A special variation of the F-test is the F-test of overall regression, which tests the hypothesis that all coefficients (except of the constant/intercept) are zero (= none of the regression explain any variation in the dependent variable).So, we know that if we want to test a joint hypothesis, we should not simply use the separate t-stats. However, if we only have access to t-values, we can use the Bonferroni method, which uses special critical values to make sure the test has the right size and significance level.Also, we are going to adapt the first assumption of the multiple regression model -> Zero conditional mean into Conditional independence:

Of course, this adaptation has several implications:- We can now interpret the effect of income as causal, but not that of education -> because the estimator of education can now capture other factors related to education (time preferences etc.). This is called Partial Association. However, if the only variable of interest is income, then this is not a problem because we can use Education as a control variable and keep it constant.The effect of income still suffers omitted variable bias if the conditional interdependence also does not hold. (that means, if the correlation between Income and the error term u is not 0)This occurs if we are still omitting other factors that influence smoking and are correlated with income. A possible solution is to add other control variables and check whether the effect is robust. If the estimator does not change (significantly), it indicates that there is no sign of OVB.Hoorcollege 7Excluding a certain variable from a multiple regression model (e.g. gender) biases the returns to education if males have higher labor incomes compared to females and when years of education differ across genders. (the example in the slides shows that males have a higher labor income on average, but education does not differ across genders) => no OVBNon-linear regression functions- Effect of variable depends on size of variable (diminishing returns etc.)- Transforming into natural logarithms- interaction effectsWhy would we transform the dependent variable into natural logarithms?=> Reason 1: Large outliers (one of the assumptions of OLS) are even more unlikely after transforming=> Reason 2: It converts changes in a variable into percentage changes, which causes the effect of a regressor on the dependent variable to be not constant, but proportional to the dependent variable.

Note: if we transform the dependent variable into natural logarithms, its called a Log-Linear model. If we choose to transform the independent variables, we speak of a Linear-Log model. If we choose to do the latter, the interpretation of predictors is exactly opposite: a one percent increase in the independent variable leads to an absolute change of n in the dependent variable. (DO NOT COMPARE r^2)

It is also possible to transform all the variables into natural logarithms -> Log-Log modelIt is possible to compare the R^2 of a log-linear model to a log-log model.Interaction effectsAre returns to education (for instance) similar across genders. You can use both continuous and binary variables to compare.OLS fails if the parameters are not linear, e.g. one of the parts of the model is e^beta2.Hoorcollege 8Association versus causality: only causal effects should lead to policy prescriptions. How do we determine when a regression analysis provides an estimate of a causal effect?

Internal validity:A study is internally valid if its statistical inferences (conclusions) about causal effects are valid for the population and setting studied. Basically means if we can trust the findings and conclusions of the research.Threats:*Unbiased and consistent estimators are important - OVB - Errors in variables - sample selection - simultaneous causality - functional form specification1. OVB

OLS estimator of the concept of importance should be unbiased and consistent -> conditional mean independence (1st assumption of OLS). This assumption is violated when there is an omitted variable that correlates with the variable of interest and has an influence on the dependent variable.

If omitted variables or adequate controls are not available: -> Use panel data (repeated observations from same unit)-> Instrumental variables regression-> research design: randomized controlled trial (RCT) / quasi-experimental approach2. Errors in variablesArises when the independent variable is measured imprecisely. If, for instance, among lower educated people there is more difference in the years it took them to get their degree, the conditional mean independence assumption is violated (larger error term dependent on X) = non-random measurement error.Classical measurement error -> downward bias in coefficient of a predictor Use formula:

Random measurement error in dependent variable does not lead to bias, but reduces precision. This requires no particular solution.In all other cases, we should either use instrumental variables or use a mathematical solution for the measurement error.3. Sample selection bias: If data is missing at random or based on a regressor, this does not result in a biased model. However, if it is based on dependent variable and conditional on a regressor, this means that de missing data actually is dependent on the value of the error and thus results in sample selection bias. (which violates the first assumption of OLS).

4. Simultaneous causalityThis occurs when causality runs from the regressors to the dependent variable and vice versa. Also known as reverse causality. If the effect of the dependent variable on the regressor is negative, then the causal effect is underestimated. If the effect is positive, then the causal effect is overestimated.5. Functional Form misspecificationUsage of wrong mathematical formula in the model -> leads to inconsistency and thus violates the third assumption about the likelihood of outliers occurring. If the dependent variable consists of continuous data, this might serve as a possible solution.6. inconsistency of OLS standard errorsHeteroscedasticity, it is possible to work around this by using Heteroscedasticity robust standard errors. (White?)The second OLS assumption, namely that all variables used are identically drawn also comes into play here. Independence across observations can be achieved by using random sampling.Threats to external validityA study is externally valid if its inferences can be generalized to other populations and settings.- Differences in populations- Differences in settingsImportance of these validities for forecastingFor causal models, both internal and external validity matter. For forecasting, only external validity is of importance, but in terms of time (and not population/settings). Hoorcollege 9 Tools to ensure or restore internal validity I

1. SamplingAppropriate sampling: obtain a representative sample of the population under study with randomly selected observations (independent). This provides a solution for threat 6 from hoorcollege 8.2 steps in sampling: Defining the population and drawing the sample, while making sure this is an appropriate sample.2 main techniques in sampling: Probablity and non-Probability sampling. - Probability sampling means that all members of the population have an equal chance of ending up in the sample. This is the superior sampling design, but register is required.- non-probability sampling implies using other techniques not suggested by probability theory -> this might result in a sample that is not representative. This method has many disadvantages, but it is essential when there is no register available of the population.Sampling weights: ex-ante you use a probability sampling framework, but then find out that the sample is not representative -> each observation gets a weight to restore representativeness. This weight is usually based on observable characteristics. A High (low) sampling weight means Less (more) likeliness to be in the sample.2. Panel DataPanel data consists of observations on the same n (number) entities at two or more time periods T. Advantage: It allows us to get rid of influences of omitted variables that do not change over time. It is limited in its potential in such a way that time-invariant variables (variables that are constant over time within a participant) drop out and the model will not explain their influence.3. instrumental variablesIf the regression of an independent variable X on the natural logarithm (ln) of Y suffers from OVB, how can instrumental variables help -> The variation in X consists of two parts : variation that is correlated with the error term (endogenous variation) leading to OVB and variation that is independent of the error term (exogenous variation). If we isolate the exogenous variation, we can get rid of the OVB -> this is what an instrumental variable does.Instrumental variable: example -> increase in compulsory school age Has to satisfy 2 conditions:1. Instrument exogeneity: the instrument variable is uncorrelated with the error term (cannot be tested!)2. Instrument relevance: the instrument (the school reform) is a good predictor of the endogenous variable (years of education). (testable!)

Two-stage least squares:First stage: predict years of education by the instrument. Since the instrument is assumed independent of the error term, education (X) is also independent of it.Second stage: regress the dependent variable Y on predicted education.If the instrument predicts X perfectly, then both betas (estimators) are equal for TSLS and OLS (only justified when there is no OVB). Hence, TSLS estimates are only externally valid for the variation in education that is explained by the instrument.If the instrument is good (Relevant and Exogenous), then an instrument can be used as a tool to address violations of OLS. (conditional mean zero and conditional mean independence).Hoorcollege 10 Tools to ensure or restore internal validity IIExperiment versus Quasi-Experiment: Experiment is random, quasi-experiment as if random. Experiment on purpose, quasi-experiment not on purpose.Basic problems in causal inference: influence of breast cancer screening on mortality. We cannot observe the influence of having a scan and not having a scan within the same participant -> individual effect cannot be measured and we need an experiment to inform us on the average causal effect. In the model, add observables if the treatment is random based on this (these) observable(s). E.g. if we randomly selected based on age, add age into the model.Internal validity of experimentsAre subjects assigned to control group similar to those assigned to treatment group? -> test whether subjects are similar in pretreatment characteristics ~ F-test (for equality of parameters)

Failure to follow treatment protocol:- Partial compliance: some people in the control group received treatment and vice versa. =>

- if data on random assignment and actual treatment -> use random assignment as an instrument variable for treatment. The exogeneity assumption will always be satisfied, and the relevance assumption will be plausibly satisfied.

- if data on random assignment but no data on actual treatment -> intention to treat

Difference between IV (instrument variable) and ITT (intention to treat)- both use random assignment (ra)- different interpretation: IV- Beta(TSLS) is the mortality effect of receiving a mammogram, ITT- Delta is the mortality effect of being selected into a treatment group.Other threats to Internal Validity:- Attrition. Harmless: if participants move out of the Netherlands -> conditional mean independence is okay as it is unrelated to treatment. Harmful: those that develop late-stage breast cancer are excluded from the experiment -> conditional mean independence no longer holds as related to treatment.- Experimental effects (Hawthorne effect): motivational effects e.g. more self-examinations when in treatment group, placebo-effects => solution: double blind (means that neither the participants nor the researcher have knowledge on who receives the treatment and who doesnt.- Small samples pose a threat to randomization Threats to external validity:- Non-representative sample, program or policy- General Equilibrium effects: treatment causes awareness which lowers potential benefit of treatmentQuasi-Experiments:In these experiments randomness is introduced by variations in individual circumstances that make it appear as if the treatment was randomly assigned. Goal: isolate random variation to obtain average treatment effect.The subjects are sampled at random which results in a representative sample, which makes it typically superior to experiments.The treatment is also assigned at random, which is typically inferior to experiments. (Conditioning on controls is often required)If the treatment is fully determined by quasi-experimental variation: OLS -> control group and treatment group are identical pretreatment.If the treatment is fully determined by quasi-experimental variation -> DiD (Difference-in-difference)DiD: random treatment assignment, but there remain differences between treatment and control groups=> compare changes in outcomes pre- and posttreatment to remove differences in outcomes pretreatment.

If the treatment is fully determined by quasi-experimental variation: SRDSharp regression discontinuity design (SRD): treatment depends entirely on crossing a threshold.If the treatment is partially determined by quasi-experimental variation:- Instrumental variables: treatment non-random, but determinant of treatment is random- Fuzzy regression discontinuity design: same intuition as SRD, but receipt of treatment not the only determinant.Comparison quasi-experiments to experiments:Compared to an experimental design, is . Threat to internal validity for a quasi-experimental design1. Failure to randomize bigger2. Failure to follow treatment protocol equally important 3. Attrition equally important4. Experimental effects smallerSpecific threat to internal validity that only matters for quasi-experiments: instrument validity -> random instrument variable might not suffice for exogeneity (= IV and error are unrelated) assumption to hold.

External validity: similar to experimentsHoorcollege 11 Binary Choice ModelsBinary variables only have 2 values. Usually the 1 value is positive.If the dependent variable can only have 2 values, the interpretation of the estimators (beta) changes ->

It means that 77% of the people that have these characteristics are expected to buy. If one individual has a 77% chance to buy, then 77% of the people that have these characteristics will buy (law of large numbers).However, the linear probability model also yields chances that are outside the interval of 0-100%, which is impossible as we are calculation probabilities => Probit ModelThe Probit-Model indirectly estimates probability via the z-score. The underlying idea behind the probit model is that every linear prediction has a normally distributed error term, which can be used to make the model fit between 0 and 100%.

Note: The probit model predicts z-scores -> the parameters are hard to interpret as it is hard to express the influence of z on the probability. Because of the shape of the z-distribution, the effect sizes are different for different values of z. Usually -> calculate at mean value of the regressor (independent).Logit ModelIndirectly estimate probability, but not through z-score => we use a logistic function This function uses the concept of odds =>

For effect sizes, use the same method as probit.

Hoorcollege 12: Time Series AnalysisTime series data are a series of data of a quantity obtained at successive times, often with equal intervals between them.Reasons to analyse these data:- improvement over time- effectiveness of treatment

More accurately:- improvement over time properties of time series- effectiveness of treatment interaction between time series- forecastingNotation: Xt is the value of X at time tThe error time at time t can also be a time series, in which we cant just apply OLS without correcting for this.If we take Yt is the value of Y at time t, then the value of T at time t-1 is called the first lag. First difference when t is denoted in years: There is also a so-called log-difference: This results in: Annualized growth: if monthly growth is denoted g and t is the number of months =>

So to get from monthly growth to annualized growth: multiply log-difference with 12 To get from quarterly growth to annualized growth => multiply by 4.

Autocorrelation:- between -1 and 1- autocorrelation decreases in strength (=absolute value) as the lag length increases- 5% critical value: Autocorrelated errors:Is the error term at time t correlated with the one at time t-1?

Normally, we dont have access to the actual error terms, only to the residuals (from our linear model).

We can analyse autocorrelation in two ways: visually or through formal testsThe visual way looks at autocorrelograms; partial correlation is the same thing but removes the control variables.The formal test is a Durbin-Watson test:

However, if there is significant autocorrelation between error terms, then we cant use OLS as it assumes the covariance between e(t) and e(t-1) is 0.This results in our model having the right coefficients, but wrong standard errors => Solution: Newey-West heteroscedasticity and autocorrelation consistent (HAC/Newey-West) variance

Hoorcollege 13: Dynamic ModelsModels that allow the dependent variable (Y) to be autocorrelated. These models are called Auto-regressive models. In a first-order autoregressive model AR(1), the dependent variable Y at time t depends on Y at time t-1. An assumption of a particular model is that the coefficient before Y (t-1) is between -1 and 1. (Beta)In the long term, the AR(1) process converges to a certain expectation -> E[y(t)]=E[y(t-1)]E[Y(t)] = Autocorrelation: The correlation between Y(t) and Y(t-1) in AR(1) model is 1, between Y(t) and Y(t-1) is (1)2 and between Y(t) and Y(t-n)= (1)nThere also exist higher order AR models -> for instance; a second-order AR model would look like this:

The sample for an auto-regressive model is always n-1, because the first observation will serve as a pre-sample. The pre-sample has to have the length of a lag.In order to decide which AR(p) model we will use, we take a look at the so-called information criteria:

We will choose the lag length that results in the lowest AIC/SC/HQC. (note: sample should have the same length for all different lags)Normally, the Schwarz criterion will serve as default.We can also look at the influence of lagged independent (X) variables -> This will result in a finite distributed lag (DL) model:

Parameter beta(s) indicates the effect a change in X(t-s) has on Y(t). If Beta(S) is 0.3, this implies that if X(t-s) +1 -> Y(t) +0.3. This is called the distributed-lag weight or s-period delay multiplier.Lets say X permanently increases by 1, what will the effect be s periods from now?

So, in short: n-period effectsDelay-multiplier: the effect of X(t-n) on Y(t) (beta t-n)Period interim multiplier: effect of the sum of X(t-n) up to X(t) on Y(t) (sum of betas)Total multiplier: sum of all betas Never include beta0 in these!!!

Now, we will combine the AR and DL models -> ARDL: Autoregressive distributed lag model

Hoorcollege 14: Moving average models

The main difference between an MA model and an AR model is that in an MA model, the autocorrelation abruptly stops, while in an AR model it dies off gradually. It still is difficult to distinguish between the two. Always choose the one with the lowest BIC/AIC.

We can also combine the two into an ARMA (p,q) model. (p and q are the number of lags)

Maximum likelihoodThe overall likelihood is equal to all the individual likelihoods multiplied, which basically means that for every time we get a 0, we multiply all these chances and then in turn multiply this total with the total multiplied of all the chances of getting a 1 in case we got a 1. (Slides for example).

ForecastingIf we know the value of Y(T), then we can use this value to calculate (T+1). However, this method will result in inaccurate predictions due to forecast errors:

There is also a mean-squared forecast error, which should not be confused with the standard error, because there might be other sources of error (e.g. wrong paraemeters). Lowest MSFE -> best forecasting model.Because we are never sure of forecasts, we will never give exact forecasts, only 95% forecast intervals:

This is due to the forecast uncertainty ->

How do we know which sample to use? Ideally we would choose the one with the lowest MSFE, but we only know this after we made the model and compared the predictions to the future, so we wont need to model anymore to predict! An ideal solution would be to use an extra part to test the out-of-sample performance, but in practice we need something now! => pseudo-out-of-sampleAn extra advantage of this method is that it estimates RMSFE properly as it takes coefficient uncertainty into account.Granger CausalityIf X(t-1) explains (part of) Y(t) then X Granger causes Y. However, we have to be cautious using Granger causality in time-series analysis, as it is possible that both variables are affected by another time-variant variable C. Ideally, we are looking for causality, so we should do an RCT (randomized controlled trial).

Hoorcollege 15: Non-stationarityWe have always assumed stationarity, which means there is a stable (long-term) distribution. This means that there simultaneously are a stable long term expectation of Y and long-term variance. (long term Y converges to a certain value).However, non-stationarity is also possible -> there is a trend!In an AR(1) model, we assumed the autocorrelation coefficient 1 to be between -1 and 1. However, if the 1 happens to be equal to 1, y(t) follows a random walk.

A random walk has a variance increasing in t, which means the series is not mean-reverting. This means that with increasing t, the probability of the series returning to its mean gets smaller.Model: This results in y(t) converging to .This has two implications:1) Because the expectation of the error terms is zero -> E[]= E[] + 0 = 2) Because we take the sum of the errors until t, more errors means more variance -> In this case, only the errors are non-stationary, as the long term expectation of y converges to a stable value.However, lets see what happens if we add a constant:Model: This results in y(t) converging to Again, two implications:1) Because the expectation of the error term is still zero -> (= non-stationary)2) Variance increases as errors increase in same fashion as beforeFormal test for non-stationarity: Dickey-Fuller unit root test-> tests if sum of AR-coefficients is equal to 1.H0: the time series is a random walkHa: the time series is stationary

More extensively:

3 different DF tests:

Depends on what the first differences (= difference between t and t-1 for y)1) If they lie around 0 -> test 12) If they lie around a different mean -> test 23) Trending -> test 3However, the DF test is very sensitive to correlation of the residuals. To control for this, we include lagged first differences ->

This is called the augmented DF test.Solutions to non-stationarity:1) Detrending: include a regressor Xt= t as independent variable to control for the trend part.2) Differencing: Take the first difference of the series (described earlier). Also fixes for other time-varying effectsBreaks:A break occurs when the regression function/model changes over the course of the sample. We can improve the model by adding a dummy which is 1 after and 0 before the break. The formal test is a Chow-Break test, which is an F-test for the joint significance of gamma variables.However, usually you dont know the exact breakpoint, so a Quandt likelihood ratio test is more appropriate. The breakpoint is likely to happen around the maximum F-value. The QLR test tests the null hypothesis that there are no breakpoints.Hoorcollege 16Relaxation of the assumption that the variance of the error term does not depend on t => A high variance is probably strongly correlated with the size of the error terms, so variance modeled as a time series: