24
Statistics 359a Regression Analysis

Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Embed Size (px)

Citation preview

Page 1: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Statistics 359a

Regression Analysis

Page 2: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Necessary Background Knowledge - Statistics

• expectations of sums

• variances of sums

• distributions of sums of normal random variables

• t distribution – assumptions and use

• calculation of confidence intervals

• simple tests of hypotheses and p-values

Page 3: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Necessary Background Knowledge – Linear Algebra

• multiplication of conformable matrices

• transpose of a matrix

• determinant of a square matrix

• inverse of a square matrix

• eigenvalues of a square matrix

• quadratic forms

Page 4: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Origin of Least Squares

Introduction of the metric system and the length of a meter

• 1790 – French National Assembly commissions the French Academy of Sciences to design a simple decimal-based system of weights and measures

• 1791 – French Academy defines the meter to be 10-7 or one ten-millionth of the length of the meridian through Paris from the north pole to the equator.

Page 5: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal
Page 6: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Adrien-Marie Legendre

• Legendre on the French commission in 1792 to determine the length of the meridian quadrant

• measurements of latitude made in 1795

• complex calculations made from the measurements in 1799

• Legendre proposes the method of least squares in 1805 to determine the length of a meter

Page 7: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Data

• old French units of measurement: 1 module = 2 toises• old French to imperial English: 1 toise = 6.395 feet• metric to imperial: 1 meter = 3.2808 feet

Page 8: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

From Spherical Geometry

earth theofy ellipticit the torelated is

modules in

arc an of degree one of length

)28500/(1 D

(90D)quadrant meridian theof length the torelated is

length arc

)cos()sin(2850028500

C

C

C

S

LLLLSS

LL

Page 9: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Including measurement errors, the data and model reduce to:

)014.0()765.4(000279.0

)277.0()914.2(001529.0

)324.0()048.0(002625.0

)027.0()720.2(000475.0

)590.0()912.4(003398.0

5

4

3

2

1

C

C

C

C

C

Page 10: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Solution is:

D = 28497.78 modules90D = 2564800.2 modules = length of the

meridian quadrantTherefore

1 meter = 0.256480 modules = 0.512960 toises = 3.280 feetmodern meter = 3.2808 feet

Page 11: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Origin of the Term “Regression”

• Francis Galton, 1886, ‘Regression towards mediocrity in hereditary stature.’ Journal of the Anthropological Institute, 15: 246 – 263

• See JSTOR under UWO library databases

Page 12: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Data on Heights of Children and Parents

Page 13: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

‘Regression Line’

Page 14: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Theoretical Basis

For X and Y bivariate normal with equal means variances

For > 0

E(Y |X ) < x for x > and

E(Y |X ) > x for x <

)()|( xxXYE

))(1()|( xxxXYE

Page 15: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Example in Data Analysis Through Regression

• Relationship between the price of a violin bow and its attributes such as age, shape and ornamentation on the bow

Page 16: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Violin Bow Example

The following data on violin bows made by W.E. Hill and Sons of London, England are taken from the internet site www.maestronet.com/pricehist.html. The data show the prices of the bows sold at auction at Sotheby’s auction house for the years 1994-97. Also given are data on various factors that may affect the price of the bow. These include: the year of the sale (in case of price inflation or deflation); the year of manufacture (or age – are antique bows more or less valuable?); weight of the bow in grams (do buyers like heavier or lighter bows?); the shape of the bow (is there an aesthetic effect to the price?); presence or absence of ornamental gold; presence or absence of ornamental pearl; and whether the bow has a tortoiseshell frog or an ebony frog. Only the bows for which the approximate year of manufacture has been given are included in the data set. Prices from other auction houses and for other bow makers, as well as violins, are available at the same site, but only Sotheby’s gives the year of manufacture. A Minitab file of the data is at O:\359\bows.mtb.

Price in U.S.

Dollars Year of

Sale

Year the Bow was

Made Weight in

Grams

Shape O=octagonal

R=round Gold

Accessories

Tortoise-shell Frog

Pearl Accessories

1874 1997 1957 59.0 O N N N 2436 1997 1935 62.0 R N N N 7498 1997 1920 62.0 R Y Y N 1142 1996 1945 59.5 O N N Y 1935 1996 1890 57.5 R N N N 1759 1996 1900 56.0 O N N N 5278 1996 1950 57.0 O Y Y Y 4905 1995 1920 58.0 R Y N N 7994 1995 1920 60.0 O Y Y Y 2543 1995 1926 62.5 R N N Y 1769 1994 1935 61.0 R N N N 1592 1994 1960 61.0 R N N Y 3716 1994 1935 55.0 O Y Y Y 2477 1994 1925 59.0 R N N Y 2654 1994 1930 58.0 R N N N 3362 1994 1935 58.0 R N Y Y

Page 17: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Price and Date of Sale

• 1995 seems to be a more expensive year• Is the effect confounded with some other attribute

common to 1995?

1997199619951994

8000

7000

6000

5000

4000

3000

2000

1000

Year Sold

Pric

e

Violin Bows - Price and Sale Date

Page 18: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Price and Year of Manufacture

• Is there anything special about 1920?• Is there a quadratic trend in the data?

1890 1900 1910 1920 1930 1940 1950 1960

1000

2000

3000

4000

5000

6000

7000

8000

Year Made

Pric

e

Violin Bows - Price and Year of Manufacture

Page 19: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Price and Weight of the Bow

• Is there any trend with respect to the weight?

636261605958575655

8000

7000

6000

5000

4000

3000

2000

1000

Weight

Pric

e

Violin Bows - Price and Weight in Grams

Page 20: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Octagonal vs. Round Bows

• No apparent trend

80007000600050004000300020001000

1.0

0.5

0.0

Price

Sha

pe

Violin Bows - Price and Shape

1 = round, 0 = octagonal

Page 21: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

The Gold Standard?

• The presence of gold on a bow generally makes it more expensive

80007000600050004000300020001000

1.0

0.5

0.0

Price

Gold

Violin Bows - Price and Gold Accessories

1 = present, 0 = absent

Page 22: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Tortoise Shell Frogs

• Some evidence of added expense for tortoise shell

80007000600050004000300020001000

1.0

0.5

0.0

Price

Fro

g

Violin Bows - Price and Tortoise Shell Frogs

1 = present, 0 = absent

Page 23: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Price and Pearl Accessories

• No apparent effect

80007000600050004000300020001000

1.0

0.5

0.0

Price

Pe

arl

Violin Bows - Price and Pearl Accessories

1 = present, 0 = absent

Page 24: Statistics 359a Regression Analysis. Necessary Background Knowledge - Statistics expectations of sums variances of sums distributions of sums of normal

Prediction

• Can we use the model built with the current data to predict the future price of a bow

• Example: some 1999 data from auctions

• 1920 bow, 60.5 g., round with gold and pearl accessories - $4098

• 1933 bow, 61 g., octagonal with pearl accessories only - $2421