51
Mendenhall : Introduction To Probability And Statistics Solutions to exercices [§ 1.3 #1.01] a) The student. b) The exam. c) The patient. d) The azalea plant. e) The car. [§ 1.3 #1.02] a) The time to assemble is quantitative. b) The number of students is quantitative. c) The rating of a politician is qualitative. d) The province or territory of residence is qualitative. [§ 1.3 #1.03] a) The population is a discrete variable. b) The weight is a continuous variable. c) The time is a continuous variable. d) The number of consumers is a discrete variable. [§ 1.3 #1.04] a) The number of boating acccidents is a discrete variable. b) The time is a continuous variable. c) The choice of colour is a qualitative variable. d) The number of brothers and sisters is a discrete variable. e) The yield in kilograms is a continuous variable. [§ 1.3 #1.05] a) The vehicule is the experimental unit. b) The variables are : the type (qualitative), the make (qualitative), carpool or not (qualitative), the one-way commute distance (quantitative - continuous) and the age of vehicule (quantitative - discrete). c) This is a multivariate data. [§ 1.3 #1.06] a) This is a population. b) The variable is the age at death. c) This is a quantitative variable. [§ 1.3 #1.07] The population is the voter’s opinions (for or against the candidate) at the time of the election for all persons voting during the election. It depends on time by the fact that voter’s opinions may change frequently during election time or prior to the election time. [§ 1.3 #1.08] a) The variable is the survival time. b) This is a quantitative continuous variable. 1

Mendenhal Solutions Stats

Embed Size (px)

DESCRIPTION

Stat book solution.

Citation preview

  • Mendenhall : Introduction To Probability And Statistics

    Solutions to exercices

    [ 1.3 #1.01] a) The student.b) The exam.

    c) The patient.

    d) The azalea plant.

    e) The car.

    [ 1.3 #1.02] a) The time to assemble is quantitative.b) The number of students is quantitative.

    c) The rating of a politician is qualitative.

    d) The province or territory of residence is qualitative.

    [ 1.3 #1.03] a) The population is a discrete variable.b) The weight is a continuous variable.

    c) The time is a continuous variable.

    d) The number of consumers is a discrete variable.

    [ 1.3 #1.04] a) The number of boating acccidents is a discrete variable.b) The time is a continuous variable.

    c) The choice of colour is a qualitative variable.

    d) The number of brothers and sisters is a discrete variable.

    e) The yield in kilograms is a continuous variable.

    [ 1.3 #1.05] a) The vehicule is the experimental unit.b) The variables are : the type (qualitative), the make (qualitative), carpool or not

    (qualitative), the one-way commute distance (quantitative - continuous) and theage of vehicule (quantitative - discrete).

    c) This is a multivariate data.

    [ 1.3 #1.06] a) This is a population.b) The variable is the age at death.

    c) This is a quantitative variable.

    [ 1.3 #1.07] The population is the voters opinions (for or against the candidate) at the time ofthe election for all persons voting during the election. It depends on time by thefact that voters opinions may change frequently during election time or prior to theelection time.

    [ 1.3 #1.08] a) The variable is the survival time.b) This is a quantitative continuous variable.

    1

  • c) The survival time for all patients having a particular type of cancer and havingundergone a particular type of radiotherapy.

    d) He could select a significative number of people from different ages and differenthealth conditions from this population to get a representative sample.

    e) Since this is only a sample, the result could be wrong. If we sample fromall patients having cancer and radiotherapy, some may still be living and theirsurvival time will not be measurable. Hence, we cannot sample directly from thepopulation of interest, but must arrive at some reasonable alternate populationfrom which to sample.

    [ 1.3 #1.09] a) The variable is the reading score. This is a quantitative discrete variable (sinceit is an interger score).

    b) The student is the experimental unit.

    c) The population is the reading score for all students who could possibly be taughtby this method.

    [ 1.3 #1.10] a) The experimental unit is the people.b) The variable is the category and is qualitative (A, B, C, and D).

    c)

    d)

    2

  • e) The shape of the bar chart changes, but it is not important.

    f) We have : 1450 = 0, 28 for B,2050 = 0, 4 for C, and

    550 = 0, 1 for D.

    g) There is : (5014)50 100 = 3650 100 = 72%.[ 1.3 #1.11] a) The experimental unit is the pair of jeans.

    b) The variable is province and it is qualitative (QC, ON, MB).

    c)

    d)

    3

  • e) 825 = 0, 32.f) Since we have 0, 32 in QC, 925 = 0, 36 in ON, and

    825 = 0, 32 in MB, then ON

    produce the most jeans.

    g) By looking at the bars, we can easily know that ON produces the most jeans.Also, we can conclude that since the bars and the sectors are almost equal insize, the three provinces produced roughly the same number of pairs of jeans.

    [ 1.3 #1.12] a) The opinion about the Islam-West clash for all Canadian adults.b) All the Canadians adults of 2007.

    c) No, we could add the other reason category.

    d) No, since peoples opinion can change greatly in one year, the sample taken in2007 could be not representative in 2008.

    [ 1.3 #1.13] a) No, we could add Morocco, Iran, etc.b) A bar chart is appropriate.

    c)

    4

  • d) Answers may vary.

    [ 1.3 #1.14] a) The variable is the educational attainment.b) This variable is qualitative.

    c) The numbers represent the percentage of the population type for each category.

    d)

    e)

    5

  • f) 6% of the Aboriginal population has a university degree. Since 17, 70% of thenon-Aboriginal population has a university degree, there is an important gapbetween these two population : we have almost 3 times more educated non-Aboriginal people than educated Aboriginal.

    [ 1.3 #1.15] a) Yes. The total percentage of education level in each bar graph is 100%.b) Yes. There is a significant increase (from 39% to 46%) in the post secondary

    education attainment over the years.

    c)

    [ 1.5 #1.16] Range: 1 0, 5 = 0, 5; Min. Class Width: 0,58 = 0, 0625; Conv. Class Width:0, 06Range: 100 0 = 100; Min. Class Width: 1006 = 16, 66667; Conv. Class Width:17

    6

  • Range: 1500 1200 = 300; Min. Class Width: 3009 = 33, 33333; Conv. ClassWidth: 33

    [ 1.5 #1.17] Convenient Starting Point: 0, 5; First Two Classes: 0, 5 to < 0, 56, 0, 56 to< 0, 62Convenient Starting Point: 0; First Two Classes: 0 to < 16, 16 to < 32Convenient Starting Point: 1200; First Two Classes: 1200 to < 1233, 1233 to1266

    [ 1.5 #1.18]1|682|12555788993|11455666777789994|0001223456789995|116676|12

    a) The stem and leaf display has a mound shaped distribution.

    b) From the stem and lead display, the smallest observation is 1.6.

    c) The eight and ninth largest observations are both 4.9.

    [ 1.5 #1.19] a) For 50 values, we should use 7 or 8 class intervals.b)

    7

  • c) There is 2+5+5+5+14+6+650 =4350 of the measurements that are less than 5.1.

    d) There is 14+6+6+2+5+250 =3550 =

    710 of the measurements that are larger than 3.6.

    e) The stem and leaf display has more peaked mound-shaped distribution than therelative frequency histogram because of the smaller number of groups.

    [ 1.5 #1.20] a)3|34555667799994|00223334458

    b)

    3|343|555667799994|0022333444|58

    Yes. With this method, its more easy to know where is the center of distributionand where are the peaks.

    [ 1.5 #1.21] a)

    b) There is 620 =310 = 0.3 measurements greater than 1.

    c) There is 9+520 =1420 =

    710 = 0.7 measurements less than 2.

    8

  • d) The probability is 620 =310 = 0.3.

    e) It is almost symmetric and there is no outliers.

    [ 1.5 #1.22] a)

    b) With a leaf unit of 0.1.c)

    0|000001|000000002|000000

    d) Yes they do. They have the same three bars and the same peaks.

    [ 1.5 #1.23] As seen on the line chart, the time to resolve the maze decrease a lot since day 3. Wecan assume that the rat has learned.

    9

  • [ 1.5 #1.24] a)

    b) The measurement increases during the first year and decreases during the nextthree years. It increases between year 5 and 6, decreases during one year, is

    10

  • stable for one year and finally decreases the next two years.

    [ 1.5 #1.25] a) We could use a frequency histogram :

    We could also use a stem and leaf plot with a leaf unit of 1.0 :

    5|576|1236|5787|27|568|248|66799|134

    b) The distribution is not mound-shaped, but is rather bimodal with two peakscentred around the scores 65 and 85.

    c) This might indicate that the students are divided into two groups : those whounderstand the material and do well on exams, annd those who do not have athorough command of the material.

    [ 1.5 #1.26] a)

    11

  • b) The shape is skewed left.

    c) 3650 =1825

    [ 1.5 #1.27] a) A bar chart or a pie chart would be appropriate, since we have qualitativevariables.

    b) The bar chart :

    12

  • c) The average annual income is growing with the education level.

    [ 1.5 #1.28] a)

    3|0001122233443|5556666677889994|0000111122334|556667885|005|5

    b)

    13

  • c) The graphs look almost the same, but the stem and leaf plot has the advantageto show all the measurements.

    d) 2750e) The probability is 4750 .

    [ 1.5 #1.29] a)

    14

  • b) The pareto chart is more effective, since it is more easy to compare the data.

    c) The number of passengers; also, we could consider the size of the airline. Ina big airline, it could be difficult to serve the passengers; if we have a lot of

    15

  • passengers, it could be more difficult to give a good service.

    [ 1.5 #1.30] a)

    0|223334440|5566667778888991|001111111222333441|66778888992|1232|583|113|64|4|55|2

    b) 2560 =512

    c) 0.2

    [ 1.5 #1.31] a)

    This is a skewed left distribution. We can consider 4, 4.5 and 5.2 as outliers.b) Maybe the supermarket was understaffed that day, or there may have been an

    unusuallu large number of customers in the store.

    16

  • c) The two graphs convey the same information. The stem and leaf plot allows usto actually recreate the actual data set, while the histogram does not.

    [ 1.5 #1.32] a)

    b) The leaf unit is 0.001 :

    26|8927|1127|56928|112

    c) No, but some measurements are identical.

    [ 1.5 #1.33] a) It would be roughly mound-shaped.

    17

  • b) The leaf unit is 1 :

    3|94|4|56667785|1245|5796|016|56697|04

    c) J. Clark, B. Mulroney, A. Meighen, K. Campbell and S. Harper. They are allconservative; maybe the trend of transferring the leadership to younger genera-tions in that party is faster.

    [ 1.5 #1.34] a)

    b) This is skewed to the right.

    c) Its not far enough to the main distribution to be considered as an outlier.

    [ 1.5 #1.35] a)

    18

  • b) 420 =15

    [ 1.5 #1.36] a) This is clearly skewed to the left. There is two outliers : 16.4 and 17.1.

    0|9991|2242|01243|44|055|06|487|13

    b) The stem and leaf plot is clearly better, because we can see every measurement.

    19

  • [ 1.5 #1.37] a) The variable is the number of contaminated Waste Sites in each province. Thisis a discrete variable.

    b)

    1|72|13|44|445|5676|7|88

    The distribution is skewed to the right with three outliers. The unusually largemeasurements are : 205, 235 and 300.

    b) Its very difficult to know. This is the largest provinces, but its also the caseof Nunavut, Yukon and Northwest Territories which have not much waste Sites.Some other variables like population size, numbers of industries, etc. can helpto explain the data.

    [ 1. Supp. #1.38] a) The ethnic origin is a qualitative variable.b) The score is a quantitative variable.

    c) The type of establishment is a qualitative variable.

    d) The mercury concentration is a quantitative variable.

    20

  • [ 1. Supp. #1.39] a) The distribution of non-secured loan sizes might be skewed (a few extremelylarge loans are possible).

    b) The distribution of secured loan sizes is not likely to contain unusually large orsmall values.

    c) Not likely to be skewed.

    d) Not likely to be skewed.

    e) If a package is dropped, it is likely that all the shells will be broken. Hence, afew large number of broken shells is possible. The distribution will be skewed.

    f) If an animal has one tick, he is likely to have more than one. There will besome 0s with uninfected rabbits, and then a larger number of large values.The distribution will not be symmetric.

    [ 1. Supp. #1.40] a) Discrete.b) Continuous.

    c) Discrete.

    d) Discrete.

    e) Continuous.

    [ 1. Supp. #1.41] a) Continuous.b) Continuous.

    c) Discrete.

    d) Discrete.

    e) Discrete.

    [ 1. Supp. #1.42] a) Discrete.b) Continuous.

    c) Continuous.

    d) Discrete.

    [ 1. Supp. #1.43]

    7|898|0179|01244566688

    10|17911|2

    The distribution is symmetric, mound-shaped.

    [ 1. Supp. #1.44] a) The graphical technique used is the bar chart.b) This could be significative, but we have to be careful. Some student may have

    good score for other reasons that have no link with this method : if they knowthat they are on study, they could work harder and get better grades.

    21

  • [ 1.Supp. #1.48] a)

    b) Since Prince Rupert is located west of the Coastal mountain, it is not unusualthat the average rainfall would be very high.

    c) The value x = 1154.66 (Vancouver) does not lie far from the centre of thedistribution. It would not be considered unusually rainy.

    [ 1. Supp. #1.58] a) There is no unusual features. The distribution is roughly mound shaped.

    22

  • b)

    4|395|6|3567|02348|3446789|125

    10|01211|067912|13|1

    With 1.0 as leaf unit.c) Since the data appears in Mercers site, this global investment consulting agency

    may have chosen the cities of its business priorities.

    [ 1. Supp. #1.67] a) Cest approximativement en forme de cloche.b) Il y a quelques donnees aberranets a` 38.2. Probablement que la personne dont

    la temperature etait de 38.22 avait une maladie.c) Oui, mais quelque peu a` droite.

    [ 2.2 #2.01] a)

    23

  • b) La moyenne : x = 0+5+1+1+35 = 2. La mediane : si on met les donnees en ordrecroissant, on obtient : 0, 1, 1, 3, 5; la position de la mediane est (n+1)2 =

    (5+1)2 = 3

    cest-a`-dire la position 3, donc m = 1. Le mode est 1.c) x 6= m, alors la distribution est asymetrique. Ici, m < x, alors cest asymetrique

    a` droite.

    [ 2.2 #2.02] a) x = 3+2+5+6+4+4+3+58 = 4b) Si on met les donnees en ordre croissant, on obtient : 2, 3, 3, 4, 4, 5, 5, 6; la posi-

    tion de la mediane est (n+1)2 =(8+1)

    2 = 4.5, cest-a`-dire entre les positions 4 et5, donc m = 4+42 = 4.

    c) x = m, alors la distribution est symetrique. Le graphique est clairementsymetrique :

    24

  • [ 2.2 #2.03] a) x = 3+5+4+6+10+5+6+9+2+810 = 5.8b) Si on met les donnees en ordre croissant, on obtient : 2, 3, 4, 5, 5, 6, 6, 8, 9, 10; la

    position de la mediane est (n+1)2 =(10+1)

    2 = 5.5, cest-a`-dire entre les positions5 et 6, donc m = 5+62 = 5.5.

    c) Les modes sont 5 et 6. Lensemble est bimodal.

    [ 2.2 #2.04] a) x = 825+883+947+988+1036+1044+1152+1197+1347+140410 = 825b) x = 847+833+1014+983+1126+1121+1157+1174+1396+137410 = 847c) Si le fait davoir une voiture est important, le prix moyen de lassurance est une

    mesure interessante a` considerer pour negocier lachat dune assurance. Cettemesure est egalement interessante pour mesurer une partie du cout de la vieet comparer les provinces entre elles. A` cet egard, dautres mesures moyennesauraient ete interessantes : le cout moyen de leducation post-secondaire, le coutdes habitations, le cout moyen du panier depicerie, etc.

    [ 2.2 #2.05] a)0|0000001|00000000000002|00003|00

    La feuille a pour unite 1.0 et la distribution est asymetrique a` droite.b) Le mode est 1.

    25

  • c) x = 1+1+0+1+3+0+0+1+1+1+2+2+2+1+0+1+1+3+0+1+1+0+2+1+125 = 1.08. On meten ordre les donnees : 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, , 2, 2, 2, 2, 3, 3;(n+1)

    2 =(25+1)

    2 = 13, la mediane est a` la position 13, alors m = 1. Le mode est1.

    d) La moyenne est lege`rement plus a` droite que la mediane, ce qui correspond aufait que la distribution est asymetrique a` droite.

    [ 2.2 #2.06] a)

    1|788882|000012342|663|233|4|4|95|25|6

    La distribution est asymetrique a` droite.

    b) x = 56+52+49+33+32+26+26+24+23+22+21.5+20.7+20.3+20.1+20+18.7+18.4+18.2+18+17.520 =26.82. Pour la mediane, la position est (n+1)2 =

    (20+1)2 = 10.5, cest-a`-dire entre

    la position 10 et 11, donc m = 21.5+222 = 21.75.

    26

  • c) Comme la moyenne est fortement influencee par les donnees aberrantes, la me-diane serait une mesure plus adequate du centre de la distribution.

    [ 2.2 #2.07] Comme le nombre denfants doit etre une variable discre`te, il est evident que 2.5represente le nombre moyen denfants par famille pour toutes les familles aux Etats-Unis durant les annees 30.

    [ 2.2 #2.08] a) La moyenne : x = 0.99+1.12+1.92+0.63+1.23+0.67+0.85+0.69+0.65+0.6+0.53+0.6+1.41+0.6614 =0.8964.

    b) Pour la mediane, on met en ordre les donnees : 0.53, 0.6, 0.6, ..., 1.12, 1.23, 1.41, 1.92;la position est (n+1)2 =

    (14+1)2 = 7.5, cest-a`-dire entre la position 7 et 8, donc

    m = 0.67+0.692 = 0.68.c) Yes, the distribution is skewed to the right, because x > m.

    [ 2.2 #2.09] The distribution of sports salaries will be skewed to the right, because of the veryhigh salaries of some sports figures. Hence, the median salary would be a bettermeasure of centre than the mean.

    [ 2.2 #2.10] a) La moyenne : x = 175+200+190+185+250+190+230+225+240+26510 = 215.b) Pour la mediane, on met en ordre les donnees : 175, 185, 190, 190, 200, 225, 230, 240, 250, 265;

    la position est (n+1)2 =(10+1)

    2 = 5.5, cest-a`-dire entre la position 5 et 6, doncm = 200+2252 = 212.5.

    c) Comme il ny a pas de donnees aberrantes, la moyenne demeure la mesure laplus fiable.

    [ 2.2 #2.11] a) La moyenne : x = 22+16+12+19+40+40+21+10+20+19+4+65+39+20+6+34+23+718 = 23.16667.Pour la medianne, on met en ordre croissant les donnees : 4, 6, 7, 10, 12, 16, ..., 40, 40, 65;la position est (n+1)2 =

    (18+1)2 = 9.5, cest-a`-dire entre la position 9 et 10, donc

    m = 20+202 = 20. Les modes sont 19, 20 et 40.b) x > m. La distribution est donc asymetrique a` droite.

    c) Oui, la distribution est asymetrique a` droite.

    27

  • [ 2.2 #2.12] a) x = 1200+800+1050+750+700+500+600+670+1200+65010 = 812.b) On met en ordre croissant les donnees : 500, 600, 650, 670, 700, 750, 800, 1050, 1200, 1200;

    la position est (n+1)2 =(10+1)

    2 = 5.5, cest-a`-dire entre la position 5 et 6, doncm = 700+7502 = 725.

    c) Ces donnees permettent de faire un achat plus eclaire. La qualite techniquede lappareil, la resolution de lecran, sa durabilite, etc., sont des donnees quiauraient pu egalement etre prises en consideration.

    [ 2.2 #2.13] a) x = 8+6+3+23+2+6+5+4+1+0+2+7+11+5+8+8+1017 = 6.411765.b) On met en ordre croissant les donnees : 0, 1, 2, 2, 3, 4, 5, 5, 6, 6, 7, 8, 8, 8, 10, 11, 23;

    la position est (n+1)2 =(17+1)

    2 = 9, donc m = 6. Le mode est 8.c) x > m (mais faiblement). La distribution est donc lege`rement asymetrique a`

    droite.

    d) Il y a une mesure aberrante, 23. Pour cette raison, la mediane est une valeurplus fiable, car elle est moins sensible aux valeurs aberrantes.

    [ 2.3 #2.14] a) La moyenne de lechantillon : x = 2+1+1+3+55 = 2.4.b) La variance de lechantillon :

    s2 =(xi x)2n 1

    = (2 2.4)2 + (1 2.4)2 + (1 2.4)2 + (3 2.4)2 + (5 2.4)2

    5 1= 2.8

    28

  • c) Lecart-type de lechantillon : s =s2 =

    2.8 = 1.67332.

    d) On prend la formule computationnelle :

    s2 =x2i (

    xi)2n

    n 1

    =(22 + 12 + 12 + 32 + 52) (2+1+1+3+5)25

    5 1=

    40 14454

    = 2.8

    Et s =s2 =

    2.8 = 1.67332. Les resultats sont identiques.

    [ 2.3 #2.15] a) Letendue : R = 4 1 = 3.b) La moyenne de lechantillon : x = 4+1+3+1+3+1+2+28 = 2.125.c) On prend la formule computationnelle pour trouver la variance :

    s2 =x2i (

    xi)2n

    n 1

    =(42 + 12 + 32 + 12 + 32 + 12 + 22 + 22) (4+1+3+1+3+1+2+2)28

    8 1=

    45 28987

    = 1.267857

    Et, pour lecart-type : s =s2 =

    1.267857 = 1.125992.

    d) On calcule la variance de lechantillon :

    s2 =(xi x)2n 1

    = (4 2.125)2 + (1 2.125)2 + (3 2.125)2 + ...+ (2 2.125)2 + (2 2.125)2

    8 1= 1.267857

    Lecart-type de lechantillon : s =s2 =

    1.267857 = 1.125992. Les valeurs

    sont identiques.

    [ 2.3 #2.16] a) Letendue : R = 6 1 = 5.b) La moyenne de lechantillon : x = 3+1+5+6+4+4+3+58 = 3.875.

    29

  • c) La variance de lechantillon :

    s2 =(xi x)2n 1

    = 2(3 3.875)2 + (1 3.875)2 + 2(5 3.875)2 + (6 3.875)2 + 2(4 3.875)2

    8 1= 2.410714

    Lecart-type de lechantillon : s =s2 =

    2.410714 = 1.552647.

    On peut aussi calculer la variance de lechantillon avec la formule computation-nelle :

    s2 =x2i (

    xi)2n

    n 1

    =(32 + 12 + 52 + 62 + 42 + 42 + 32 + 52) (3+1+5+6+4+4+3+5)28

    8 1=

    137 96187

    = 2.410714.

    c) 51.552647 = 3.220307. Alors letendue est approximativement 3 fois lecart-type.

    [ 2.3 #2.17] a) Letendue : R = 2.39 1.28 = 1.11.b) On calcule la variance :

    s2 =x2i (

    xi)2n

    n 1

    =(1.282 + 2.392 + 1.52 + 1.882 + 1.512) (1.28+2.39+1.5+1.88+1.51)25

    4

    =15.415 73.27365

    4= 0.19007

    Alors lecart-type est : s =s2 =

    0.19007 = 0.4359702

    c) 1.110.4359702 = 2.546046. Alors letendue est approximativement 2.5 fois lecart-type.

    [ 2.3 #2.18] a) Letendue : R = 312.4 165.12 = 147.28.b) La moyenne : x = 204.94+180+178.23+176.43+...+312.4+238.66+225.47+222.2312 = 227.2167.

    30

  • c) On calcule dabord la variance :

    s2 =x2i (

    xi)2n

    n 1=

    647847.1 74343481211

    = 2574.373

    Alors lecart-type est : s =s2 =

    2574.373 = 50.73828

    [ 2.5 #2.21] a) x = 50, s = 10. Linterval est defini en tant que x = ks; on a 50 k10 = 40 et50 + k10 = 60, alors k = 1, donc la proportion des donnees entre 40 et 60, seloncette re`gle, est de 68%.

    b) x = 50, s = 10. Linterval est defini en tant que x = ks; on a 50 k10 = 30 et50 + k10 = 70, alors k = 2, donc la proportion des donnees entre 30 et 70, seloncette re`gle, est de 95%.

    c) Comme 68% des donnees sont entre 40 and 60, alors 34% des donnees sont entre50 et 60. Comme 95% des donnees sont entre 30 et 70, alors 47.5% des donneessont entre 30 and 50. On conclut que 34% + 47.5% = 81.5% des donnees sontentre 30 et 60.

    d) Comme 5% des donnees sont a` lexterieur de la distribution entre 30 et 70, il ya 2.5% de la distribution qui se trouve entre 0 et 30. Il y a alors 81.5%+2.5% =84% de la distribution qui se trouve entre 0 et 60. On conclut que la probabiliteque la distribution se situe au-dela` de 60 est de 16%.

    [ 2.5 #2.35] a) Avec une feuille dunite 1.0 :

    0|91|162|3353|184|00165|12456|27|138|79|2

    On voit que la distribution est relativement en forme de cloche.

    b) La moyenne : x = 46+73+41+25+51+52+31+23+55+62+16+9+92+40+38+71+54+11+87+40+2321 =

    31

  • 44.7619. Pour obtenir lecart-type, on calcule dabord la variance :

    s2 =(xi x)2n 1

    = (46 44.7691)2 + (73 44.7619)2 + (41 44.7619)2 + ...+ (23 44.7619)

    21 1= 547.9905

    Lecart-type : s =s2 =

    547.9905 = 23.4092.

    c) Comme la distribution est relativement en forme de cloche, on peut utiliser lare`gle empirique : pour un interval defini en tant que x2s, cest-a`-dire allant de44.76192(23.4092) = 2.0565 a` 44.7619 + 2(23.4092) = 91.5803, on sait quaumoins 95% de la distribution sera situee dans cet interval. On peut verifier quily a bien 2021 = 0.952381 = 95% des donnees dans cet intervale (toutes les donneessauf 92).

    [ 2.5 #2.39] a) La moyenne : x = 0(10)+1(5)+2(3)+3(2)+4(1)+6(1)+9(1)+10(1)24 = 1.916667; autrementdit, on a x = 0+0+0+0+0+0+0+0+0+0+1+1+1+1+1+2+2+2+3+3+...+9+1024 = 1.916667.Pour obtenir lecart-type, on calcule dabord la variance :

    s2 =(xi x)2n 1

    = 10(0 1.916667]2 + 5(1 1.916667)2 + ...+ (9 1.9166667)2 + (10 1.9166667)2

    24 1= 7.818841

    Lecart-type : s =s2 =

    7.818841 = 2.796219.

    b) x s = [1.916667 2.796219, 1.916667 + 2.796219] = [0.879552, 4.712886], x2s = [1.9166672(2.796219), 1.916667+2(2.796219)] = [3.675771, 7.509105] etx3s = [1.9166673(2.796219), 1.916667+3(2.796219)] = [6.47199, 10.30532]

    c) 21/24 = 0.875 = 87.5% des donnees sont dans le premier interval (ce qui estconforme au theore`me de Tchebysheff qui stipule quil doit y en avoir au moins0%, et la re`gle empirique, qui en prevoit 68%) , 22/24 = 0.9166667 = 91.66667%dans le second (ce qui est conforme au theore`me de Tchebysheff qui stipulequil doit y en avoir au moins 75%, et la re`gle empirique, qui en prevoit 95%),et 24/24 = 1 = 100% des donnees dans le troisie`me (ce qui est conforme autheore`me de Tchebysheff qui stipule quil doit y en avoir au moins 89%, et lare`gle empirique, qui en prevoit 99, 7%).

    [ 2.7 #2.42] a) On met en ordre les donnees : 0, 1, 3, 4, 4, 5, 6, 6, 6, 7, 7, 8; la position de la me-diane est (n+1)2 =

    (12+1)2 = 6.5, cest-a`-dire entre la position 6 et 7, donc m =

    5+62 = 5.5. On a la position de Q1 =

    (n+1)4 = 3.25, alors Q1 = 3 +

    (43)4 = 3.25.

    On a la position de Q3 = 3(n+1)4 = 9.75, alors Q3 = 6+3(76)

    4 = 6.75. On a doncIQR = Q3 Q1 = 6.75 3.25 = 3.5 et les cinq nombres : Min = 0, Q1 = 3.25,m = 5.5, Q3 = 6.75 et Max = 8.

    32

  • b) La moyenne : x = 8+7+1+4+6+6+4+5+7+3+012 = 4.75. Pour obtenir lecart-type,on calcule dabord la variance :

    s2 =(xi x)2n 1

    = (0 4.75)2 + (1 4.75)2 + (3 4.75)2 + ...+ (8 4.75)

    12 1= 6.022727

    Lecart-type : s =s2 =

    x2i

    (

    xi)2

    nn1 =

    x2inx2n1 =

    6.022727 =

    2.454124.c) Avec une courbe normale en cloche, on se sert souvent de la cote z pour verifier

    si une donnee est aberrante (situee dans la zone qui, selon la re`gle empirique,est situee dans x 3s). z score = xxs . Pour la plus petite valeur, on azscore = 04.752.454124 = 1.935518. Pour la plus grande valeur, on a : zscore =84.752.454124 = 1.324301. Comme aucune valeur absolue de ces deux cotes z nexce`de2, ces valeurs ne sont pas exceptionnellement elevees ou faibles.

    [ 2.7 #2.43] On met en ordre les donnees : 0, 1, 5, 6, 7, 8, 9, 10, 12, 12, 13, 14, 16, 19, 19; la po-sition de la mediane est (15+1)2 = 8, cest-a`-dire a` la position 8, donc m = 10.On a la position de Q1 = (n+1)4 = 4, alors Q1 = 6. On a la position deQ3 = 3(n+1)4 = 12, alors Q3 = 14. On a IQR = Q3Q1 = 8 et les cinq nombres: Min = 0, Q1 = 6, m = 10, Q3 = 14 et Max = 19.

    [ 2.7 #2.45] On met en ordre les donnees : 2, 3, 4, 5, 6, 6, 6, 7, 8, 9, 9, 10, 22; la position de lamediane est (13+1)2 = 7, cest-a`-dire a` la position 7, donc m = 6. On a laposition de Q1 = (n+1)4 = 3.5, alors Q1 = 4 +

    (54)4 = 4.25. On a la position de

    Q3 = 3(n+1)4 = 10.5, alors Q3 = 9 +3(99)

    4 = 9. On a IQR = Q3 Q1 = 4.75et les cinq nombres : Min = 2, Q1 = 4.25, m = 6, Q3 = 9 et Max = 22. Laplus petite valeur adjacente est Q1 1.5(IQR) = 4.25 1.5(4.75) = 2.875 etla plus grande valeur adjacente est Q3 + 1.5(IQR) = 9 + 1.5(4.75) = 16.125. Onvoit clairement que 22 est une donnee aberrante.

    33

  • [ 2.7 #2.46] Il y a 69% des candidats qui ont fait moins bien que moi, tandis que 31% des candidatsont fait mieux.

    [ 2.7 #2.49] a) On met en ordre les donnees pour Mario Lemieux : 1, 6, 7, 17, 19, 28, 35, 44, 45, 50, 54, 69, 69, 70, 85;la position de la mediane est (15+1)2 = 8, donc m = 44. On a la position deQ1 = (n+1)4 = 4, alors Q1 = 17. On a la position de Q3 =

    3(n+1)4 = 12, alors

    Q3 = 69. On a donc les cinq nombres : Min = 1, Q1 = 17, m = 44, Q3 = 69,Max = 85.

    On met en ordre les donnees pour Brett Hull : 0, 1, 25, 29, 30, 32, 37, 39, 41, 42, 54, 57, 70, 72, 86;on a m = 39, Q1 = 29 et Q3 = 57. On a donc les cinq nombres : Min = 0,Q1 = 29, m = 39, Q3 = 57 et Max = 86.

    b) Il ny a pas de donnees aberrantes :

    34

  • c) La distribution semble assez symetrique pour Lemieux, tandis quelle est davan-tage asymetrique pour Hull. La distribution de Lemieux varie beaucoup plus,car elle a un IQR plus eleve, mais elle a aussi une mediane plus elevee pour lenombre de buts.

    [ 2.7 #2.50] On a m = 100. Il ny a pas de valeurs aberrantes a` gauche, mais deux a` droite (169et 208). La distribution est symetrique (car la mediane est au centre de IQR); si lamediane avait ete a` droite, on aurait eu une distribution asymetrique a` gauche.

    [ 2.7 #2.54] a) La position de Q1 = n+14 = 3.5, alors Q1 = 97+982 = 97.5. La position deQ3 = 3n+14 = 10.5, alors Q1 =

    106+1072 = 106.5

    b) IQR = Q3 Q1 = 106.5 97.5 = 9c) La cloture de gauche : Q1 1.5(IQR) = 97.5 1.5(9) = 84.d) La cloture de droite : Q3 + 1.5(IQR) = 106.5 + 1.5(9) = 120.e) On prend la mediane dont la position est n+12 = 7: m = 104.

    35

  • f) Oui, 126 est une donnee aberrante.g) On a z score = xxs . On trouve x = 105.53 et s = 8.42. On a donc zmin =93103.53

    8.42=1.25 et zmax =126103.53

    8.42 = 2.67. On trouve ainsi que la valeur 126 est unedonnee aberrante, car la valeur absolue de sa cote z est superieure a` 2.

    h) On a zhamilton = 98103.538.42 = 0.66. Il sagit dune valeur normale, car sa valeurabsolue est inferieure a` 2.

    i) La ville de Victoria serait le bon choix, car comme le prix de lessence y esteleve, on doit moins en consommer, ce qui nuit moins a` lenvironnement.

    [ 2.7 #2.55] a) x = 26.21429, s = 1.251373.b) x = 26.14286, s = 2.413333.c) Ils ont en general le meme nombre de raisins par bote, mais dans la marque

    Sunmaid, il y a certaines botes qui ont beaucoup plus de raisins que la moyenne.Cest ce quindique lecart-type.

    [ 2.7 #2.63] Pour n = 400, la moyenne est x = 600 et la variance est s2 = 4900. On a donc lecarttype s =

    s2 = 70. Comme la distribution est en forme de cloche, on peut presumer,

    selon la re`gle empirique, que 68% des donnees se trouvent dans lintervalle xs, cest-a`-dire (530, 670), donc 400(0.68) = 272 resultats. On sait aussi que 95% des donneesse trouvent dans lintervalle x 2s, cest-a`-dire (460, 740), donc 400(0.95) = 380resultats.

    [ 2.7 #2.64] a) x =

    xin = 6.85. s =

    s2 =

    x2i

    (

    xi)2

    nn1 = 1.008299

    b) zmax = xxs =8.56.851.008299 = 1.636419. Cette valeur est habituelle et nest pas

    aberrante (car sa valeur absolue est inferieure a` 2.)

    36

  • c) On cherche le mode : 7.d) On met en ordre les donnees : 5, 6, 6, 6.75, 7, 7, 7, 7.25, 8, 8.50. On trouve la

    position de la mediane : (n+1)2 = 5.5, alors m = 7. On trouve la position deQ1 = (n+1)4 = 2.75, alors Q1 = 6. On trouve la position de Q3 =

    3(n+1)4 = 8.75,

    alors Q3 = 7.25 + 0.25(87.25) = 7.4375. On a IQR = Q3Q1 = 7.43756 =1.4375. On trouve la cloture de gauche : Q1 1.5(1.4375) = 3.84375. Ontrouve la cloture de droite : Q3 + 1.5(1.4375) = 9.59375. Il ny a aucune donneeaberrante, comme le laissait predire la cote z. Dans ce cas, les clotures peuventetre les valeurs minimales et maximales :

    [ 2.7 #2.78] Brett Hull a une distribution assez symetrique; il y a une donnee aberrante, illustrantune saison avec un nombre de but exceptionnel. Les distributions de Mario Lemieuxet de Bobby Hull sont asymetriques a` gauche, tandis que celle de Wayne Gretzky estasymetrique a` droite. La variabilite est sensiblement la meme entre les deux Hull,celle de Gretzky et Lemieux sont pratiquement identique et superieur a` celle des Hull.Gretzky a la cloture de droite la plus eloignee, ce qui laisse penser quil a un nombrede buts tre`s eleve lors dune saison exceptionnelle. Lemieux est celui avec le IQR leplus eleve et la mediane la plus elevee; cest Brett Hull qui a la plus faible mediane(autour de 38) alors que les autres ont une mediane qui se situe approximativemententre 41 et 44.

    [ 3.2 #3.05] a) La population dinteret est lopinion des parents et des enfants a` propos du tempslibre dont dispose les enfants. Lechantillon, cest 198 opinions des parents et200 opinions des enfants a` propos du temps libre dont dispose les enfants.

    b) Il sagit de la variable qualitative opinion. Les donnees sont bivariees dans lamesure ou`, pour chaque personne, on lui demande sil est un adulte ou un enfant

    37

  • et sa reponse a` la question (juste assez, pas assez, trop, ne sait pas)

    c) Chaque entree dans le tableau represente le nombre de personnes qui entre danschaque categorie dopinions.

    d)

    38

  • e) On pourrait utiliser un diagramme comparatif a` barres ou un diagramme enpiles. Comme les donnees representent la frequence relative, il est plus represen-tatif dutiliser les diagrammes circulaires.

    [ 3.4 #3.11] a)

    b) Il y a une relation approximativement lineaire.

    c) r = sxysxsy . On trouve dabord la covariance : sxy =

    xy

    x

    y

    nn1 =

    126 193765 =

    1.7666. On trouve ensuite les ecarts-type : sx = 1.47196 et sy = 1.32916. On adonc r = 1.76661.471961.32916 = 0.9029526.

    d) On trouve dabord x = 3.166667 et y = 6.166667. Ensuite, on trouve b =r sysx = 0.9029526 1.329161.47196 = 0.815354. On trouve a = y bx = 6.166667 0.815354(3.166667) = 3.584712. Lequation est donc approximativement : y =3.58 + 0.815x. On voit quelle correspond aux donnees.

    39

  • [ 3.4 #3.15] a) On doit trouver a = y bx. y = 73.61167, x = 2.833333. On trouve b = r sysx .On a sy = 35.79161 et sx = 1.47196. On trouve la covariance : sxy = 51.62233et r = sxysxsy =

    51.6223335.791611.47196 = 0.9798517. Donc b = 0.9798517 35.791611.47196 =

    23.82569 et donc a = 73.61167 (23.82569 2.833333) = 6.10556. Lequation estapproximativement y = 6.106 + 23.826x.

    b) Lequation est assez fide`le aux informations :

    40

  • c) Pour x = 6, on a y = 6.106 + 23.826(6) = 149.06. Il est risque destimer unevaleur de y a` partir dune valeur de x qui se situe a` lexterieur de la zone decollecte des donnees.

    [ 3.4 #3.17] a)

    41

  • b) Il ny a pas vraiment de tendance prononcee, mais on remarque quil y a unecorrelation assez positive entre le pre-test et le post-test.

    c) r = sxysxsy . On a sxy = 78.07143, sx = 9.28644 et sy = 11.05613. On a doncr = 78.071439,2864411.05613 = 0.7603958. Cest une valeur positive, ce qui confirme notreinterpretation du graphique.

    [ 3.4 #3.19] a) La variable dependance est la taille de la television x, tandis que le prix de latelevision y est la variable dependante. En effet, on veut predire le prixa` partirde la taille.

    b) La relation est somme toute assez lineaire, quoiquelle est assez curviligne.

    42

  • [ 3.4 #3.20] a) r = sxysxsy . On a sxy = 1313.778, sx = 6.219146 et sy = 250.1022. On a doncr = 1313.7786.219146250.1022 = 0.8446014. Cest une valeur fortement positive. Ceciconfirme que la relation est lineaire.

    b) On a b = r sysx = 0.8446014250.10226.219146 = 33.96554. Pour trouver a = y bx, il faut

    y = 812 et x = 43.3, donc a = 812 33.96554(43.3) = 658.7079. On a donclequation : y = 658.71 + 33.97x.

    c) Avec y = 658.71 + 33.97(46), on trouve y = 903.91.d) Faire une extrapolation, cest-a`-dire utiliser des donnees qui vont au-dela` de

    letendue de la recherche, cela risque de fournir des resultats qui ne sont pasfiables.

    [ 4.3 #4.01] a) = {E1, E2, E3, E4, E5, E6}. On a : E1 = {Observer un 1}, E2 = {Observerun 2}, E3 = {Observer un 3}, E4 = {Observer un 4}, E5 = {Observer un 5} etE6 = {Observer un 6}.

    b) A = {E2}, B = {E2, E4, E6}, C = {E3, E4, E5, E6}, D = {E2}, E = {E2, E4, E6},F =

    c) P (Ei) = 16d) On a P (A) = P (D) = 16 , P (B) = P (E) = P (E2)+P (E4)+P (E6) =

    12 , P (C) =

    P (E3) + P (E4) + P (E5) + P (E6) = 23 , P (F ) = 0. On trouve :P (Ei) = 1.

    [ 4.3 #4.02] a) = {E1, E2, E3, E4, E5}. Ei P (Ei) = P (E1) + P (E2) + P (E3) + P (E4) +P (E5) = 1. Alors 3P (E5) = 0.3 et P (E5) = 0.1. Et P (E4) = 0.2

    b) P (A) = P (E1) + P (E2) + P (E3) = 0.15 + 0.4 + 0.2 = 0.75, P (B) = P (E2) +P (E3) = 0.15 + 0.4 = 0.55

    43

  • c) (A B) = {E1, E2, E3, E4}d) (A B) = {E3}

    [ 4.3 #4.03] = {E1, E2, ..., E10}. P (E1) = 0.45, P (E2) = 0.453 = 0.15. On aP (Ei) =

    0.6 + P (E3) + P (E4) + ...+ P (E10) = 1. Donc, P (E3) + P (E4) + ...+ P (E10) = 0.4.Comme ces derniers sont equiprobables, on a P (E3) = P (E4) = P (E5) = P (E6) =P (E7) = P (E8) = P (E9) = P (E10) = 0.48 = 0.05.

    [ 4.3 #4.04] = {E1, E2, E3, E4}. On a E1 = {HH} = 0.49, E2 = {HM}, E3 = {MH} = 0.21et E4 = {MM} = 0.09.

    a) Sachant queP (Ei) = 1, on a P (E2) = 1 0.49 0.21 0.09 = 0.21.

    b) A = {E1, E2, E3}. P (A) = {la probabilite que le joueur atteigne la cible aumoins une fois} = P (E1) + P (E2) + P (E3) = 0.49 + 0.21 + 0.21 = 0.91.

    [ 4.3 #4.05] a) = {E1, E2, ..., E24}. On a : E1 = {DNQ}, E2 = {DNL}, E3 = {DQN},E4 = {DQL}, E5 = {DLN}, E6 = {DLQ}, E7 = {NDQ}, E8 = {NDL},E9 = {NQL}, E10 = {NQD}, E11 = {NLQ}, E12 = {DLN}, E13 = {QDN},E14 = {QDL}, E15 = {QNL}, E16 = {QND}, E17 = {QLN}, E18 = {QLD},E19 = {LDN}, E20 = {LDQ}, E21 = {LNQ}, E22 = {LND}, E23 = {LQN},E24 = {LQD}.

    b) A = {E2, E4, E5, E6, E8, E9, E11, E12, E14, E15, E17, E18, E19, E20, E21, E22, E23, E24}.Et P (A) = 1824 =

    34 .

    c) Si B = {La somme des piges egale ou depasse 1.10 $}, on voit que cela nestpossible que sil y a un dollars dans la pige, donc P (B) = P (A) = 34 .

    [ 4.3 #4.07] = {E1, E2, ..., E20}. On a : E1 = {R1R2}, E2 = {R1R3}, E3 = {R1J1}, E4 ={R1J2}, E5 = {R2R1}, E6 = {R2R3}, E7 = {R2J1}, E8 = {R2J2}, E9 = {R3R1},E10 = {R3R2}, E11 = {R3J1}, E12 = {R3J2}, E13 = {J1R1}, E14 = {J1R2}, E15 ={J1R3}, E16 = {J1J2}, E17 = {J2R1}, E18 = {J2R2}, E19 = {J2R3}, E20 = {J2J1}.

    [ 4.3 #4.08] E21 = {R1R1}, E17 = {R2R2}, E18 = {R3R3}, E19 = {J1J1}, E20 = {J2J2}.[ 4.3 #4.09] = {E1, E2, E3, E4}. E1 = {BeUt}, E2 = {BeNUt}, E3 = {SuperfUt}, E4 =

    {SuperfNUt}.a) On a : P (E1) = 0.44, P (E2) = 0.14, P (E3) = 0.02 et P (E4) = 0.4. On

    definit A = {Ladulte a besoin de lunettes} = {E1, E2}. On a donc : P (A) =P (E1) + P (E2) = 0.44 + 0.14 = 0.58.

    b) On cherche P (E2) = 0.14.c) C = {Ladulte utilise des lunettes, quil en ait ou non besoin} = {E1, E3}. On

    a donc P (C) = P (E1) + P (E3) = 0.44 + 0.02 = 0.46.

    [ 4.3 #4.14] a) Il y a 4 3 2 1 = 24 evenements dans = {JBED, JBDE, JEBD, JEDB,JDEB, JDBE,BJED,BJDE,BDJE,BDEJ,BEJD,BEDJ,EJBD,EJDB,EBJD,EBDJ,EDBJ,EDJD,DJBE,DJEB,DBJE,DBEJ,DEBJ,DEJB}

    b) P (Ei) = 124 .c) A = {Dave gagne la course} = {DJBE, DJEB, DBJE, DBEJ, DEBJ, DEJB}.

    P (A) = 624 =14 .

    44

  • d) B = {Dave gagne et John est second} = {DJBE, DJEB}. P (B) = 224 = 112 .e) C = {Ed arrive dernier}. Or, P (C) = P (A) = 14 .

    [ 4.3 #4.15] = {E1, E2, E3, E4}. E1 = {AnOn}, E2 = {AmOn}, E3 = {AnOv}, E4 = {AmOv}.a) On a P (E1) = 140300 , P (E2) =

    6300 , P (E3) =

    3300 , P (E4) =

    151300 . A = {E1}. On a

    P (A) = 140300 = 0.467.b) B = {La mouche a des yeux vermillons} = {E3, E4} On a P (B) = P (E3) +

    P (E4) = 3+151300 =154300 = 0.513.

    c) C = {La mouche a des yeux vermillons ou des ailes miniatures} = {E2, E3, E4}.On a P (C) = P (E2) + P (E3) + P (E4) = 6+3+151300 =

    160300 = 0.533

    [ 4.6 #4.41] a) Pour P (A) = 0.3 et P (B) = 0.4 et A et B disjoints, on a : P (AB) = P (A|B) =0 et P (A B) = P (A) + P (B) = 0.3 + 0.4 = 0.7.

    b) Pour P (A) = 0.3 et P (B) = 0.4 et A et B independants, on a : P (A B) =P (A)P (B) = 0.3 0.4 = 0.12, P (A B) = P (A) + P (B) P (A B) = 0.3 +0.4 0.12 = 0.58

    c) Si P (A) = 0.1, P (B) = 0.5 et P (A|B) = 0.1 et que P (A) = P (A|B), cest queles evenements A et B sont independants. Dans ce cas, on trouve : P (AB) =P (A)P (B) = 0.1 0.5 = 0.05 et P (A B) = P (A) + P (B) P (A B) =0.1 + 0.5 0.05 = 0.55.

    d) Si P (A B) = 0, alors que P (A) = 0.2 et P (B) = 0.5, cest que les evenementsA et B sont disjoints. On trouve : P (A B) = P (A) + P (B) = 0.2 + 0.5 = 0.7et P (A|B) = 0.

    [ 4.6 #4.42] Chaque evenement a une probabilite equivalente de 1/5.a) On a : Ac = {E2, E4, E5}. P (Ac) = P (E1) + P (E2) + P (E3) = 35 = 0.6b) On a : (A B) = {E1}. P (A B) = P (E1) = 15c) On a : (B C) = {E4}. P (B C) = P (E4) = 15d) On a : (A B) = {E1, E2, E3, E4, E5}. P (A B) = P (E1) + P (E2) + P (E3) +

    P (E4) + P (E5) = 1e) On a : (B|C) = {E4}. On a P (B|C) = P (E4) = 12 (car il y a une chance sur

    deux que E4 apparaisse dans C).

    f) On a : (A|B) = {E1}. P (A|B) = P (E1) = 14 (car il y a une chance sur quatreque E1 apparaisse dans B).

    g) On a : (A B C) = {E1, E2, E3, E4, E5}. P (A B C) = P (E1) + P (E2) +P (E3) + P (E4) + P (E5) = 1

    h) On a : (A B)c = {E2, E3, E4, E5}. P ((A B)c) = P (E2) + P (E3) + P (E4) +P (E5) = 45

    [ 4.6 #4.43] a) P (Ac) = 1 P (A) = 1 0.4 = 0.6b) P ((A B)c) = 1 P (A B) = 1 15 = 45

    Les resultats sont les memes.

    [ 4.6 #4.44] a) P (A|B) = P (AB)P (B) =1545

    = 14

    45

  • b) P (B|C) = P (BC)P (C) =1525

    = 12Les resultats sont les memes.

    [ 4.6 #4.45] a) P (A B) = P (A) + P (B) P (A B) = 25 + 45 15 = 1b) P (A B) = P (A|B)P (B) = 14 45 = 15c) P (B C) = P (B|C)P (C) = 12 25 = 15

    Les resultats sont les memes.

    [ 4.6 #4.46] a) Ici, P (A|B) 6= P (A), car 14 6= 25 . Aussi P (AB) 6= P (A)P (B), car 15 6= 25 45 . Onconclut que A et B ne sont pas des evenements independants, mais dependants.

    b) Ils ne sont pas disjoints, car P (A B) 6= 0 et P (A|B) 6= 0.[ 4.6 #4.47] = {E1, E2, E3, E4, E5, E6} et E1 = {1}, E2 = {2}, E3 = {3}, E4 = {4}, E5 = {5}

    et E6 = {6}. On definit : A = {E1, E2, E3}, B = {E1, E2} et C = {E4, E5, E6}.Comme les evenements elementaires sont equiprobables, on trouve : P (A) = 12 ,P (B) = 13 et P (C) =

    12 .

    a) = Ei = 1.b) On a : (A|B) = {E1, E2} et P (A|B) = 1. Autre solution : P (A|B) = P (AB)P (B) =

    1313

    = 1

    c) P (B) = P (E1) + P (E2) = 13d) On a : (A B C) = , alors P (A B C) = 0e) On a : (A B) = {E1, E2} = 13 . Autre solution : (A B) = P (B)(A|B) = 13f) On a : (A C) = , alors P (A C) = 0g) On a : (B C) = , alors P (B C) = 0h) On a : (A C) = {E1, E2, E3, E4, E5, E6}, alors P (A C) = 1.i) On a : (B C) = {E1, E2, E4, E5, E6}, alors P (B C) = 56 .

    [ 4.6 #4.48] a) P (A|B) 6= P (A), car 1 6= 12 . Aussi P (A B) 6= P (A)P (B), car 13 6= 16 . Onconclut que A et B ne sont pas des evenements independants, mais dependants.Ils ne sont pas non plus disjoints, car (A B) 6= 0 et (A|B) 6= 0.

    b) On a P (A C) = 0. Ils sont disjoints et donc dependants.[ 4.6 #4.49] a) P (A B) = P (A)P (B) = 0.4 0.2 = 0.08

    b) P (A B) = P (A) + P (B) P (A)P (B) = 0.4 + 0.2 0.08 = 0.52[ 4.6 #4.50] a) P (A B) = 0

    b) P (A B) = P (A) + P (B) = 0.3 + 0.5 = 0.8

    [ 4.6 #4.51] a) P (B|A) = P (BA)P (A) = 0.120.4 = 0.3b) Non, car P (A B) 6= 0.c) P (B|A) = P (B), alors A et B sont des evenements independants. On peut

    egalement verifier que : P (A B) = P (A)P (B).

    46

  • [ 4.6 #4.52] a) P (A) = P (A B) + P (A Bc) = 0.34 + 0.15 = 0.49b) P (B) = P (A B) + P (B Ac) = 0.34 + 0.46 = 0.8c) P (A B) = 0.34d) P (A B) = P (A) + P (B) P (A B) = 0.49 + 0.8 0.34 = 0.95e) P (A|B) = P (AB)P (B) = 0.340.8 = 0.425f) P (B|A) = P (BA)P (A) = 0.340.49 = 0.6938776

    [ 4.6 #4.53] a) Non, car P (A B) 6= 0 et P (A|B) 6= 0.b) Non, car P (A) 6= P (A|B). On a aussi que P (A B) 6= P (A)P (B).

    [ 4.6 #4.54] On a A = {la personne utilise la drogue}, B = {le test reve`le que oui}. On aP (B|A) = P (Bc|Ac) = 0.98 et P (Bc|A) = (B|Ac) = 0.02.

    a) On a C = {Le non-utilisateur de drogue echoue a` se faire reperer aux deuxtests.}. On cherche : P (C) = P (B|Ac) P (B|Ac). Comme les tests sontindependants, on a P (C) = P (B|Ac) P (B|Ac) = P (B|Ac)P (B|Ac) = 0.02 0.02 = 0.0004.

    b) On a D = {Lutilisateur de drogue est repere a` au moins un des deux tests.}.On cherche : P (D) = (P (Bc|A) P (B|A)) (P (B|A) P (Bc|A)) (P (B|A) P (B|A)). Comme les tests sont independants et que ces trois evenementssont disjoints, on trouve : P (D) = (P (Bc|A)P (B|A)) + (P (B|A)P (Bc|A)) +(P (B|A)P (B|A)) = (0.02)(0.98) + (0.98)(0.02) + (0.98)(0.98) = 0.9996

    c) On a E = {Lutilisateur de drogue passe les deux tests sans se faire reperer.}.On cherche : P (E) = P (B|A) P (B|A). Comme les tests sont independants,on a P (B|A) P (B|A) = P (Bc|A)P (Bc|A) = 0.02 0.02 = 0.0004

    [ 4.6 #4.60] On a A = {Choisir Tim Horton}, B = {Choisir Starbucks} et C = {Acheter un cafesans cafeine}. On a P (A) = 0.7, P (B) = 0.3 et P (C) = 0.6.

    a) On cherche P (A C) = P (A)P (C) = 0.7 0.6 = 0.42b) Les evenements sont independants, car il est specifie que le choix de cafe ne

    depend pas de letablissement choisi.

    c) On a P (BC) = P (B)P (C) = 0.30.6 = 0.18. On cherche P (B|C) = P (BC)P (B) =0.180.6 = 0.3.

    d) On cherche P (P (A)P (C) = P (A)+P (C)P (AC) = 0.7+0.60.42 = 0.88[ 4.6 #4.61] On a A = {Le premier inspecteur decouvre la defaillance}, B = {Le deuxie`me in-

    specteur decouvre la defaillance}. On donne P (A) = 0.1, P (Bc|A) = 0.5. On trouveP (B|A) = 10.5. On cherche P (A) P (B|A). Comme ces deux evenements sontindependants, on a : P (A) P (B|A) = P (A)P (B|A) = 0.1 00.5 = 0.05

    [ 4.6 #4.62] A = {Fumeur}, B = {Mort du cancer}. On donne P (A) = 0.2, P (Ac) = 10.20.8,P (B) = 0.006. On a P (B|A) = 10P (B|Ac). On a P (B) = P (BA)(BAc). Donc,comme il sagit de deux evenements independants, on trouve P (B) = P (A)P (B|A)P (Ac)P (B|Ac) = P (A)P (B|A) + P (Ac)P (B|Ac) = 0.2P (B|A) + 0.8P (B|A)10 = 0.006.Donc P (B|A) = 0.0214.

    47

  • [ 4.6 #4.63] On a : A = {A detecte}, B = {B detecte}. On a P (A) = 0.95, P (B) = 0.98 etP (A B) = 0.94.

    a) On cherche P (A B) = P (A) + P (B) P (A B) = 0.95 + 0.98 0.94 = 0.99b) On cherche P (Ac Bc) = P ((A B)c) = 1 P ((A B)) = 0.01

    [ 4.6 #4.65] a) P (A) = 541029b) P (F ) = 5171029c) P (A F ) = 371029d) P (F |A) = 3754e) P (F |B) = 6499f) P (F |C) = 138241g) P (C|M) = 103512h) P (Bc) = 1 991029 = 9301029 = 310343

    [ 4.6 #4.69] On a P (S1) = 0.7, P (S2) = 0.3, P (A|S1) = 0.2 et P (A|S2) = 0.3.a) On cherche P (A) = P (S1)P (A|S1) + P (S2)P (A|S2) = 0.7 0.2 + 0.3 0.3 = 0.23b) On cherche P (S1|A) = P (S1)P (A|S1)P (A) = 0.70.20.23 = 0.609. On cherche P (S2|A) =

    P (S2)P (A|S2)P (A) =

    (0.3)(0.3)0.23 = 0.391.

    [ 4.6 #4.70] On a P (S1) = 0.2, P (S2) = 0.5, P (S3) = 0.3, P (A|S1) = 0.2, P (A|S2) = 0.1 etP (A|S3) = 0.3. On trouve P (A) = P (S1)P (A|S1)+P (S2)P (A|S2)+P (S3)P (A|S3) =(0.2)(0.2) + (0.5)(0.1) + (0.3)(0.3) = 0.18. On cherche P (S1|A) = P (S1)P (A|S1)P (A) =(0.2)(0.2)

    0.18 = 0.222. On cherche P (S2|A) = P (S2)P (A|S2)P (A) = (0.5)(0.1)0.18 = 0.278. Oncherche P (S3|A) = P (S3)P (A|S3)P (A) = (0.3)(0.3)0.18 = 0.5.

    [ 4.7 #4.71] On a P (A) = P (S1)P (A|S1) + P (S2)P (A|S2) = (0.6)(0.3) + (0.4)(0.5) = 0.38[ 4.7 #4.72] On definit V = {Crime violent}, N = {Crime non-violent} etE = {Crime enregistre}.

    On a P (V ) = 0.2, P (N) = 1P (V ) = 10.2 = 0.8, P (E|V ) = 0.9 et P (E|N) = 0.7.a) On cherche P (E) = P (V )P (E|V ) + P (N)P (E|N) = (0.2)(0.9) + (0.8)(0.7) =

    0.74b) On utilise la re`gle de Bayes. On cherche P (V |E) = P (V )P (E|V )P (E) = (0.2)(0.9)(0.74) =

    0.243. On cherche P (N |E) = P (N)P (E|N)P (E) = (0.8)(0.7)(0.74) = 0.757 On a aussiP (N |E) = 1 P (V |E) = 1 0.243 = 0.757

    c) Comme la proportion de crimes non-violents est plus elevee au depart (P (A)

    2) = 0.2 + 0.1 = 0.3e) P (x 3) = 0.1 + 0.3 + 0.3 + 0.2 = 1 0.1 = 0.9

    [ 4.8 #4.85] = xp(x) = (0.1)(0) + (1)(0.4) + (2)(0.4) + (3)(0.1) = 1.5[ 4.8 #4.86] On definit D : (la personne aime David Letterman), J : (la personne aime Jay Leno).

    On sait que P (J) = 0.52

    a) On a = (DDD,DJD,DJJ,DDJ,JDD,JDJ,JJD,JJJ). Si x =(le nombre de per-sonnes qui aiment Jay Leno), on a : x = 0, 1, 2, 3. P (0) = P (DDD) = (0.48)3 =0.1106. P (1) = P (DJD) + P (DDJ) + P (JDD) = 3(0.52)2(0.48) = 0.3594.P (2) = P (DJJ) + P (JDJ) + P (JJD) = 3(0.52)(0.48) = 0.3894. On a P (3) =P (JJJ) = (0.52)3 = 0.1406

    b)

    c) P (1) = 0.3594d) = xp(x) = (0)(0.1106) + (1)(0.3594) + (2)(0.3894) + (3)(0.1406) = 1.56 et

    2 = (x)2p(x) = (01.56)2(0.1106)+(11.56)2(3594)+(21.56)2(0.3894)+(3 1.56)2(0.1406) = 0.7488

    [ 4.8 #4.89] On a comme possibilites : E1 = {F1F2} E2 = {F1M1} E3 = {F1M2} E4 = {F1M3}E5 = {F2F1} E6 = {F2M1} E7 = {F2M2} E8 = {F2M3} E9 = {M1F1} E10 ={M1F2} E11 = {M1M2} E12 = {M1M3} E13 = {M2F1} E14 = {M2F2} E15 ={M2M1} E16 = {M2M3} E17 = {M3F1} E18 = {M3F2} E19 = {M3M1} E20 ={M3M2}

    a) Pour p(x), on a p(0) = P (E11, E12, E15, E16, E19, E20) = 620 =310 , p(1) =

    P (E2, ..., E10, E13, E14, E17, E18) = 1220 =35 et p(2) = P (E1, E5) =

    220 =

    110 .

    b)

    [ 4.8 #4.90][ 4.8 #4.95] On a x1 = D sil ny a aucun vol. On a x2 = (1000D) sil y a vol. On a p(x1) =

    0.99 et p(x2) = 0.01. On cherche =xp(x) = D(0.99)+(D50000)(0.01) = 1000.

    On trouve D 500 = 1000 et D = 1500.[ 4.8 #4.96][ 4.8 #4.98][ 4.8 #4.99][ 4.8 #4.102][ 4.8 #4.103]

    50

  • [ 4.8 #4.106][ 4.8 #4.107][ 4.8 #4.114][ 4.8 #4.121][ 4.8 #4.122][ 4.8 #4.130]

    51