Upload
camron-gibbs
View
218
Download
1
Embed Size (px)
Citation preview
RESEARCH & DATA ANALYSIS
SCIENTISTS COLLECT STATISTICAL DATA
FROM EXPERIMENTS STATISTICAL OR
NUMERICAL DATA ALLOWS FOR MORE ACCURATE ANALYSIS & EVALUATION OF THE RESULTS FROM EXPERIMENTS
STATISTICS DEAL WITH COLLECTING, ANALYZING,
AND INTERPRETING INFORMATION OR RESULTS
TYPES OF DATA:
QUANTITATIVE DATA – AMOUNTS, MEASUREMENTS OR NUMERICAL DATA
QUALITATIVE DATA – NON-NUMERICAL IN NATURE (CHARACTERISTICS – COLOR, SHAPE, ETC.)
TYPES OF DATA COLLECTED:
• POPULATION: 100% OF DATA ARE COLLECTED (CAN BE EXACT)
(GREEK LETTERS USED TO ABBREVIATE QUANTITIES)
• SAMPLE: A SMALLER ESTIMATE OR REPRESENTATION IS COLLECTED
(ENGLISH LETTERS STAND FOR QUANTITIES SURVEYED)
EXPERIMENTS ARE CONDUCTED USING
MULTIPLE REPLICATIONS
• REPLICATION INSURES MORE RELIABLE / ACCURATE RESULTS
DATA IS COLLECTED FROM MULTIPLE EXPERIMENTS AND AN AVERAGE IS DETERMINED
FROM THOSE RESULTS
• THE LARGER THE SAMPLE SIZE THE BETTER REPRESENTATION OF THE TRUE VALUE.
MEASURES OF CENTRAL TENDENCY INCLUDE:
MEAN AVERAGE *
MEDIAN
MODE
* USUALLY THE BEST CHOICE FOR GETTING CENTRAL TENDENCY
THE AVERAGE FOR A SET OF DATA / NUMBERS IS ALSO
CALLED THE MEANEXAMPLE:
10
12
8
11
+ 12
53 5 = 10.6 MEAN (AVG)
MEDIAN VALUE IS THE MIDDLE VALUE IN A SAMPLE
OF VALUES
THERE MUST BE THE SAME NUMBER ABOVE AND BELOW THE MEDIAN
SAMPLE: 6, 9, 10, 11, 12, 14, 18
MEDIAN VALUE = 11
MEAN / AVERAGE = 11.43
MODE IS THE VALUE THAT OCCURS MOST OFTEN IN A
SAMPLE
• SAMPLE: 10, 8 11, 12, 14, 8, 11, 11
MODE = 11
Which Example Below is a More Accurate Mean Average ?
MODE ?MEAN ?MEDIAN ?
WHY IS IT MORE ACCURATE?
MEASURES OF VARIATION
RANGE = HIGHEST SCORE – LOWEST SCORE
in the set of numbers
_ 2 STANDARD = Sq. Root of E(x – x)
DEVIATION * n - 1
* BEST CHOICE - USES ALL NUMBERS IN THE LIST
RANGE OF A SET OF DATA= HIGHEST VALUE – LOWEST VALUE
EXAMPLE:
6, 7, 8, 11,12,14,14,15,15, 16, 19, 20
20 – 6 = 14 (RANGE)
RANGE HAS LIMITED USE
10% RULE:
Some researchers consider data to be valid and representative or significant within the
10% range {
EXAMPLE: 10%RULE: 10 0.6 12 1.4 8 2.6 11 0.4 12 1.4 53 (10.6 MEAN)
10% of 10.6 (mean) is + / - 1.06
or a range of 9.54 – 11.66
THE RANGE (9.54 – 11.66) REPRESENTS A VALID RANGE
FOR ACCEPTING THE DATA
EXAMPLE:
10
12 NOTE: USING THE 10% RULE
8 & THE RANGE (9.54 – 11.6)
11 WHICH VALUES WOULD BE
12 CONSIDERED OUT OF RANGE?
53 8 & 12
STANDARD DEVIATION (SD) IS A MEASUREMENT OF THE VARIATION FROM THE MEAN
SD CONSIDERS THE #
THAT ARE OUT
OF RANGE AND
HOW FAR OUT OF
RANGE THEY ARE
STANDARD DEVIATION REPRESENTS -
HOW CLOSELY
DATA ARE
CLUSTERED
AROUND
THE MEAN
STANDARD DEVIATION TERMS:
_
X = MEAN
X = INDIVIDUAL SCORES IN THE SET
EX = SUM OF ALL SCORES / VALUES
n = TOTAL NUMBER OF SCORES OR
VALUES IN THE SET
Calculating a Standard Deviation Take a sample problem with the following values:
There are eight data points total, with a mean (or average) value of 5:
To calculate the standard deviation, compute the difference of each data point from the mean, then square the result:
Next divide the sum of these values by the number of values, then take the square root to get the standard deviation:
The standard deviation of this example is 2.
FINDING STANDARD DEVIATION CAN BE CONFUSING &
DIFFICULT IN SOME SITUATIONS
• PROCEDURES VARY DEPENDING ON THE PURPOSE & TYPE OF DATA RECORDED
• COMPUTER PROGRAMS & SCIENTIFIC CALCULATORS WILL MAKE
THE TASK EASIER
Use of Standard Deviation:
One standard deviation away from the mean in either direction represents around 68 % of the population in this group. Two standard
deviations away from the mean account for roughly 95 % of the population. And three standard deviations account for about 99 % of the
population.
If the curve were flatter and more spread out, the standard deviation would be larger in order to account for 68 % of the population. So
standard deviation can tell you how spread out the examples in a set are from the mean.
This is useful if you are comparing results for different things (drugs, equipment, etc.). Standard deviation will tell you how diverse the test
scores are for each specific thing being measured.
NORMAL DISTRIBUTION OR
A BELL CURVE
Each colored band has a width of one standard deviation.
MEAN
GAUSSIAN CURVE
• SCORES ARE PLOTTED ON A GRAPH
• ALSO KNOWN AS:
NORMAL DISTRIBUTION CURVE
NORMAL DISTRIBUTION OR Gaussian Curve
Shows Normal Frequency:
– 68.26% of the values are within 1 standard deviations from the mean.
– 95.44% of the values are within 2 standard deviations from the mean. Common Choice
– 99.73% of the values are within 3 standard deviations from the mean.
STANDARD DEVIATION:EXAMPLE: 2 SD: 10 12 10 12 13 9 } SD FROM 11 MEAN 14 53 5 = 10.6 MEAN
• MOST RESEARCHERS CONSIDER +/- 2 SD DATA VALID / ACCEPTABLE DATA
ACCURACY IS HOW CLOSE A RESULT IS TO THE TRUE VALUE
WHERE AS
PRECISION REFERS TO THE REPRODUCIBILITY OF RESULTS,
OR HOW CLOSE THE RESULTS ARE TO EACH OTHER
LABORATORY INSTRUMENTS MUST BE PRECISE AS WELL
AS ACCURATE
CLOSETRUE
Coefficient of Variation
• Precision of a new instrument will be compared to the precision of old instrument
• CV = STANDARD DEVIATION X 100%
MEAN AVERAGE
OR
% DIFFERENCE = LOW # - HIGH # X 100%
HIGH
COEFFICIENT OF VARIATION -(CV) IS THE STANDARD
DEVIATION RELATIVE TO THE MEAN OF THAT SAMPLE
• MAYBE EXPRESSED AS A % OF THE MEAN
CV = s x 100
x
PURPOSE OF DETERMINING COEFFICIENT OF VARIATION
IS TO COMPARE VARIATION OF TWO DIFFERENT SAMPLES OR
PRECISION OF TWO DIFFERENT INSTRUMENTS OR METHODS
Chi Square Analysis
• A STATISTICAL MEASURING INSTRUMENT THAT DETERMINES HOW WELL A SET OF DATA SUPPORT THE HYPOTHESIS OR EXPECTED VALUES
• [MAJOR USE IS IN GENETICS]• [EMPLOYS THE PUNNETT SQUARE]• PREDICTIONS ARE BASED ON PROBABILITY
Chi Square Analysis• TESTS WHETHER ITEMS IN VARIOUS
CATEGORIES DEVIATE OR ARE THE SAME
• NULL HYPOTHESIS MEANS IT MEETS EXPECTATIONS OR LITTLE DIFFERENCE
• A PROBABILITY OF 0.05 OR LESS SHOWS AN EXTREME DIFFERENCE FROM EXPECTATED OBSERVATION / HYPOTHESIS
• THE SMALLER THE # THE GREATER THE LIKELYHOOD IT SUPPORTS
THE HYPOTHESIS
IN SUMMARY:• THERE ARE A VARIETY OF DATA
ANALYSIS INSTRUMENTS• EACH INSTRUMENT IS BEST SUITED
TO MEASURE CERTAIN PARAMETERS OF DATA• SCIENTISTS AND RESEACHERS USE
THE INSTRUMENTS TO INTERPRET TEST RESULTS