Upload
kavita-shah
View
217
Download
0
Embed Size (px)
Citation preview
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
1/8
MBA431 Quantitative Business Analysis Dr. Hong Chen
1
LAB ONE
Introduction to Microsoft Excel, Graphical Presentation of Data & Descriptive
StatisticsRef: pp13-14, 78-79, Chapter 4, Selvanathan et al.(2011)
Part 1. Introduction to Microsoft Excel
Basic statistical analysis can be done easily in Microsoft Excel and some plug-ins, one of which is
Data Analysis ToolPak an Excel add-in program.
1.1 Opening Excel
If you dont find Excel on your computer desktop, you can go to Start, All Programs,Microsoft Office and click Microsoft Excel 2010 (2007 will also do).
From Excel screen, click on Filefrom the main menu, select Newfrom the drop-down menu,and then click on Create.
1.2 The Excel workbook and worksheet
Excel files are called workbooks. A workbook contains worksheets (by default, 3 worksheets, namely Sheet 1~3. You can
generate a number of worksheets based on your needs). You can operate on any of these
sheets and any other sheets that may be created. To change the worksheet, use your mouse
pointer and click the sheet you wish to move to.
A worksheet consists of rows and columns. The rows are numbered, and the columns areidentified by letters. And each cellin the worksheet can be identified by the combination of
one letter and one number, e.g. A1 refers the first cell in the worksheet, and D3 is the cell in
the fourth column and the third row.
A cell becomes activewhen you move the mouse pointer (which appears as a large plus sign)and click, e.g. cell D5 in the Figure 1. In the active cell, you can type in a number, word or
formula.
You can use any of the four Up, Down, Left or Rightarrow keys, which appear on yourkeyboard as arrows pointing up, down, left and right respectively.
At the bottom left-hand corner of the screen you will see the word Ready. As you begin totype something into the active cell, the word Ready changes to Enter.
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
2/8
2
1.3 Inputting data
To input data, open a new workbook by clicking the File tab from the menubar and thenselecting New.
Data are usually stored in columns. Active the cell in the first row of the column in whichyou plan to type the data.
You may type the name of the variable if you wish. E.g. if you plan to type your assignmentmarks in column A you may type Assignment Marks in cell A1. Hit the Enter key on your
keyboard and cell A2 becomes active.
Begin typing the marks, following each one by Enter. Use the arrow key or mouse pointer tomove to a new column if you wish to enter another set of numbers.
1.4 Importing data files
Data files for this courses computer practices can be downloaded from Moodle. To import a file, click the Filetab and select Openon the drop-down menu. Browse the directories to find the required file. Double-click each of the directories along the
path until you reach the file you wish to open.
The file will appear in the form in which it was saved.
1.5 Data Analysis ToolPak
The Data Analysis ToolPak is a group of statistical functions that comes with Excel. You can find
the ToolPak by clicking the Datatab from the menubar and then Data Analysisfrom the Analysis
sub-menu. If the ToolPak does not appear in the menu, follow the following steps to add it in:
Click on the File tab, select Excel Options. From the options list, click on Add-Ins, which will display another menu. Make sure that
under Manage, you select Excel Add-Insand then click Go.
Select Analysis ToolPakand then click OK. To access Analysis TookPak, simply click the Data tab and then Data Analysis from the
Analysissub-menu.
There are 19 menu items in Data Analysis. Click the one you wish to use, and follow theinstructions described in the textbook or lab session notes.
1.6 Formula bar and Insert function fx
On the Formula tab from menubar you will find the fx Insert function. Clicking this button
produces other menus that allow you to specify functions that perform various calculations.
1.7 Saving workbooks
To save a new file, click the Filetab from the menubar and select the option Save as on thedrop-down menu. Enter the new file name and click Save.
To save an already saved file with the same name, choose Saveon the drop-down menu, andthe original file will be overwritten.
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
3/8
3
Part 2. Graphical Presentation of Data
2.1 Line plot of time series data
Use the Insert tab from Excel menubar to draw line plot. We use dataset in the Excel file
XR02-81to draw line plot and describe the trend of job vacancies in New South Wales (NSW)
over 1991-2008.
1) Highlight cells from B1 to B19.2) Click the Insert tab from menubar, from Charts submenu choose Line, and then click
the first type of chart in the second row (when you place mouse pointer on the chart it
shows the nameLine with Markers).
3) Note that the horizontal axis does not show the points of years. To let the horizontal axisrepresent points of years, from the Design tab, click Select datafrom Datasubmenu.
4) In the pop-up dialogue named Select Data Source, click Edit under the Horizontal(Category) Axis Labels.
5) Move your mouse pointer to theInput Rangebox and click. Back to Sheet1, highlightcells A2 to A19, and then click OKin the pop-up box.
6) Click OKin the Select Data Sourcedialogue, which will produce a line plot as follows.(You can continue working on the format of the plot and choose options from the
right-click menu).
7) To put all three time series in one graph, simply highlight three series by covering cellsfrom B1 to D19. Then follow Steps 2~6. The multiple series line plot will be as follows:
2.2 Histogram to present frequency distribution
For the rest of this practice we use the M-Status (marriage status) series in Column C in the
0.0
10.0
20.0
30.0
40.0
50.0
60.0
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
NSW
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
200.0
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
NSW
Victoria
Australia
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
4/8
4
Excel fileXR02-82.
1) We use Analysis ToolPakfor this practice. To plot a Histogram we need to define theclasses (categories) for the variable. To do that, in cell G1~G5 type class, 1, 2, 3, 4
respectively (see below).
2) From the Datatab go to the last submenu Analysis ToolPak (refer to Part 1.5 to add itinto Excel if you cannot find it in the submenu).
3) Choose Histogramfrom the list.4) In the pop-up Histogramdialogue, move mouse pointer to Input Rangebox and click.
Go back to worksheet Sheet1, highlight cells from C1 to C501 (in total 500 observations
in this sample). Switch back to the pop-up dialogue.
5) Move mouse pointer to Bin Range box and click. Go back to worksheet Sheet1,highlight cells from G1 to G5. Switch back to the pop-up dialogue.
6) Check (i.e. tick) the Labelsoption to indicate that the first row is series names.7) Choose Output Range from the Output Options. Move mouse pointer to Output
Rangebox and click. Go back to worksheet, highlight cell H1 (this will decide where the
chart will be displayed). Switch back to the pop-up box.
8) Check (i.e. tick) the box of Chart Output. Then clickOK. A histogram will be producedaccordingly. Note that the chart has a fifth category named Other. Simply delete the last
row in the newly generated frequency table, so that the histogram chart is without the
fifth category (see below).
9) Note that in our practice today classes are denoted by single values rather than intervals.These values can be regarded as mid-points of classes.
2.3 Bar charts to present frequency distribution
1) Click the Insert tab from menubar, from Charts submenu choose Bar, and then click thefirst type of chart in the first row (when you place mouse pointer on the chart it shows
the name Clustered Bar).
2) Place mouse pointer to the empty chart and click. From the Design tab, click Select data
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
5/8
5
from Datasubmenu.
3) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells I1 to I5, and then click OK. In the current example, the Horizonal (Category) Axis
Labels are automatically tuned.
4) You can try the other types of bar charts to see different views.
2.4 Pie charts to present relative frequency distribution
1) We can use Pie charts to present frequency distribution and relative frequencydistribution. For the latter, we need to firstly calculate relative frequency. To do that, in
cell J1 typeRelative Frequency, then press Enter on your keyboard which makes cell J2
active. In J2 type =I2/500(i.e. divide the value in cell I2 by the total sample size 500),
and press Enterto finish the command. To duplicate the command in other cells, move
mouse pointer to the right-bottom corner of cell J2 (and you will see the mouse pointer
becomes a +). Click the right-bottom corner of J2 and drag the mouse pointer over cells
J3 to J5 (caution: keep pressing the left side of the mouse and dont release it until J3 to
J5 are covered). Then you will see that J3 to J5 are filled with values of relative
frequency.
2) Click the Insert tab from menubar, from Charts submenu choose Pie, and then click thefirst type of chart in the first row.
3) Place mouse pointer to the empty chart and click. From the Design tab, click Select datafrom Datasubmenu.
4) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells J1 to J5, and then click OK. In the current example, the Horizonal (Category) Axis
Labels are automatically tuned.
5) To add data labels to the Pie chart, move mouse pointer to the Pie chart, right-click on thePie area and choose Add Data Labels.
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
6/8
6
2.5 Ogive to present cumulative frequency distribution
1) We use Ogive plots to present cumulative relative frequency distribution. Thus we needto calculate cumulative relative frequency. To do that,
In cell K1 type
Cumulative Relative Frequency, then press
Enteron your keyboard
which makes cell K2 active. In K2 type =J2 (i.e. first classs cumulative relative
frequency is relative frequency in the first class), and press Enter to finish the
command.
In K3 type =J3+K2(i.e. second classs cumulative relative frequency is the sum ofsecond classs relative frequency and cumulative relative frequency of preceding
class).
To duplicate the command in other cells, move mouse pointer to the right-bottomcorner of cell K3 (and you will see the mouse pointer becomes a +). Click the
right-bottom corner of K3 and drag the mouse pointer over cells K3 to K5 (caution:
keep pressing the left side of the mouse and dont release it until K3 to K5 are
covered). Then you will see that K3 to K5 are filled with values of cumulative
relative frequency.
2) Click the Insert tab from menubar, from Charts submenu choose Line, and then clickthe first type of chart in the second row.
3) Place mouse pointer to the empty chart and click. From the Design tab, click Select datafrom Datasubmenu.
4) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells K1 to K5, and then click OK.
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
7/8
7
Part 3. Descriptive statistics
We use dataset in the Excel file XM04-06 to summarize sample statistics for 100 students
marks.
1. Maximum value can be found by typing =max(A2:A101). Note that A2:A101 is theinput range.
2. Minimumvalue can be found by typing =min(A2:A101).3. Range is the difference of the above two by typing =B2-B3, where B2 is the cell of
maximum and B3 is the cell of minimum.
4. To calculate meanof the marks series, in any blank cell type =average(A2:A101).5. To calculate medianof the series, in any blank cell type =median(A2:A101).6. To calculate modeof the series, in any blank cell type =mode(A2:A101).7. To calculate varianceof the series, in any blank cell type =var(A2:A101).8. To calculate standard deviationof the series, in any blank cell type =stdev(A2:A101).
Check that s.d. is the square root of variance. You can do this by typing =sqrt(.)where .
is the cell where variance is.
9. Coefficient of variation can be calculated by typing =B9/B5 where B5 is the cell ofmean value and B9 is the cell of standard deviation in my practice.
10. The above statistics can be obtained by using the Summary Statistics function of DataAnalysis Toolpak. Follow the steps below to obtain the Summary:
a) Click the Datatab from the menubar, choose the last submenu Data AnalysisandDescriptive Statistics.
b) In the Input range, type in $A$2:$A$101(or $A$1:$A$101if the cell containing thevariable name is included, then tick the box Label in first row).
c) Under Output options click the check box for Output Range, and type the startingcell reference for the output, e.g. $D$2 (again the cell should be blank to avoid
overwrite existing data).
d) Tick the check box for Summary Statistics.e) Click OK.
Check that the values obtained upon commands are consistent with those in the Summary
table.
Percentiles
To find out the kthpercentile value, firstly you need to arrange the data in either ascending or
descending order. Here we arrange data in ascending array.
1. Move the mouse pointer over the column reference (e.g. Aif the series is in column A)and click. Note that by doing this the whole column should be selected.
2. Right-click the mouse and choose Copy.3. Move the mouse pointer to cell G1and click. Right-click the mouse and choose Paste.
The whole series should be copied to column G.
4. From the Datatab, click Sort A to Z button( ) from the Sort & Filtersubmenu. Data
8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics
8/8
8
are arranged in an ascending order.
5. From the Formulastab, click Insert Function. From Select a Category drop-down list,choose Statistical, and from Select a function, choose PERCENTILE. INC.
6. In the Arraybox, type in G2:G101(the input range). Type 0.1 in the Kbox, which givesthe 10
thpercentile value, 38.9; type 0.25 in the Kbox, which gives the 25thpercentile (i.e.
Q1) value, 64.75; type 0.5 in the Kbox, which gives the 50thpercentile (i.e. median or
Q2 )value, 81; type 0.75 in the Kbox, which gives the 75thpercentile (i.e. Q3) value, 90;
and type 1 in the Kbox, which gives the 100thpercentile value (i.e. the last observations
value in the sample), 100.