1. Introduction to Excel, Data Presentation and Descriptive Statistics

Embed Size (px)

Citation preview

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    1/8

    MBA431 Quantitative Business Analysis Dr. Hong Chen

    1

    LAB ONE

    Introduction to Microsoft Excel, Graphical Presentation of Data & Descriptive

    StatisticsRef: pp13-14, 78-79, Chapter 4, Selvanathan et al.(2011)

    Part 1. Introduction to Microsoft Excel

    Basic statistical analysis can be done easily in Microsoft Excel and some plug-ins, one of which is

    Data Analysis ToolPak an Excel add-in program.

    1.1 Opening Excel

    If you dont find Excel on your computer desktop, you can go to Start, All Programs,Microsoft Office and click Microsoft Excel 2010 (2007 will also do).

    From Excel screen, click on Filefrom the main menu, select Newfrom the drop-down menu,and then click on Create.

    1.2 The Excel workbook and worksheet

    Excel files are called workbooks. A workbook contains worksheets (by default, 3 worksheets, namely Sheet 1~3. You can

    generate a number of worksheets based on your needs). You can operate on any of these

    sheets and any other sheets that may be created. To change the worksheet, use your mouse

    pointer and click the sheet you wish to move to.

    A worksheet consists of rows and columns. The rows are numbered, and the columns areidentified by letters. And each cellin the worksheet can be identified by the combination of

    one letter and one number, e.g. A1 refers the first cell in the worksheet, and D3 is the cell in

    the fourth column and the third row.

    A cell becomes activewhen you move the mouse pointer (which appears as a large plus sign)and click, e.g. cell D5 in the Figure 1. In the active cell, you can type in a number, word or

    formula.

    You can use any of the four Up, Down, Left or Rightarrow keys, which appear on yourkeyboard as arrows pointing up, down, left and right respectively.

    At the bottom left-hand corner of the screen you will see the word Ready. As you begin totype something into the active cell, the word Ready changes to Enter.

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    2/8

    2

    1.3 Inputting data

    To input data, open a new workbook by clicking the File tab from the menubar and thenselecting New.

    Data are usually stored in columns. Active the cell in the first row of the column in whichyou plan to type the data.

    You may type the name of the variable if you wish. E.g. if you plan to type your assignmentmarks in column A you may type Assignment Marks in cell A1. Hit the Enter key on your

    keyboard and cell A2 becomes active.

    Begin typing the marks, following each one by Enter. Use the arrow key or mouse pointer tomove to a new column if you wish to enter another set of numbers.

    1.4 Importing data files

    Data files for this courses computer practices can be downloaded from Moodle. To import a file, click the Filetab and select Openon the drop-down menu. Browse the directories to find the required file. Double-click each of the directories along the

    path until you reach the file you wish to open.

    The file will appear in the form in which it was saved.

    1.5 Data Analysis ToolPak

    The Data Analysis ToolPak is a group of statistical functions that comes with Excel. You can find

    the ToolPak by clicking the Datatab from the menubar and then Data Analysisfrom the Analysis

    sub-menu. If the ToolPak does not appear in the menu, follow the following steps to add it in:

    Click on the File tab, select Excel Options. From the options list, click on Add-Ins, which will display another menu. Make sure that

    under Manage, you select Excel Add-Insand then click Go.

    Select Analysis ToolPakand then click OK. To access Analysis TookPak, simply click the Data tab and then Data Analysis from the

    Analysissub-menu.

    There are 19 menu items in Data Analysis. Click the one you wish to use, and follow theinstructions described in the textbook or lab session notes.

    1.6 Formula bar and Insert function fx

    On the Formula tab from menubar you will find the fx Insert function. Clicking this button

    produces other menus that allow you to specify functions that perform various calculations.

    1.7 Saving workbooks

    To save a new file, click the Filetab from the menubar and select the option Save as on thedrop-down menu. Enter the new file name and click Save.

    To save an already saved file with the same name, choose Saveon the drop-down menu, andthe original file will be overwritten.

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    3/8

    3

    Part 2. Graphical Presentation of Data

    2.1 Line plot of time series data

    Use the Insert tab from Excel menubar to draw line plot. We use dataset in the Excel file

    XR02-81to draw line plot and describe the trend of job vacancies in New South Wales (NSW)

    over 1991-2008.

    1) Highlight cells from B1 to B19.2) Click the Insert tab from menubar, from Charts submenu choose Line, and then click

    the first type of chart in the second row (when you place mouse pointer on the chart it

    shows the nameLine with Markers).

    3) Note that the horizontal axis does not show the points of years. To let the horizontal axisrepresent points of years, from the Design tab, click Select datafrom Datasubmenu.

    4) In the pop-up dialogue named Select Data Source, click Edit under the Horizontal(Category) Axis Labels.

    5) Move your mouse pointer to theInput Rangebox and click. Back to Sheet1, highlightcells A2 to A19, and then click OKin the pop-up box.

    6) Click OKin the Select Data Sourcedialogue, which will produce a line plot as follows.(You can continue working on the format of the plot and choose options from the

    right-click menu).

    7) To put all three time series in one graph, simply highlight three series by covering cellsfrom B1 to D19. Then follow Steps 2~6. The multiple series line plot will be as follows:

    2.2 Histogram to present frequency distribution

    For the rest of this practice we use the M-Status (marriage status) series in Column C in the

    0.0

    10.0

    20.0

    30.0

    40.0

    50.0

    60.0

    1991

    1992

    1993

    1994

    1995

    1996

    1997

    1998

    1999

    2000

    2001

    2002

    2003

    2004

    2005

    2006

    2007

    2008

    NSW

    0.0

    20.0

    40.0

    60.0

    80.0

    100.0

    120.0

    140.0

    160.0

    180.0

    200.0

    1991

    1992

    1993

    1994

    1995

    1996

    1997

    1998

    1999

    2000

    2001

    2002

    2003

    2004

    2005

    2006

    2007

    2008

    NSW

    Victoria

    Australia

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    4/8

    4

    Excel fileXR02-82.

    1) We use Analysis ToolPakfor this practice. To plot a Histogram we need to define theclasses (categories) for the variable. To do that, in cell G1~G5 type class, 1, 2, 3, 4

    respectively (see below).

    2) From the Datatab go to the last submenu Analysis ToolPak (refer to Part 1.5 to add itinto Excel if you cannot find it in the submenu).

    3) Choose Histogramfrom the list.4) In the pop-up Histogramdialogue, move mouse pointer to Input Rangebox and click.

    Go back to worksheet Sheet1, highlight cells from C1 to C501 (in total 500 observations

    in this sample). Switch back to the pop-up dialogue.

    5) Move mouse pointer to Bin Range box and click. Go back to worksheet Sheet1,highlight cells from G1 to G5. Switch back to the pop-up dialogue.

    6) Check (i.e. tick) the Labelsoption to indicate that the first row is series names.7) Choose Output Range from the Output Options. Move mouse pointer to Output

    Rangebox and click. Go back to worksheet, highlight cell H1 (this will decide where the

    chart will be displayed). Switch back to the pop-up box.

    8) Check (i.e. tick) the box of Chart Output. Then clickOK. A histogram will be producedaccordingly. Note that the chart has a fifth category named Other. Simply delete the last

    row in the newly generated frequency table, so that the histogram chart is without the

    fifth category (see below).

    9) Note that in our practice today classes are denoted by single values rather than intervals.These values can be regarded as mid-points of classes.

    2.3 Bar charts to present frequency distribution

    1) Click the Insert tab from menubar, from Charts submenu choose Bar, and then click thefirst type of chart in the first row (when you place mouse pointer on the chart it shows

    the name Clustered Bar).

    2) Place mouse pointer to the empty chart and click. From the Design tab, click Select data

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    5/8

    5

    from Datasubmenu.

    3) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells I1 to I5, and then click OK. In the current example, the Horizonal (Category) Axis

    Labels are automatically tuned.

    4) You can try the other types of bar charts to see different views.

    2.4 Pie charts to present relative frequency distribution

    1) We can use Pie charts to present frequency distribution and relative frequencydistribution. For the latter, we need to firstly calculate relative frequency. To do that, in

    cell J1 typeRelative Frequency, then press Enter on your keyboard which makes cell J2

    active. In J2 type =I2/500(i.e. divide the value in cell I2 by the total sample size 500),

    and press Enterto finish the command. To duplicate the command in other cells, move

    mouse pointer to the right-bottom corner of cell J2 (and you will see the mouse pointer

    becomes a +). Click the right-bottom corner of J2 and drag the mouse pointer over cells

    J3 to J5 (caution: keep pressing the left side of the mouse and dont release it until J3 to

    J5 are covered). Then you will see that J3 to J5 are filled with values of relative

    frequency.

    2) Click the Insert tab from menubar, from Charts submenu choose Pie, and then click thefirst type of chart in the first row.

    3) Place mouse pointer to the empty chart and click. From the Design tab, click Select datafrom Datasubmenu.

    4) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells J1 to J5, and then click OK. In the current example, the Horizonal (Category) Axis

    Labels are automatically tuned.

    5) To add data labels to the Pie chart, move mouse pointer to the Pie chart, right-click on thePie area and choose Add Data Labels.

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    6/8

    6

    2.5 Ogive to present cumulative frequency distribution

    1) We use Ogive plots to present cumulative relative frequency distribution. Thus we needto calculate cumulative relative frequency. To do that,

    In cell K1 type

    Cumulative Relative Frequency, then press

    Enteron your keyboard

    which makes cell K2 active. In K2 type =J2 (i.e. first classs cumulative relative

    frequency is relative frequency in the first class), and press Enter to finish the

    command.

    In K3 type =J3+K2(i.e. second classs cumulative relative frequency is the sum ofsecond classs relative frequency and cumulative relative frequency of preceding

    class).

    To duplicate the command in other cells, move mouse pointer to the right-bottomcorner of cell K3 (and you will see the mouse pointer becomes a +). Click the

    right-bottom corner of K3 and drag the mouse pointer over cells K3 to K5 (caution:

    keep pressing the left side of the mouse and dont release it until K3 to K5 are

    covered). Then you will see that K3 to K5 are filled with values of cumulative

    relative frequency.

    2) Click the Insert tab from menubar, from Charts submenu choose Line, and then clickthe first type of chart in the second row.

    3) Place mouse pointer to the empty chart and click. From the Design tab, click Select datafrom Datasubmenu.

    4) Move mouse pointer to the Chart data rangebox and click. Return to Sheet1, highlightcells K1 to K5, and then click OK.

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    7/8

    7

    Part 3. Descriptive statistics

    We use dataset in the Excel file XM04-06 to summarize sample statistics for 100 students

    marks.

    1. Maximum value can be found by typing =max(A2:A101). Note that A2:A101 is theinput range.

    2. Minimumvalue can be found by typing =min(A2:A101).3. Range is the difference of the above two by typing =B2-B3, where B2 is the cell of

    maximum and B3 is the cell of minimum.

    4. To calculate meanof the marks series, in any blank cell type =average(A2:A101).5. To calculate medianof the series, in any blank cell type =median(A2:A101).6. To calculate modeof the series, in any blank cell type =mode(A2:A101).7. To calculate varianceof the series, in any blank cell type =var(A2:A101).8. To calculate standard deviationof the series, in any blank cell type =stdev(A2:A101).

    Check that s.d. is the square root of variance. You can do this by typing =sqrt(.)where .

    is the cell where variance is.

    9. Coefficient of variation can be calculated by typing =B9/B5 where B5 is the cell ofmean value and B9 is the cell of standard deviation in my practice.

    10. The above statistics can be obtained by using the Summary Statistics function of DataAnalysis Toolpak. Follow the steps below to obtain the Summary:

    a) Click the Datatab from the menubar, choose the last submenu Data AnalysisandDescriptive Statistics.

    b) In the Input range, type in $A$2:$A$101(or $A$1:$A$101if the cell containing thevariable name is included, then tick the box Label in first row).

    c) Under Output options click the check box for Output Range, and type the startingcell reference for the output, e.g. $D$2 (again the cell should be blank to avoid

    overwrite existing data).

    d) Tick the check box for Summary Statistics.e) Click OK.

    Check that the values obtained upon commands are consistent with those in the Summary

    table.

    Percentiles

    To find out the kthpercentile value, firstly you need to arrange the data in either ascending or

    descending order. Here we arrange data in ascending array.

    1. Move the mouse pointer over the column reference (e.g. Aif the series is in column A)and click. Note that by doing this the whole column should be selected.

    2. Right-click the mouse and choose Copy.3. Move the mouse pointer to cell G1and click. Right-click the mouse and choose Paste.

    The whole series should be copied to column G.

    4. From the Datatab, click Sort A to Z button( ) from the Sort & Filtersubmenu. Data

  • 8/12/2019 1. Introduction to Excel, Data Presentation and Descriptive Statistics

    8/8

    8

    are arranged in an ascending order.

    5. From the Formulastab, click Insert Function. From Select a Category drop-down list,choose Statistical, and from Select a function, choose PERCENTILE. INC.

    6. In the Arraybox, type in G2:G101(the input range). Type 0.1 in the Kbox, which givesthe 10

    thpercentile value, 38.9; type 0.25 in the Kbox, which gives the 25thpercentile (i.e.

    Q1) value, 64.75; type 0.5 in the Kbox, which gives the 50thpercentile (i.e. median or

    Q2 )value, 81; type 0.75 in the Kbox, which gives the 75thpercentile (i.e. Q3) value, 90;

    and type 1 in the Kbox, which gives the 100thpercentile value (i.e. the last observations

    value in the sample), 100.