68
Data Analysis, Visualization, and Use A practical guide for program improvement Version 2 July, 2017

Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

Data Analysis,

Visualization,

and Use

A practical guide for

program improvement

Version 2

July, 2017

Page 2: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

Created by

Caleb Parker, Research Associate, FHI 360

Katherine Lew, Technical Advisor, FHI 360

Amita Mehrotra, Project Manager, FHI 360

Page 3: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

1

Contents Introduction ........................................................................................................................ 2

Data Analytics Lifecycle ....................................................................................................... 4

SECTION 1: DEFINITION OF THE ANALYSIS ............................................................... 6

SECTION 2: EXPLORATION OF THE DATA ................................................................. 8

2.1: OVERVIEW OF DATABASES .................................................................................. 8

2.2: TYPES OF QUANTITATIVE DATA ......................................................................... 11

2.3: DATABASE CLEANING AND FORMATTING ......................................................... 12

Technical Steps: Cleaning and formatting a database .......................................... 14

2.4: SUMMARY TABLES ............................................................................................. 18

Technical Steps: Creating Summary Tables .......................................................... 19

SECTION 3: VISUALIZATION OF THE DATA ............................................................. 22

3.1: CHART COMPONENTS AND TYPES ..................................................................... 22

3.2: CREATING CHARTS AND GRAPHS OF YOUR DATA ............................................. 26

Technical Steps: Creating a chart in MS Excel. ...................................................... 39

3.3: GUIDELINES FOR DESIGNING CHARTS AND GRAPHS ......................................... 40

Primary Guidelines for All Charts .......................................................................... 40

Guidelines for Communicating Ideas within Charts ............................................. 47

SECTION 4: INTERPRETATION OF THE VISUALS ...................................................... 50

4.1: Interpretation examples .................................................................................... 51

4.2: Policy Advocacy Example ................................................................................... 56

SECTION 5: COMMUNICATION OF THE FINDINGS .................................................. 58

5.1: Create a Communication Product ...................................................................... 58

Policy Advocacy Example ...................................................................................... 62

Additional Examples.............................................................................................. 63

Page 4: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2

Introduction

Programs being implemented throughout the world for human development

activities generate data. The data could be as simple as recording the number of

teachers receiving a training to increase attendance in their schools. And it can be

as complex as recording the ongoing clinical history of thousands of individuals

on anti-retroviral therapy used to track their health and ensure their adherence to

the medication. Teams collect the data to report their activities to funders and

local authorities, and use the data to help make decisions for the program,

among other reasons.

But just looking at spreadsheets is not enough for us to find meaning in the data.

So how do the data collected become useful information that someone can easily

understand and then use to make decisions with? This is done through a process

called data analytics (or data analysis), which can be defined as the process

of turning raw data into useable information shared for a specific audience.

A critical component of the analysis is data visualization. Data are visualized in

graphics like charts, graphs, maps, symbols, tables, and text, and then by

examining the visuals, users can identify patterns, trends, and differences in the

data.

The final product of data analytics is not simply one graph or chart, but is a

collection of well-designed visuals that are intentionally crafted to answer a

specific question of the data that a specific audience can use for decision-making.

The interpretation of the findings represented in the visuals is clearly described in

this final product.

Page 5: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3

This manual is a guide designed to support data analytics for program staff and

anyone involved in collecting and using data to make decisions with. Throughout

this manual, we will cover the entire process of data analytics with the following

objectives:

• Improve collaboration between technical teams and the monitoring and

evaluation (M&E) teams

• Strengthen the capacity of staff to create and design visualizations of the

data for analysis

• Strengthen the capacity of staff to analyze data for decision-making, and

create products to communicate the interpretation of the analysis

Page 6: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

4

Data Analytics Lifecycle

Data analytics is the process of turning raw data into useable information shared

for decision-making. The process of data analytics for our purposes comes in five

steps. Each step is necessary to produce an informative, useful understanding of

the data for the audience.

1. Definition of the analysis

a. What is purpose of this analysis?

b. What are the actual questions being asked of the data?

c. Who is the audience that will see and use the analysis?

2. Exploration of the data

a. Ensure the database is cleaned, complete and formatted properly.

b. Identify the variables in the data, the unit of measure, and the

geographic, demographic, and time dimensions of the data.

c. Aggregate and summarize the data.

3. Visualization of the data

a. Determine the most appropriate graphic to use to visualize the

data.

4. Interpretation of the visualization

a. What patterns, trends and differences exist in the data as found in

the visualization?

b. Why do those patterns, trends, and differences exist in the

visualizations?

c. What conclusions and recommendations can you determine from

the visualizations?

5. Communication of the findings

a. What information/messages should be shared with the audience,

using what form of visualization?

Page 7: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

5

Data analytics is the process of

turning raw data

into useable information

shared for decision-making.

Page 8: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

6

SECTION 1: DEFINITION OF THE ANALYSIS

Every analysis and visualization is based on a purpose to understand some part of

the data, to derive meaning about certain actions or results that was collected in

the data. Before conducting data analysis, consider what the analysis will inform,

and who the intended end-user, or audience, is. This information can be

organized into a formal Data Analysis Plan. The plan is usually set up at the

beginning of the program before activities begin and amended during the

program.

Data Analysis Plan: A Data Analysis Plan (DAP) is a guide that describes the

question of the data, the analyses that must be done, and who the audience will

be. Some components of the DAP included in the example in Figure 3 may not be

necessary for each project, and some may need to be added. Therefore, DAPs

should be adapted to suit the needs of the project.

The plan should begin with an objective that captures the overall idea that the

analyses will address, and then one or more specific analysis questions will be

written, as described in Figure 3. DAPs can consist of several objectives, each with

their own analysis questions. The example in Figure 3 shows one objective with

one analysis question. Each analysis question may require more than one

visualization to understand the data. In the example, note that two visualizations

are being created for the analysis question. Space is also provided for details on

the indicator, the source of the data, and the time, demographic, and geographic

characteristics.

A Data Analysis Plan (DAP)

is a guide that describes

the question of the data,

the analyses that must be done,

and who the audience will be.

Page 9: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

7

Page 10: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

8

SECTION 2: EXPLORATION OF THE DATA

This section begins with an overview of databases, discusses cleaning and

transforming the database, and then describes how to explore the data.

2.1: OVERVIEW OF DATABASES

While software packages may differ, the arrangement and overall structure of a

database is the same and is composed of the same components. This subsection

covers important terminology that will be used throughout the manual.

1. Indicator: Measurable variable used as a representation of an associated

factor or quantity. Indicators often comprise one or more data elements.

The indicator for the number of children testing positive for HIV may be

comprised of multiple variables in the table if the definition of “child” is

represented in multiple variables across different age groups; the

variables of “2-5 years”, “6-8 years”, and “9-11 years” for HIV positive test

results must then be added together to calculate the indicator.

2. Variable or Field (Column): Each column in the database provides space

for one, unique characteristic that can vary for each data point within a

given range (e.g., 0 to 500), and are of the same data type (i.e., nominal,

categorical, continuous). A variable could be the name of the individual

health facility, while another could be the total number of women

receiving care and treatment for HIV. Variable names are maintained in a

header row, which is always the very first row (Row 1 in MS Excel).

3. Data Point, Feature, or Element (Row): Every row in the database is a

single unit of data, referred to in several ways. For this manual, it will be

often referred to as a data point. Each point consists of that unit’s

information across all variables. Each data point must be unique, and

must be of the same unit of measure (see definition below). Note that the

first row of the database must always be reserved for variable names.

Page 11: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

9

4. Data Value (Cell): The values for each data point (row) at each variable

(column) are found where the row and column intersect. Cells cannot be

merged across rows or columns in the database. All values within one

variable must be formatted the same way. For example: a variable that

describe the number of hours spent in a training can have the data values

listed as “8”, “8:00”, “eight”, and so on. Choose one format for all data

values within each variable.

5. Class or Category: The data values of one variable can be aggregated

(grouped together) into classes. Variables that have categorical data can

easily be grouped into classes. For example, a variable “type of facility”

has only three values throughout a database: “health post,” “clinic,” and

“hospital.” The data can therefore be aggregated into three different

classes and analyzed.

6. Unique Identification Field: Each data point (row) of the database may

have some way to separate it easily from another data point. While the

names of facilities may be the unit of measure, and each one of them are

unique for each row, having a new field that enumerates each facility

name can help manage the data more easily. Text-based fields (like the

name of districts) have more chances of causing problems, such as how

names can be spelled differently (e.g., Carroogou versus Karougou).

7. Unit of Measure: Each data point (row) of the database must fit in the

same kind of measurement. This can be done with a geographic (or

spatial) characteristic, such as an individual client, an individual health

facility, or individual districts of a nation, and is often referred to as the

“organizational unit” (note that for mapping, the database’s unit of

measure must be geographic). This could be done using other variables as

well, or a combination of variables. Each database row must be unique,

meaning that each specific combination of a unit of measure and

variable(s) cannot be repeated. Databases that are disaggregated to lower

levels of measure (e.g., health facilities, schools, organization’s centers)

make analyses easier that those that are aggregated to higher levels (e.g.,

regions, school districts, entire organizations).

Page 12: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

10

Page 13: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

11

2.2: TYPES OF QUANTITATIVE DATA

Each variable will consist of one of these three types of data: nominal, categorical,

or continuous. Nominal and categorical can be text or number, but continuous

can only be numeric: if any text is written in a cell for this column, the software

may not be able to recognize it as continuous, therefore limiting the ability to

visualize the data.

1. Nominal or Categorical: These are data which have no specific order or

value associated with them. One must not come before or after another

(order), and one is not “better” or “greater than” another (value). Usually,

nominal data for our purposes include names for districts, regions, or

health facilities and schools. While one health facility may be described as

a higher quality than another, here we are only considering the actual

name and nothing else. Other variables for those health facilities will place

a value to it (e.g., number of services offered), and an order to it (e.g., type

of health facility).

2. Ranked or Ordered: Data that fit into this category have attributes where

one unit of data comes before or after another. The actual value between

different data units is not important. For our purposes, this could be the

type of administrative boundary: national boundaries come first, then

regions, then districts, and so on. The type of health facility is another

example. Health facilities are classified by services that are offered,

including health posts, health centers, and hospitals. Health posts offer

fewer services, while hospitals offer more services: in this way, they can be

organized by the services they provide.

3. Continuous: For these variables, the data are numeric and those numbers

indicate order (one comes before another) and value (one is greater than

or less than another). The number of clients seen at a health facility is one

such example. If that facility sees 100 clients during one month, and then

the next month sees 150 clients, you can say more clients were seen in the

second month. Continuous data can be classified with specific data ranges

as well, which then makes the data type ranked or ordered. Classifying

continuous data is useful in mapping for better visualizations, but may not

be necessary in other visualization formats.

Page 14: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

12

2.3: DATABASE CLEANING AND FORMATTING

Before exploring the data, (1) the data must be cleaned, and the (2) database

must be formatted and set up for easy analysis. The processes for cleaning and

formatting the data are often overlapping, and are presented as one in this

subsection.

The goal of data cleaning is to make certain of the accuracy and completeness of

the data. This also includes ensuring that the no data points are duplicated, and

that missing data values are supposed to be blank as opposed to being

accidentally deleted or not entered.

Formatting the database means that the database is structured in a way that

analysis can easily and directly be carried out. This includes making data values

for each variable consistent in spelling and format, creating new variables needed

for the analysis, as well as removing empty columns and rows.

Page 15: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

13

In Figure 5, the original database was cleaned and transformed in the following ways:

• Unmerge all merged cells, and write the data values for each data point

• Remove blank row in the middle of the database

• Dates should be formatted the same way

• Workshop hours should be formatted the same way

• Gender - use the same spelling for each gender consistently

• Role - use the same spelling for each role consistently

• Organization - use the same spelling for each organization consistently

The goal of data cleaning is

to make certain of the

accuracy and completeness

of the data.

Formatting the database

means that the

database is structured in

a way that analysis can

easily and directly

be carried out.

Page 16: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

14

Technical Steps: Cleaning and formatting a database

Copy the original database into a new tab called “Formatted Database”. Before

doing anything to the database, make a copy. Only work on the copy, leaving the

original untouched for data quality control. If some error corrupts your database,

you should always have an original to return to. Therefore, name the original

database worksheet as “Original Database”. Create a new worksheet in the same

MS Excel file and name it “Formatted Database”. Make all changes in the

“Formatted Database” worksheet.

Document changes to the database in a “Notes” tab: Every change made to the

original database should be documented. Consider the likelihood that someone

may ask you to make a slight change to your visual weeks after you created it, or,

harder still, if someone else needs to make changes using your files. For you and

others to easily know how to make changes and quickly create updated visuals,

having such documentation is essential. This section outlines some key

considerations for keeping track of changes made to an Excel spreadsheet.

Make a new worksheet/tab in the same MS Excel file called like “Notes” to explain

all variable names that are not easy to understand in the database, including new

variables that you calculate or add. Also include explanations of any equations

used to generate new data.

The database is now ready to be cleaned and formatted.

1. Open the “Formatted Worksheet”. Make sure that all cleaning and

formatting are done to this database.

2. Scan the database to make sure all the variables are present that you

need for the analysis.

3. Scan the database to make sure all the data points (rows) are present.

4. Scan the data values for each variable to see if they are generally what

you would expect to find. For any anomalies, the data should be corrected

before proceeding.

• For example, if you know that no clinic treated more than 100

people, but you find one that treated 900, check the data to

ensure this is correct or what the correct number should be.

• For large databases, you can summarize the data. Calculate the

“average,” “minimum”, and “maximum” to get a better sense of

the data.

Page 17: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

15

5. Format the data values for each variable so that they are consistent. For

text, each category should be spelled the exact same way, including how

letters are capitalized. For numeric data, no text values should appear in

the variable.

• For example, the data value for a health post might be written in

the database as “hpost”, “helth post”, “healthpost”, and “post”.

Having four ways of spelling this single value makes aggregating

the data challenging. Therefore, change all the different names

given to health post to be one consistent value, like “Health Post”.

6. The variable names should be placed in the very top row. A variable can

only be in one row, and cannot be part of a merged cell.

7. Remove any blank rows or columns.

8. For the next steps, consider if any new variables need to be created. Think about the type of outcomes you will need to answer the question

for your analysis. Will you need to compare different time points, or show

differences between population groups? Will you need to show the data

at a level of aggregation? You may need to create new variables that are

calculated using other variables.

• Aggregation: An indicator may be represented by multiple

variables that break the indicator into finer parts, such as by

showing multiple age ranges, multiple kinds of professions, or

workshop names. Grouping or aggregating these multiple

variables into one or two variables can help with the ease of

interpretation. This may not be necessary in all instances, however.

For example, if the visual will only need to show the total number

of adults and children who were tested for HIV, but the database

shows ten distinct age ranges, you must create two new variables

(one for children, and one for adults) and add the variables that

qualify as children together into the “children” variable, and add

the variables that quality as adults together into the “adults”

variable.

• Percentages: If you need to show a percentage, such as the

percent of men tested out of all people tested, create this new

variable by dividing the “men” over the “total people” who were

tested.

Page 18: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

16

• Change over time: When the goal of the analysis is to compare

one value to another, consider making a new variable that

measures the amount of change. This new variable could be

presented in the visual instead of or as a complement to the raw

numbers. Showing a value of “percentage change” makes it easier

for the audience to interpret.

The difference between “year 1” and “year 2”, for example, can be

shown first as raw totals in side-by-side bar charts. Showing either

a total change as a raw number, or a percentage change on top of

this chart or in addition to this chart can greatly help one to

interpret meaning. The formula for creating a percent change over

time is ((t2-t1)/t2), where t1 is the value at time period one, and t2

is the value at time period two.

• Normalization: To be able to compare data point values for one

variable, you may need to have the data normalized.

Normalizations transforms the data values from raw numbers to

values that are along a common scale, which allows for a much

easier comparison between data points. This could be used to

compare capacity reached at each health facility, especially if each

health facility has a unique number for capacity.

For example, one health facility can only treat 17 people, and

another can treat 48: if they both treat 17 people, the first facility is

at 100% capacity, whereas the second is at only 35%. If you only

consider that they both served 17 people, it may appear that they

are functioning equally.

Page 19: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

17

Page 20: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

18

2.4: SUMMARY TABLES

Summarizing the data into smaller tables will allow for you to make charts and

graphs more easily. The summarization is simply aggregating data based on

certain demographic or geographic characteristics. Each visualization may require

a separate summary table, which will depend on what the chart or graph is

supposed to show.

Page 21: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

19

Technical Steps: Creating Summary Tables

1. In the Data Analysis Plan, identify the first visualization that must be

created.

2. Identify the variables that apply to this visualization.

3. Create a Pivot Table

• Select the entire database in the “Formatted Worksheet” tab.

• Go to the “Insert” tab at the very top menu, and select “Pivot

Table”.

• A window will appear; simply click “OK”. A new tab has been

created. This is where the summary table will be placed.

4. Identify the how the data will be aggregated by data points (rows).

This could be done by either a demographic, geographic, or other

variable. This will become individual data points (rows) in the summary

table: each row will be a unique class of the variable. (note that the time

characteristic should be part of the variable, not the data point). The data

points for this visual could be a list of the regions, or a list of age groups,

or a list of workshop titles.

• In the “Pivot Table” pane that appeared at the right of the page,

find the variable that will form the rows. Click, hold, and drag that

variable over to the box beneath the term “Rows”. In the Pivot

Table, each class will appear once in the first column.

5. Identify the variables to add for the columns.

• In the “Pivot Table” pane that appeared at the right of the page,

find the variable that you want to show as the column. Click, hold,

and drag that variable over to the box beneath the term “Values”.

In the Pivot Table, a summary of data will appear for each class.

• Note that if the data need to be summed, counted, or averaged,

you can change this function. Select the drop-down menu beside

the variable listed in the “Values” box, choose “Value Field

Settings,” and select the option that best fits your needs.

Page 22: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

20

6. Make the summary table. The Pivot Table is always linked to the

database in the “Formatted Worksheet.” However, we recommend

copying the entire pivot table, and pasting the table as values

immediately below the pivot table. This will “de-link” the summary table,

which makes it easier to work with. Only work with this “de-linked”

summary table, and not the pivot table.

• In the summary table, change the variable names as needed so

that they make sense. These variable names will ultimately become

the labels in the chart.

Page 23: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

2: Exploration of the Data

21

Page 24: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

22

SECTION 3: VISUALIZATION OF THE DATA

Visualizing data helps us derive meaning from tables and text. In order to

visualize our data, we must first understand the various types of graphs and

charts to us, and then how to transform our database into meaningful

information through aggregation.

3.1: CHART COMPONENTS AND TYPES

Visuals can be formatted in several ways that makes it easier for the end user to

focus on the data rather than being distracted by the background chart

components. First, it is necessary to know what the various chart components are

and how they can be formatted to enhance understanding of data presented in

the visual. The components of charts and graphs are listed in the diagram below.

Microsoft Excel alone has 15 different types of charts and graphs to choose from,

which is only some of the options found throughout graphic design. For our

purposes, however, we will be limiting the number of charts we reference to the

kinds that best meet our common needs.

Page 25: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

23

A. Bar/Column Charts (vertical bars): Data values are shown as bars that start

at the bottom of the chart area on the horizontal axis and increase height along

the vertical axis. Variable classes (e.g., men, women, children) are shown along

the horizontal axis, and the amounts (data values) of each class is shown in the

height of the bars.

B. Bar/Column Charts (horizontal bars): Data values are shown as bars that

start at the side (or center) of the chart area along the vertical axis and increase in

width along the horizontal axis. Variable classes (e.g., men, women, children) are

shown along the vertical axis, and the amounts (data values) of each class is

shown in the width of the bars.

Page 26: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

24

C. Scatterplot: Scatterplots are usually used to display two different continuous

variables at once; one along the horizontal axis, and the other along the vertical

axis. Potential relationships exist between the two variables if they form patterns

on the scatterplot. However, for our use, scatterplots may be used in the same

way that bar/column charts show quantities of data for one variable.

D. Line Graph: Data values are shown as points which start at the left side of the

horizontal axis and move to the right side, indicating a movement through time.

The points are then connected with a line. The slope of the line between two

points indicates some value change (increase or decrease) over time, and stable

values will show a straight line relative to the horizontal axis.

Page 27: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

25

E. Pie Chart: Data values are represented by “slices” that indicate a proportion of

a whole number. The entire pie equals 100%, or the sum of all the component

parts. Each slice (a data value) equals a percentage of the whole, and when all

slices are added together, they must equal 100%.

Page 28: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

26

3.2: CREATING CHARTS AND GRAPHS OF YOUR DATA

The summary tables created in the previous section will be used to generate

charts and graphs of the data. To begin the technical work of visualizing the data

in the summary tables, begin by considering the following examples to know

what kind of chart to choose.

A. Compare categories of one variable at one period of time. Start with

Bar/Column Charts. Each class of the variable is represented by one bar/column.

Page 29: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

27

B. Compare categories and sub-categories of one variable at one period of

time. Are you comparing data values from multiple variables with the same

classes at one period of time? Start with the Bar/Column Chart. Each class is

represented by one bar or column per variable; multiple bars will be on each class

to represent the different variables. For all the bars of each class, they will be

touching one another on the chart so that they can easily be compared.

Page 30: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

28

Page 31: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

29

C. Compare categories of one variable over multiple time periods. Start with

a Line Graph. Each class will have their values represented by dots at each time

point; then a line will connect the points for each class. Classes will be shown by

different colors.

Page 32: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

30

D. Compare categories and sub-categories of one variable over multiple

time periods. Are you comparing data values from one variable by multiple

classes over multiple time periods? This becomes hard to visualize, especially in

one visual. Consider showing two or more visuals side by side. Each visual could

be a line graph showing classes for a single variable. To compare the data

between the graphs, be sure that the graphs have the same numeric range on the

axis, and the same classes are shown.

Page 33: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

31

E. Compare categories of a variable that are parts of a whole. Start with a Pie

Chart. Each of the parts become a slice of the “pie”. The data should be ordered

from greatest to least for ease of interpretation. Consider collapsing the data

classes into fewer classes to make interpretation easier.

Page 34: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

32

Page 35: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

33

F. Compare two variables with continuous data at one time period. For this

kind of comparison, a scatterplot is used. Both variables are placed on the chart,

one on each axis.

Page 36: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

34

G. Compare categories of one variable against a target. Start with bar/column

chart. The first bar is the actual data, and the second bar is the target. This is

especially important when the categories do not have the same target value.

On top of the column/bar chart, another graph can be added to show the

percentage of the target reached. Using the Scatterplot chart to represent the

percentage of the target reached against the backdrop of the real numbers can

be helpful to understand how the values differ. A target can be shown by creating

a line at the target value, and when all actual values are added, their distances

above and below that target line will indicate how close or far they are from the

target.

Page 37: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

35

Page 38: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

36

Page 39: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

37

Page 40: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

38

H. Highlight the time when a program started or when an event occurred

related to the data. Using a line graph to indicate the change over time of a

variable, simply add an icon through the “Insert/Shapes” tab to the graph.

Page 41: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

39

Technical Steps: Creating a chart in MS Excel.

Create your chart or graph using MS Excel.

1. For the first summary table, determine what chart will be created.

Use the questions above to determine what chart is best.

2. Next, highlight the entire summary table, including the variable

names and all the data.

3. With the entire summary table highlighted, choose the “Insert” tab

from the top menu, and choose “Recommended Charts”. A new

window will appear that shows a preview of what the chart looks

like with your data.

4. Select the chart that you need, and choose “OK”.

5. The chart appears next to the table.

6. To improve the design of the chart, proceed to the next section.

Page 42: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

40

3.3: GUIDELINES FOR DESIGNING CHARTS AND GRAPHS

For the audience to understand your chart well, it must be designed properly.

This subsection covers some common techniques to consider in order to have a

chart that looks nice and is easy to interpret.

Primary Guidelines for All Charts

1. Keep the chart simple. Visuals that are overly complicated and show too

much data make it difficult for the end user to interpret. Visuals should be

created so that they are easily and quickly understood. Ideally, the visual

should only show one main idea. While this may not always be possible,

keep it as a rule of thumb.

Consider that more than one visual can be used to answer the analysis

question. You may begin with a simple bar chart about the data, for

example, and then continue to make more charts and graphs that delve

deeper into certain variables.

We strongly recommend not using the 3-D effects for your charts. They

may look interesting, but they actually distract from the information in the

visual, and can even cause the data to be misinterpreted.

We strongly recommend to

not use 3-D effects for your

charts. They may look

interesting, but they actually

distract from the information in

the visual, and can even cause

the data to be misinterpreted.

Page 43: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

41

2. Sort the data. In general, visuals are most easily understood when they

are ordered in some way. This could be an order that the audience

expects, such as the alphabetized ordering of districts by name. You could

also order the data using the data values so the audience can easily

interpret groups of the data. For example, a chart showing data ordered

by values will have one side where all the districts have the highest values,

and the other side are districts with the lowest values. By organizing data

in some order, the audience can interpret the information more easily.

Page 44: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

42

3. Choose appropriate colors. The selection of colors is not easy, but

getting the right colors chosen is extremely important to producing a

visual that is easy to interpret. MS Excel provides predetermined color

schemes that often work well as they are, or only require a few changes.

We strongly recommend not using more than one bright, bold color.

Having bold colors beside each other can be very visually distracting.

Instead, use softer, more saturated colors. The defaults provided in MS

Excel usually offer color choices that may work well together.

The selection of colors is not

easy, but getting the right colors

chosen is extremely important

to produce a visual that is

easy to interpret.

Page 45: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

43

4. Write informative titles and subtitles. The objective of many exercises

to create visuals is to add information to a larger presentation that shares

a story about the data. However, the visual can easily end up as a stand-

alone graphic on someone else’s desk who has never seen the

presentation. Because of the potential for misinterpretation of the data, or

inability to use the data presented in the visual, each graphic should come

with basic referential information.

For showing project data, consider using the following format. A title and

sometimes a subtitle or two must be used to indicate the primary points

about the graphic: (1) the variable, (2) the demographic characteristics, (3)

the spatial unit, and (4) the time period of the data. Additionally,

appropriate secondary information more specific to the graphic should be

used, such as axis labels to describe the units of measure, and a legend

that describes the meaning for each symbol. However, the actual parts of

the title that you include should be based on your project, and on what is

helpful for the audience to understand your chart.

TITLE: [Variable Name], [Demographic Specifics], [Unit of Measure]

SUBTITLE 1: [Project the Data Comes From], [Time Period that

Pertains to the Data]

SUBTITLE 2: [Country and other location information]

Example:

HIV Positivity Testing Results of Adults by Clinic

CHASS Project, 2016 Quarter 3

Mozambique

Page 46: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

44

5. Be consistent. Your audience wants to find patterns and guides in how

you present your data. When presenting multiple charts with the same

variable, for example, ensure that the same color and style is used for that

variable in each chart. When presenting charts that are meant to be

compared side-by-side, the axis values must be consistent for a correct

interpretation.

Page 47: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

45

Page 48: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

46

Page 49: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

47

Guidelines for Communicating Ideas within Charts

The remaining guidelines are helpful when designing the chart to communicate

certain ideas of your data; thus, refer to these guidelines once you have studied

the information in the chart and know what the data in the chart mean.

6. Indicate what things are related and unrelated. Features that are

related can have the same color or similar colors; their symbols can also

be touching each other or appear grouped together. Features that are not

related should have colors that stand apart, and their symbols should

appear separated.

Page 50: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

48

7. Indicate how things are “good” or “bad. Features that have desired or

positive values, and other features that have poor or negative values can

be shown by using color schemes to help convey those messages.

Page 51: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

3: Visualization of the Data

49

8. Highlight what is important. The most important feature should stand

apart from other features in some way. This can be done by using a bold

color, or writing the text larger or in bold. This is only necessary when

trying to emphasize one part of the visual.

9. Format the visual based on the medium. A visual intended to be

displayed in a presentation will be formatted differently that a visual

intended to be viewed on a printed page or individual screen. You can

create your chart or graph in the software and export the chart as an

image that can be easily pasted into a PowerPoint presentation or Word

document.

To know the actual font size to use for PowerPoint presentations, project

a draft copy of the chart and see how easy one can read it from 3 to 5

meters away. You must also consider the size for axis and legend labels; if

they are not legible to your audience during the presentation, they will

not be able to interpret the meaning easily.

Page 52: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

50

SECTION 4: INTERPRETATION OF THE

VISUALS

How do we understand what our data mean? This section covers how to find

patterns and trends in the visuals, understand what the visuals tell us about the

data, and how to draw conclusions and make recommendations for the intended

audience.

The guidance for interpretation consists of four ideas, each with a series of

questions. Not all the questions will be relevant for each chart or graph, but they

can be useful guides when trying to find meaning in the data.

1. Identify the patterns, trends, similarities, and

differences

2. Explain why those patterns, trends, similarities, and differences exist

3. Draw conclusions 4. Identify solutions

1 What looks similar? What things are grouped together?

Why are the data similar?

What do the similarities indicate about the program?

What recommendations do you have based on your conclusions to improve a program, policy, or situation?

2 What looks different?

Why are the data different?

What do these differences indicate about the program?

3 Is this what you expected to find or not?

If this is not expected, why does it look like this?

If this is not expected, what does this indicate about the program?

4

If you show data over time: How does the data change over time?

Why does the data change over time?

What do these changes over time indicate about the program?

5

If you have a target: What is the relationship of the data to the target?

Why was the target met or not?

What does it indicate about the program that the target was/was not met?

6

If you have two variables: What is the relationship between the two values?

Why does this relationship exist?

What does this relationship indicate about the program?

7

If you have marked events on line graphs: How do the data differ before and after the event?

What impact did the event have on the data?

What does this event’s impact indicate about the program?

Page 53: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

4: Interpretation of the Visuals

51

4.1: Interpretation examples

Charts and graphs created in earlier sections are used as examples.

A. Compare categories and sub-categories of one variable at one period of

time.

Page 54: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

52

B. Compare categories and sub-categories of one variable over multiple

time periods.

Page 55: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

4: Interpretation of the Visuals

53

C. Compare categories of a variable that are parts of a whole.

Page 56: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

54

D. Compare two variables with continuous data at one time period.

Page 57: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

4: Interpretation of the Visuals

55

E. Compare categories of one variable against a target.

Page 58: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

56

4.2: Policy Advocacy Example

The example below explores the meaning within the data about a pilot program

to expand HTC provision to include lay health workers. The findings indicate they

could help increase the total numbers overall.

Page 59: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

4: Interpretation of the Visuals

57

Page 60: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

58

SECTION 5: COMMUNICATION OF THE

FINDINGS

Do not expect that your audience will be able to simply look at a graph or chart

and understand the meaning. Consider the process that we have spent working

to interpret the data ourselves; your audience has not had this same opportunity.

They are relying on you to do more than just show a table or chart; they need

you to explain what the data means, how to interpret the visuals, and what

conclusions can be made.

5.1: Create a Communication Product

Once you understand the data, you can now prepare the findings for sharing. The

charts and graphs produced so far need to be tailored for the audience based on

their capacity to understand the findings, and their needs to use the findings. You

have an obligation to produce a “communication product”, a product that

transforms the graphs and charts into useable information for the decision-

makers, like a presentation, a report, a poster, or blog post.

This product should also include a recommendation or an action which the

audience should consider and will relate directly to the goals, such as an

advocacy strategy. This section discusses the options for designing a product

tailored to your audience, while accurately reflecting the findings in the data.

You have an obligation to

produce a communication

product that transforms the

graphs and charts into

useable information.

Page 61: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

5: Communication of the Findings

59

A. Message Overview. Your communication product should consist of these

basic ideas. The way in which you present these ideas will ultimately depend on

the considerations found in the following section.

1. State the problem or issue you found.

2. Show the evidence of the problem or issue.

3. Offer a recommendation or solution to resolve the problem or issue.

B. Formatting the Communication Product. Several critical factors that will

affect what the communication product looks like are presented here, and should

be considered as the communication product is developed.

1. Know your audience. Consider conducting an audience analysis, a tool to

help you understand the main target of your information, including how

they consume information.

A target of an advocacy strategy, for example, is the person or

organization you are trying to influence to make a policy change. The

primary target is a key decision-maker such as the CEO of a hospital, the

head of a provincial AIDS committee, or the Minister of Health. The

secondary target has some influence over the primary target and may

include a chief of staff, a celebrity, or a well-known business leader.

Target audiences can include other decision-makers as well, like internal

team members who need to use the data to improve the program for the

next quarter, local clinic staff who can use the data to better understand

their client’s behaviors, or local community-based organizations who want

to improve services they provide. The funders and sponsors of your

program are also important target audiences that frequently require

information on the program.

Whoever your target audience may be, answering the series of questions

below will help you compose your communication product:

a. Level of knowledge about the issue: Is the audience well-informed

already with accurate information? What is their depth of

knowledge?

b. Level of support or opposition for the issue (for advocacy): Has

your audience supported or opposed this issue or related issues?

c. Level of literacy: Do you know if the audience is literate? Would

the audience benefit from communication done through visuals

than written text?

Page 62: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

60

d. Level of comfort with data, math, and technical concepts: What

level of technical expertise does the audience have in the specific

topic? Would information be best presented with charts and

graphs, or through a different form like with oral presentations or

story-telling?

e. Education level and job function: Does the audience have a

background in the topic or in a related topic? How does the

audience’s job relate to this topic?

f. Interests and motivations: What factors does the audience

consider when forming their opinions about new information?

g. Key words and phrases to use and avoid: What words or phrases

should be used to convey the messages to this audience that will

help them relate to the topic? And what should be avoided to

prevent confusion or because the meaning of the phrase implies

something negative?

h. Expectations: What are the expectations of the audience reviewing

the information you provide?

2. Method and setting for delivery. How will you be sharing the

information with your audience? When thinking about the manner in

which to deliver your product, consider if they would find a presentation

most useful, or a question and answer session with an expert on the data?

Settings include:

a. In person at a meeting where you can share slide shows and/or

printed material, with time for in-depth explanations and

discussion

b. Brief meetings to quickly share your main ideas

c. Presenting your work as part of a larger meeting or conference

where your audience will be attending

d. Through indirect communication as with email

3. Type of communication product. What is the actual product that you

will generate to communicate your ideas?

a. Printed materials like reports, pamphlets, and posters

b. Electronic materials like presentations and slide shows

c. Online materials like blog posts and web page content

Page 63: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

5: Communication of the Findings

61

4. The level of detail to include will vary.

a. Focus on the details relevant to the audience’s needs and

interests.

b. The amount of time they may give to hear your message will

influence the amount of information you can have in your product.

If you have 10 minutes with your audience, you might only be able

to produce a one-page, simple message. But if you are provided a

workshop setting where you have plenty of time for presentations

and discussion, you may want to generate a presentation with

slides that explain your ideas and recommendations.

c. Too much information provided can cause a busy audience to

become disinterested, and even overlook the key messages,

regardless of the amount of time provided.

d. Overall, simplistic messages may be better received by your

audience than highly complex messages, unless the audience

requests or expects a great amount of detail.

C. Maintain the Integrity of the Data. You are more than simply an advocate

for your recommendations. You are responsible for managing the data properly

as you explore it, for creating the most appropriate charts and graphs based on

the type of data you have, and for interpreting the visualizations to the best of

your ability. And when the time comes for you to communicate your findings and

advocate for certain recommendations, you are ethically bound to convey the

information and interpretations as accurately as possible without making even

minor changes that may affect the outcome to favor your ideas.

You are responsible for

managing the data properly

as you explore it, for

creating the most appropriate

charts and graphs based on the

type of data you have, and for

interpreting the visualizations to

the best of your ability.

Page 64: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

62

Policy Advocacy Example

To continue the policy example from Section 4, the findings from the analysis and

the visualization were crafted into a communication product. Note that the

communication product in Figure 32 uses the chart as only a portion of the entire

visual; the key messages are written as brief statements and listed in the order in

which they should be read (from top to bottom).

Page 65: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

5: Communication of the Findings

63

Additional Examples

The visualizations from previous sections are crafted into communication

products for examples in this section.

Page 66: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

64

Page 67: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

5: Communication of the Findings

65

Page 68: Data Analysis, Visualization, and Use · Data analytics is the process of turning raw data into useable information shared for decision-making. The process of data analytics for our

66