Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Data Analysis,
Visualization,
and Use
A practical guide for
program improvement
Version 2
July, 2017
Created by
Caleb Parker, Research Associate, FHI 360
Katherine Lew, Technical Advisor, FHI 360
Amita Mehrotra, Project Manager, FHI 360
1
Contents Introduction ........................................................................................................................ 2
Data Analytics Lifecycle ....................................................................................................... 4
SECTION 1: DEFINITION OF THE ANALYSIS ............................................................... 6
SECTION 2: EXPLORATION OF THE DATA ................................................................. 8
2.1: OVERVIEW OF DATABASES .................................................................................. 8
2.2: TYPES OF QUANTITATIVE DATA ......................................................................... 11
2.3: DATABASE CLEANING AND FORMATTING ......................................................... 12
Technical Steps: Cleaning and formatting a database .......................................... 14
2.4: SUMMARY TABLES ............................................................................................. 18
Technical Steps: Creating Summary Tables .......................................................... 19
SECTION 3: VISUALIZATION OF THE DATA ............................................................. 22
3.1: CHART COMPONENTS AND TYPES ..................................................................... 22
3.2: CREATING CHARTS AND GRAPHS OF YOUR DATA ............................................. 26
Technical Steps: Creating a chart in MS Excel. ...................................................... 39
3.3: GUIDELINES FOR DESIGNING CHARTS AND GRAPHS ......................................... 40
Primary Guidelines for All Charts .......................................................................... 40
Guidelines for Communicating Ideas within Charts ............................................. 47
SECTION 4: INTERPRETATION OF THE VISUALS ...................................................... 50
4.1: Interpretation examples .................................................................................... 51
4.2: Policy Advocacy Example ................................................................................... 56
SECTION 5: COMMUNICATION OF THE FINDINGS .................................................. 58
5.1: Create a Communication Product ...................................................................... 58
Policy Advocacy Example ...................................................................................... 62
Additional Examples.............................................................................................. 63
2
Introduction
Programs being implemented throughout the world for human development
activities generate data. The data could be as simple as recording the number of
teachers receiving a training to increase attendance in their schools. And it can be
as complex as recording the ongoing clinical history of thousands of individuals
on anti-retroviral therapy used to track their health and ensure their adherence to
the medication. Teams collect the data to report their activities to funders and
local authorities, and use the data to help make decisions for the program,
among other reasons.
But just looking at spreadsheets is not enough for us to find meaning in the data.
So how do the data collected become useful information that someone can easily
understand and then use to make decisions with? This is done through a process
called data analytics (or data analysis), which can be defined as the process
of turning raw data into useable information shared for a specific audience.
A critical component of the analysis is data visualization. Data are visualized in
graphics like charts, graphs, maps, symbols, tables, and text, and then by
examining the visuals, users can identify patterns, trends, and differences in the
data.
The final product of data analytics is not simply one graph or chart, but is a
collection of well-designed visuals that are intentionally crafted to answer a
specific question of the data that a specific audience can use for decision-making.
The interpretation of the findings represented in the visuals is clearly described in
this final product.
3
This manual is a guide designed to support data analytics for program staff and
anyone involved in collecting and using data to make decisions with. Throughout
this manual, we will cover the entire process of data analytics with the following
objectives:
• Improve collaboration between technical teams and the monitoring and
evaluation (M&E) teams
• Strengthen the capacity of staff to create and design visualizations of the
data for analysis
• Strengthen the capacity of staff to analyze data for decision-making, and
create products to communicate the interpretation of the analysis
4
Data Analytics Lifecycle
Data analytics is the process of turning raw data into useable information shared
for decision-making. The process of data analytics for our purposes comes in five
steps. Each step is necessary to produce an informative, useful understanding of
the data for the audience.
1. Definition of the analysis
a. What is purpose of this analysis?
b. What are the actual questions being asked of the data?
c. Who is the audience that will see and use the analysis?
2. Exploration of the data
a. Ensure the database is cleaned, complete and formatted properly.
b. Identify the variables in the data, the unit of measure, and the
geographic, demographic, and time dimensions of the data.
c. Aggregate and summarize the data.
3. Visualization of the data
a. Determine the most appropriate graphic to use to visualize the
data.
4. Interpretation of the visualization
a. What patterns, trends and differences exist in the data as found in
the visualization?
b. Why do those patterns, trends, and differences exist in the
visualizations?
c. What conclusions and recommendations can you determine from
the visualizations?
5. Communication of the findings
a. What information/messages should be shared with the audience,
using what form of visualization?
5
Data analytics is the process of
turning raw data
into useable information
shared for decision-making.
6
SECTION 1: DEFINITION OF THE ANALYSIS
Every analysis and visualization is based on a purpose to understand some part of
the data, to derive meaning about certain actions or results that was collected in
the data. Before conducting data analysis, consider what the analysis will inform,
and who the intended end-user, or audience, is. This information can be
organized into a formal Data Analysis Plan. The plan is usually set up at the
beginning of the program before activities begin and amended during the
program.
Data Analysis Plan: A Data Analysis Plan (DAP) is a guide that describes the
question of the data, the analyses that must be done, and who the audience will
be. Some components of the DAP included in the example in Figure 3 may not be
necessary for each project, and some may need to be added. Therefore, DAPs
should be adapted to suit the needs of the project.
The plan should begin with an objective that captures the overall idea that the
analyses will address, and then one or more specific analysis questions will be
written, as described in Figure 3. DAPs can consist of several objectives, each with
their own analysis questions. The example in Figure 3 shows one objective with
one analysis question. Each analysis question may require more than one
visualization to understand the data. In the example, note that two visualizations
are being created for the analysis question. Space is also provided for details on
the indicator, the source of the data, and the time, demographic, and geographic
characteristics.
A Data Analysis Plan (DAP)
is a guide that describes
the question of the data,
the analyses that must be done,
and who the audience will be.
7
8
SECTION 2: EXPLORATION OF THE DATA
This section begins with an overview of databases, discusses cleaning and
transforming the database, and then describes how to explore the data.
2.1: OVERVIEW OF DATABASES
While software packages may differ, the arrangement and overall structure of a
database is the same and is composed of the same components. This subsection
covers important terminology that will be used throughout the manual.
1. Indicator: Measurable variable used as a representation of an associated
factor or quantity. Indicators often comprise one or more data elements.
The indicator for the number of children testing positive for HIV may be
comprised of multiple variables in the table if the definition of “child” is
represented in multiple variables across different age groups; the
variables of “2-5 years”, “6-8 years”, and “9-11 years” for HIV positive test
results must then be added together to calculate the indicator.
2. Variable or Field (Column): Each column in the database provides space
for one, unique characteristic that can vary for each data point within a
given range (e.g., 0 to 500), and are of the same data type (i.e., nominal,
categorical, continuous). A variable could be the name of the individual
health facility, while another could be the total number of women
receiving care and treatment for HIV. Variable names are maintained in a
header row, which is always the very first row (Row 1 in MS Excel).
3. Data Point, Feature, or Element (Row): Every row in the database is a
single unit of data, referred to in several ways. For this manual, it will be
often referred to as a data point. Each point consists of that unit’s
information across all variables. Each data point must be unique, and
must be of the same unit of measure (see definition below). Note that the
first row of the database must always be reserved for variable names.
2: Exploration of the Data
9
4. Data Value (Cell): The values for each data point (row) at each variable
(column) are found where the row and column intersect. Cells cannot be
merged across rows or columns in the database. All values within one
variable must be formatted the same way. For example: a variable that
describe the number of hours spent in a training can have the data values
listed as “8”, “8:00”, “eight”, and so on. Choose one format for all data
values within each variable.
5. Class or Category: The data values of one variable can be aggregated
(grouped together) into classes. Variables that have categorical data can
easily be grouped into classes. For example, a variable “type of facility”
has only three values throughout a database: “health post,” “clinic,” and
“hospital.” The data can therefore be aggregated into three different
classes and analyzed.
6. Unique Identification Field: Each data point (row) of the database may
have some way to separate it easily from another data point. While the
names of facilities may be the unit of measure, and each one of them are
unique for each row, having a new field that enumerates each facility
name can help manage the data more easily. Text-based fields (like the
name of districts) have more chances of causing problems, such as how
names can be spelled differently (e.g., Carroogou versus Karougou).
7. Unit of Measure: Each data point (row) of the database must fit in the
same kind of measurement. This can be done with a geographic (or
spatial) characteristic, such as an individual client, an individual health
facility, or individual districts of a nation, and is often referred to as the
“organizational unit” (note that for mapping, the database’s unit of
measure must be geographic). This could be done using other variables as
well, or a combination of variables. Each database row must be unique,
meaning that each specific combination of a unit of measure and
variable(s) cannot be repeated. Databases that are disaggregated to lower
levels of measure (e.g., health facilities, schools, organization’s centers)
make analyses easier that those that are aggregated to higher levels (e.g.,
regions, school districts, entire organizations).
10
2: Exploration of the Data
11
2.2: TYPES OF QUANTITATIVE DATA
Each variable will consist of one of these three types of data: nominal, categorical,
or continuous. Nominal and categorical can be text or number, but continuous
can only be numeric: if any text is written in a cell for this column, the software
may not be able to recognize it as continuous, therefore limiting the ability to
visualize the data.
1. Nominal or Categorical: These are data which have no specific order or
value associated with them. One must not come before or after another
(order), and one is not “better” or “greater than” another (value). Usually,
nominal data for our purposes include names for districts, regions, or
health facilities and schools. While one health facility may be described as
a higher quality than another, here we are only considering the actual
name and nothing else. Other variables for those health facilities will place
a value to it (e.g., number of services offered), and an order to it (e.g., type
of health facility).
2. Ranked or Ordered: Data that fit into this category have attributes where
one unit of data comes before or after another. The actual value between
different data units is not important. For our purposes, this could be the
type of administrative boundary: national boundaries come first, then
regions, then districts, and so on. The type of health facility is another
example. Health facilities are classified by services that are offered,
including health posts, health centers, and hospitals. Health posts offer
fewer services, while hospitals offer more services: in this way, they can be
organized by the services they provide.
3. Continuous: For these variables, the data are numeric and those numbers
indicate order (one comes before another) and value (one is greater than
or less than another). The number of clients seen at a health facility is one
such example. If that facility sees 100 clients during one month, and then
the next month sees 150 clients, you can say more clients were seen in the
second month. Continuous data can be classified with specific data ranges
as well, which then makes the data type ranked or ordered. Classifying
continuous data is useful in mapping for better visualizations, but may not
be necessary in other visualization formats.
12
2.3: DATABASE CLEANING AND FORMATTING
Before exploring the data, (1) the data must be cleaned, and the (2) database
must be formatted and set up for easy analysis. The processes for cleaning and
formatting the data are often overlapping, and are presented as one in this
subsection.
The goal of data cleaning is to make certain of the accuracy and completeness of
the data. This also includes ensuring that the no data points are duplicated, and
that missing data values are supposed to be blank as opposed to being
accidentally deleted or not entered.
Formatting the database means that the database is structured in a way that
analysis can easily and directly be carried out. This includes making data values
for each variable consistent in spelling and format, creating new variables needed
for the analysis, as well as removing empty columns and rows.
2: Exploration of the Data
13
In Figure 5, the original database was cleaned and transformed in the following ways:
• Unmerge all merged cells, and write the data values for each data point
• Remove blank row in the middle of the database
• Dates should be formatted the same way
• Workshop hours should be formatted the same way
• Gender - use the same spelling for each gender consistently
• Role - use the same spelling for each role consistently
• Organization - use the same spelling for each organization consistently
The goal of data cleaning is
to make certain of the
accuracy and completeness
of the data.
Formatting the database
means that the
database is structured in
a way that analysis can
easily and directly
be carried out.
14
Technical Steps: Cleaning and formatting a database
Copy the original database into a new tab called “Formatted Database”. Before
doing anything to the database, make a copy. Only work on the copy, leaving the
original untouched for data quality control. If some error corrupts your database,
you should always have an original to return to. Therefore, name the original
database worksheet as “Original Database”. Create a new worksheet in the same
MS Excel file and name it “Formatted Database”. Make all changes in the
“Formatted Database” worksheet.
Document changes to the database in a “Notes” tab: Every change made to the
original database should be documented. Consider the likelihood that someone
may ask you to make a slight change to your visual weeks after you created it, or,
harder still, if someone else needs to make changes using your files. For you and
others to easily know how to make changes and quickly create updated visuals,
having such documentation is essential. This section outlines some key
considerations for keeping track of changes made to an Excel spreadsheet.
Make a new worksheet/tab in the same MS Excel file called like “Notes” to explain
all variable names that are not easy to understand in the database, including new
variables that you calculate or add. Also include explanations of any equations
used to generate new data.
The database is now ready to be cleaned and formatted.
1. Open the “Formatted Worksheet”. Make sure that all cleaning and
formatting are done to this database.
2. Scan the database to make sure all the variables are present that you
need for the analysis.
3. Scan the database to make sure all the data points (rows) are present.
4. Scan the data values for each variable to see if they are generally what
you would expect to find. For any anomalies, the data should be corrected
before proceeding.
• For example, if you know that no clinic treated more than 100
people, but you find one that treated 900, check the data to
ensure this is correct or what the correct number should be.
• For large databases, you can summarize the data. Calculate the
“average,” “minimum”, and “maximum” to get a better sense of
the data.
2: Exploration of the Data
15
5. Format the data values for each variable so that they are consistent. For
text, each category should be spelled the exact same way, including how
letters are capitalized. For numeric data, no text values should appear in
the variable.
• For example, the data value for a health post might be written in
the database as “hpost”, “helth post”, “healthpost”, and “post”.
Having four ways of spelling this single value makes aggregating
the data challenging. Therefore, change all the different names
given to health post to be one consistent value, like “Health Post”.
6. The variable names should be placed in the very top row. A variable can
only be in one row, and cannot be part of a merged cell.
7. Remove any blank rows or columns.
8. For the next steps, consider if any new variables need to be created. Think about the type of outcomes you will need to answer the question
for your analysis. Will you need to compare different time points, or show
differences between population groups? Will you need to show the data
at a level of aggregation? You may need to create new variables that are
calculated using other variables.
• Aggregation: An indicator may be represented by multiple
variables that break the indicator into finer parts, such as by
showing multiple age ranges, multiple kinds of professions, or
workshop names. Grouping or aggregating these multiple
variables into one or two variables can help with the ease of
interpretation. This may not be necessary in all instances, however.
For example, if the visual will only need to show the total number
of adults and children who were tested for HIV, but the database
shows ten distinct age ranges, you must create two new variables
(one for children, and one for adults) and add the variables that
qualify as children together into the “children” variable, and add
the variables that quality as adults together into the “adults”
variable.
• Percentages: If you need to show a percentage, such as the
percent of men tested out of all people tested, create this new
variable by dividing the “men” over the “total people” who were
tested.
16
• Change over time: When the goal of the analysis is to compare
one value to another, consider making a new variable that
measures the amount of change. This new variable could be
presented in the visual instead of or as a complement to the raw
numbers. Showing a value of “percentage change” makes it easier
for the audience to interpret.
The difference between “year 1” and “year 2”, for example, can be
shown first as raw totals in side-by-side bar charts. Showing either
a total change as a raw number, or a percentage change on top of
this chart or in addition to this chart can greatly help one to
interpret meaning. The formula for creating a percent change over
time is ((t2-t1)/t2), where t1 is the value at time period one, and t2
is the value at time period two.
• Normalization: To be able to compare data point values for one
variable, you may need to have the data normalized.
Normalizations transforms the data values from raw numbers to
values that are along a common scale, which allows for a much
easier comparison between data points. This could be used to
compare capacity reached at each health facility, especially if each
health facility has a unique number for capacity.
For example, one health facility can only treat 17 people, and
another can treat 48: if they both treat 17 people, the first facility is
at 100% capacity, whereas the second is at only 35%. If you only
consider that they both served 17 people, it may appear that they
are functioning equally.
2: Exploration of the Data
17
18
2.4: SUMMARY TABLES
Summarizing the data into smaller tables will allow for you to make charts and
graphs more easily. The summarization is simply aggregating data based on
certain demographic or geographic characteristics. Each visualization may require
a separate summary table, which will depend on what the chart or graph is
supposed to show.
2: Exploration of the Data
19
Technical Steps: Creating Summary Tables
1. In the Data Analysis Plan, identify the first visualization that must be
created.
2. Identify the variables that apply to this visualization.
3. Create a Pivot Table
• Select the entire database in the “Formatted Worksheet” tab.
• Go to the “Insert” tab at the very top menu, and select “Pivot
Table”.
• A window will appear; simply click “OK”. A new tab has been
created. This is where the summary table will be placed.
4. Identify the how the data will be aggregated by data points (rows).
This could be done by either a demographic, geographic, or other
variable. This will become individual data points (rows) in the summary
table: each row will be a unique class of the variable. (note that the time
characteristic should be part of the variable, not the data point). The data
points for this visual could be a list of the regions, or a list of age groups,
or a list of workshop titles.
• In the “Pivot Table” pane that appeared at the right of the page,
find the variable that will form the rows. Click, hold, and drag that
variable over to the box beneath the term “Rows”. In the Pivot
Table, each class will appear once in the first column.
5. Identify the variables to add for the columns.
• In the “Pivot Table” pane that appeared at the right of the page,
find the variable that you want to show as the column. Click, hold,
and drag that variable over to the box beneath the term “Values”.
In the Pivot Table, a summary of data will appear for each class.
• Note that if the data need to be summed, counted, or averaged,
you can change this function. Select the drop-down menu beside
the variable listed in the “Values” box, choose “Value Field
Settings,” and select the option that best fits your needs.
20
6. Make the summary table. The Pivot Table is always linked to the
database in the “Formatted Worksheet.” However, we recommend
copying the entire pivot table, and pasting the table as values
immediately below the pivot table. This will “de-link” the summary table,
which makes it easier to work with. Only work with this “de-linked”
summary table, and not the pivot table.
• In the summary table, change the variable names as needed so
that they make sense. These variable names will ultimately become
the labels in the chart.
2: Exploration of the Data
21
22
SECTION 3: VISUALIZATION OF THE DATA
Visualizing data helps us derive meaning from tables and text. In order to
visualize our data, we must first understand the various types of graphs and
charts to us, and then how to transform our database into meaningful
information through aggregation.
3.1: CHART COMPONENTS AND TYPES
Visuals can be formatted in several ways that makes it easier for the end user to
focus on the data rather than being distracted by the background chart
components. First, it is necessary to know what the various chart components are
and how they can be formatted to enhance understanding of data presented in
the visual. The components of charts and graphs are listed in the diagram below.
Microsoft Excel alone has 15 different types of charts and graphs to choose from,
which is only some of the options found throughout graphic design. For our
purposes, however, we will be limiting the number of charts we reference to the
kinds that best meet our common needs.
3: Visualization of the Data
23
A. Bar/Column Charts (vertical bars): Data values are shown as bars that start
at the bottom of the chart area on the horizontal axis and increase height along
the vertical axis. Variable classes (e.g., men, women, children) are shown along
the horizontal axis, and the amounts (data values) of each class is shown in the
height of the bars.
B. Bar/Column Charts (horizontal bars): Data values are shown as bars that
start at the side (or center) of the chart area along the vertical axis and increase in
width along the horizontal axis. Variable classes (e.g., men, women, children) are
shown along the vertical axis, and the amounts (data values) of each class is
shown in the width of the bars.
24
C. Scatterplot: Scatterplots are usually used to display two different continuous
variables at once; one along the horizontal axis, and the other along the vertical
axis. Potential relationships exist between the two variables if they form patterns
on the scatterplot. However, for our use, scatterplots may be used in the same
way that bar/column charts show quantities of data for one variable.
D. Line Graph: Data values are shown as points which start at the left side of the
horizontal axis and move to the right side, indicating a movement through time.
The points are then connected with a line. The slope of the line between two
points indicates some value change (increase or decrease) over time, and stable
values will show a straight line relative to the horizontal axis.
3: Visualization of the Data
25
E. Pie Chart: Data values are represented by “slices” that indicate a proportion of
a whole number. The entire pie equals 100%, or the sum of all the component
parts. Each slice (a data value) equals a percentage of the whole, and when all
slices are added together, they must equal 100%.
26
3.2: CREATING CHARTS AND GRAPHS OF YOUR DATA
The summary tables created in the previous section will be used to generate
charts and graphs of the data. To begin the technical work of visualizing the data
in the summary tables, begin by considering the following examples to know
what kind of chart to choose.
A. Compare categories of one variable at one period of time. Start with
Bar/Column Charts. Each class of the variable is represented by one bar/column.
3: Visualization of the Data
27
B. Compare categories and sub-categories of one variable at one period of
time. Are you comparing data values from multiple variables with the same
classes at one period of time? Start with the Bar/Column Chart. Each class is
represented by one bar or column per variable; multiple bars will be on each class
to represent the different variables. For all the bars of each class, they will be
touching one another on the chart so that they can easily be compared.
28
3: Visualization of the Data
29
C. Compare categories of one variable over multiple time periods. Start with
a Line Graph. Each class will have their values represented by dots at each time
point; then a line will connect the points for each class. Classes will be shown by
different colors.
30
D. Compare categories and sub-categories of one variable over multiple
time periods. Are you comparing data values from one variable by multiple
classes over multiple time periods? This becomes hard to visualize, especially in
one visual. Consider showing two or more visuals side by side. Each visual could
be a line graph showing classes for a single variable. To compare the data
between the graphs, be sure that the graphs have the same numeric range on the
axis, and the same classes are shown.
3: Visualization of the Data
31
E. Compare categories of a variable that are parts of a whole. Start with a Pie
Chart. Each of the parts become a slice of the “pie”. The data should be ordered
from greatest to least for ease of interpretation. Consider collapsing the data
classes into fewer classes to make interpretation easier.
32
3: Visualization of the Data
33
F. Compare two variables with continuous data at one time period. For this
kind of comparison, a scatterplot is used. Both variables are placed on the chart,
one on each axis.
34
G. Compare categories of one variable against a target. Start with bar/column
chart. The first bar is the actual data, and the second bar is the target. This is
especially important when the categories do not have the same target value.
On top of the column/bar chart, another graph can be added to show the
percentage of the target reached. Using the Scatterplot chart to represent the
percentage of the target reached against the backdrop of the real numbers can
be helpful to understand how the values differ. A target can be shown by creating
a line at the target value, and when all actual values are added, their distances
above and below that target line will indicate how close or far they are from the
target.
3: Visualization of the Data
35
36
3: Visualization of the Data
37
38
H. Highlight the time when a program started or when an event occurred
related to the data. Using a line graph to indicate the change over time of a
variable, simply add an icon through the “Insert/Shapes” tab to the graph.
3: Visualization of the Data
39
Technical Steps: Creating a chart in MS Excel.
Create your chart or graph using MS Excel.
1. For the first summary table, determine what chart will be created.
Use the questions above to determine what chart is best.
2. Next, highlight the entire summary table, including the variable
names and all the data.
3. With the entire summary table highlighted, choose the “Insert” tab
from the top menu, and choose “Recommended Charts”. A new
window will appear that shows a preview of what the chart looks
like with your data.
4. Select the chart that you need, and choose “OK”.
5. The chart appears next to the table.
6. To improve the design of the chart, proceed to the next section.
40
3.3: GUIDELINES FOR DESIGNING CHARTS AND GRAPHS
For the audience to understand your chart well, it must be designed properly.
This subsection covers some common techniques to consider in order to have a
chart that looks nice and is easy to interpret.
Primary Guidelines for All Charts
1. Keep the chart simple. Visuals that are overly complicated and show too
much data make it difficult for the end user to interpret. Visuals should be
created so that they are easily and quickly understood. Ideally, the visual
should only show one main idea. While this may not always be possible,
keep it as a rule of thumb.
Consider that more than one visual can be used to answer the analysis
question. You may begin with a simple bar chart about the data, for
example, and then continue to make more charts and graphs that delve
deeper into certain variables.
We strongly recommend not using the 3-D effects for your charts. They
may look interesting, but they actually distract from the information in the
visual, and can even cause the data to be misinterpreted.
We strongly recommend to
not use 3-D effects for your
charts. They may look
interesting, but they actually
distract from the information in
the visual, and can even cause
the data to be misinterpreted.
3: Visualization of the Data
41
2. Sort the data. In general, visuals are most easily understood when they
are ordered in some way. This could be an order that the audience
expects, such as the alphabetized ordering of districts by name. You could
also order the data using the data values so the audience can easily
interpret groups of the data. For example, a chart showing data ordered
by values will have one side where all the districts have the highest values,
and the other side are districts with the lowest values. By organizing data
in some order, the audience can interpret the information more easily.
42
3. Choose appropriate colors. The selection of colors is not easy, but
getting the right colors chosen is extremely important to producing a
visual that is easy to interpret. MS Excel provides predetermined color
schemes that often work well as they are, or only require a few changes.
We strongly recommend not using more than one bright, bold color.
Having bold colors beside each other can be very visually distracting.
Instead, use softer, more saturated colors. The defaults provided in MS
Excel usually offer color choices that may work well together.
The selection of colors is not
easy, but getting the right colors
chosen is extremely important
to produce a visual that is
easy to interpret.
3: Visualization of the Data
43
4. Write informative titles and subtitles. The objective of many exercises
to create visuals is to add information to a larger presentation that shares
a story about the data. However, the visual can easily end up as a stand-
alone graphic on someone else’s desk who has never seen the
presentation. Because of the potential for misinterpretation of the data, or
inability to use the data presented in the visual, each graphic should come
with basic referential information.
For showing project data, consider using the following format. A title and
sometimes a subtitle or two must be used to indicate the primary points
about the graphic: (1) the variable, (2) the demographic characteristics, (3)
the spatial unit, and (4) the time period of the data. Additionally,
appropriate secondary information more specific to the graphic should be
used, such as axis labels to describe the units of measure, and a legend
that describes the meaning for each symbol. However, the actual parts of
the title that you include should be based on your project, and on what is
helpful for the audience to understand your chart.
TITLE: [Variable Name], [Demographic Specifics], [Unit of Measure]
SUBTITLE 1: [Project the Data Comes From], [Time Period that
Pertains to the Data]
SUBTITLE 2: [Country and other location information]
Example:
HIV Positivity Testing Results of Adults by Clinic
CHASS Project, 2016 Quarter 3
Mozambique
44
5. Be consistent. Your audience wants to find patterns and guides in how
you present your data. When presenting multiple charts with the same
variable, for example, ensure that the same color and style is used for that
variable in each chart. When presenting charts that are meant to be
compared side-by-side, the axis values must be consistent for a correct
interpretation.
3: Visualization of the Data
45
46
3: Visualization of the Data
47
Guidelines for Communicating Ideas within Charts
The remaining guidelines are helpful when designing the chart to communicate
certain ideas of your data; thus, refer to these guidelines once you have studied
the information in the chart and know what the data in the chart mean.
6. Indicate what things are related and unrelated. Features that are
related can have the same color or similar colors; their symbols can also
be touching each other or appear grouped together. Features that are not
related should have colors that stand apart, and their symbols should
appear separated.
48
7. Indicate how things are “good” or “bad. Features that have desired or
positive values, and other features that have poor or negative values can
be shown by using color schemes to help convey those messages.
3: Visualization of the Data
49
8. Highlight what is important. The most important feature should stand
apart from other features in some way. This can be done by using a bold
color, or writing the text larger or in bold. This is only necessary when
trying to emphasize one part of the visual.
9. Format the visual based on the medium. A visual intended to be
displayed in a presentation will be formatted differently that a visual
intended to be viewed on a printed page or individual screen. You can
create your chart or graph in the software and export the chart as an
image that can be easily pasted into a PowerPoint presentation or Word
document.
To know the actual font size to use for PowerPoint presentations, project
a draft copy of the chart and see how easy one can read it from 3 to 5
meters away. You must also consider the size for axis and legend labels; if
they are not legible to your audience during the presentation, they will
not be able to interpret the meaning easily.
50
SECTION 4: INTERPRETATION OF THE
VISUALS
How do we understand what our data mean? This section covers how to find
patterns and trends in the visuals, understand what the visuals tell us about the
data, and how to draw conclusions and make recommendations for the intended
audience.
The guidance for interpretation consists of four ideas, each with a series of
questions. Not all the questions will be relevant for each chart or graph, but they
can be useful guides when trying to find meaning in the data.
1. Identify the patterns, trends, similarities, and
differences
2. Explain why those patterns, trends, similarities, and differences exist
3. Draw conclusions 4. Identify solutions
1 What looks similar? What things are grouped together?
Why are the data similar?
What do the similarities indicate about the program?
What recommendations do you have based on your conclusions to improve a program, policy, or situation?
2 What looks different?
Why are the data different?
What do these differences indicate about the program?
3 Is this what you expected to find or not?
If this is not expected, why does it look like this?
If this is not expected, what does this indicate about the program?
4
If you show data over time: How does the data change over time?
Why does the data change over time?
What do these changes over time indicate about the program?
5
If you have a target: What is the relationship of the data to the target?
Why was the target met or not?
What does it indicate about the program that the target was/was not met?
6
If you have two variables: What is the relationship between the two values?
Why does this relationship exist?
What does this relationship indicate about the program?
7
If you have marked events on line graphs: How do the data differ before and after the event?
What impact did the event have on the data?
What does this event’s impact indicate about the program?
4: Interpretation of the Visuals
51
4.1: Interpretation examples
Charts and graphs created in earlier sections are used as examples.
A. Compare categories and sub-categories of one variable at one period of
time.
52
B. Compare categories and sub-categories of one variable over multiple
time periods.
4: Interpretation of the Visuals
53
C. Compare categories of a variable that are parts of a whole.
54
D. Compare two variables with continuous data at one time period.
4: Interpretation of the Visuals
55
E. Compare categories of one variable against a target.
56
4.2: Policy Advocacy Example
The example below explores the meaning within the data about a pilot program
to expand HTC provision to include lay health workers. The findings indicate they
could help increase the total numbers overall.
4: Interpretation of the Visuals
57
58
SECTION 5: COMMUNICATION OF THE
FINDINGS
Do not expect that your audience will be able to simply look at a graph or chart
and understand the meaning. Consider the process that we have spent working
to interpret the data ourselves; your audience has not had this same opportunity.
They are relying on you to do more than just show a table or chart; they need
you to explain what the data means, how to interpret the visuals, and what
conclusions can be made.
5.1: Create a Communication Product
Once you understand the data, you can now prepare the findings for sharing. The
charts and graphs produced so far need to be tailored for the audience based on
their capacity to understand the findings, and their needs to use the findings. You
have an obligation to produce a “communication product”, a product that
transforms the graphs and charts into useable information for the decision-
makers, like a presentation, a report, a poster, or blog post.
This product should also include a recommendation or an action which the
audience should consider and will relate directly to the goals, such as an
advocacy strategy. This section discusses the options for designing a product
tailored to your audience, while accurately reflecting the findings in the data.
You have an obligation to
produce a communication
product that transforms the
graphs and charts into
useable information.
5: Communication of the Findings
59
A. Message Overview. Your communication product should consist of these
basic ideas. The way in which you present these ideas will ultimately depend on
the considerations found in the following section.
1. State the problem or issue you found.
2. Show the evidence of the problem or issue.
3. Offer a recommendation or solution to resolve the problem or issue.
B. Formatting the Communication Product. Several critical factors that will
affect what the communication product looks like are presented here, and should
be considered as the communication product is developed.
1. Know your audience. Consider conducting an audience analysis, a tool to
help you understand the main target of your information, including how
they consume information.
A target of an advocacy strategy, for example, is the person or
organization you are trying to influence to make a policy change. The
primary target is a key decision-maker such as the CEO of a hospital, the
head of a provincial AIDS committee, or the Minister of Health. The
secondary target has some influence over the primary target and may
include a chief of staff, a celebrity, or a well-known business leader.
Target audiences can include other decision-makers as well, like internal
team members who need to use the data to improve the program for the
next quarter, local clinic staff who can use the data to better understand
their client’s behaviors, or local community-based organizations who want
to improve services they provide. The funders and sponsors of your
program are also important target audiences that frequently require
information on the program.
Whoever your target audience may be, answering the series of questions
below will help you compose your communication product:
a. Level of knowledge about the issue: Is the audience well-informed
already with accurate information? What is their depth of
knowledge?
b. Level of support or opposition for the issue (for advocacy): Has
your audience supported or opposed this issue or related issues?
c. Level of literacy: Do you know if the audience is literate? Would
the audience benefit from communication done through visuals
than written text?
60
d. Level of comfort with data, math, and technical concepts: What
level of technical expertise does the audience have in the specific
topic? Would information be best presented with charts and
graphs, or through a different form like with oral presentations or
story-telling?
e. Education level and job function: Does the audience have a
background in the topic or in a related topic? How does the
audience’s job relate to this topic?
f. Interests and motivations: What factors does the audience
consider when forming their opinions about new information?
g. Key words and phrases to use and avoid: What words or phrases
should be used to convey the messages to this audience that will
help them relate to the topic? And what should be avoided to
prevent confusion or because the meaning of the phrase implies
something negative?
h. Expectations: What are the expectations of the audience reviewing
the information you provide?
2. Method and setting for delivery. How will you be sharing the
information with your audience? When thinking about the manner in
which to deliver your product, consider if they would find a presentation
most useful, or a question and answer session with an expert on the data?
Settings include:
a. In person at a meeting where you can share slide shows and/or
printed material, with time for in-depth explanations and
discussion
b. Brief meetings to quickly share your main ideas
c. Presenting your work as part of a larger meeting or conference
where your audience will be attending
d. Through indirect communication as with email
3. Type of communication product. What is the actual product that you
will generate to communicate your ideas?
a. Printed materials like reports, pamphlets, and posters
b. Electronic materials like presentations and slide shows
c. Online materials like blog posts and web page content
5: Communication of the Findings
61
4. The level of detail to include will vary.
a. Focus on the details relevant to the audience’s needs and
interests.
b. The amount of time they may give to hear your message will
influence the amount of information you can have in your product.
If you have 10 minutes with your audience, you might only be able
to produce a one-page, simple message. But if you are provided a
workshop setting where you have plenty of time for presentations
and discussion, you may want to generate a presentation with
slides that explain your ideas and recommendations.
c. Too much information provided can cause a busy audience to
become disinterested, and even overlook the key messages,
regardless of the amount of time provided.
d. Overall, simplistic messages may be better received by your
audience than highly complex messages, unless the audience
requests or expects a great amount of detail.
C. Maintain the Integrity of the Data. You are more than simply an advocate
for your recommendations. You are responsible for managing the data properly
as you explore it, for creating the most appropriate charts and graphs based on
the type of data you have, and for interpreting the visualizations to the best of
your ability. And when the time comes for you to communicate your findings and
advocate for certain recommendations, you are ethically bound to convey the
information and interpretations as accurately as possible without making even
minor changes that may affect the outcome to favor your ideas.
You are responsible for
managing the data properly
as you explore it, for
creating the most appropriate
charts and graphs based on the
type of data you have, and for
interpreting the visualizations to
the best of your ability.
62
Policy Advocacy Example
To continue the policy example from Section 4, the findings from the analysis and
the visualization were crafted into a communication product. Note that the
communication product in Figure 32 uses the chart as only a portion of the entire
visual; the key messages are written as brief statements and listed in the order in
which they should be read (from top to bottom).
5: Communication of the Findings
63
Additional Examples
The visualizations from previous sections are crafted into communication
products for examples in this section.
64
5: Communication of the Findings
65
66