41
P REVALENCE INTRODUCTION Definition........................................33 Use in public health and research.................34 MEASUREMENTS OF PREVALENCE Point prevalence.................................. 34 Period prevalence.................................34 Life time prevalence..............................34 EXAMPLES OF PREVALENCE STUDIES Seroprevalence studies............................36 Repeat prevalence studies.........................36 METHODOLOGY Sampling .........................................36 Sample size ......................................37 Primary and secondary source of data .............38 Types of bias .................................... 38 Survivor bias ...............................38 Participant selection bias ..................38 Observer bias ...............................39 DATA ANALYSIS Measurement of prevalence ........................39 Stratification ...................................40 Logistic regression ..............................40 ADVANTAGES AND LIMITATIONS.............................41 CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY..........42 ADDITIONAL READING.....................................43 EXERCISES..............................................44 STUDIES

PREVALENCE STUDIES - Universidade Federal de Goiás  · Web view- Prevalence studies are often used as a ... in an initial serological screening for Trypanosoma cruzi infection among

Embed Size (px)

Citation preview

P

REVALENCE

INTRODUCTION

Definition.........................................................................................................33Use in public health and research.....................................................................34

MEASUREMENTS OF PREVALENCE

Point prevalence...............................................................................................34Period prevalence.............................................................................................34Life time prevalence.........................................................................................34

EXAMPLES OF PREVALENCE STUDIESSeroprevalence studies.....................................................................................36Repeat prevalence studies................................................................................36

METHODOLOGYSampling .........................................................................................................36Sample size .....................................................................................................37Primary and secondary source of data .............................................................38

Types of bias ...................................................................................................38Survivor bias .......................................................................................38Participant selection bias .....................................................................38Observer bias ......................................................................................39

DATA ANALYSISMeasurement of prevalence .............................................................................39Stratification ....................................................................................................40Logistic regression .........................................................................................40

ADVANTAGES AND LIMITATIONS....................................................................41

CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY......................42

ADDITIONAL READING........................................................................................43

EXERCISES..............................................................................................................44

DATA FILE DICTIONARY....................................................................................59

STUDIES

Prevalence studies

INTRODUCTION

DefinitionPrevalence or cross-sectional are the most common population-based epidemiological studies. They are designed to estimate the frequency of a health event in the population at a point in time or over a short period of time. Cross-sectional studies can also be used to investigate associations between risk factors and disease, although this is not the most efficient design to study causality.

A prevalence rate (P) is estimated by:

P =

number of persons with the event of interest (infection, disease, etc )

population at risk of presenting the event of interest during a given time

The population at risk is usually the population living in the study area, or it is defined by geographical, administrative, demographical, occupational, or other parameter, such as health services clients. Prevalence rate is reported on a population base, eg. 5 cases of a disease per 100 inhabitants (5%).

Prevalence is influenced by the incidence (I), and mean duration (D) of the disease. As a proportion, the numerator is part of denominator and has no unit - value ranges from 0 to 1. When incidence and the population dynamic are constant, prevalence (P) may be calculated as by:

P = incidence x duration of disease

The duration of a disease can be obtained when the incidence and prevalence are known. An area reporting, for example, an incidence of 3.3 new cases of tuberculosis per year, per 100,000 inhabitants, and prevalence rate of 19.8/100,000 will estimate an average duration of the disease as :

D =P

=19.8

= 6 monthsI 3.3

Prevalence can be evaluated by a single (cross-sectional) measurement. Conversely, incidence requires at least two measurements of the same population: one at the beginning

33

Prevalence studies

of a follow-up to exclude those who are already sick/infected, and the other to detect the emergence of new cases. For infectious diseases of rapid evolution prevalence measurement has no significance. For events (infections and diseases) of longer or chronic duration, prevalence may indicate the risk of exposure for susceptible individuals.

Use in Public Health and Research - Prevalence studies are often used as a baseline measurement for the monitoring of control programmes. They are also used in the selection of participants for other studies such as case-control, cohort and clinical trials. For example, in an initial serological screening for Trypanosoma cruzi infection among a large schoolchildren population in a rural area in Brazil a prevalence of 7.9% (95% confidence interval 6.8%-9.1%) was reported (Andrade et al., 1992). A sample of those seropositive children was then selected to participate in a clinical trial to evaluate the efficacy of benznidazol as a specific treatment. In addition, seropositive and seronegative matched controls (case-control) were compared to evaluate environmental, familial and nutritional risk factors associated with T. cruzi infection.

MEASUREMENTS OF PREVALENCE

Prevalence is considered a simple measure of burden of disease. In infectious disease, however, its interpretation would requires a more extensive knowledge of the mechanisms by which the infection is transmitted, pre-clinical and clinical evolution of the disease, as well as, the duration of the infection/disease and the case-fatality rate. The criteria for defining of infection and disease, and the probable duration of the event until recovery or death, must be specified in the design stage of a prevalence study.

The most commonly used types of prevalence rate are: point, period or lifetime prevalence.

Point prevalence - is the total number of individuals with a disease or an attribute at a specific point in time divided by the population at risk of having the attribute or disease at this point in time. The prevalence rate has been used as a synonym for the point preva-lence rate.

Period prevalence - is the total number of individuals with a disease or an attribute in a given period of time divided by the population at risk of having the attribute or disease during the same period of time.

Lifetime prevalence - is the total number of person known to have had the disease or attribute at least part of their life.

Figure 1 illustrates the concepts of point and period prevalence in malaria. The point prevalence in endemic areas of malaria can be obtained by the parasitological screening of a population over a short period of time. Differences between the prevalence of infection and the incidence of clinical cases depend on the levels of endemicity. According to the example in figure 1, at the beginning of 1992 the point prevalence of symptomatic malaria was 4 cases, and 5 new cases were diagnosed during the year (incidence), yielding a period prevalence of 9. At the beginning of 1993 the point prevalence of infection was 12 cases and the number of clinical cases 3, which illustrates the differences between point prevalences of infection and disease, respectively.

34

Prevalence studies

Prevalence estimates in control activities are influenced by the operation and diagnosis criteria. Changes in case definition, treatment schemes and discharge criteria may change prevalence figures. Mass interventions potentially interfere with the transmissibility of an infectious disease, its incidence, duration and characteristics of the infection/disease of existing cases. In the case of leprosy control, for example, target areas for elimination are defined as those with prevalence rates below 1 case per 10,000.

Figure 2 illustrates the concept of point and period prevalence for leprosy. Assume 500 cases (N) at the beginning of the period (t0) and that all new cases (A = 250), regardless of their clinical form, occurred at the same time, at mid-year (t1). The period prevalence (Δt1) is 750 cases; 500 at the start of the period, plus 250 new cases. Assuming that in time t1 there were 350 discharges (B = 350), the prevalence at point t1 is the net number of cases (N - B = 150) plus the new cases (A = 250), which totals 400 cases. Thus, in a situation of stable incidence, reduction of the point prevalence will depend on the number of patients treated (cured or discharged) and the proportion that defaulted from treatment.

35

Prevalence studies

EXAMPLES OF PREVALENCE STUDIES

Seroprevalence studies - are particularly useful for infectious diseases that induce antibody response or other biological markers. Seroprevalence studies are used to determine geographic distribution of a large number of diseases, such as hepatitis A, B, C, HIV and also in surveys before and after vaccinations to evaluate antibody seroconversion. Prevalence is estimated with respect to age and sex in order to understand the dynamics of transmission of infection in the community. This type of analysis allows the identification of areas of high risk within the community, carriers, immune and susceptible individuals. The analysis should indicate the current and past disease/infection/immunity situation, providing useful information to predict future risk of transmission.

Repeat prevalence studies - are prevalence studies repeated at intervals, generally of years or decades. They differ from cohort studies for not studying the same group of individuals, but instead, by evaluating independent samples of a given population in different period of time. Thus, it is unlikely that in probability sampling the same individuals will be selected to measure the events of interest. These repeated prevalence studies are used to evaluate the health/disease/infection status and trend of large populations; they are important for the planning of health services and for the detection of changes in the health situation over time. Examples are nutritional and household surveys in censuses.

METHODOLOGY

Sampling

Random sampling - prevalence studies generally require random sampling of a population. A probability sample substantially increases the chance that the participants will be representative of the target population and so assures the internal validity of the study.

36

Treatment+N – B = 150

Defaulters

New cases(A = 250)

t0t1 t1 t2

t2

Discharges (B = 350)}{N=500

Figure 2Leprosy - Point and period prevalence

Prevalence studies

This also allows for the extrapolation of study results to other communities (external validity).

Systematic sampling – a systematic sampling will use some type of pre-established sequence to select participants, for example, from files of medical histories, houses on a street, or patients presenting spontaneously in ambulatory clinics.

Stratified sampling - this involves dividing the population into distinct subgroups according to some important characteristics and selecting a random sample of each subgroup. If the proportion of sample drawn from each strata is the same as the proportion of the total population, then all strata will be fairly represented in with regard to the number of person in the sample. A two-stage sampling was developed by EPI-WHO to evaluate vaccination coverage and the quality of health services. 30 urban settlements are selected and 7 children in the given age group are selected in each settlement.

Sample size

While a probability sample gives a study internal validity, the precision of the prevalence estimate obtained depends on the sample size. Thus, the width of the confidence interval (estimated by the interval of the prevalence in the population) reflects the degree of precision conferred by the size of the sample chosen.

The size of a sample for simple random sampling is calculated as follows:

n = Z * Z [P (1-P)] / (D*D)

where:Z the value of the reference normal distribution for the desired confidence level (Z =

1.96 for the 95% confidence interval - 95% CI)P the expected prevalenceD the highest acceptable error in the estimate (half-width of the CI - measurement of

precision)

For example, to estimate the seropositivity for dengue virus antibody in a population of about 1 million inhabitants with an expected prevalence of 15% (P = 0.15) and a 95% CI of 12% (D= 0.06), the number of persons to be studied would be:

n = 1.962 [0.15 (1 - 0.15)] / (0.062)n = 136 persons

The estimate sample size should be increased to compensate for eventual refusals or losses.

The sample size conveys an idea of the order of magnitude of the population needed for the study, but must not be rigid, as it is calculated on the basis of an estimated parameter (expected prevalence). This estimate is usually obtained from a review of the literature. Sample sizes must be based on different estimates of prevalence and precision in keeping with the purpose of the study. A balance between what is desirable and what is practically possible should be achieved. Opinion surveys are generally conducted on about 1,000 persons to obtain good precision (for example, 95% CI with a maximum width of 6%). It

37

Prevalence studies

should be emphasized that prevalence studies are not suited for events of low frequency of occurrence.

Primary and Secondary Source of Data

Official information systems - Sources of secondary data can be useful in prevalence studies and must be used whenever possible as a first approach to the question to be answered. Some of the advantages of using existing databases for epidemiological purposes are: its low cost, the availability of information over a long period of time which allow to evaluate secular trends, and, possible access through computerized systems.

In some case, data generated from information systems of control programmes make it possible to build up time series. Other sources are the medical histories of general or referral hospitals and of sentinel hospitals for infectious diseases.

The interpretation of secondary data requires a knowledge of the coverage and quality of the information; of changes in the definition of cases over time; of administrative actions such as changes from voluntary to transitory reporting, and changes in established interventions and report forms.

The epidemiological interpretation should recognize the limitations, quality of the existing database, potential biases associated with determination of the disease and the selection of cases for treatment.

Collecting primary data - Self-administered questionnaires and interviews are common ways of obtaining information on morbidity, frequency of symptoms and variables of interest in prevalence surveys. In addition, laboratory tests for the detection of biological markers can be used to measure the prevalence of infection/disease and risk factors. The quality of primary data over secondary is evident. Planned epidemiological study can collect data in standardized and validated way, completion can be assured, and careful recording make the data base useful for analysis and interpretation.

Types of bias

Survivor bias – systematic error arising in cross-sectional studies for including only prevalent cases. Cases with rapid evolution and early deaths are excluded, while longer survival cases tend to be over-represented. Since the probability of surviving a disease affects its prevalence, studies based on prevalent cases generate associations that reflect determinants of the survival of cases.

Participant selection bias - convenience sampling, such as that prompted by the accessibility of persons presenting at public health services, specialized clinics and referral services, etc., can introduce participant selection biases. Self-selected or voluntary participants, tend to be healthier than the overall population. Sampling based on risk behavior subgroups tend to overestimate the prevalence of some sexually transmitted diseases. Some population subgroups such as blood donors and pregnant women are more likely to show prevalence rates closer to the overall population.

The refusal of individuals to participate in prevalence studies, interviews or donating biological material can also introduce participant selection bias, and must be held to a

38

Prevalence studies

minimum to ensure that the sample is representative of the base population. Working alternatives for reducing the number of such refusals must be provided in the protocol in advance. The differences between participants and non-participants in a study must be evaluated in relation to the variables demographic and risk variables.

Stored collections of clinical and laboratory specimens are eventually used to estimate prevalence of some diseases. Serum and biological material banks that do not record a description of the population from which the specimens have been taken, the sampling method used, and the circumstances in which they were obtained are without value for epidemiological purposes. In order for the results obtained from these tests to represent the actual prevalence, all the requirements of a project design must be satisfied: (a) clear purposes, (b) representation of the population of interest, (c) sample size, and (d) knowledge of the tests to be used, their sensitivity and specificity, the limits of their accuracy and their significance.

Observer bias – a flaw in measuring exposure or outcome that result in differential quality (accuracy) of information between comparison groups. Standard procedures for interviews and clinical examination will assure an uniform evaluation of associations and effect in a survey.

DATA ANALYSIS

Measurement of prevalence

Confidence interval of a prevalence show the degree of uncertainty of the estimate. Assuming a random sample of the population of interest, the larger the sample studied, the more precise the observed proportion will be. A 95% confidence interval may be presented in figures or graphically. The width of the interval reflects its precision. The upper and lower limits of confidence intervals will be close to each other when estimations are made from large samples. When two or more proportions show overlapping confidence intervals it is assumed that there is no statistically significant difference between them. However, Chi square test or Fisher test are considered the appropriate procedures to compare two proportions.

In prevalence studies the association between exposure and disease can be evaluated. The relative risk estimation or the odds ratio can be calculated especially when the frequency of the disease/outcome is rare. In these circumstances the ratio between the two prevalences (exposed and not exposed) called prevalence ratio (PR) can be used. For example, to study the association between history of sexually transmitted disease (STD) and homeless children, 496 children (101 homeless and 395 family children working in the streets were interviewed for histories of STD. STD history was reported by 24.8% homeless children and 3.5% family children working in the streets. (Porto et al., 1994). The results are presented in the following table:

History of STD Total

39

Prevalence studies

Children Yes No

Homeless 25 (24.8%) 76 (75.2%) 101 (100%)

In-street 14 (3.5%) 381 (96.5%) 395 (100%)

Total 39 (7.9%) 457 (92.1%) 496 (100%)

PR= 0.24 / 0.035 = 7.31 (CI 95% 3.7 – 12.9)OR= 25*381 / 14*76 = 8.9 (CI 95% 4.2-19.1)

Therefore, the risk of reporting a STD was 7 times higher for homeless children than for family children working in the streets.

Stratification - The main technique for the evaluation of confounding effect and to examine interaction (modification of effect) between risk factors is stratification. Stratified analysis is usually done in the following stages:

Divide the study population into strata for the potential confounding variable; Calculate estimates of the effect of exposure (prevalence ratio and confidence interval),

for each specific stratum in relation to the baseline exposure level; Determine whether the magnitudes of the differences between the prevalence ratio of

the different strata suggest interaction or confounding; Estimate a summary (grouped) risk based on Mantel-Haenszel test in case of

confounding.

Logistic Regression – Although stratification can be used to adjust the prevalence for more than one confounding variable, a large number of strata tend to produce clusters of small numbers of observations, with loss of precision in the calculations. This limitation of stratification in the simultaneous adjustment of several confounding variables can be overcome to some extent by the use of modeling techniques. Multivariate models will help better understanding the predictive value of a set of variables related to a particular outcome. If the endpoint is binary logistical regression models can be applied to prevalence studies to assess the effect of one exposure in the presence of other additional risk factors. When the endpoint is continuous, linear regression is the option. Logistic regression is usually done in the following steps:

Identify the variables to be included in the model; Recode the variables if necessary, Select the outcome variable and the other variables (predictors) to be considered in the

model Create dummy variables when necessary Estimate the adjusted Odds

40

Prevalence studies

ADVANTAGES AND LIMITATIONS

Prevalence are considered less time consuming as compared to cohort or case-control (when based on incident cases), less expensive, and are operationally less complex than other epidemiological study designs. Whenever possible, they should be based on samples of the general population and not on selected populations such as health services users.

Prevalence studies are not suitable for rare or short-duration diseases, which will afflict few persons at a given point in time. It is frequently difficult to separate cause and effect (risk factor and disease) because the measurements of exposure and disease are made at the same time, and for this reason, can not be used to test etiological hypotheses.

41

Prevalence studies

CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY

. Define the importance of the question to be answered

. check the literature and other studies in the field. verify if the research question generates scientific knowledge and possibly an impact on public health

. Frame clearly the questions to be answered

. explain the event to be measured and the population to be studied. make sure that the questions can be answered in technical and operational terms

. Describe the technical procedures for evaluating the event of interest

. laboratory methods, interviews, questionnaires, clinical examination

. interpretation and categorization of the parameters to be evaluated

. Establish the sampling procedures

. define the reference population and the population to be studied

. establish the sampling method to be used

. Calculate the sample size

. estimate the expected prevalence to calculate sample size

. define the acceptable precision (error) in your estimate of prevalence

. Discuss ethical issues

. risks vs. benefits

. availability of medical care for participants in whom the event is detected

. confidentiality of the results

. use of preexisting biological samples and serum banks

. Describe the stages of analysis of the data. give the parameters (proportion and 95% CI, mean), and describe the statistical methods and comparison subgroups

. conduct a stratification analysis or logistical regression analysis to adjust for potential confounders

42

Prevalence studies

ADDITIONAL READING

ANDRADE, A.L.S.S., ZICKER, F., LUQUETTI, A.O., OLIVEIRA, R.M., SILVA, S.A., SOUZA,J.M.P. & MARTELLI, C.M.T. Surveillance of Trypanosoma cruzi transmission by serological screening of schoolchildren. WHO Bulletin,70(5):625-9, 1992.

BEAGLEHOLE, R., BONITA, R. & KJELLSTRÖM, T. Basic Epidemiology. World Health Organization, Geneva, 1993. GORDIS, L. Epidemiology. Elsevier Science, 3rd edition, 2004

GIESECKE, J. Modern Infectious Disease Epidemiology, Boston:Little, Brown and Company, 2003.

HENNECKENS, H.C. & BURING, J.E. Epidemiology in Medicine, 5th ed. Boston: Toronto, Ed. Little, Brown and Company, 1987.

PAUL, J.R. & WHITE, C. Serological epidemiology. Academic Press New York and London, 1973.

PORTO, S.O.B., CARDOSO, D.D.P., QUEIROZ, D.A.O., ROSA, H., ANDRADE, A.L.S.S., ZICKER, F. & MARTELLI, C.M.T. Prevalence and risk factors for HBV infection among street youth in Central Brazil. Journal of Adolescent Health,15:577-81, 1994.

43

Prevalence studies

EXERCISES

Files: 1. ViewScreen

2. ViewHepbprev

Exercise 1

Serologic screening for Trypanosoma cruzi. A serologic survey was conducted to estimate the prevalence of T.cruzi infection in schoolchildren aged 7 to 12 years, resid-ing in endemic rural areas of central Brazil. Blood samples were collected on filter paper from 1,990 children for indirect hemagglutination (IHA), indirect immunofluorescence (IIF) and ELISA. Details of the study and methodology are given in Andrade et al., 1992. The plan of analysis include (a) comparison of seroprevalence by each technique, and (b) prevalence ratio by sex and age group. Use the screen data table, included in the EpiGuide.MDB project to answer the following questions.

**Before starting the exercise route out the results to a HTML file named “Results SCREEN”. ROUTEOUT 'Results SCREEN' [Figure 1]

[Figure 1- Route Out results]

44

1- Click on RouteOut

2- Define the folder to save the HTM file

4- Mark this box if you want to replace an

existing file3- Write the file name

Prevalence studies

Question 1. Calculate the prevalence rate and 95% CI obtained with each laboratory technique used. Is there any statistical difference between the results?

Note 1: READ C:\EPIGUIDE\EpiGuide.mdb':viewSCREEN [Figure 2]

FREQ IHA [Figure 3]FREQ IIFFREQ ELISA

[Figure 2 – Read data file]

45

1- Click on Read from the Analysis Commands tree

2 – Change to the desired project:EPIGUIDE.MDB

3 – Identify the data file you will usein the exercise

4 – Click Ok

Prevalence studies

[Figure 3 – Frequencies command]

Question 2. Estimate prevalence and 95%CI of T.cruzi infection assuming a positive diagnosis as having at least 2 positive serological tests. Calculate the seroprevalence and 95% CI for each municipality. Is there any difference between the areas? Calculate the prevalence and prevalence ratio by sex. Is there any association between sex and T. cruzi infec-tion?

Note 2: [To continue the exercise you must create a new variable called “POS”. Consider (+) as positive to at least two serological tests and (-) as negative.]

DEFINE POS [standard] [Figure 4][Assign all records with a negative result (-)]ASSIGN POS = (-) [Figure 5][Identify the positive records (+) - positivity to at least 2 tests]IF IHA = ”P” and IIF = ”P” THEN

ASSIGN POS = (+)END [Figure 6]IF IHA = ”P” and ELISA = ”P” THEN

ASSIGN POS = (+)ENDIF ELISA = ”P” and IIF = ”P“ THEN

ASSIGN POS = (+)ENDFREQ POS

TABLES MUN POS [Figure 7]

46

1- Click Frequencies

2 – Choose the variable(s)

3 - Click OK when finished

Prevalence studies

For Epitable:Run EPITABLE to calculate the 95% CI – select DESCRIBE, then select PROPORTION, then select SIMPLE RANDOM SAMPLING[Use the results of the previous table]

For Open Epi: [Figure 8]Access OPEN EPI from the EpiGuide CD or from www.openepi.com From Open Epi menu choose PROPORTIONS from the COUNTS folder. Click on ENTER NEW DATA button. Populate the table provided with the numbers for the Numerator and Denominator. Use the results of the previous command. Click CALCULATE. A result window will be shown. Note the results and close the result window.

TABLES SEX POS

[Figure 4 – Define new variable]

[Figure 5 – Assign values]

47

1- Click Define to create a new variable

2 – Type the new variable name

2 - Choose the variable to receive the new values

3 - Choose from the Available Variables to construct the expression

4 - Revise the Assign Expression 1 – Click Assign

5 – Click OK

Prevalence studies

[Figure 6 – IF command]

1 – Click IF to establish conditions for the new variable2 – Choose the variable to build the condition3 – Create the condition(s) to assign the values for the new variable4 – Click THEN to access the THEN Block5 – Click ASSIGN6 – Choose the variable to receive the new values7 – Choose from the Available variables to construct the expression8 – Revise the assign expression9 – Click ADD to return to the IF window10 – Click OK when finished

48

2

3

4

1

5

6

7

8

9

10

Prevalence studies

[Figure 7 – Tables command]

[Figure 8 – Open Epi - Proportions]

49

1- Click on Tables

3- Choose the Outcome variable

4 – Click Ok

1- Click Proportion

2- ClickEnter New Data

3 – Type the values for Numerator and Denominator

4 – Click Calculate

Prevalence studies

Question 3. Compare the T.cruzi infection prevalence between the different ages. Is there any trend between seropositivity and age? Create a new variable AGEGR as 7-9 and 10-12 years old. Calculate the prevalence and prevalence ratio for each age stratum.

Note 3: TABLES AGE POS[Note the results – absolute values for negative and positive for each age]

From the UTILITIES Menu of EPI WIN main page select STATCALC and select CHI SQUARE FOR TREND[Use the data produced by the preceding table]Press F10 to leave STATCALCReturn to ANALYSIS[Create a new variable “AGEGR” to group by age stratum]Define AGEGR [standard]

RECODE AGE TO AGEGR7-9=210-12=1

END [Figure 9]

TABLES AGEGR POS

[Figure 9 – Recode values into a new variable]

50

3 - Choose source variable

4 – Choose destination variable (new)

5 – Type old values or range of values

1 – Define the new variable

7 – Click OK

2 – Click Recode6 – Type the new values. Press enter to go to the next line

Prevalence studies

Question 4. Based on the results of questions 2 and 3, is there any association between exposure to T. cruzi in childhood and sex and/or age?

[If you are doing the Advanced exercise leave Analysis OPEN and proceed to Question 5]

[If you are not doing the Advanced analysis of this exercise you can Exit Analysis]

EXIT [to leave ANALYSIS]

Advanced Exercise

Question 5. What are the adjusted prevalence ratios for group age and sex (OR) after applying a logistic regression technique? What happened to the associations between exposure to T. cruzi and sex or age?

[To apply Logistic regression analysis you will have to recode variables SEX and AGEGR INTO SEX_R and

AGEGR_R]

DEFINE SEX_R [standard]RECODE SEX TO SEX_R

1=12=0

END

DEFINE AGEGR_RRECODE AGEGR TO AGEGR_R

1=12=0

END

LOGISTIC POS = SEX_R AGEGR_R [Figure 10]

EXIT [to leave ANALYSIS]

51

Prevalence studies

[Figure 10 – Logistic Regression]

Exercise 2

Prevalence of and risk factors for hepatitis B infection. A cross-sectional study was designed to measure the prevalence of serologic markers for hepatitis B virus infection (HBV) in first-time blood donors and convicts, and to evaluate risk factors associated with seropositivity. The viewhepbprev, part of EPIGUIDE.MDB project, includes results of HBsAg and anti-HBs (ELISA) for 1,033 blood donors and 201 convicts, and 14 potential risk factors variables. Details of the methodology and population are in Martelli et al., 1990. Positivity to HBsAg or to anti-HBS was taken as HBV infection. The plan of analysis was designed to (1) evaluate the prevalence of the HBsAg and anti-HBs markers in the group of donors and convicts; (2) compare sex and age distribution and frequency of potential risk factors between groups, and (3) calculate the prevalence ratio of HBV positivity between exposure groups.

**Before starting the exercise route out the results to a HTML file named “Results HEPBPREV”ROUTEOUT 'Results HEPBPREV'

Question 1. Estimate prevalence of HBV carriers (HBsAg), immunes (anti-HbsAG) and susceptibles (absence of marker) in the study population. Construct a prevalence table for HBV markers for donors and convicts. Discuss the results.

52

1 – Click Logistic Regression

2 – Choose the Outcome

3 – Choose the Independent

variables4 – Click Make

Dummy to create dummy variables

5- Type a table name to save the residuals

6 – Click OK

Prevalence studies

Note 1: READ ‘C:\EPIGUIDE\EpiGuide.mdb’:viewHepbprev[For the variables HBSAG and ANTIHBSAG exclude

code -1 (no information)]SELECT HBSAG <> -1 AND GROUP = 2 [Figure 11]

FREQ HBSAG [Figure 12]SELECT (to disable selection) [Figure 13]SELECT ANTIHBSAG <> -1 AND GROUP = 2FREQ ANTIHBSAGSELECT[Follow the same commands for GROUP=1]

[Create variable EXP (exposure to hepatitis B virus). For the variable “EXP” exclude the No information entries HBSAG = -1 AND ANTIHBSAG = -1]

DEFINE EXP [standard]ASSIGN EXP = 2 [negative results]IF HBSAG = 1 OR ANTIHBSAG = 1 THEN

EXP = 1 END

IF HBSAG = -1 AND ANTIHBSAG = -1 THEN

EXP = (.)END

TABLES GROUP EXP

[Figure 11 – Select command]

53

1- Click Select

2- Choose the variable(s) to build the selection

criteria

3- Define the selection criteria

4- Click OK when finished

Prevalence studies

[Figure 12 – Frequencies command]

[Figure 13 – Cancel Select]

Question 2. Compare age and sex distribution of the study populations. Stratify the two study populations by age groups (<=29 years; 30-39 years; >=40 years) and compare seropreva-lences in each age stratum. Is there any seropositivity trend among the blood donor’s group? Give possible explanations.

54

1- Click Frequencies

2 – Choose the variable(s)

3 - Click OK when finished

1- Click Cancel Select

2 - Click OK to cancel current selection criteria

Prevalence studies

Note 2: MEANS AGE GROUP TABLES = (-) [Figure 14]

SELECT SEX <> 9TABLES SEX GROUPSELECT[Create variable “AGEGR” (age groups)]DEFINE AGEGR [standard]IF AGE >= 15 AND AGE < 30 THEN

ASSIGN AGEGR = 1ENDIF AGE >= 30 AND AGE < 40 THEN

ASSIGN AGEGR = 2ENDIF AGE >= 40 THEN

ASSIGN AGEGR = 3ENDTABLES GROUP EXP STRATAVAR = AGEGR [Figure 15][Note the results – absolute numbers]From Epi Info main page select the UTILITIES menu then select

STATCALC and select CHI SQUARE FOR TREND[Use data from preceding table (Group = 2) for each level of

“AGEGR”]Press F10 to leave STATCALCReturn to ANALYSIS

[Figure 14 – Means command]

[Figure 15 – Tables with stratification]55

1- Click Means

2- Choose the variable to apply the means command

3- Choose the variable to use for comparison

4- Click Settings

5 – Uncheck the Show Tables in Output box

Prevalence studies

Question 3. Compare the frequency of the potential risk factors for exposure to HBV (TRANSF, INJMED, INJDRUG, TATTOO, VDRL, STD) among convicts and blood donors. Are the differences statistically significant?

Note 3: SELECT <name of variable> <> -1TABLES <name of variable> GROUPEx: SELECT TRANSF <> -1Ex: TABLES TRANSF GROUPSELECTRepeat the procedure for each variable

Question 4. Calculate the prevalence ratio (PR) and 95% CI for incarceration (as a risk factor) for HBV taking the blood donors as reference group (PR=1). Are the convicts a population at higher risk of HBV? Discuss. Within the group of convicts, does the number of years of incarceration (YEXP) increase the probability of HBV infection? Comment on a possible confounding factor in this univariate analysis.

56

1- Click on Tables

3- Choose the Outcome variable

5 – Click Ok

2- Choose the Exposure Variable

4- Choose theVariable toStratify by

Prevalence studies

Note 4: TABLES GROUP EXPSELECT GROUP = 1TABLES YEXP EXP[Note the results – absolute numbers]Select STATCALC from the UTILITIES menu in the Epi Info main page, then select CHI SQUARE FOR TREND[Use the data produced by the preceding table (GROUP=1)]Press F10 to leave STATCALCReturn to ANALYSISSELECT

Question 5. Assuming age as a confounding variable, adjust the PR for HBV associated with incarceration by each age group. Discuss results.

Note 5: TABLES GROUP EXP STRATAVAR =AGEGR

Question 6. Calculate the PR and 95% CI of the potential risk variables mentioned in question 3. Which variables show an association with HBV infection? Recalculate prevalence ratios for convicts and donors separately, considering incarceration as a possible confounding factor. Which risk factor remained statistically significant associated to HBV after adjustment ?

NOTE 6: SELECT <name of variable> <> -1TABLES <name of variable> EXPEx: SELECT TRANSF <> -1Ex: TABLES TRANSF EXPTABLES TRANSF EXP STRATAVAR = GROUP[Enter the same commands as for the other variables]

[Use STATCALC to calculate the PR for each SELECT

EXIT [to close ANALYLIS]

Question 7. Given the dynamics of HBV transmission in areas of low/medium endemicity, what are the principal methodological considerations relative to the study popu-lation? Consider selection and survival biases. Discuss the HBV serologic screening of blood donors, vaccination of children in areas of low endemicity, and vaccination of risk groups.

57

Prevalence studies

REFERENCES

ANDRADE, A.L.S.S., ZICKER, F., LUQUETTI, A.O., OLIVEIRA, R.M., SILVA, S.A., SOUZA,J.M.P. & MARTELLI, C.M.T. Surveillance of Trypanosoma cruzi transmission by serological screening of schoolchildren. WHO Bulletin,70(5):625-9, 1992.

MARTELLI, C.M..T., ANDRADE, A.L.S.S., CARDOSO, D.D.P., SOUSA, L.C.S., SILVA, S.A., SOUSA, M..A. & ZICKER, F. Soroprevalência e fatores de risco para a infecção pelo vírus da Hepatite B pelos marcadores AgHBs e Anti-HBs em prisioneiros e primodoadores de sangue. Revista de Saúde Pública,24(4):270-6, 1990.

For Analysis:

DEAN AG, ARNER TG, SUNKI GG, FRIEDMAN R, LANTINGA M, SANGAM S, ZUBIETA JC,  SULLIVAN KM, BRENDEL KA, GAO Z, FONTAINE N, SHU M, FULLER G.  Epi Info™ a database and statistics program for public health professionals.  Centers for Disease Control and Prevention, Atlanta, Georgia, USA, 2002. http://www.cdc.gov/epiinfo/downloads.htm

DEAN A.G., DEAN J.A., COULOMBIER D. et al.  Epi Info™, Version 6.04, a word processing, database, and statistics program for public health on IBM-compatible microcomputers.  http://www.cdc.gov/epiinfo/Epi6/ei6.htm

DEAN, A., SULLIVAN, K, & SOE, M.M. OpenEpi - Open Source Epidemiologic Statistics for Public Health. http://www.openepi.com

58

Prevalence studies

DATA FILE DICTIONARY

Project: EPIGUIDE.MDBFile: Screen

Variable Description Code Description of code

ID Identification number 1 to 1991

MUN Municipality123

Posse Simolândia Guarani

SEX Sex 12

MaleFemale

AGE Age in years 7 to 12

IHA Hemagglutination test PN

PositiveNegative

IIF Immunofluorescence test PN

PositiveNegative

ELISA ELISA PN

PositiveNegative

59

Prevalence studies

Project: EPIGUIDE.MDBFile: HepbprevVariable Description Code Description of code

ID Identification number

AGE Age in years 15 to 71

SEX Sex129

MaleFemaleNo information

TRANSF Blood transfusion-112

No informationYesNo

INJMED Use of injectable medicine-112

No informationYesNo

INJDRUG Use of injectable drug-112

No informationYesNo

TATTOO Presence of tattoo-112

No informationYesNo

ACP Acupuncture-112

No informationYesNo

HBSAG Serology for HbsAg-112

No informationPositiveNegative

ANTIHBSAG Serology for anti-HBsAg-112

No informationPositiveNegative

VDRL Serology for VDRL-112

No informationPositiveNegative

YEXP Years of incarceration

-1012

Blood donorsless than 1 year1 year2 or more years

GROUP Population under study 12

ConvictsFirst-time donors

STD Report of sexually transmitted disease 12

YesNo

60