32
71 IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND SECONDARY SCHOOLS IN CANADA Simon Cheung and Marianne Gossen, Statistics Canada Simon Cheung, Statistics Canada, R.H. Coats Building 16 th floor, Ottawa, Ontario, K1A 0T6, Canada [email protected] ABSTRACT In Canada, education is a provincial and territorial responsibility, with a ministry of education in each province and territory. The involvement of governments in education is extensive, generating a wealth of administrative data on many aspects of education. The education statistics program of Statistics Canada has benefited a great deal by the administrative data in education. However, the delay in receiving and processing these data has prevented them from being used as an effective school frame. This paper discusses the need for an effective sampling frame of elementary and secondary schools in Canada and the strategy of Statistics Canada for implementing its school frame. Key Words: Sampling frame, school, stratum jumper, relative cv 1. INTRODUCTION Canada comprises ten provinces and three territories, each of which, within the federative system of shared powers, is responsible for education. Nevertheless, Statistics Canada, as a national statistical agency, has a legislated mandate for the collection, analysis and dissemination of statistics on all aspects of Canadian life. Thus, Statistics Canada’s mandate includes the responsibility for collecting and disseminating education statistics. Each of the provinces and territories has developed its own educational structures and institutions. The variety of learning programs offered reflects the diversity of the country’s historical and cultural heritage. In section two, this paper discusses the diversity of the elementary and secondary education in Canada and the extensive involvement of governments in the administration of schools. The education statistics program at Statistics Canada has benefited a great deal from the wealth of administrative data in education through the provincial governments. The education statistics program at Statistics Canada has been in existence for many decades. In the past, a lot of the focus was on input factors into education. Education outcomes were mainly measured by graduation statistics. Occasionally employment outcomes were also measured by surveys of graduates (e.g. the National Graduate Survey) and employment status by education attainment in the census of population and the Canadian Labour Force Survey. Recently, a new model of the education system in Canada has been adopted in order to reflect more fully research and policy needs in education. As discussed further in section three, this model expands the focus of the education statistics program into environmental and process factors in education, and the relationship among all factors. The new data needs exhibited by this model bring out the need for a sampling frame of elementary and secondary schools in Canada. To implement a frame of elementary and secondary schools in Canada, we first examined the possibility of doing so using existing administrative data. The issues and challenges of using administrative data are well known, e.g. Brackstone (1987). Section four discusses the two major challenges in our case, namely timeliness and quality. Our strategy of implementing a school frame is mentioned in section five. 2. THE ELEMENTARY AND SECONDARY EDUCATION SYSTEM IN CANADA In Canada, the responsibility for education rests with provincial and territorial governments. Each province and territory has developed its own system for education, and the structure can differ from jurisdiction to jurisdiction. These structures and institutions reflect the different circumstances of regions separated by great distances and the diversity of the country’s historical and cultural heritage. Public education is provided free to all Canadian citizens and permanent residents up to the end of secondary school - normally age 18. The ages for compulsory schooling

IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

71

IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND SECONDARY SCHOOLSIN CANADA

Simon Cheung and Marianne Gossen, Statistics CanadaSimon Cheung, Statistics Canada, R.H. Coats Building 16th floor, Ottawa, Ontario, K1A 0T6, Canada

[email protected]

ABSTRACT

In Canada, education is a provincial and territorial responsibility, with a ministry of education in each province andterritory. The involvement of governments in education is extensive, generating a wealth of administrative data on manyaspects of education. The education statistics program of Statistics Canada has benefited a great deal by the administrativedata in education. However, the delay in receiving and processing these data has prevented them from being used as aneffective school frame. This paper discusses the need for an effective sampling frame of elementary and secondary schoolsin Canada and the strategy of Statistics Canada for implementing its school frame.

Key Words: Sampling frame, school, stratum jumper, relative cv

1. INTRODUCTION

Canada comprises ten provinces and three territories, each of which, within the federative system of shared powers,is responsible for education. Nevertheless, Statistics Canada, as a national statistical agency, has a legislatedmandate for the collection, analysis and dissemination of statistics on all aspects of Canadian life. Thus, StatisticsCanada’s mandate includes the responsibility for collecting and disseminating education statistics.

Each of the provinces and territories has developed its own educational structures and institutions. The variety oflearning programs offered reflects the diversity of the country’s historical and cultural heritage. In section two, thispaper discusses the diversity of the elementary and secondary education in Canada and the extensive involvement ofgovernments in the administration of schools. The education statistics program at Statistics Canada has benefited agreat deal from the wealth of administrative data in education through the provincial governments.

The education statistics program at Statistics Canada has been in existence for many decades. In the past, a lot of thefocus was on input factors into education. Education outcomes were mainly measured by graduation statistics.Occasionally employment outcomes were also measured by surveys of graduates (e.g. the National GraduateSurvey) and employment status by education attainment in the census of population and the Canadian Labour ForceSurvey.

Recently, a new model of the education system in Canada has been adopted in order to reflect more fully researchand policy needs in education. As discussed further in section three, this model expands the focus of the educationstatistics program into environmental and process factors in education, and the relationship among all factors. Thenew data needs exhibited by this model bring out the need for a sampling frame of elementary and secondaryschools in Canada.

To implement a frame of elementary and secondary schools in Canada, we first examined the possibility of doing sousing existing administrative data. The issues and challenges of using administrative data are well known, e.g.Brackstone (1987). Section four discusses the two major challenges in our case, namely timeliness and quality. Ourstrategy of implementing a school frame is mentioned in section five.

2. THE ELEMENTARY AND SECONDARY EDUCATION SYSTEM IN CANADA

In Canada, the responsibility for education rests with provincial and territorial governments. Each province andterritory has developed its own system for education, and the structure can differ from jurisdiction to jurisdiction.These structures and institutions reflect the different circumstances of regions separated by great distances and thediversity of the country’s historical and cultural heritage. Public education is provided free to all Canadian citizensand permanent residents up to the end of secondary school - normally age 18. The ages for compulsory schooling

Page 2: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

72

vary from province to province1; generally it is required from age 6 or 7 to age 16.

Figure 1 illustrates the similarities and differences in levels of elementary and secondary schooling in Canada. Ingeneral, this system begins with pre-school or kindergarten, through primary school to secondary (or high) school upto and including grade 12. One exception is inthe province of Quebec where secondary(general) schooling ends at grade 11 while thestream of vocational training ends at grade 13.Currently, students in Ontario also take an extrayear beyond grade 12 to complete the OntarioAcademic Courses before entering university.

Elementary and secondary schools in Canadaoffer very diverse learning programs and areorganized under different administrativearrangements. The three main types ofadministrative arrangements are: (1)provincially funded public schools, (2) privateor independent schools, and (3) federallyfunded aboriginal schools on reserves. Withineach of these three categories there is a widevariety of types of schools, with the schoolsdiffering in their learning programs, their focusof study or the student population which theytarget. For example, alternative schools mayoffer very different learning programs (e.g. forstudents with special family and supervisionneeds) from most other public schools. Adultor continuing education centres, schools for thevisually or hearing impaired, institutionalschools all have a different focus, targetingspecific student sub-populations. Distance learning or correspondence schools also add to the diversity due to themethod in which their programs are delivered. These are just a few of the examples of the wide variety of schoolsthat exist in Canada.

The involvement of the provincial governments in education is generally extensive but varies somewhat betweenprovinces. The responsibilities of governments in education include the formation of school districts or schoolboards, education policies such as curriculum guidelines and student-to-teacher ratios, and the funding of specialeducation programs. In the province of Ontario, the education ministry also recently took on the responsibility ofredistributing education taxes to school boards, the selection of certain text books and, as in some other provinces,the establishment of province-wide examinations for students in selected grades. Individual school boards areresponsible for the local operation and administration of their schools. School boards are the primary agents incollecting administrative information from schools throughout the year, i.e. data on student enrollment, teachingstaff, types of education programs and expenditures in education. These administrative data are then passed on to theprovincial ministries and eventually to Statistics Canada.

The amount of administrative data collected and the method of data collection can vary between provinces. In thelarge provinces, the ministries collect very comprehensive data from schools and school boards and fully processthem before the computer files are sent to Statistics Canada. In the other provinces, the provincial ministry sends totheir school boards (and eventually to schools) questionnaires developed by Statistics Canada. The completedquestionnaires are collected by the province and returned to Statistics Canada in electronic or paper form as the casemay be.

1 For brevity, we shall refer province in this paper to include the three territories.

Page 3: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

73

Private or independent schools provide an alternative to the publicly funded schools. They may operate in anyprovince if they meet the general standards for elementary and secondary schools prescribed by that jurisdiction.About 60% of private and independent schools receive some funding assistance from provincial or federalgovernments, especially for offering heritage and minority language education programs. The degree to whichprivate schools are funded by provincial governments varies from province to province with some provincesoffering no funding at all to private schools. Many of the private and independent schools are also known to theministries through program certification or curriculum approval. Statistics Canada maintains a list of the private andindependent schools in Canada and surveys them directly each year.

Provincial governments have no jurisdiction on aboriginal reserves in most of the country. Schools on thesereserves are funded by the federal government department, Indian and Northern Affairs Canada (INAC). StatisticsCanada collects data for these schools directly from INAC.

Table 1 shows thedistribution of schoolsand students byprovince or territoryand main type ofschool in the schoolyear 1996-97. Therewere about 500 schoolboards and 16,000schools in Canada.About 87% of theschools were publicschools administeredby the provincial education ministries through their school boards. They accounted for 95% of the elementary andsecondary students. About 9% of the schools (with 5% of the students) were private or independent schools.

The extensive administrative data generated by the provincial and federal governments have facilitated the task ofcollecting education statistics in Canada. In particular, the administrative data are informative in describing the inputto education and education attainment of students. We also expect that the quality of these administrative datacontinues to improve with the advancement of computer technologies, and we continue to see them form a strongbase of the Canadian education statistics program.

2. IMPORTANCE OF A FRAME OF ELEMENTARY AND SECONDARY SCHOOLS IN CANADA

For many decades, the education statistics program at Statistics Canada has focused mainly on publishing statisticson student enrollment (e.g. by age, sex and grade), programs of learning, teaching staff and expenditures ineducation. In other words, a lot of the focus was on input factors into education. Education outcomes were mainlymeasured by graduation statistics. These two types of statistics have been published primarily based onadministrative data obtained through provincial ministries of education. Occasionally employment outcomes werealso measured by surveys of graduates (e.g. the National Graduate Survey) and employment status by educationattainment in the census of population and the Canadian Labour Force Survey. This section discusses an expandedmodel of the education system in Canada that has been adopted recently in order to reflect research and policy needsin education. This model expands the focus of the education statistics program into environmental and processfactors in education, and the relationship among all factors. The new data needs exhibited by this model bring outthe need for a sampling frame of elementary and secondary schools in Canada.

The reader is referred to the paper by Drew, Lipps and Hodgkinson (1998) for an explanation of this model and thecorresponding survey initiatives supporting the model. Here, we have included excerpts from the aforementionedpaper to explain the model so as to bring out the roles and requirements of the school frame at Statistics Canada.

Figure 2 illustrates the model adopted by Statistics Canada. It identifies the various factors and elements affectingthe Canadian education system. This is a simple input-process-output model with the environmental factors added.

Table 1: Elementary and secondary schools by province or territory and type of school, 1996-97.

Province Schools Students Schools Students Schools Students Schools StudentsNewfoundland & Labrador 460 106,316 0 0 2 178 462 106,494Prince Edward Island 66 24,537 1 45 3 232 70 24,814Nova Scotia 461 163,625 12 1,373 21 2,164 494 167,162New Brunswick 375 133,635 10 798 18 821 403 135,254Quebec 2,689 1,027,211 36 6,281 322 103,630 3,047 1,137,122Ontario 5,107 2,070,103 102 12,792 562 78,843 5,771 2,161,738Manitoba 711 195,166 52 15,265 87 13,395 850 223,826Saskatchewan 812 196,391 69 13,388 41 2,975 922 212,754Alberta 1,678 527,639 58 10,499 173 21,980 1,909 560,118British Columbia 1,642 607,191 115 5,648 307 55,529 2,064 668,368Yukon 29 6,372 0 0 1 6 30 6,378Northwest Terr. & Nunavut 85 18,047 0 0 0 0 85 18,047Canada 14,115 5,076,233 455 66,089 1,537 279,753 16,107 5,422,075

ALLPublic Federal-Indian Affairs Private

Page 4: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

74

Figure 2: Model of the Education System

Environmental Context/ForcesNational & Global Factors Community FactorsFiscal Context Average Income in CommunityTransformation to Knowledge Based Economy Employment & Unemployment Rates for the CommunityGlobalization of the National Economy Average Educational Attainment in the Community

% of School-Age Population Living in PovertyHealth Status of School-Age Children

Inputs Processes OutcomesDemographicsSchool age pop. as % of total

Population Province LabourMix of Population Curriculum Unemployment/Employment

(Immigrants, Aboriginal, etc.) rates after graduationProvince & Board School Fit between employment &

Level of Funding by Prov & School Brd Use of ICT field of studySchool Board Resource Staff School Safety/Discipline Employability Skills

School Principal's Leadership EducationSchool Resources Special Needs Children Graduation RatesTeacher's Education & Experience High School

Family Classroom Post-SecondaryFamily Size Class Size Time taken to Complete EducationParent's Education Teacher's Expectations Academic AchievementFamily Income Hours of Instructional by Subject SocialChildren Homework Life SkillsPast Academic Achievement Teaching Approach HealthAttitude towards School CitizenshipIntellectual Ability

The various elements of the model and the relationships among them help identify the information needs foreducation research and policy development in Canada.

Briefly, “environmental context refers to thehost of factors which are beyond the directinfluence of the education system, butnevertheless impact on it. Forexample, community factorssuch as the average incomeand educational attainment of thecommunity may influence studentsliving in that community, as wellas potentially impacting onfinancing of education.National and global forcesexert pressures on thesystem. The fiscal contextof governments has animpact - fiscal restraintand deficit cuttingmeasures place pressure onthe education system and itsability to cope with changeand other forces acting upon it.Among such other factors are thetransition to a knowledge based economy, and the technology revolution.

Education inputs include demographic factors such as the size of the school aged population, financial resourcesdevoted to education, the supply of teachers, schools and resources, student enrollments, family factors such asparents’ education, and student level factors such as innate ability, peer influences, etc.

Educational processes describe how the education system functions, given the inputs it receives. Processes includethe provincial curriculum, and school level factors such as the availability and use of Information CommunicationsTechnology (ICT) and issues such as technical support, the school environment as it is shaped by the principal’sleadership style, school discipline and safety, and special needs programming. Additionally there are classroomlevel factors - such as class size and characteristics, ICT both as a subject of study and its use in teaching othersubjects, a number of factors relating to the teacher - the fit between specialty and subjects taught, training issuesexpectations of students, assignment of homework, and instruction hours by subject.

The model distinguishes three types of outcomes of the education system. The first are education outcomes. Theseinclude graduation rates at various levels of education, and assessments of learning achievement - particularly asthey relate to the curriculum or expectations of what students at a particular level should know. Second are labourmarket outcomes. They indicate how individuals fare in the labour market once leaving the education system.Basic measures include employment and unemployment rates of individuals recently leaving the education systemby different levels of educational attainment. Other measures could examine how well the education system hasprepared individuals for the labour market - for example in developing “employability skills”. A third category ofoutcomes are social outcomes of education - the role played by the education system in developing rounded andwell-adjusted individuals capable of contributing positively to society.”

In the context of this paper, the model identifies the important tasks of collecting information on school factors,especially in relation to the input and process modules, namely information on school resources, teacher’s educationand experience, use of ICT, school safety/discipline, principal’s leadership, school neighbourhood and special needschildren. In order for policy makers to make decisions which will lead to an improvement in the effectiveness of theeducation system by improving the outcomes as described in the model, there is a need for nationally comparableinformation on schools, teachers and educational processes. The availability of a timely school frame with reliablecoverage in all parts of the country and all segments of the elementary and secondary school systems would

Page 5: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

75

facilitate the collection of nationally comparable data. Undoubtedly, some of the information elements could becollected without using a school frame, e.g. by surveying students or teachers in household surveys, such as theNational Longitudinal Survey of Children and Youth. However, these alternative methods are most likely moreexpensive to implement and pose difficult methodological problems when attempting to produce estimates related tothe population of schools or teachers. Consequently, the development of an effective frame of elementary andsecondary schools in Canada has become an important strategic initiative in completing the framework of theeducation statistics program at Statistics Canada.

3. A SCHOOL FRAME USING EXISTING ADMINISTRATIVE DATA

For constructing a school frame at Statistics Canada, an immediate question is the possibility of doing so using theexisting system of administrative data that Statistics Canada possesses on education. In this section, we concludethat the administrative data, as currently obtained from provincial ministries, could not provide an effective schoolframe.

As mentioned in section 2, Statistics Canada receives from provinces throughout the year administrative data onelementary and secondary education (e.g. student enrollment, teaching staff, student graduations, minority languageprograms and education expenditures). In many cases, these data are provided in a large number of computer files.All these administrative data are processed and stored in a database at Statistics Canada. For decades now, thisdatabase has served its original functions of data tabulation and analysis quite well.

Can this same database be used as our school frame? Our first concern was the lack of timeliness of these data whenused as a school frame. It usually takes a year or two to receive and process these administrative data. These data areinitially compiled by each school and reported to the school boards one month after the school year begins. In mostprovinces, these data are in turn further processed and analyzed for administrative purposes by the school boards andprovincial ministries before they are sent to Statistics Canada. Consequently, by the time the data becomeavailable, it would be two years old in comparison to the current school and student population. With special co-operation of the provincial ministries, Statistics Canada has recently attempted to shorten this delay in order toconduct quick surveys of schools. In these cases, our experience indicates that the data could be provided by theministries at the end of the school year at the earliest. Assuming that this improvement to timeliness can continue,our examination turns to the impact of designing school surveys using the data of the preceding year.

Clearly, previous year data would become useless when major structural changes are taking place in a province.These changes often affect the formation and number of school boards as well as the mission, existence and theclientele of individual schools. In 1994, the province of Newfoundland discontinued school boards which had beencreated on a denominational basis. In 1998, the province of Quebec re-aligned its school boards from adenominational to a linguistic basis. In 1999, the province of Ontario reduced the number of school boards by a halfthrough amalgamations. The process of school closure within each board in Ontario is still ongoing today. In thesecases, it would be impossible to identify or describe the school population based on previous year data. In view ofthe major impact that structural changes can have, the school frame system must be prepared to deal with thesechanges in a timely and effective manner.

In the case where there is no major structural change to the education system, we examine the potential impact ofusing a one-year-old frame. Of particular interest are frame errors which may cause survey undercoverage orinefficient samples. Survey undercoverage would be caused by unknown birth units (i.e. new schools) or bymisclassifying frame units to be out-of-scope. In the case of stratified sampling - a technique which StatisticsCanada surveys often adopt, if frame data fail to form strata of homogeneous units, sample efficiency can bereduced considerably. To examine these concerns, we looked at student enrollments between school years 1996-97and 1997-98 when few school closures took place and few new schools opened. Student enrollment is often chosento be a key stratification variable in designing school surveys. Figure 3 is a scatter plot showing total enrollment inschools2 operating in 1996-97 and 1997-98. Most of the points in the plot spread quite closely around the identity

2 To facilitate subsequent comparison to Figure 2, schools in the provinces of Ontario and Bristish Columbia havebeen excluded.

Page 6: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

76

Table 3: Effects of 1996-97 stratification data on estimating (1997-98) 15-year-old enrollment.

Province Relative Bias with Stratum Births DeathsCV 96-97 jumpers (%) (%)

Stratification (%)(%)

Prince Edward Island 1.01 -1.7 11.1 3.7 3.7Nova Scotia 1.04 -0.4 14.4 3.9 4.9New Brunswick 1.26 -0.3 14.2 8.5 26.0Quebec 1.03 -0.7 7.1 3.5 4.5Manitoba 1.17 -3.1 23.9 15.9 13.5Saskatchewan 1.31 -0.28 19.5 4.8 4.9

Table 2: Effects of 1996-97 stratification data on estimating 1997-98 total enrollment.

Province Relative Bias with Stratum Births DeathsCV 96-97 jumpers (%) (%)

Stratification (%)(%)

Prince Edward Island 1.02 -1.5 1.5 1.4 2.9Nova Scotia 1.08 -1.6 2.2 3.2 2.2New Brunswick 1.14 -0.5 5.0 1.3 6.7Quebec 1.06 -1.2 6.2 2.5 2.2Manitoba 1.07 -2.1 6.9 4.8 4.4Saskatchewan 1.08 -0.4 5.3 1.4 2.8

line. Since schools tend to be stratified into wide ranges ofenrollment level, Figure 3 suggests that one-year changes intotal enrollment may not result in a lot of stratum jumpersexcept for schools around the stratum boundaries.

Table 2 presents the effects of sampling from a one-year-oldframe in terms of relative CV, bias and proportions of stratumjumpers, birth units and death units. The relative CV is theratio of the sampling CV for estimating current (1997-98)total enrollment based on the old (1996-97) frame to thatbased on the current (1997-98) frame. The CVs are calculatedassuming optimum sample allocation to strata. The stratawere defined as proposed by Singleton (1998). The stratumdefinition incorporated the location, type and graderange of school as well as total enrollment. Forestimating total enrollment (or variables whichcorrelate strongly to it), the results indicate anoticeable erosion in sampling efficiency (CV)corresponding to the percentage of stratum jumpersand deaths when using an out-of-date frame,though the magnitude of erosion may perhaps stillbe acceptable. The bias due to birth units isgenerally small compared to the CV erosion. Theprovince of New Brunswick has the biggest jumpin the expected CVs though the percentage of stratum jumpersand deaths combined are comparable to Manitoba. Theexplanation for this becomes more apparent in our discussionlater on. Our overall observation is that, for school surveyscollecting data which correlate well with total enrollment, usingframe data from the preceding year may not result in a greatloss of sample efficiency or coverage errors.

Unfortunately, the same conclusion may not hold when totalenrollment is not the desired stratification variable, especially incases where sampling is targeted at a specific subpopulation ofschools and students. Such is the case with the Canadian Youthin Transition Survey3 (YITS) which targets 15-year-old studentsin its school sample. Figure 4 is a plot of thenumber of 15-year-old students in schoolsbetween 1996-97 and 1997-98 (i.e. same asFigure 3 except for 15-year-old students). Theuse of the enrollment for a single year of age asopposed to total school enrollment as astratification variable would make it necessaryto have strata which are much narrowerespecially for the smaller schools. With thesame construction as Table 2, Table 3 show theimpacts when we simulate the YITSstratification using the old (1996-97) and current (1997-98) frame data. The results show a significant framedegradation in all cells. Frame stratification using preceding year data could lead in a large number of stratum 3 The Youth in Transition Survey, a longitudinal survey initiative of Statistics Canada and the federal ministry ofHuman Resources Development Canada, examines in depth issues related to youth’s transition between educationand the world of work. In its first cycle, the sample cohort of 15-year-old students was integrated with the OECDinitiative of the Program for International Student Assessment (PISA).

Page 7: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

77

jumpers and deaths, resulting in a significant loss of sample efficiency. In the case of New Brunswick andSaskatchewan, the high levels of births and deaths may indicate important structural changes to schools with 15-year-old students, that was overlooked when looking at total enrollment only.

In short, we conclude that the current delay of nearly one year after the school year in obtaining administrative datafrom provincial ministries is most undesirable if we rely on them as the school frame. With special efforts,provincial ministries can provide Statistics Canada the required administrative data by the end of the school year,shortening the delay by 12 months. Under this scenario, the resulting school frame may only be of acceptablequality for designing surveys when survey variables correlate well with total student enrollment. However, such aframe may not be useful for designing surveys which focus on specific segments of the school or student population.It is also important to note that we must be able to deal with the massive changes to the school frame in the eventthat the school system is undergoing major structural changes. The need for timely and accurate school frame ismost acute in view of the narrow window of time within the school year for surveying students and staff withoutaffecting school activities.

4. THE DEVELOPMENT OF A SCHOOL FRAME IN CANADA

In view of the potentially high risks in using a school frame based on administrative data currently obtained fromprovinces, Statistics Canada decides to adopt a new strategy for implementing its school frame. This plan will beadjusted if necessary as more experience is gained from field operations.

The school frame carries key information about each school board and each elementary or secondary school inCanada. Information is kept at both the board and school levels to reflect their relationship and to facilitate surveyfollow-up. Conceivably, a future survey may also want to select school boards as primary sampling units when amulti-stage sample design is warranted. For every school board and school on the frame, the data to be included(i.e. collected or generated) are geocoding of the board or school, board identification and address (e.g. boardnumber, board name, building number, building name, post box, rural route number, street, city and postal code),contact information (e.g. name and title of contact person, telephone and fax numbers, and email and websiteaddresses), and other key characteristics of the board/school (e.g. first and second languages, grade range, studentenrollment, funding source, type of board, religion, and start and closing dates).

As mentioned above, to be useful, the school frame needs to be timely and up-to-date. The frame must be in placeshortly after the school year has started so that schools can be surveyed without affecting school activities (e.g.school breaks and examinations). To meet this requirement, Statistics Canada will be collecting the frame datadirectly from school boards as opposed to provincial ministries. School boards would have no difficulty insupplying the frame data since they represent only a small part of the administrative data in their school board reportto provincial ministries by October 1 each year. This strategy requires contacting about 500 school boards only andis far more feasible than contacting about 16,000 schools in Canada. To start off, all school boards were contactedin April, 2000 for constructing the initial school frame. This first round of frame construction obtained responses forover 90% of schools in the first six weeks. The school boards will be followed up twice a year (i.e. fall and spring)for updating the school frame.

In the past few years several new initiatives have been put in place at Statistics Canada to improve thecommunication and foster closer collaboration between Statistics Canada and the provinces. This increase incollaboration and communication should alert Statistics Canada to any changes in the school systems as theyhappen. When major changes occur in the school system, Statistics Canada will approach provincial ministrieswhere necessary to update the list of school boards in August (just before school starts). The usual frame updates inthe fall and spring will then follow as mentioned above. Moreover, whenever the need arises, additional frame datawill be collected from school boards for mounting surveys on specific segments of the school or student population.As an example, the number of students of specific age (e.g. 15) in all schools could be collected for designingpossible future cycles of the Program for International Student Assessment (PISA).

Page 8: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

78

5. SUMMARY

In Canada, education is a provincial responsibility, with a ministry of education in each province. With closecollaboration and partnership of provincial education ministries, Statistics Canada has adopted a model of theeducation system and the corresponding framework of the education statistics program which would meet theresearch and policy development needs in Canada. The model brings out the strategic importance of implementing asampling frame of elementary and secondary schools, which will enable efficient studies related to school andschool neighbourhood factors affecting children’s learning.

The Canadian education system is very diverse in the education programs offered across the country. Fortunately,the task of collecting education statistics has been made easier by the wealth of administrative data available due tothe extensive involvement of governments in education. However, the delay in obtaining and processingadministrative data has made them of limited use as a school frame. To overcome this problem, Statistics Canada istaking new initiatives to implement a timely and accurate sampling frame of elementary and secondary schools inCanada. With the active participation of provincial ministries and school boards, the success of this developmentstrategy is assured for 90% of the schools (and 95% of the students). With respect to the remaining schools (i.e.private and independent schools) Statistics Canada will continue to survey them directly every year. To ensuremaximum coverage, additional data sources such as the Business Register of Statistics Canada, the CanadianAlmanac and the MDR’s directory of schools will be exploited in identifying new schools.

ACKNOWLEDGEMENT

We like to thank all our colleagues in Statistics Canada, whose suggestions and efforts helped improve this paper. Inparticular, our thanks to Doug Drew and Raynald Lortie, Centre for Education Statistics, for giving a lot of theirinsight on the subject of this paper.

REFERENCES

Brackstone, G. J. (1988). “Statistical uses of Administrative Data: Issues and Challenges,” Proceedings of the 1987International Symposium on Statistical Uses of Administrative Data, Statistics Canada, Ottawa, Ontario.

Drew, D., Lipps, G., and Hodgkinson, D. (1998), “New Information Initiatives in Canada to Support Policy Issuesin Education,”, paper presented at the Fourth International Meeting, Statistical Information Trustees of Mexico,Mexico City, September 1998.

Singleton, J. (1998). “Methodological Overview: Sample of Schools”, Statistics Canada internal document.

Page 9: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

89

EVALUATING THE COVERAGE OF THE U.S. NATIONAL CENTER FOR EDUCATIONSTATISTICS’ PUBLIC AND PRIVATE SCHOOL FRAMES USING DATA FROM THE

NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS

Hyunshik Lee, John Burke, and Keith Rust, Westat, Inc.Hyunshik Lee, Westat, Inc., 1650 Research Blvd., Rockville, MD 20850

[email protected]

ABSTRACT

The National Center for Education Statistics (NCES) maintains databases of public and private elementary andsecondary schools in the United States. These are called the Common Core of Data (CCD) and the PrivateSchool Survey (PSS), respectively. These databases are used as sampling frames for NCES surveys of schoolsand students. There is always a time lag between the reference period of a frame and the actual data collection.During that time, critical school variables, such as in-scope status, enrollment and addresses, can change. Thesechanges create potential issues of coverage for these sampling frames. This presentation will report on anevaluation exercise of these NCES frames using data from a commercially available database of schools,Quality Education Data (QED), and information from a fielded survey, the National Assessment of EducationalProgress (NAEP). The study will evaluate the impact of changes in critical CCD and PSS variables that are usedto construct sampling frames of schools.

Key Words: Eligibility, Contact information, Survey disposition, Enrollment

1. INTRODUCTION

The National Assessment of Educational Progress (NAEP) conducts annual (for national assessments)and bi-annual (for state-by-state assessments) surveys of elementary and secondary school students, insubjects such as mathematics, science, reading, and writing. The samples are two-stage (for the state-by-state assessments) or three-stage (for the national assessments) in which the final stage sample is ofstudents, and the penultimate stage is of schools. Both public and private schools are included.

The key to obtaining high quality coverage of the populations of students of interest lies principally inobtaining complete and accurate lists of public and private schools throughout the United States. TheNational Center of Education Statistics (NCES) maintains a publicly available file of public schools, andalso maintains a file of private schools that can be used as frames for NCES-sponsored surveys, such asNAEP. Thus an important issue for planning and conducting each cycle of NAEP is the quality of thesefiles, both in absolute terms, and in comparison to alternative sources for NAEP frames.

In this paper we combine information from the NCES school files, an alternative list of schoolsmaintained by Quality Education Data, Inc., and information obtained during the 1998 national NAEPassessment, to conduct a limited evaluation of the NCES school files that were available at the time ofNAEP sample selection in the spring of 1997. Sections 2 through 5 provide the necessary backgroundinformation that enables the reader to understand the analyses that are presented later, and their findingsand limitations. In section 2 we discuss the NCES public school file, known as the Common Core of Data(CCD), and in section 3 we similarly discuss the NCES list of private schools, developed for the PrivateSchool Survey (PSS), conducted by NCES. In section 4 we discuss alternative lists of public and privateschools available from commercial organizations. Section 5 briefly describes the design of NAEP, and itssampling frames. Having established the necessary background, Sections 6 though 8 then describe theanalyses that were undertaken, the results of these, and the conclusions that we have drawn. Section 6covers the evaluation of CCD, the NCES public school list, while Section 7 discusses the PSS the privateschool list. In Section 8 we give some summary conclusions.

2. NCES PUBLIC SCHOOL FRAME: COMMON CORE OF DATA (CCD)

The Common Core of Data (CCD) is a program of the National Center for Education Statistics (NCES)of the U.S. Department of Education. Its primary goal is to annually produce a comprehensive, nationalstatistical database of all public elementary and secondary schools and school districts. The NCES websitestates that the objectives of the CCD program are twofold: (1) to provide an official listing of publicelementary and secondary schools and school districts in the nation, which can be used to select samples

Page 10: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

90

for other NCES surveys; (2) to provide basic information and descriptive statistics on public elementaryand secondary schools and schooling in general.

The program consists of five surveys of which three are nonfiscal and the other two are fiscal. Most ofdata are collected from the state education agencies, which report to the surveys based on theiradministrative records. The surveys cover 50 states, the District of Columbia, Department of DefenseSchools, and the outlying areas. One of the three nonfiscal surveys produces a public school frame file,called the Public School Universe, which is used in the study. This school file contains information on allpublic elementary and secondary schools (approximately 89,000) in operation during a school yearincluding school name, address, phone number, school type, grade span, enrollment by grade, studentrace/ethnicity, the number of classroom teachers, etc.

The CCD products are available free of charge and the public school files are available on the NCESweb site starting with the 1987/88 school year. The CCD public school file (the Public School Universe)will be referred to as CCD in this paper.

3. NCES PRIVATE SCHOOL FRAME: PRIVATE SCHOOL SURVEY (PSS)

The Private School Survey (PSS) is the private school analogue to CCD universe survey of publicschools, although in contrast to CCD, PSS is conducted only every other year. In addition to basic name,address and telephone information the survey captures student enrollment by grade and by race/ethnicity,counts of teachers employed, type of school/program, religious orientation or purpose and membership inprivate school organizations. There is no PSS equivalent to the CCD education agency file, not even forRoman Catholic schools, most of which are organized by Dioceses much as public schools are organizedby local school district.

PSS uses a dual frame approach consisting of a list frame and a PSU-based area sample frame. Everyother year the previous PSS school universe file is freshened with newly opened schools found bycanvassing a broad variety of sources. National membership lists are obtained from private schoolassociations and religious denominations. State departments of education are asked to provide privateschool lists as are state health and recreation departments. These lists are unduplicated against the existingPSS universe file and newly identified schools are added to the list frame.

The more costly frame building efforts are limited to the area frame which consists of 124geographically defined primary sampling units (PSUs) selected to provide representation by Census region,metro/nonmetro status and percent of students enrolled in private schools. The area sample list building ismore intensive and attempts to identify all private schools within the PSUs. The primary source of newschool listings is the Yellow Pages, although non-Roman Catholic religious institutions, local educationagencies, chambers of commerce and local government offices are contacted as well. As with the listframe, potential new schools are unduplicated against the area frame before being added.

PSS is a mail survey. Schools that do not mail back a completed questionnaire are followed up bytelephone. The PSS universe file contains records for survey nonrespondents, as well as for respondents,although not much data is available for nonrespondents beyond name and address. In terms of maintainingthe best possible coverage however the retention of PSS nonrespondents on the file is an advantage to othersurveys using the PSS file as a private school sampling frame.

4. OTHER SCHOOL FRAME DATA SOURCES

There are two major commercially available school frame data sources: Quality Education Data andMarket Data Retrieval of Dun and Bradstreet. While these firms provide all needed frame information toconduct a school survey, the main emphasis is on marketing rather than on statistical aspects. Since we didnot study the latter database, we omit its description, which can be found in Hamann (2000).

QED’s National Education Database is a comprehensive education database covering all U.S. andCanadian educational institutions including both public and nonpublic elementary and secondary schools,post-secondary institutions, and school districts. It has been produced since 1981 by Quality EducationData, Inc. headquartered in Denver Colorado, which is a research and database company, focusedexclusively on education. The QED on-line database with multiple files and tables is the core data resourcefrom which production files of various forms and sizes are built on a monthly basis. It is updatedcontinually in contrast to CCD, which has an annual production cycle. One of the primary data sources forQED updates is CCD and the company also does their own data collection using mail surveys and updates

Page 11: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

91

are telephone-verified. QED contains almost all the data items that are contained in CCD and more, such asinformation on technology (cable, computer, video, satellite dish, on-line capability, etc.), educationclimate index and Orshansky wealth indicator in the case of district level data, etc. Because of this closerelation between the QED database and CCD, the QED database maintains the CCD identification numbersfor most records on QED.

5. NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS (NAEP) AND ITS SAMPLINGFRAMES

The purpose of the 1998 NAEP was to assess the achievement of U.S. public and private schoolstudents in civics, reading and writing at grades 4, 8 and 12. Although 1998 NAEP also involved state-by-state assessments, the focus of this description and of the analyses that follow is the National assessmentalone. National NAEP employed a multistage probability sample design, where counties or groups ofcounties were the first-stage sampling units, elementary and secondary schools were the second-stage units,and the third and final stage involved the selection of students within schools.

A total of 94 primary sampling units (PSUs) were selected, and within them a sample of 2,683 publicand private schools. Of the 883 schools selected for the grade 4 assessment, 733 schools actuallyparticipated; for grade 8, 761 schools participated out of 953 sampled; and for grade 12, 608 schools out of847. Various blocks or packages of exercises were administered in these schools to 36,104 4th graders,48,797 8th graders, and 48,588 12th graders for a total of 133,489 assessed students overall.

Within each PSU a list of schools was constructed using two sources. Public, Bureau of IndianAffairs, and Department of Defense Education Activity schools were obtained from the March 1997version of QED. Catholic and other private schools were obtained from the Private School Survey (PSS)developed for the NCES 1995-1996 School and Staffing Survey.

In order to be eligible for NAEP, a school has to be a regular elementary or secondary school and tooffer one of the three targeted grades: 4, 8 or 12. Regular schools are schools with students who areclassified as being in a specific grade (as opposed to schools having only “ungraded” classrooms). Thisincludes statewide magnet schools and charter schools, but strictly vocational schools where studentsreceive their academic instruction at another location are not considered to be regular schools. Note thatalthough there is a PSS survey item where respondents can identify their school as regular or other thanregular, that variable was not used to subset the file during private school frame construction.

A school's grade span is the lowest and highest grades for which the school has nonzero studentenrollment and a grade span variable can be found on both the QED and PSS files. A school record whosegrade span did not cover the NAEP-targeted grade was not included in the school frame for that grade. Forprivate schools the issue of grade span was a bit more complicated. The 1995-96 PSS questionnaire designpermitted a school to report both that it offered a particular grade and also that there were no studentsenrolled at that grade. To avoid coverage problems NAEP redefined the grade span for private schools interms of whether the grade was offered or had nonzero enrollment. A school appeared in the frame for aparticular grade without regard to its eligibility status for either of the two other designated grades. As aresult there is considerable overlap among the three grade-level frames.

For each grade-level sample, schools were selected without replacement and with probabilitiesproportional to an assigned measure of size. The measure of size was a function of the number of grade-eligible students in that school, estimated as total enrollment divided by the grade span. For example, aschool offering grades K to 12 would have grade eligible students estimated as 100 if the total schoolenrollment was 1,300. Further details of school sampling, subject and session allocation and studentsampling are omitted, as they are not relevant to the frame quality issues addressed by this paper.

6. COMPARISON OF NCES PUBLIC SCHOOL FRAME WITH OTHER FRAMES

To evaluate the quality of CCD as a sampling frame for school surveys in comparison with otherframes, which are as comprehensive as CCD in terms of coverage and data content, our study strategy wasto use the NAEP survey results, treating them as true, and to compare them with frame data to check theiraccuracy. For the public school frame, this strategy excludes MDR since QED was the main data source tobuild the 1998 NAEP sampling frame for public school samples.

This study strategy is cost-efficient since there is no need of extra data collection. However, it comeswith some limitations since the study results are subject to the sampling error, nonsampling error, and

Page 12: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

92

matching error. Nonresponse as a source of the nonsampling error can be important if the nonresponsepattern is correlated with frame data sources. We believe this is not the case for our study and other errorsare small enough to draw meaningful conclusions, especially in terms of relative merit of the competingframes.

When comparing different sampling frames for a survey, the main issues to be considered arecoverage, breath of available data items, and accuracy of the frame data. Since all the databases (CCD,PSS, QED) we compared in the study have all essential frame data such as contact information, studentenrolment information, school type, grade span, and student race/ethnicity characteristics, we focus on theissues of coverage and accuracy of frame data.

NAEP has two main programs, national and state assessments, each of which has two components, thepublic school assessment and private school assessment. The study was carried out separately for these twoassessments since their sampling frames were different. In this section we report some of the results of thestudy on the public school frame using the data from the 1998 NAEP national program (the study resultsfor the private school frame will be presented in Section 7). Even though the study was carried out usingboth national and state samples and the state program had a much larger sample size than the nationalprogram, in this paper we present only the results from the national program since the state results cannotbe extended to the whole universe due to the fact that not all states participated in the state program.

We used the 1998 NAEP data and the 1995/96 version of CCD. The timeliness of the CCD files hasbeen improved greatly in recent years and this version would have been available for the 1998 survey yearif CCD had been released under the current production schedule. In reality, only the 1994/95 version ofCCD was available at the time of frame building for the 1998 NAEP public school samples, which startedin April 1997. The use of the 1995/96 CCD file in the study instead of the 1994/95 version would givemore favorable results for CCD and also reflect current and future circumstances under which NAEP andother school surveys would be conducted. The 1998 NAEP public school sampling frame was built usingthe March 1997 version of QED. From now on, we will refer to QED and CCD without a version indicatorto mean those versions used in the study unless specified otherwise.

Even though QED has a variable that links to CCD, not every NAEP sample school could be matchedto a record on CCD. However, a task that matched the NAEP data from 1990 to 1998 to the 1995/96version of CCD was carried out prior to this study (see Dymowski, 1998, for detail). The task usedmatching software that was developed for the task and achieved a very high match rate. We used the resultof this work and the weighted match rate for the 1998 NAEP public schools was about 98 percent.

The disposition information collected during school recruitment provides valuable information onschool eligibility irrespective of the school’s participation in the assessment. It includes information onwhether a sampled school is open or closed, if it is open, then whether it is regular or not, and if it isregular, whether it has the sampled grade or not. Table 6.1 presents weighted distribution of the differentdisposition codes for the national samples. Wilson’s score method (Newcombe, 1998) with the effectivesample size was used to compute the confidence interval for its superior performance.

A small percentage of sampled schools (selected from the QED-based frame) were found to benonregular (0-3.3%) contrary to what QED indicated on the frame. Another small percentage (0.4-1.5%) ofsample schools were found to be closed. Among open regular schools, a smaller percentage (0.2-1%) ofschools were found ineligible because they did not have the sampled grade. Overall, the weighted eligibilityrates range from 95% (for the Grade 8 sample) to 99% (for the Grade 12 sample).

The NAEP samples were matched to CCD and the weighted match rates were 98.7, 96.1, and 99.3%,respectively, for Grades 4, 8, and 12. This means that 0.7-3.9% of NAEP sample schools were not on CCD.Among those found on CCD, 92.4-96.6% of schools were coded as eligible on CCD. This implies that ifCCD is used as the sampling frame, the sample starts with 4-7% less coverage than when QED is used asthe frame if one assumes that QED includes all eligible schools on CCD. This is not an absurd assumptionsince QED uses CCD as its primary data source.

We now consider the school coverage by CCD status of NAEP sample schools coded as open withsampled grade shown in Table 6.2. Over 94% of schools were found to be on CCD with the sampled grade.The school coverage tells only a part of the story since the ultimate survey unit is the student not the school.So it is useful to look at the student coverage as well, also shown in Table 6.2. The student coverage is ingeneral better than the school coverage, which means that the schools missed by CCD tend to be small.

Page 13: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

93

Table 6.1. Distribution of the dispositions of the 1998 NAEP sample public schools

95% Confidence IntervalGrade Disposition Freq

Unweighted(%)

Weighted(%) Lower Upper

Open with sampled grade 522 98.1 97.5 95.3 98.7Open w/o sampled grade 5 0.9 1.0 0.4 2.5Not a regular school 0 0.0 0.0 N/A N/A

4

Closed 5 0.9 1.5 0.6 3.6Open with sampled grade 497 98.0 94.9 88.7 97.8Open w/o sampled grade 2 0.4 0.4 0.1 1.5Not a regular school 3 0.6 3.3 1.0 10.2

8

Closed 5 1.0 1.3 0.4 3.8Open with sampled grade 527 99.1 99.2 97.8 99.7Open w/o sampled grade 1 0.2 0.2 0.0 1.2Not a regular school 1 0.2 0.2 0.0 1.2

12

Closed 3 0.6 0.4 0.1 1.5

Table 6.2. School and student coverage by CCD status of 1998 NAEP sample public schools with theirdisposition coded as open with sampled grade (weighted estimates in terms of school weight)

Weighted coverage (%) 95% Confidence IntervalGrade CCD status Freq School Student School Student

On CCD with sampled grade 501 96.5 96.6 (94.9, 97.6) (95.0, 98.2)On CCD w/o sampled grade 11 2.2 1.8 (1.2, 4.1) (0.6, 3.0)4Not on CCD 10 1.4 1.6 (0.7, 2.7) (0.6, 2.6)On CCD with sampled grade 470 94.1 95.4 (90.0, 96.6) (93.2, 97.6)On CCD w/o sampled grade 14 3.5 2.3 (1.7, 7.2) (0.9, 3.7)8Not on CCD 13 2.4 2.3 (1.2, 4.6) (0.9, 3.7)On CCD with sampled grade 517 96.8 98.3 (92.8, 98.6) (97.1, 99.5)On CCD w/o sampled grade 7 2.7 1.2 (1.0, 6.8) (0.0, 2.4)12Not on CCD 3 0.5 0.5 (0.2, 1.6) (0.0, 1.1)

Next, we examine the accuracy of contact information on QED and CCD against the informationcollected during NAEP field operations. We consider this using only cooperating non-substitute schools,and thus, the sample sizes for this analysis are smaller than for the previous analysis for the coverageissues. School names and addresses on QED are generally fairly accurate as it is observed that fewer than5% of them were found to be inaccurate. On the other hand, the telephone numbers were found to be moreoften inaccurate. About 6-10% of area codes and 4-6% of local numbers were faulty. The ever-increasingdemand for new phone numbers is in general the main driving factor for more frequent changes of areacodes than local phone numbers as explained in Pierkarski et al. (1999) without change of the location.

The results for CCD are much worse than those for QED. School names were incorrect on CCD 16-22% of the time depending on the sample grade. Addresses are even worse with 20-25% faulty rates. Inmany cases, CCD gives only mailing addresses with a wrong P.O. Box number and CCD school namesmiss important descriptions such as “elementary.” Since exact matching of names and addresses from twodifferent sources was not possible, we applied a certain matching criteria and thus the results could bedifferent if the criteria are changed. However, the relative merits of different frames would not be muchaffected since the same criteria were applied to both QED and CCD.

Moreover, it is clear that QED maintains better information on the contact information than CCD whenwe compare the telephone numbers since there is no such ambiguity in matching phone number as fornames and addresses. The faulty phone number rates were 10-15% for the area code and 14-16% for thelocal number.

Lastly, we compare enrollment data from QED and CCD with the field collected enrollment data.NAEP uses an estimated enrollment to determine the size measure for probability proportional to size(PPS) sampling. The estimated enrollment is calculated as the average grade enrollment (the total schoolenrollment divided by the grade span that is the difference between the highest grade and the lowest grade).

Page 14: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

94

Even though individual grade enrollment is available on QED (and CCD), it has not been used directlybecause it fluctuates too much over time. On the other hand, the estimated enrollment may be tooinaccurate. Therefore, it was a good opportunity to evaluate not only the frame information on enrollmenton QED and CCD but also the NAEP procedure to determine the size measure.

We first examine the significance of the differences between the field collected enrollment data andestimated and individual grade enrollment data from QED and CCD. The student t-test was performed totest whether or not the weighted average differences are significantly different from zero as shown in Table6.3. The differences for the estimated enrollment used by NAEP were significant at 5% level for all threegrade samples with a particular pronouncement for the Grade 12 sample.

Table 6.3. Comparison of mean grade enrollment obtained from 1998 NAEP field operations with meanestimated grade enrollment from QED and CCD, and with mean grade enrollment from QED and CCD

Significance in differenceGrade

Samplesize

NAEPmean

Other enrollment datasource Mean t-value P-value (%) < 5%

Estimate from QED 75.8 -4.1 0.0 YesQED grade enrollment 71.5 0.6 52.6 NoEstimate from CCD 72.2 0.1 92.6 No

4 422 72.2

CCD grade enrollment 72.9 -0.6 54.9 NoEstimate from QED 174.0 -2.2 2.9 YesQED grade enrollment 165.9 1.2 24.9 NoEstimate from CCD 167.8 0.5 63.1 No

8 374 168.9

CCD grade enrollment 171.0 -0.9 37.3 NoEstimate from QED 212.0 -18.6 0.0 YesQED grade enrollment 165.5 4.7 0.0 YesEstimate from CCD 205.3 -11.9 0.0 Yes

12 402 176.4

CCD grade enrollment 166.7 4.7 0.0 Yes

If the CCD enrollment data had been used with the NAEP size determination procedure, the resultswould have been much better (see the estimate from CCD rows in Table 6.3). Generally, individual gradeenrollment on both QED and CCD is a better size measure than the estimated using QED enrollment data.Also, CCD enrollment information provides a better size measure either directly or via the NAEPestimation procedure. This is somewhat strange because QED starts from CCD data and updates CCDinformation through their own data collection efforts. Examining scatter plots of field enrollment dataagainst the QED grade enrollment, we found that there are quite a few cases with zero grade enrollmenteven though field observed enrollments were far from zero. It may be just one time phenomenon. If,however, this problem has occurred systematically over time, QED needs to correct the systematic error. Itis also revealing that both data sources failed to provide a good size measure for the Grade 12 sample (CCDdid much better, though). A through investigation may be warranted to find out the cause.

We also performed regression analysis with a hope of finding a better prediction of the size measureusing the linear model: ( )2

2211 ,0iid~, σεεββα iiiii xxy +++= , where iy is the field observedenrollment, ix1 and ix2 are frame enrollment data either estimated (by the NAEP procedure) or individualgrade enrollment, α , 1β , and 2β are regression coefficients, and iε is the error term. We assumed aconstant error variance. The R-square measure was slightly improved when two independent variables wereused rather than one and CCD provided slightly better predictors than QED. The R-square measure rangesfrom 0.80% to 0.91% as shown in Table 6.4 along with estimated regression coefficients. Note that bothpredictor variables are more or less equally important. The regression predicted size measure has bothadvantages of being stable over time and more accurate than the current measure. Therefore, the regressionpredicted size measure is recommended for any school surveys that need a size measure regardless theframe data source.

The discussion given in this section was extracted from Lee (2000), which presents much moredetailed accounts of the study results.

Page 15: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

95

Table 6.4. Results of regression model fitting: estimated regression coefficients and some test statistics

Predictor variable Estimated regression coefficientsGrade Data

1x 2x α̂ P-value1β̂ P-value

2β̂ P-valueR-

square

QED Estimated Grade 7.2 0 0.68 0 0.19 1 0.814CCD Estimated Grade 7.6 0 0.39 0 0.50 0 0.80QED Estimated Grade 14.9 0 0.39 0 0.53 0 0.878CCD Estimated Grade 9.7 0 0.38 0 0.56 0 0.90QED Estimated Grade 6.8 0 0.52 0 0.36 0 0.9112CCD Estimated Grade 4.6 4 0.29 0 0.67 0 0.91

7. COMPARISON OF NCES PRIVATE SCHOOL FRAME WITH OTHER FRAMES

Prior to 1996 the QED school file was the source of the NAEP private school sampling frame. In 1996a combined QED/PSS frame was used, and in 1998 and beyond, PSS alone. For this analysis we use theresults of the 1998 NAEP private school recruitment process and survey experience to evaluate the qualityof the PSS file relative to QED.

Each school sampled for NAEP is assigned a disposition code, which tracks the status of the schoolfrom recruitment through assessment. During the field period these codes facilitate the computation ofresponse rates, and later on during weighting they are crucial to proper nonresponse adjustment. Despitethe care exercised in selecting the NAEP school samples, inevitably some sampled schools turn out not tobe eligible. This has the potential to compromise the study if the student sample size were to fall too shortof the target. NAEP allows a limited amount of substitution for refusals, but an ineligible school is apermanent loss. NAEP samples are restricted to regular schools containing grades 4, 8 or 12. There arethree reasons why a sampled school may be ineligible. It may have closed. Even if open, it may not be aregular school. Or if open, and a regular school, there may be no students enrolled at the sampled grade.To truly understand the results of the school recruitment process and what it has to say about PSS framequality, it helps to have some background information on NAEP school frame construction and how it isdecided that a school will be included on the private school sampling frame.

For the purposes of NAEP private school frame construction, every school listed on the 1995-96 PSSuniverse file was assumed to be open in 1998, except for those that PSS classified as out-of-scope. Inparticular PSS nonrespondents were included as well as PSS respondents. On the PSS questionnaire thereis a question (“What type of school/program is this?”) which allows a school to identify itself as Regular,Montessori, Elementary or secondary with a special program emphasis, Special education,Vocational/technical, Early childhood/daycare or Alternative. This item was not referenced in determiningwhat records would or would not be included in the NAEP school frame. Restricting the frame to includeonly those schools self-identifying as ‘regular’ could lead to coverage problems. Instead NAEP chose toerr on the side of inclusiveness with the knowledge that this would result in a higher rate of ineligibilityamong sampled schools. This attrition due to ineligibility was offset in advance by selecting a largerschool sample to begin with.

The PSS questionnaire asks both whether an individual grade was offered and how many students wereenrolled at that grade. During data cleaning NCES forced consistency between these two fields i.e. the ‘doyou offer’ flag was set to yes if and only if the enrollment count for that grade was greater than zero. In theinterest of avoiding coverage bias Westat asked to be given a file containing the flags prior to editing andused both the unedited flag and the enrollment to decide whether or not a school had the grade of interest.Nonrespondents to the Private School Survey are included on the PSS file but the records contain littledescriptive data beyond name and address. Since grade span information was not available for theseschools, they were included in the NAEP frames at all three grades (4, 8 and 12). These decisions risked apotential increase in ineligibility, in order to avoid bias.

From the disposition code one can tell retrospectively whether or not a school was open, was a regularschool and had the specific grade for which the school was sampled. Table 7.1 presents the distribution ofprivate school dispositions for 1998 National NAEP.

Page 16: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

96

Table 7.1. Distribution of 1998 National NAEP private school dispositions

95% ConfidenceintervalGrade School disposition Freq Unweighted

(%)Weighted

(%) Lower UpperOpen and offers sampled grade 307 87.5 82.3 76.6 88.0Open but does not offer sampled grade 15 4.3 6.2 2.6 9.8Not a regular school 13 3.7 4.2 2.0 6.54

Closed 16 4.6 7.3 3.1 11.4Open and offers sampled grade 378 84.8 77.6 73.1 82.2Open but does not offer sampled grade 31 7.0 9.5 5.9 13.1Not a regular school 19 4.3 5.7 2.9 8.58

Closed 18 4.0 7.2 3.7 10.7Open and offers sampled grade 220 69.8 61.5 55.2 67.8Open but does not offer sampled grade 51 16.2 19.8 14.2 25.3Not a regular school 32 10.2 15.9 10.8 21.012

Closed 12 3.8 2.8 1.4 4.2

Most private schools sampled for 1998 NAEP were eligible. Weighted estimates of the proportioneligible on the PSS-derived school frames range from 61.5% for grade 12 to 82.3 percent for grade 4. Theobserved rates of ineligibility are partly an artifact of the frame building procedures described in thepreceding paragraphs. Those procedures go a long way towards explaining the relatively high percentagesof grade 12 schools that do not have the sampled grade (19.8 percent) or that have the sampled grade butare not regular schools (15.9 percent). The extent to which closed schools were found in the NAEPsamples is strictly a PSS quality issue. Slightly over 7 percent of sampled schools were determined to beclosed at grades 4 and 8, and a little less than 3 percent at grade 12.

The NAEP private school sample was merged against the QED file in order to evaluate whether QEDinformation more accurately predicted the dispositions encountered during school recruitment. Table 7.2examines eligible schools only. It is noteworthy how great a proportion of PSS-sampled eligible schoolscould not be found on the QED file or if found, did not have the sampled grade. This ranged from around16 percent for grade 8 to 27 percent for grade 12. However if the issue is reframed in terms of students, theapparent QED undercoverage shrinks to less than 9 percent for grade 8 and under 7 percent for grade 12.One cannot conclude from this however that PSS provides superior school and student coverage relative toQED. Since PSS alone constituted the 1998 private school sampling frame, there is no way of using the1998 NAEP sample to identify eligible schools from QED that are not on PSS.

Table 7.2. QED status of PSS schools classified during 1998 NAEP as eligible

Weighted coverage (%) 95% Confidence intervalGrade QED status Freq School Student School Student

On QED with sampled grade 266 80.2 90.7 (74.6, 85.8) (87.0, 94.4)On QED without sampled grade 5 1.5 2.0 (0.1, 2.9) (0.0, 4.1)4

Not on QED 36 18.3 7.3 (13.0, 23.6) (4.5, 10.2)On QED with sampled grade 336 83.8 91.2 (77.6, 90.1) (87.5, 94.8)On QED without sampled grade 14 3.5 3.4 (1.3, 5.6) (1.2, 5.7)8

Not on QED 28 12.7 5.4 (7.2, 18.2) (2.8, 7.9)On QED with sampled grade 180 72.7 93.4 (65.1, 80.3) (91.1, 95.6)On QED without sampled grade 12 7.4 2.1 (3.3, 11.4) (1.0, 3.3)12

Not on QED 28 19.9 4.5 (13.2, 26.6) (2.5, 6.5)

Page 17: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

97

Similar analyses were performed looking at PSS schools found to be ineligible during NAEPrecruitment, by individual category of ineligibility (does not offer sampled grade, not a regular school,closed). This was with the goal of ascertaining whether QED had correctly identified these schools asineligible. The general conclusion was that had QED been used as the sampling frame most of the PSSineligibles would have had no chance of selection, but usually because the school was not listed on QED.

NAEP school recruitment requires accurate contact information. In the worst case scenario errors inname address or telephone number can lead to the wrong school being recruited or a disposition of ‘closed’when the school is merely unlocatable. School name, street address, city, zip code, area code and localtelephone number were corrected as necessary by field staff as part of NAEP survey operations. Thisupdated contact information was compared to the original PSS information and also to QED information ininstances where a matching school record could be located on the file. With the exception of area code,and to a much lesser extent local telephone number, the contact information on PSS appeared to be of veryhigh quality. In all likelihood the telephone numbers were out of date because PSS is conducted everyother year. About half the time QED had the correct area codes when PSS did not.

Having accurate information on the number of eligible students enrolled in a school is critical to theNAEP sample design. Enrollment at the sampled grade is estimated during school frame preparation. Themeasure of size used in school sampling is a function of this estimate and projections of student sampleyields are made using these estimates. For these reasons it is important that the frame estimate be close tothe actual number of eligible students in cooperating schools. If it deviates greatly it can result in highlyvariable student weights or even a shortfall in the student sample yield.

While enrollments for each individual grade are available on PSS, they were not used directly. NAEPestimates a school’s per-grade enrollment as the total number of students enrolled divided by the number ofgrades taught. If a particular school offers more than one of grades 4, 8 or 12, the same estimated gradeenrollment is attributed to all of the relevant grades offered. This procedure was developed when QED wasused as the sampling frame and enrollment by grade was not available. However it has made sense tocontinue using it for two reasons. The PSS frame is out of date and the exact enrollment by grade will varyover time. This procedure smoothes out that variability. Also for some small schools the enrollment for aparticular grade may be zero one year and nonzero the next. It would not be desirable to exclude such aschool from the sampling frame on the basis of such a volatile measure.

The last analysis presented here compares frame estimates of the number of eligible students with whatwas actually encountered in the field at the time of student sampling. The comparisons were made usingboth PSS and QED estimates and is restricted to cooperating schools that appeared on both frames. Table7.3 summarizes the distribution of differences between the frame estimates of eligible students with the truenumber.

Table 7.3. Percentage distribution of schools, by difference between frame estimate and true enrollmentand by frame

Grade 4 Grade 8 Grade 12Frame estimate minus trueenrollment PSS QED PSS QED PSS QED-28 or less 0 0 1 1 2 2-27 to -17 1 2 0 1 1 1-16 to -6 9 13 7 9 10 12-5 to 5 63 61 55 53 47 406 to 16 22 21 32 30 28 3217 to 27 4 3 4 6 9 9

28 or more 1 0 1 0 3 4

The modal category is always centered about zero, where the actual student count was within plus orminus five of the estimated count. There’s an asymmetry to the distribution with a greater number ofschools overestimating the enrollment by 6 to 16 students than underestimating the enrollment by the sameamount. This asymmetry is most pronounced at grade 8. Then the distribution quickly trails off with longtails in both directions. The QED and PSS distributions are very similar. From this one can conclude thatthe enrollment information on the PSS and QED files is highly suitable for NAEP purposes. Neither of thetwo files is superior to the other in this respect.

Page 18: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

98

The above discussion of PSS frame quality was exerted from Burke (1999), which presents a muchmore detailed presentation of these and related topics.

8. CONCLUSIONS

In order to evaluate the NCES school lists as a sampling frame for NCES school surveys, we used thedata collected during the 1998 NAEP field operations. The disposition code maintained in the NAEP FieldManagement System for all sample schools provides information on school eligibility and to a certainextent on coverage. Moreover, participating schools provide richer data on updated contact information andstudent enrollment information. Treating these pieces of information as true even though they are not error-free, we evaluated the sampling frames used for the 1998 NAEP and their possible best alternatives thatcould be used for any NCES school surveys. QED was used for the public school samples and PSS wasused for the non-public school samples.

The study found out that the frames used for the 1998 NAEP are better in general than the alternatives.QED as the public school frame maintains better contact information, school type information, and gradeinformation but slightly worse enrollment information than CCD. It is difficult to address the coverageissues entirely due to lack of information on the true population with perfect coverage. However,considering that a higher percentage of eligible schools were not found or coded as ineligible on CCD thanon QED, QED appears to provide a better coverage than CCD. Nonetheless, it would be very interesting toconduct a similar study using the 2000 NAEP data, where CCD was used as the public school samplingframe, and thus, CCD and QED will play opposite role in the study. Complimenting the current study, suchstudy will provide a better picture on the coverage issue.

PSS also maintains good contact information, notably with the exception of the area code and localphone number to a much less extent. Both PSS and QED provide equally good size measures. The framecreation procedure from PSS intentionally includes possible out-of-scope schools to reduce undercoverage.Thus, the eligibility rates of the PSS samples were much lower than those rates seen from the QED publicschool frame. It is clear that QED was missing many NAEP eligible schools that were included on PSS.However, again we cannot fully address the question of relative coverage as we do not have informationabout the full population, and in particular eligible schools included on QED but not on PSS.

The regression analysis of the field collected enrollment data with frame enrollment data for publicschools demonstrates that the NAEP size measure determination procedure can be improved using theregression predicted values of student enrollment. The new procedure is suggested for any school surveysthat need a size measure based on student enrollment.

As the discussant pointed out, the school population is becoming more volatile. Therefore, maintaininga good school frame is becoming more difficult and more important and thus, it would be desirable toconduct evaluation of a school frame periodically to ensure an adequate reflection of the survey population.

9. REFERENCES

Broughman, S.P., and L.A. Colaciello (1998), “Private School Universe Survey, 1995-96,” NCES 98-229,Washington, DC: National Center for Education Statistics, U.S. Department of Education.

Burke, J. (1999), “An Evaluation of PSS Data Quality in Relation to 1998 NAEP Survey Experience,”Technical Report submitted to NCES by Westat as a part of ESSI Work Task 1.2.83.1.

Dymowski, R. (1998), “ESSI Work Task 1.2.76.1 – 1990 through 1998 NAEP Samples: NCES Identifiers,”Technical Report submitted to NCES by Westat as a part of ESSI Work Task 1.2.76.1.

Hamann, T.A. (2000), “Evaluation the Coverage of the U.S. National Center for Education Statistics’Public Elementary/Secondary School Frame,” Invited paper presented at the Second InternationalConference on Establishment Surveys.

Lee, H. (2000), “An Evaluation of QED and CCD as the sampling Frame for the National Assessment ofEducation Progress (NAEP),” Technical Report submitted to NCES by Westat as a part of ESSI WorkTask 1.2.83.1.

Newcombe, R.G. (1998), “Two-sided Confidence Intervals for the Single Proportion: Comparison of SevenMethods,” Statistics in Medicine, 17, pp. 857-872.

Pierkarski, L., G. Kaplan, and J. Prestegaard (1999), “Telephony and Telephone Sampling: The Dynamicsof Change,” paper presented at the Annual Conference of the American Association for Public OpinionResearch.

Page 19: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

79

EVALUATING THE COVERAGE OF THE U.S. NATIONAL CENTER FOR EDUCATION STATISTICS’PUBLIC ELEMENTARY/SECONDARY SCHOOL FRAME

Thomas A. Hamann, U. S. Census BureauGovernments Division, Washington Plaza II - Room 511, Washington, DC 20233

[email protected]

ABSTRACT

This paper compares the public school frame used by the National Center for Education Statistics (NCES) — CommonCore of Data (CCD) with commercially available database files of schools of by two private U. S. firms, Market DataRetrieval (MDR) and Quality Education Data (QED). The primary objective of this evaluation is to determine theaccuracy and completeness of the list of schools in the 1994-95 CCD Public Elementary/Secondary School UniverseSurvey. This evaluation effort examines public schools by school type (regular, special, alternative and vocational). Thethree data files - CCD, MDR, and QED - each average about 85,000 schools. The number of schools found on one of twofiles averages roughly 3,000 which amounts to a discrepancy in coverage between the files of about three to four percentof all public schools. In general, the CCD coverage of all schools and regular schools matched that of MDR and QEDfairly closely. The CCD appears to have broader coverage of special education and alternative schools than do the othertwo files, while both MDR and QED include considerably more vocational schools than does the CCD. Recommendationsinclude methods that call for adopting an assertive approach for improving the CCD. Such an approach would involvecreating a CCD school universe survey form that accommodates the various types of schools and state views andcompiling a more complete CCD list. This would enable potential future reconciliation of the CCD file to the MDR andQED files, and improve the CCD as a sampling frame for other surveys. A primary use of the results of this evaluationshould be reconciliation by the state CCD coordinators of their state’s non-matching schools.

Key Words: Common Core of Data, coverage evaluation, sampling frame

1. INTRODUCTION

1.1 Scope and Purpose of Evaluation

The Common Core of Data (CCD) is the National Center for Education Statistics’ (NCES) primary database onelementary and secondary public education in the United States. The annual CCD census is a comprehensive nationalstatistical database of all public elementary and secondary schools and school districts that contains comparable dataacross all states. The CCD surveys are designed to provide an official listing of public elementary and secondaryschools and school districts that can be used to select samples for other NCES surveys, and to provide directoryinformation for a variety of users. In addition, the CCD provides basic information and descriptive statistics onpublic elementary and secondary schools, students, and staff. The CCD is an important resource for policymakersand researchers at the state and local levels.

The school coverage evaluation described herein, commissioned by NCES, was conducted by the GovernmentsDivision of the United States Census Bureau. Its objective is to determine the accuracy and completeness of the listof schools used for the 1994-95 Common Core of Data’s Public Elementary/Secondary School Universe survey(referred to as the CCD file in this paper). The CCD file was primarily compared to those files of two private firms -Market Data Retrieval (MDR) and Quality Education Data (QED). The CCD collects data on a mail survey for theschools in existence as of October 1 of the survey year (1994-95). The MDR and QED files represent data collectedby a mail and phone survey covering the survey school year (1994-95). The CCD file was also compared to othersources, including the listings of schools from the Bureau of Indian Affairs (BIA) and the Department of Defense(DOD).

The CCD consists of three nonfiscal surveys. The nonfiscal surveys are the Public Elementary/ Secondary SchoolUniverse, the Local Education Agency Universe and the State Nonfiscal Survey. Together, these surveys provideschool names and addresses, and demographic information on students and staff in the public schools. Theinformation is collected at the school, local education agency, and state level, respectively. The CCD survey listingof schools is essentially determined by the respondents themselves. That is, it lists the schools that were reported bystate CCD survey coordinators.

This paper reports the results of research and analysis undertaken by the Census Bureau staff. It has undergone a more limited review by theCensus Bureau than its official publications. This report is released to inform parties and to encourage discussion.

Page 20: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

80

Market Data Retrieval (MDR) has collected education data for 30 years. The MDR education database covers alllevels of the education process from preschool through college in the United States and Canada, including publiclibraries. The MDR survey annually collects comprehensive data on educational institutions and personnel via mailand phone canvassing of school and district administrators. MDR offers mailing lists, database marketing services,state-by-state school directories, and customized statistical reports and analyses about the education market. MDR’sproducts and services are used to provide highly targeted mailing lists for direct mail marketing, telemarketingcampaigns, market research, and product development.

Quality Education Data (QED) has collected education data since 1981. QED has built a comprehensive database ofeducational institutions, encompassing “every single K-12 school and school district in the United States.” The QEDNational Education Database covers all public school districts and both public and nonpublic schools and supportsall QED products and services. These include market research, marketing databases, database design, annualresearch reports tracking critical educational trends, and customized database and mailing lists to the educationmarket. Each year, QED mails surveys to school and school district officials to collect information, including nameand address, financial, demographic, technology, program, faculty, and facility data. Similar to MDR, all datareceived from the mail survey are telephone-verified by QED market researchers.

1.2 Methodology and Limitations of Evaluation

Methodology

The comparison of data files for this evaluation was undertaken through several steps. First, survey forms andrelevant documents containing definitions and classification criteria for the CCD, MDR and QED surveys wereobtained. Next, it was verified that these files were for approximately the same time period. The CCD surveydefinitions were then compared with those found in the MDR and QED survey materials. Fourth, differences indefinitions and classification criteria that might affect coverage were identified. Fifth, common data fields wereidentified and the most efficient approaches to linking the files were determined. Finally, the CCD file was comparedto each outside source (QED and MDR files, school directories, etc.) separately. The entire universe of schools wascompared for all three files. No samples were drawn for this evaluation.

The record linkage, or school matching, process involved three phases. Initially, schools were matched electronicallyon CCD school identification number between the entire CCD file and the entire MDR and QED files. This wasaccomplished for each state of the United States and the District of Columbia1. Not all schools listed on the QED andMDR files are assigned a CCD identification number. In such cases, efforts were made to “hand match” the schoolswithout CCD identification numbers by school name, address, and grade range. Within a given state, all non-matching schools were then compared by school type, such as regular or vocational. For example, schools coded asspecial education on both the CCD and MDR files were compared. Lastly, the remaining non-matching schools onthe CCD file were compared to the entire MDR file, and non-matching schools in the MDR file were compared tothe entire CCD file.

The objective of the data file comparison was to generate accurate counts of matching and, particularly, non-matching schools (schools counted as non-matching were those found in one database file that did not appearanywhere in the other databases) between the three data files. The process used to compare the CCD file to both theMDR and the QED files was identical. No attempt was made to specifically compare the QED and MDR files.

Limitations

Several limitations to the matching efforts of this evaluation are recognized. The MDR and QED files often use theNCES (CCD) school identification number for schools that are also listed on the CCD file. However, not all schoolslisted on these two files have a CCD identification number assigned to them. The QED file contains many schoolswith zero-filled NCES school identification number fields. Similarly, the MDR files contain numerous records with ablank field for the NCES school identification number. The inability to match all schools by identification number, insome instances, resulted in a subjective assessment of whether or not two schools were the same school. This mayhave lead to an overstated number of non-matches.

1The CCD and QED surveys provided data for the outlying areas of the U.S. - American Samoa, Guam, the Northern MarianaIslands, Puerto Rico, the U.S. Virgin Islands, and the Department of Defense - while MDR did not provide these data. Thesedata are not included in this paper.

Page 21: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

81

Other hindrances to matching efforts include apparent situations of shared mailing addresses between schools orcommon education agency (school district) addresses given rather than the address of the individual school. In somecases one school on a file matches to two or three schools on the other file. These one-to-many, and many-to-one,matches help to explain the differences in the total non-matching school counts between the three data files.

Although every effort was made to ensure that similar categories of data (school types) were compared between thedata files, it is recognized that some inherent definition, categorization, and coverage differences exist between theCCD, MDR, and QED surveys. The appearance of a school on the MDR or QED files but not on the CCD file, orvice-versa, does not necessarily suggest that either file is in error. For example, the QED and MDR surveys collectinformation on adult schools while such schools are considered out-of-scope on the CCD survey. The method ofcomparison employed in this evaluation, i.e., a comparison of one data file to another, does not account forshortcomings each survey encountered initially in distinguishing (and classifying) a given school’s school type fromanother. The over- and under-reporting of certain school types is a concern for all three surveys compared in thisevaluation2. The findings and analyses provided herein are descriptive statistics on coverage differences between theCCD, MDR, and QED file and not coverage estimates for the CCD survey. No attempt was made to estimate thetotal number of schools or the number of schools that were missed or over-counted on the three files.

Finally, it is recognized that some of the identified non-matching schools, for a particular school type in a given statefor example, may now appear on a file - the result of being collected during one of the survey cycles conducted sincethe survey year (1994-95) used for this evaluation.

2. FINDINGS FROM COMPARISON OF DATA FILES

2.1 General Findings

In 1994-95, a total of 86,220 public schools were included on the CCD file, in comparison with 83,953 schools onthe MDR file and 87,135 on the QED file. Table 1 highlights the national counts and percentages of both electronicand hand matches found between the CCD and the MDR and QED files.

Table 1. Summary of Matching Schools – The Common Core of Data File Compared to the Market Data Retrievaland Quality Education Data Files: 1994-95 School Year

Files Compared:CCD and MDR CCD and QED

Data File* Electronic matches Hand matches Electronic matches Hand matchesNo. 76,923 6,601 72,719 10,437CCD % 89.2 7.7 84.3 12.1No. 76,923 4,208MDR % 91.6 5.0 -- --No. 72,719 9,170QED % -- -- 83.5 10.5

Key: The 76,923 electronic matches found between the CCD and MDR files ÷ the total number of CCD schools (86,220) = 89.2%.Note: *Does not include the outlying areas or DOD overseas schools. No attempt was made to compare QED and MDR files.Sources: MDR, QED, CCD.

The average number of non-matching schools (schools included in only one of two files) was about 3,000(approximately 3 to 4 percent) (Table 2). The discrepancy between the CCD file and the other two files wassomewhat evenly distributed among all the states, with several states having one percent or less of their schoolsappear on the MDR or QED files but not on the CCD file.

Table 2. Summary of Non-matching Schools - The Common Core of Data File Comparedto the Market Data Retrieval and Quality Education Data Files: 1994-95 School Year

No. of schools NOT found on data file:Data File

Total no.of schools CCD QED MDR

CCD 86,220 - 3,600 3,011QED 87,135 2,786 - -MDR 83,953 2,842 - -Notes: No attempt was made to compare QED and MDR to each other.Sources: MDR, QED, CCD.

2A technical review panel was conducted in March 1999 by NCES to address the difficulty in the CCD survey of distinguishingand categorizing all types of vocational education schools from other school types.

Page 22: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

82

The coverage percentages presented below (Table 3) show separate comparisons of the CCD file with the MDR andQED files. For example, 3.6 percent of schools included on the CCD file were not on the MDR file. Conversely, 3.3percent of MDR’s schools were not reported on the CCD file. In general, the CCD coverage of all schools andregular schools matched that of MDR and QED fairly closely. It is notable that regular schools, while accounting forover 90 percent of all the schools on all three files, accounted for only between 1 and 2 percent of the schoolsincluded on one of two files. Assuming that larger numbers of schools reflect better coverage, the CCD appears tohave a broader coverage of special education and alternative schools than do the other two files, while both MDRand QED include considerably more vocational schools than does the CCD file.

Table 3. Coverage Gap – Percent of Schools Included on OnlyOne of Two Data Files: 1994-95 School Year

Percent schools onCCD but not:

Percent schools not onCCD:CCD School

Type MDR QED MDR QED Total 3.6 4.1 3.3 3.2

Regular 1.0 1.6 1.6 1.0Special Ed. 47.9 57.0 34.8 26.3Vocational 8.2 6.8 44.8 56.3Alternative 73.5 79.3 20.1 17.0Key: Total schools on CCD but not MDR ÷ total schools on MDR = 3.6%Sources: MDR, QED, CCD.

Several points concerning these coverage comparisons should be kept in mind. First, not all schools listed on theQED and MDR files have been assigned a NCES (CCD) identification number. When an MDR or QED school didnot have a CCD identification number, a subjective assessment was made about whether cases on the two files wereactually the same school. This may have led to an underestimate of the number of matches. Second, comparisonswere made on school identification, not school type: a school that was listed as "vocational" on the CCD file and"alternative" on the QED file was still considered a match between the two files. Third, there were apparentsituations of shared mailing addresses between schools, or of a school district's address being used for multipleschools. This could occur on one file but not on the other. Cases of one-to-many correspondence were consideredmatches across the files. That is, if the CCD listed a Jefferson Elementary School and a Jefferson Middle School (orJefferson Alternative School) at a single address, while the MDR listed only a Jefferson School at that same address,both of the CCD schools were considered to have a match on the MDR file. Finally, the comparisons do not takeinto account some inherent definition, categorization and coverage differences among the three surveys.

The QED and MDR collect information on adult schools, which are considered out-of-scope for the CCD. Thismeans that some legitimate differences in coverage should be expected. Between 15 and 16 percent of the schoolsfound on the QED and MDR files but not on the CCD file are out-of-scope adult schools. The number andpercentage breakdown of the schools found on the other two files but not on CCD file is shown below (Table 4).

Table 4. Summary of Missing CCD Schools - The Common Core of Data File Compared tothe Market Data Retrieval and Quality Education Data Files: 1994-95 School Year

CCD SchoolsIn-scope Out-of-scope

Schools:PK only or

K onlyAll

other grades Adult TotalNo. 127 2,263 452On MDR, not CCD% 4.5 79.6 15.9 2,842

No. 238 2,118 430On QED, not CCD% 8.5 76.0 15.4 2,786

Sources: MDR, QED, CCD.

Missing information, such as missing students, is a serious omission when trying to understand the consequences ofschool undercoverage. Using data from the MDR or QED files, additional information can be learned about casespotentially missing from the CCD file. In this case, the number of potentially missing CCD students (based on thenumber of schools found on the QED or MDR files, but not on the CCD file) is not substantial. For example, the

Page 23: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

83

941,360 students enrolled in the 2,786 schools found on the QED file, but not the CCD file, represent 2.1 percent ofthe CCD student population (Table 5).

Table 5. Potentially Missing CCD Students – The Common Core of Data File Compared to theMarket Data Retrieval and Quality Education Data Files: 1994-95 School Year

Schools:Number ofStudents Percent of Total Student Population

On QED, not CCD 941,360 2.1on MDR, not CCD 948,923 2.2on CCD, not QED 481,533 1.1on CCD, not MDR 337,024 0.8Key: The student population of the schools found on the QED file but not the CCD file ÷ the total CCD studentpopulation = 2.1 percent. The total student population of the CCD, QED, and MDR files is 44,031,399,45,834,927, and 44,606,013, respectively.Sources: MDR, QED, CCD.

Eliminating schools considered out-of-scope on the CCD file (such as adult schools) as well as schools consisting ofpre-kindergarten students and/or kindergarten students only lessens the percentage of potentially missing CCDstudents. Based on the enrollment of schools found only on the QED and MDR files, this percentage is reduced to0.6 and 1.0 percent, respectively.

School type is an important piece of information for many users of these data files. There are cases in which aschool appears on both the CCD and MDR or QED files, but is classified as a different type. These discrepanciesare uncommon among the regular school listings. However, of the 1,783 special education schools included on theMDR file and the 1,520 special education schools on the QED file, some 109 and 110, respectively, are listed as adifferent type of school on the CCD file (Table 6). This occurs for 43 of the 1,230 vocational schools on the MDRfile and 85 of the 1,420 vocational schools on the QED file; and 204 of MDR's 1,768 alternative schools and 230 ofthe 1,766 alternative schools shown for QED. Overall, the classification differences between CCD and the other filesdo not appear substantial.

Table 6. Classification Gap – Number of Schools for which the Common Core of Data and the Market DataRetrieval or Quality Education Data Files Differ in School Type: 1994-95 School Year

CCD Classification differs from: Special Education Vocational AlternativeMDR listed as 109 43 204QED listed as 110 85 230Key: 43 vocational schools on the MDR file were listed as some other type on the CCD file.Note: The MDR and QED files listed 75 and 53 adult schools, respectively, that matched to a CCD school of a different type.Sources: MDR, QED, CCD.

Other findings of this evaluation are worth noting. The CCD federal agency type included BIA and DOD schools.Most all of the BIA schools in the CCD file were listed (coded) under the federal agency type. Between 120 and 150Bureau of Indian Affairs (BIA) schools were not on the CCD file based on a comparison with other sources. The1994-95 CCD file does not specifically code (by agency type) for the domestic DOD schools. The DOD schoolswere found throughout the CCD file and most were coded as ‘regular’ for school type. In addition, charter and adulteducation schools (as classified on the MDR and QED files) matched to other school types on the CCD file. Alimited number of schools that matched on school identification number between the three files differed in schoolname.

2.2 Comparison by School Type

The CCD, QED, and MDR files identify school types as regular, vocational, special education, alternative, and adult.The CCD file does not specifically code adult schools, while the MDR file does not have a code for alternativeschools. However, the MDR school type for regular, special education, and vocational schools includes schools withthe characteristic of alternative education. Thus, alternative schools identified within the MDR data file areessentially a subset of the four school types - regular, special education, vocational, and adult.

Page 24: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

84

Regular Schools

There were 80,373 schools coded as regular on the CCD file. The MDR file listed 80,214 regular schools, while theQED file listed 79,024 such schools (Table 7). For all three data files, the number of regular schools accounted forover 90 percent of all the schools found on the file.

Table 7. Summary of School Totals, by School Type for the Common Core of DataFile, Market Data Retrieval and Quality Education Data Files: 1994-95 School YearData File

RegularSpecial

Education Vocational AlternativeCCD 80,373 2,014 895 2,938MDR 80,214 1,783 1,230 1,768QED 79,024 1,520 1,420 1,766Notes: The MDR and QED files list 460 and 602 adult schools, respectively.Sources: MDR, QED, CCD.

There were 1,288 schools classified as regular on the MDR file that were not found on the CCD file (Table 8). These1,288 schools were distributed rather evenly throughout the states - 19 states had 20 or more regular schools notfound on the CCD file. For 2 of these 19 states, New Mexico and Washington, the regular schools found on theMDR file and not on the CCD file accounted for over 95 percent of all such schools found in those particular states.California, with 285 of these schools, by far reported the highest total for any state.

Table 8. Non-matching Regular Schools - The Common Core of Data File Compared to theMarket Data Retrieval and Quality Education Data Files: 1994-95 School Year

No. of schools NOT found on data file:Data File CCD QED MDRCCD - 1,237 756QED 824 - -MDR 1,288 - -Notes: No attempt was made to compare QED and MDR to each other.Sources: MDR, QED, CCD.

The 824 schools classified as regular on the QED file that were not found on the CCD file represented one-third ofall schools not matching between files. California with 184 non-matching schools accounted for about 22 percent ofthese schools. Only two states did not report at least one non-matching regular school.

There were 756 regular schools listed on the CCD file that did not appear on the MDR file - about one-quarter ofsuch schools. Washington, with 131 non-matching schools, was the only state to account for as much as 10 percentof these schools.

There were 1,237 regular schools listed on the CCD file that did not appear on the QED file. Twenty-three states hadfewer than 10 such schools. New York, Texas and Washington accounted for 100, 113, and 142, respectively, ofthese regular schools. As a percentage of the total schools found on the CCD file, but not on the QED file for theserespective states, New York at just under 75 percent (100 of 138 schools) was most notable.

A few schools coded as regular on the MDR or QED files matched (on identification number or name and address)between these files and the CCD file, but the school type in the CCD file was not regular. These schools arediscussed in more detail later.

Special Education Schools

There were 2,014 special education schools on the CCD file. The MDR file listed 1,783 special education schools,while the QED file had 1,520 such schools (Table 7). For each file, the special education schools represented onlyabout two percent of all the schools listed.

There were 701 schools classified as special education on the MDR file that were not on the CCD file (Table 9).Ohio and California, with 209 and 127 schools respectively, accounted for about one-half of these schools. Ohio’s

Page 25: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

85

209 special education schools made up about 87 percent of that state’s non-matching total of 241. Thirty states had 5or fewer special education schools on the MDR file but not the CCD file.

Table 9. Non-matching Special Education Schools - The Common Core of Data File Comparedto the Market Data Retrieval and Quality Education Data Files: 1994-95 School Year

No. of schools NOT found on data file:Data File CCD QED MDRCCD - 866 854QED 530 - -MDR 701 - -Notes: No attempt was made to compare QED and MDR to each other.Sources: MDR, QED, CCD.

There were 530 schools classified as special education on the QED file that were not on the CCD file. California andOhio accounted for 148 and 103, respectively, of these schools. Again, Ohio was more notable in that the specialeducation schools accounted for almost three-fourths of all its non-matching schools, while in California theyrepresented less than 20 percent of such schools. Fourteen states did not have any non-matching special educationschools.

There were 854 special education schools listed on the CCD file that did not appear on the MDR file. Six states -California with 57 schools, Vermont and Florida with 59 schools each, Illinois with 61 schools, Minnesota with 103schools, and Texas with 116 schools - accounted for about one-half of this total. Vermont’s total of 59 specialeducation schools was remarkable because they represented over 90 percent of that state’s total non-matchingschools.

There were 866 special education schools listed on the CCD file that did not appear on the QED file. Texasaccounted for 116 (slightly less than 30 percent of that state’s total of 390) of these schools. In three states,Delaware, North Dakota, and Vermont, the number of special education schools represented more than 75 percent ofall schools found on the CCD file, but not the QED file for that state.

For the schools coded as special education on the MDR file, 109 matched on identification number or name andaddress between the MDR and CCD files, but the school type on the CCD file was not special education (Table 6).Of these 109 schools, 72 were listed on the CCD file as regular schools, 8 were listed as vocational, and 29 werelisted as alternative. For the schools coded as special education on the QED file, 110 matched on identificationnumber or name and address between the QED and CCD files, but the school type on the CCD file was not specialeducation. Of these 110 schools, 78 were listed on the CCD file as regular schools, 2 were listed as vocational, and30 were listed as alternative.

Vocational Schools

There were 895 (about 1 percent) vocational education schools on the CCD file. The MDR file listed 1,230vocational schools, while the QED file reported 1,420 such schools (Table 7).Of the 2,842 schools found on the MDR file that were not found on the CCD file, 401 were classified as vocational(Table 10). Kentucky and Alabama, with 71 and 68 respectively, accounted for the highest state totals for theseschools. In addition to these two states, for only two other states - Oklahoma and Vermont - did the number ofvocational schools represent at least one-half of the non-matching schools for that state. Seventeen states did nothave a single vocational school that appeared on the MDR file and not on the CCD file.

Table 10. Non-matching Vocational Schools - The Common Core of Data File Compared tothe Market Data Retrieval and Quality Education Data Files: 1994-95 School Year

No. of schools NOT found on data file:Data File CCD QED MDRCCD - 96 101QED 504 - -MDR 401 - -Notes: No attempt was made to compare QED and MDR to each other.Sources: MDR, QED, CCD.

Page 26: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

86

There were 504 schools classified as vocational on the QED file that were not on the CCD file. Again, Kentucky andAlabama were notable with each accounting for 72 (about 30 percent) of these schools. Twenty states had one orfewer schools on the QED file but not on the CCD.

There were 101 vocational schools (less than four percent of the 3,011 total) listed on the CCD file that did notappear on the MDR file. No state had a remarkably high number of these schools. Only four states had as many asten, while 26 did not have any.

There were 96 vocational schools listed on the CCD file that did not appear on the QED file. Only two states,Mississippi with 14 and Texas with 16, had notable totals. Perhaps more relevant, while Texas’ total accounted forless than five percent of all the schools found on the CCD file but not on the QED file for that state, Mississippi’svocational schools accounted for almost 30 percent of that state’s total.

Of the schools coded as vocational on the MDR file, 43 matched on identification number or name and addressbetween the MDR and CCD files, but the school type in the CCD file was not vocational (Table 6). Of these 43schools, 21 were listed on the CCD file as regular schools, 8 were listed as special education, and 14 were listed asalternative. For the schools coded as vocational on the QED file, 85 matched on identification number or name andaddress between the QED and CCD files, but the school type in the CCD file was not vocational. Of these 85schools, 35 were listed on the CCD file as regular schools, 20 were listed as special education, and 30 were listed asalternative.

Alternative Schools

There were 2,938 alternative education schools on the CCD file - between 3 and 4 percent of the total. The MDR filelisted 1,768 schools with an alternative program, while the QED file reported 1,766 alternative schools (Table 7).

There were 592 schools with an alternative education program on the MDR file not on the CCD file (Table 11). TheMDR survey considered alternative education to be a characteristic, while the CCD and QED files treated it as aschool type. A school could be regular, special, vocational or adult and also report that it had an alternative program.Thus, the 592 schools identified as having an alternative program are reflected in the count (2,842) of schools foundon the MDR file but not on the CCD file.

Table 11. Non-matching Alternative Schools - The Common Core of Data File Compared tothe Market Data Retrieval and Quality Education Data Files: 1994-95 School Year

No. of schools NOT found on data file:Data File CCD QED MDRCCD - 1,401 1,300QED 498 - -MDR 592 - -Notes: No attempt was made to compare QED and MDR to each other.Sources: MDR, QED, CCD.

Of the 2,786 schools on the QED file that were not on the CCD file, there were 498 classified as alternativeeducation. California accounted for 189 (about 40 percent) of these schools. Of all the states that had at least 10alternative schools found on the QED file but not on the CCD file, only Maryland with 15 reported a number thatrepresented at least half of their non-matching schools.

There were 1,300 (about 43 percent of the total 3,011) alternative education schools listed on the CCD file that didnot appear on the MDR file. Two states, Minnesota and California, with 372 and 181 respectively, combined torepresent nearly one-half of all the alternative schools found on the CCD file but not on MDR file. In both instances,these totals represent more than 70 percent of that state’s non-matching schools. Three other states, Colorado,Nevada, and South Carolina, although having smaller numbers (at least ten) of non-matching alternative schools,reported such totals that accounted for about three-fourths of their non-matching schools.

There were 1,401 alternative education schools listed on the CCD file that did not appear on the QED file. Threestates accounted for over one-half of these schools. Minnesota, with 387 non-matching alternative schools (75percent of all the schools listed for the state), is notable. Seventeen states had 2 or fewer alternative schools that werefound on the CCD file but not on the QED file.

Page 27: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

87

For the schools coded as alternative education (i.e., schools with an alternative education program) on the MDR file,204 matched on identification number or name and address between the MDR and CCD files, but the school type onthe CCD file was not alternative education (Table 6). Of these 204 schools, 141 schools were listed on the CCD fileas regular schools, 55 were listed as special education, and 8 were listed as vocational. For the schools coded asalternative education on the QED file, 230 matched on identification number or name and address between the QEDand CCD files, but the school type on the CCD file was not alternative education. Of these 230 schools, 155 schoolswere listed on the CCD file as regular schools, 66 were listed as special education, and 9 were listed as vocational.

Adult Schools

Based on the MDR and QED files, there were 460 and 602 adult education schools, respectively, in the United Statesduring the 1994-95 school year. Both of these files contained schools classified as adult that matched to other schooltypes on the CCD file. There were 75 schools coded as adult education on the MDR file that matched to the CCDfile. As CCD school types, these 75 schools were classified in the following manner: 10 as regular, 6 as specialeducation, 14 as vocational, and 45 as alternative. There were 53 schools coded as adult education on the QED filethat matched to the CCD file. Of these 53 schools, 4 were classified as regular, 1 as special education, 6 asvocational, and 42 as alternative on CCD school type.

3. RECOMMENDATIONS AND CONCLUSION

3.1 Recommendations

Highlights of the recommendations based on the findings of this evaluation include the following:

Reconcile state listing of non-matching schools and the CCD file to other sources to compile a more completelist of schools.

State coordinators should review and reconcile their state’s listing of non-matching schools (schools that appear onQED or MDR, but not on the CCD file, and vice-versa). This effort should provide information as to why

discrepancies exist and allow a judgement to either impose the CCD scope and definition or restate the CCD scopeand definition if a state appears to be excluding schools that should be on the CCD.

Reconcile the varying classifications, definitions, and reporting of schools.

There are cases in which a school appears on both the CCD file and MDR and QED files, but is classified as adifferent school type, that need to be examined. Efforts to reconcile such discrepancies should include the continuedimprovement of definitions. The addition of identifiers in the CCD file for schools that have an adult educationcomponent (much like what has been, or will be, done for charter, BIA and DOD schools) 3 would be useful.

Query respondents about their ability to report whether regular schools have special, alternative, vocationalor adult components in addition to the main curriculum.

Determining this ability may allow the CCD to adopt an approach to school classification based on schoolcharacteristics, like the types of programs offered. Such an approach, similar to the manner in which MDR identifiesalternative schools, would allow a school to have more than one record on a data file and report unequivocally thatthey have more than one type of program.

Request the state coordinators to report the full legal name of each school.

This would likely eliminate some confusion that might exist when comparing and attempting to match two schoolsthat have similar names but different addresses, for example. This may require a larger field length for school name(the CCD added 30 characters – for a total of 50 characters - to the name field in 1998), but given the fact that anadditional field has been added to accommodate both mailing and physical location addresses for every school, thisshould not be a significant issue.

3 The CCD survey began including charter schools with the 1998-99 survey cycle. The BIA and DOD will begin self-reporting their schoolswith the 1999-2000 survey cycle.

Page 28: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

88

Compare the CCD file to the MDR and QED files and other sources, such as the DOD and BIA lists, and theNational Charter Schools Directory.

Use the state non-matching school listings and files generated by this evaluation and add fields to the CCD file forthe MDR and QED school identification numbers (assuming proprietary issues do not prevent this). Alternatively,work with the MDR and QED staffs to keep their identification number links up to date. Such an effort could be anannual CCD survey function, or perhaps more practically, periodic evaluative efforts such as the current one couldbe undertaken to address CCD school coverage issues. A statistical analysis of coverage error to supplement thedescriptive statistics and provide estimates of CCD list coverage is recommended.

3.2 Summary

Findings suggest that the CCD file is a quality data source and listing of public schools when compared to othersources. The CCD appears to have a broader coverage of special education and alternative schools than do the MDRand QED files. Despite the shortcomings outlined in this evaluation, the CCD is an accurate, comprehensivestatistical database of this nation’s public elementary and secondary schools, particularly so with respect to itscoverage of regular schools. The specific recommendations made herein include methods that call for adopting anassertive approach for improving the CCD. Such an approach would involve creating a CCD school universe surveyform that accommodates the various types of schools and state views and compiling a more complete CCD list. Thiswould enable potential future reconciliation of the CCD file to the MDR and QED files, and ultimately, improve theCCD as a sampling frame for other surveys. For this approach to be effective, a primary use of the findings andresults of this evaluation must be reconciliation by the state CCD coordinators of their state’s non-matching schools.Undertaking the suggestions put forth in this report will help ensure a better CCD for the future.

4. REFERENCES

Bureau of Indian Affairs. Office of Indian Education Programs. Online. September 1997.http://shaman.unm.edu/oiep/address.htm.

Department of Defense. Online. November 1997. http://www.tmn.com/dodea/home.htm.

Market Data Retrieval Database. School Year 1994-95. A company of the Dun Bradstreet Corporation. Shelton, CT.

Quality Education Data. School Year 1994-95. A division of Peterson’s. Princeton, NJ.

U.S. Department of Education, National Center for Education Statistics, “Public Elementary/Secondary UniverseSurvey: School Year 1994-95"

U.S. Department of Education, National Center for Education Statistics, “Local Education Agency Universe Survey:School Year 1994-95"

Page 29: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

99

DISCUSSION OF SCHOOL SAMPLING FRAME PAPER

Fritz Scheuren, Urban Institute1402 Ruffner Rd., Alex VA 22302

[email protected]

Key Words: Multiple systems estimation, record linkage, sampling efficiency, coverage

Let me begin by starting the thanks we all owe the authors for their fine papers. I thoroughly enjoyed them.They certainly make a contribution to the current meetings, in that the authors have tackled for schools what businesssurveys attempt for establishments and enterprises. Indeed, one of the themes in my remarks is that more use of thiscommonality could have been developed by all the authors. Surprisingly none of them cited, for example, any of theliterature that grew out of the earlier (first) International Conference on Establishment Surveys. Another usefulreference, at least for the US papers is the NCES Technical Report entitled An Assessment of the Accuracy of CCDData, dated March 1995.

As a profession we are at various stages in the continuing revolution in computing and in the employment ofincreasingly sophisticated mathematical and statistical modeling in our work. This is true not only of end user analysesbut also of routine data gathering tasks too, including the maintenance and evaluation of sampling frames. While theauthors almost certainly recognize this, they could have developed this aspect further in their papers.

One final general comment, I would have liked to see data producers, including the current authors, recast theirproducer concerns more into the language of their clients. Admittedly this is hard to do but may be well worth the effort;otherwise the legitimate "housekeeping" concerns that the authors have so ably addressed may not get the attention andresources they need.

1. Cheung and Gossen. The Cheung and Gossen paper focuses on new developments now occurring at StatisticsCanada to build or rebuild their national school frame into a single, highly centralized and more efficient system. Theexisting system is well described in the paper as well as the progress being made on the new system. A key objective,as the authors state, for the new system is to eliminate processing delays that impact the timeliness of the frame and itsoverall coverage.

To illustrate the concerns raised, two figures are provided in the paper (figures 3 and 4); these show howquickly enrollment can be dated and hence the weakness in using even one-year-old data in stratification. While perhapsoverstated somewhat, this weakness certainly is a problem, especially when it results in undercoverage and not justinefficiency (see table 2 in the paper). For continuing schools one alternative not discussed would be to employ twostratifiers, with the second (say, number of teachers) being correlated with enrollment but not as rapidly changing.

The solid nature of the Canadian public school administrative data comes through very clearly. The wayindependent schools are treated statistically was not prominently featured. Even though presently only a minor part ofthe school universe, more information on them would have been welcome.

Additional information would have been appreciated on how the administrative data are used in surveyestimation. Of particular interest would have been some comments on mass imputation (e.g., Kovar and Whitridge 1995)and calibration estimation (e.g., Brewer 1999), since the frames being built will almost certainly be rich enough forthese methods to warrant consideration.

2. Hamann. The Hamann paper is another excellent effort. In it, the author, compares three main sources of USpublic school data: The Common Core of Data (CCD), a government universe survey, of public schools, plus twocommercially available files of schools entitled Quality Education Data (QED) and Market Data Retrieval (MDR). Allthree of these files are intended to be complete, but as the paper makes clear, each has deficiencies. Some additionalspecialized files examined included those kept by the Bureau of Indian Affairs, the Department of Defense and theCenter for Education Reform.

The author carefully details gaps in the comparative coverage of the CCD, relative to each of the other frames.This is a most useful exercise and will be invaluable to users, even though the particular frames compared (for 1994-95)are now dated. In the final version of Hamann’s paper in these proceedings, the author nicely strengthens the baselinebeing attempted by making it possible for the reader to do a dual systems estimate (e.g., Marks Krotki and Seltzer 1974,Pollock, 2000) to see what the overall coverage of CCD was. To carry out a triple systems estimate would require amatch of the MDR and QED files, which may be out of scope. Interestingly, dual systems estimation is alreadyemployed routinely in building the US Private School Survey (PSS) frame (See, for example, Broughman, and

Page 30: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

100

Colaciello 1999). The final Hamann paper also provides ample details on counts of schools missed; additionally, itnicely develops the implications of coverage differences on statistics like school enrollment. The next paper does thistoo, albeit differently.

The paper ends with some very sensible recommendations that are worth repeating and even strengthening:First, as the author states, there should be periodic matches of the universe files across all systems. Second, the way thatthe National Center for Education Statistics (NCES) has the states develop the CCD should be reexamined and amongother things, more systematic quality checking done. Third, I would add that on a sample basis the QED and MDRshould be matched annually to the CCD and to each other. These matches should check for both coverage and contentdifferences, using the state of the art linkage software that is now available from Winkler and his colleagues at theCensus Bureau. Once the linkages are done, triple systems estimate could be made so that overall coverage can beascertained and its impact integrated into mean square error measures. This three-way matching by NCES wouldstrengthen quality management efforts on a state-by-state basis. A thorough root cause analysis should accompany thematching and resources for an annual continuous improvement (KAISEN) program ought to be set aside to implementwhat is learned.

3. Lee, Burke and Rust. This paper covers some of the same ground as Hamann. There are some scopedifferences, however. In particular, later data were used (for 1997-98) and both public and private schools wereexamined. Lee, Burke and Rust start off with the clear objective of determining for the National Assessment ofEducational Progress (NAEP), whether the QED frame is better than a combination of CCD for public schools andPSS for private ones. This time, unlike in the Hamann paper, the MDR frame was not examined at all. On balancethe CCD and PSS combination seems to win out on some variables, but not on others.

Because the data are later, the results in the Lee, Burke and Rust paper would seem to have advantagesover Hamann's. The authors are quite thorough in the way they handle the matching, although more would havebeen appreciated on the computer algorithms that were used and what the estimated linkage error rate was. Thepaper successfully examined the consequences of mistakes in the frame and not just simply on overall coverage. Theregression modeling done in this connection is certainly a tour-de-force. One minor quibble: I would have likedmore on the limitations arising from the deficiencies in the use of the NAEP sample in the presence of NAEPnonresponse.

There are many good recommendations in the paper – perhaps more than there will be resources toimplement. Frames are always out of date (Lessler and Kalsbeek 1992); indeed, this can be one of their maindeficiencies. In this connection the authors are particularly to be commended for their idea of matching CCD'sacross time -- the CCD used as a frame and the one for the year the survey is done. Indeed, this idea might be theway to give content to one of Hamann’s recommendations above.

The maturity and subtleties of the thinking in this paper are a delight, especially as informed by the otherpapers at this session. I particularly liked the “Quality Profile” flavor of much of what I read. The leadership ofNCES in this area (e.g., Jabine 1994; NCES 2000) has been noteworthy for many years.

4. Concluding Comments. There need to be more parallel presentations like these given at future conferences.The combination of such papers enhances communication. Because the Canadian and US statistical approaches aredifferent, the inevitable question of “why” gets asked and this leads to the possibility for creative synthesis. As Imentioned at the outset, more might have been said in all the papers about past work, especially that done onbusiness surveys. But, clearly, the joint treatment of common problems in the sampling of schools has worked to allour benefits.

While my mother’s family are Canadian, I do not have enough detailed knowledge to address thechallenges coming in education in Canada. I will, however, hazard some remarks about the US where I have twosons still in high school. The US education system, for example, is poised for great changes: vouchers, charterschools and increased home schooling are obvious examples. Home school has already become quite important andyet has not really been integrated into existing school frames. There is a big voucher effort going on in Florida. Infact, I am helping to design a sample of paired public-private schools to evaluate it.

Predictably, these and other changes could lead to smaller and shorter-lived schools, with the coveragechallenge for the survey practitioner becoming more like trimming a swaying Christmas tree with a string ofblinking lights. New techniques will be needed, of course, to measure and minimize the increasing uncertainty. Weare not going to be able to assure ourselves that coverage error is a small stable problem. The emphasis ontimeliness, so evident in the Cheung and Gossen paper, needs to be taken especially to heart. The dot.com world ofe-business is already a challenge for business surveys, where physical presence is becoming virtual and nationalborders have almost no meaning. This kind of world may be coming for school surveys too. When and how muchare the questions, not whether.

One place that business surveys might learn from school surveys is in the degree of access to frame data.

Page 31: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

101

The CCD data are publicly available, found at <http://nces.ed.gov/ccd/pubschuniv.html>. Unlike business namesand addresses, in the United States data about individual public schools are available for all to see and use. Thisopenness of the list (not currently available in Canada) offers a case study for both advocates and opponents in themaking of similar information available for commercial establishments.

A final point. The current revolution that is bringing the whole society into the information age has aspecial impact on traditional information-producing organizations. Ironically, they can fail to seize the opportunitytheir deep knowledge gives them if they simply continue their traditional relationships with customers and do notfully realize that the information age is an age of service and not mainly of products (e.g., Fellegi 1999). A service isan activity done with the customer and not just for them. Our producer language must be recast into that of the clientand success or quality seen from the client’s perspective (e.g., Brackstone 1999).

5. References

Archer, D. (1995) “Maintenance of Business Registers”, Business Survey Methods, 85-100, John Wiley & Sons,Inc. NewYork.

Brackstone G. (1999) “Managing Data Quality in a Statistical Agency”, Survey Methodology, 25, 2, 139-150.

Brewer, K. R. W. (1999) “Cosmetic Calibration with Unequal Probability Sampling,” Survey Methodology, 25, 2,205-212.

Broughman, S. and Colaciello, l. (1999) Private School Survey, 1997-1998, NCES Number 1999- 319, Washington,D.C.: National Center for Education Statistics.

Fellegi, I. P. (1999) “Statistical Services - Preparing for the Future”, Survey Methodology, 25, 2, 113-128.

Jabine, T. (1994) “Quality Profile for SASS Aspects of the Quality of Data in the Schools and Staffing Survey(SASS)”, NCES Technical Report, NCES 94-340, U.S. Department of Education, Office of EducationalResearch and Improvement. Washington, D.C. See also National Center for Education Statistics for anupdated Quality Profile for SASS forthcoming 2000 as NCES 2000-308.

Kovar, J. G. and Whitridge, P. J. (1995) “Imputation of Business Survey Data”, Business Survey Methods, 403-424,John Wiley & Sons, Inc. New York.

Lessler, J. and Kalsbeek, W. (1992) Nonsampling Errors in Surveys, John Wiley & Sons, Inc. New York.

Marks, E., Seltzer, W., Krótki, K. (1974) Population growth estimation: a handbook of vital statisticsmeasurement, Population Council: New York.

Pollock, K. H. (2000) “Capture-Recapture Models”, Journal of the American Statistical Association, 95, 449, 293-296.

Salvucci, S., Parker. A., Zhang, F., and Li, B. (1995) An Assessment of the Accuracy of CCD Data, NCESTechnical Report.

Winkler, W. E. (1995) “Matching and Record Linkage”, Business Survey Methods, 355-384, John Wiley & Sons,Inc. NewYork.

Page 32: IMPLEMENTING A SAMPLING FRAME OF ELEMENTARY AND …

102