22
International Statistical Review (2006), 74, 3, 357–378, Printed in Wales by Cambrian Printers c International Statistical Institute Towards an Integrated Statistical System at Statistics Netherlands Nico Heerschap and Leon Willenborg Statistics Netherlands, Voorburg, The Netherlands Summary Changes in circumstances put pressure on Statistics Netherlands (SN) to redesign the way its statistics are produced. Key developments are: the changing needs of data-users, growing competition, pressure to reduce the survey burden on enterprises, emerging new technologies and methodologies and, first and foremost, the need for more efficiency because of budget cuts. This paper describes how SN, and especially its business statistics, can adapt to these new circumstances. We envisage an optimum situation as one with a single standardised production line for all statistics and a central data repository at its core. This single production line is supported by generic and standardised tools, metadata and workflow management. However, it is clear that such an optimum situation cannot be realised in just a few years. It should be seen as the point on the horizon. Therefore, we also describe the first transformation steps from the product-based stovepipe-oriented statistical process of the past to a more integrated process of the future. A similar modernisation process exists in the area of social statistics. In the near future both systems of business and social statistics are expected to connect at pivotal points and eventually converge on one overall business architecture for SN. Discussions about such an overall business architecture for SN have already been started and the first core projects have been set up. Key words: Official statistics; Process redesign; Integrated statistics; Output driven; Metadata infrastructure; Workflow management; Data repository; Output data warehouse; Strategy; Stovepipe model. 1 Introduction Since the beginning of the 1990s Statistics Netherlands (SN) has been in a state of continuous change. The main reason is that the way of producing statistics no longer fits the changing circum- stances. To adapt to new circumstances, an alternative way of producingstatistics is necessary. The ideas about such an alternative way of producing statistics are described in this article. The process- side is examined more closely, although other organisational elements, such as skills, workforce, management and culture, should certainly not be neglected in a transition phase. This paper is based on research done for a re-engineering project for the Division of Business Statistics. Therefore most of the focus is on the production process of business statistics. However, a similar modernisation programme exists in the Division of Social Statistics. It should be kept in mind that the same new process, as described in this paper, can be implemented for the production of social statistics as well, at least seen from a broader perspective. This coincides with current developments within SN, which point in the direction of one overall business architecture for all statistics, merging the separate business architectures for social statistics, business statistics and the National Accounts The views expressed in this paper are those of the authors and do not necessarily reflect the policies of Statistics Netherlands.

Towards an Integrated Statistical System at Statistics Netherlands

Embed Size (px)

Citation preview

Page 1: Towards an Integrated Statistical System at Statistics Netherlands

International Statistical Review(2006),74, 3, 357–378, Printed in Wales by Cambrian Printersc© International Statistical Institute

Towards an Integrated Statistical System atStatistics Netherlands�

Nico Heerschap and Leon Willenborg

Statistics Netherlands, Voorburg, The Netherlands

Summary

Changes in circumstances put pressure on Statistics Netherlands (SN) to redesign the way its statisticsare produced. Key developments are: the changing needs of data-users, growing competition, pressureto reduce the survey burden on enterprises, emerging new technologies and methodologies and, first andforemost, the need for more efficiency because of budget cuts.

This paper describes how SN, and especially its business statistics, can adapt to these new circumstances.We envisage an optimum situation as one with a single standardised production line for all statistics anda central data repository at its core. This single production line is supported by generic and standardisedtools, metadata and workflow management.

However, it is clear that such an optimum situation cannot be realised in just a few years. It shouldbe seen as the point on the horizon. Therefore, we also describe the first transformation steps from theproduct-based stovepipe-oriented statistical process of the past to a more integrated process of the future.

A similar modernisation process exists in the area of social statistics. In the near future both systemsof business and social statistics are expected to connect at pivotal points and eventually converge on oneoverall business architecture for SN. Discussions about such an overall business architecture for SN havealready been started and the first core projects have been set up.

Key words: Official statistics; Process redesign; Integrated statistics; Output driven; Metadata infrastructure;Workflow management; Data repository; Output data warehouse; Strategy; Stovepipe model.

1 Introduction

Since the beginning of the 1990s Statistics Netherlands (SN) has been in a state of continuouschange. The main reason is that the way of producing statistics no longer fits the changing circum-stances. To adapt to new circumstances, an alternative way of producing statistics is necessary. Theideas about such an alternative way of producing statistics are described in this article. The process-side is examined more closely, although other organisational elements, such as skills, workforce,management and culture, should certainly not be neglected in a transition phase.

This paper is based on research done for a re-engineering project for the Division of BusinessStatistics. Therefore most of the focus is on the production process of business statistics. However, asimilar modernisation programme exists in the Division of Social Statistics. It should be kept in mindthat the same new process, as described in this paper, can be implemented for the production of socialstatistics as well, at least seen from a broader perspective. This coincides with current developmentswithin SN, which point in the direction of one overall business architecture for all statistics, mergingthe separate business architectures for social statistics, business statistics and the National Accounts

�The views expressed in this paper are those of the authors and do not necessarily reflect the policies of StatisticsNetherlands.

Page 2: Towards an Integrated Statistical System at Statistics Netherlands

358 N. HEERSCHAP& L. W ILLENBORG

(NA).The modernisation programme of SN fits international discussions about the renewal of statistical

processes in terms of “integrated statistical file systems”, “data warehouses” and “overall enterprisearchitectures”, already presented from the 1960’s onwards (e.g. Nordbotten, 1967; Sundgren, 1999;Gillman et al., 2001 and Dunnet & Osborne, 2005). In recent years these discussions are furthertriggered by the greater attention given to centralised metadata as the core of those general systems.More and more National Statistical Offices (NSO’s) are streamlining (parts of) their processes andsystems in this direction. However, the point of action can differ per NSO. Forerunners are, forexample, Australia, Canada, the United Kingdom, New Zealand and the Nordic countries as well asorganizations like the OECD and the U.S. Bureau of the Census.

The paper is organised as follows. First we describe the main reasons for changing the currentsituation within SN in more detail (section 2). Then we sketch the ideal long-term situation and thepoint on the horizon (section 3). This is followed by the main projects for the short and medium termwhich are embedded in everyday reality, especially as it concerns the business statistics (section 4).Finally, we examine the extension to an SN-wide architecture somewhat further (section 5). We endthe paper with some conclusions (section 6).

2 Old Situation

2.1 Product Stovepipe Model

Until 2001 the situation of the statistical processes at SN could be described as a pure productstovepipe model. See Figure 1.

Every single product stovepipe corresponded to a particular theme of related statistics with its ownspecific production line. All of the processing from survey design to dissemination and publicationtook place within the stovepipe. Each had its own customer base and data suppliers. In fact eachlived in a small world supported by its own tailor-made information systems. At the end of thechain National Accounts (NA) merged and integrated the results of some of these product stovepipes,sometimes repeating part of the stovepipe process in a slightly different way.

2.2 Advantages of a Product Stovepipe Model

This way of producing statistics has its own logic. To start with, the situation in a product stovepipemodel is conveniently arranged. It is impossible to control all processes of all stovepipes from onepoint in the organisation because the span of control is simply too big. Also the technical meansto support a more comprehensive system were not advanced enough. This automatically led to adivision of labour where activities are grouped together within smaller, manageable units. Originallythis was done on the basis of a product oriented rather than a process oriented view, creating theproduct stovepipes.

Furthermore, the different steps in the statistical process of a product stovepipe are closely linked.For example, somebody working on the output usually also has inside knowledgeabout (the problemsof) the input and throughput. People are more motivated because they are often involved in all thesteps of the process and because they have a clear responsibility for the final product. They know agreat deal about the specific subject matter (specialisation). These are all reasons why the quality ofthe output increases, at least within the scope of the stovepipe.

A product stovepipe is flexible in the sense that it is able to adapt more quickly to the changes inthe market place of its specific domain. Finally, a product stovepipe model is less vulnerable becausethe individual stovepipes are self supporting. A problem in one of the stovepipes usually does notaffect other stove-pipes. Given the smaller scale, problems are also relatively easy to trace and solve.

Page 3: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 359

Figure 1. Product stovepipe model or product view.

2.3 Changing Circumstances

If there had been no change in circumstances maybe there would be no need to change the situationwithin SN. However this is clearly not the case.

Firstly, there are the changing needs of customers. As the world becomes more complex andinterrelated, there is an increasing need for integrated and consistent data. Customers also want datamore quickly. Often new themes emerge, like the new economy, globalisation, aging population orenvironmental issues. This means that there is a growing need to integrate data across the limits ofthe stovepipes, not at the end of the process, at the macro-level within the NA, but much earlier atthe micro- and meso-level.

Customers have become more critical and demanding, and in a changing market SN is loosingits monopoly position in many stovepipe domains. Nowadays individual statistics can be producedeasily and often more quickly by other institutions because most data are electronically available.The competition is further reinforced by the growing possibilities, which are offered by the Internet,both to make (statistical) information available as well as the access to (statistical) information.

Furthermore, the output of SN relates more and more to the growing information needs of theEuropean Union (EU). A danger is that SN is becoming an annex to the statistical office of the EU(Eurostat). In some cases, an EU-regulation is the main reason for the existence of a survey. That isnot always a sound foundation for survival.

Secondly, to ensure the competitiveness of Dutch enterprises, there is a growing pressure to reducetheir administrative burden, which is valued at circa 17 billion euros each year. Reducing this burdenis a major policy issue of the current Dutch government.

Thirdly, there is continuous pressure to cut costs and thereby reduce staff, but still produce thesame or even more output. This wish for more efficiency increases now the Dutch government wantsto slice budgets. The trend to do more with less is likely to persist in the future, as the developmenttowards a smaller government with a smaller budget will continue.

Page 4: Towards an Integrated Statistical System at Statistics Netherlands

360 N. HEERSCHAP& L. W ILLENBORG

And, finally, there is the pressure from new developments in the information technology andstatistical methodology. As new technology becomes available, there is always an urge to deploythem (push), especially in the case of a strong wish for more efficiency (pull). It is a matter of‘enabling technologies’, too often presented as the solution for all existing problems.

2.4 Disadvantages of a Product Stovepipe Model

The changes in the environment of SN increasingly uncover the disadvantages of a productstovepipe model, as a way of producing statistics.

For example, it is rather difficult to maintain coordinationbetween the different product stovepipes.In fact there is a natural tendency for them to drift apart, leading to a variety of methodologicalsolutions and a low level of standardisation. Also, there is no natural overall integration of data fromthe different areas and sources, because the data are processed in different product stovepipes. Thismeans that there is no consistency between data on, for example, production, trade, investments andfinancial aspects of an enterprise. Integration of some of the data at the macro-level is carried out atthe end of the process, within the NA. This is clearly not very efficient. Problems that arise in thisphase of the process are difficult to trace back to their respective sources. The fact that there is nointegration of data also implies that different figures can be presented about the same phenomenon.

This situation not only confuses many customers, but if their information needs are not confinedto one product stovepipe he or she is forced to shop around to derive his or her data from differentunits in the organisation. Data, which are not be very consistent at that.

The collection of the information needed is not organised very well. In fact every product stovepipecollects the information on its own and for its own purposes, approaching an enterprise with the samekind of questions, often with only slightly different definitions. This increases the survey burdenmore than necessary. Overall account management and coordination is lacking. So, reducing theseadministrative costs is not only of interest for the continuing cooperation of Dutch enterprises, butalso for SN itself. Besides giving SN a bad image, duplicating activities for every stovepipe on theinput side is not very efficient.

Furthermore, personnel rarely changes from one isolated stovepipe to another. It is like movingfrom one tribe to another, learning new customs and habits and unlearningold ones. As a consequencethere is little job rotation. And if somebody leaves the stovepipe a severe problem may arise as allthe knowledge this person acquired disappears as well. This problem is compounded by the lack ofdocumentation or updating any existing documentation. The in-crowd there does not feel much needfor it. This means that transparency of the processes and reproducibility of the results are not alwayssecured.

And last but not least, a product stovepipe model is not very efficient. Every statistic has its ownway of organising things. The processes and supporting software are often reinvented, duplicatedand tailor-made. This also implies duplicating automation work when developing and maintainingthe systems at relatively high costs. The advantages of the economies of scale are not exploited.

The growing feeling that the disadvantages strongly outweigh the advantages of a product stovepipemodel in a changing world caused a series of reorganisations within SN, starting in the early 1990’sand continuing until the present day.

So, although a necessary precondition, the fast growing possibilities of the information technologyare certainly not the only drivers for change. Especially the combination with the inescapable externalpressure on SN from the 1990’s onwards to downsize and to be more efficient has been an importantturning point. This has created the sense of urgency to actually initiate a process of modernisationwithin SN, that is to seek for solutions away from the existing stovepipe model and to deploy theavailable technology.

Page 5: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 361

3 Optimising the Statistical Production Process: the Long Term View

3.1 Strategic Corporate Goals

In a transformation process organisations usually have a long-term strategy where they positionthemselves in the market place. Besides the mission, main markets and core activities, this is describedin the strategic corporate goals. Throughout all recent reorganisations, the main strategic corporategoals of SN have been:

1. to better accommodate the changing needs of its(key) customers. In the end, this is crucial forthe survival of SN. This can be achieved foremost by better exploiting the unique position ofSN, that is the capability to integrate different individual sets of (micro)data to one interrelatedand consistent knowledge base or theme-oriented bases. Besides the production of individualstatistics, the main value added of SN can then be defined asthe integrating crossroad on thestatistical information highway. To take full advantage of such a position SN needs to seekcooperation with other research institutions and maintain strategic alliances and knowledgenetworks.

2. reducing thesurvey burdenof enterprises. Besides further coordination between statistics andquestionnaires, this can be realised in two ways:

a. tooptimise the use of external administrative sources, as the replacement of total surveysor questions in surveys. The general rule is that own surveys should be limited, in favourof administrative sources, as much as possible and if to conduct surveys electronicallywhenever possible. In line with this, SN should use the general infrastructures, whichare put in place for business-to-government communication. This not only accounts fora central transaction portal (e-government) but also for a set of centralised registers forbusinesses, persons, addresses and real estate.

b. to approach respondents in their own environment. This means, for example, (1) usingdefinitions that respondents can understand and are harmonised with business or taxadministrations or (2) giving the respondent the option to respond in different ways (amulti-channel approach).

3. more efficiencyby redesigning the current situation towards more standardised statisticalprocesses with the optimum use of new information technologies and statistical methodologies(e.g. generic tools and standard solutions).

4. adaptation of the structure, management and staffto the new situation, especially when itconcerns the mismatch between existing skills and skills needed.

In pure business terms this means that the changing circumstances pressed SN to get better andquicker output, i.e. statistics, for less input and processing costs. Less input and processing costscan be translated to a new way of producing statistics with a smaller but more highly professionalworkforce. This implies higher productivity through increasing economies of scale, leading to aleaner organisation.

3.2 First Step: a Process-driven Model

The development towards an optimum situation starts from the disadvantages of a productstovepipe model and follows a more or less natural evolutionary path of improvements.

A first step then is to merge all separate and duplicated activities at the output and especially theinput side of the different product stovepipes, as a result of which:

� one central contact centre is set up to approach and answer questions of all respondents of allsurveys. This is also the first step towards further coordination of all data collection, primaryor secondary. This is not only cost effective, but foremost it minimises the survey burden

Page 6: Towards an Integrated Statistical System at Statistics Netherlands

362 N. HEERSCHAP& L. W ILLENBORG

of enterprises and improves contactability and relations management. This is crucial for thenecessary improvement of the image of SN.

� one helpdesk is set up to answer and coordinate all information needs of all customers of allproduct stovepipes. An improved customer service is already created for the dissemination ofdata through the Internet, namely StatLine, the Internet database of SN. In recent years thisstrategy of “one window for all data dissemination services” is, furthermore, supported by thecreation of one central unit which better accommodates the specific information needs of allstrategic customers and researchers (micro-data). With this, the focus of the customer serviceis slowly shifted from routine and low value tasks (e.g. call centre) to high value personalizedconsultancy (Gates, 1999).

Then there is the tendency to gain efficiency by merging similar production processes that operateas isolated product stovepipes into as few standardised production lines as possible, supported bygeneric (software) tools in every step of the production chain. Thereby using the advantages ofeconomies of scale.

From an organisational point of view it is then almost a natural step to shift from a productstovepipe model to a process driven model, meaning separate units for all input activities, allthroughput activities and all output activities. See Figure 2.

Figure 2. First step: towards a process stovepipe, a process driven model.

Such a shift creates demand-supply chains, especially if the activities of the NA are also included.These demand-supply chains are further elaborated by an increasing use of register information onthe input side and an expanding cooperation with other research institutions on the output side. Thisimplies that SN is more and more depended on external partners for the quality of its final products.

Page 7: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 363

In fact, this is the current situation in which the Division of Business Statistics finds itself, althoughthe different product stovepipes are still visible in parts of the organisation.

3.3 Dangers of a Process Stovepipe Model

In a process driven model it is easy to step into the same trap as with the product stovepipemodel, namely that communication and coordination between the different steps in the chains arenot formally organised. If that is the case there is the danger of replacing a product stovepipe modelby a process stovepipe model, with disadvantages such as:

� it is unclear who is responsible for the total process, as responsibility is divided over three(or more) different units. After every step intermediary products “are thrown over the wall” tothe next step. People thereby loose sight of the quality of the final product, which is liable tobecome suboptimal.

� pressure on the quality of the output, because the output phase lacks adequate knowledge aboutthe ins and outs of the input and throughput phases.

� difficulties for people to move from one process stovepipe to another. Also problems can arisewith motivation, as a part of the staff does not have the feeling that they contribute to the finalproduct.

Especially for the smaller surveys where such a process stovepipe model is created, it is difficultto see the advantages and the effect on efficiency. Efficiency in work is counter-balanced by the extraeffort needed for communication and coordination.

3.4 Second Step: one Single Standardised Production Line

If the goal is to have the lowest possible number of production lines, the ultimate goal would be tohave only one single standardised production line for all statistics, using either primary or secondarydata. The core of such a production line would be one central data repository that functions as thetransactional database (Baseline) as well as the analytical database (StatBase).

Such an “ideal process” is described in Figure 3. The following steps can be imagined:

1. It all starts with the operationalisation of the user needs into the desired statistical informationin terms of concepts, output specifications, detail and quality. In thispre-input or design phasealso decisions must be made about which input data, primary, secondary or a combination ofboth, are needed to produce the desired output; how to combine and integrate different, maybealready collected, data sources; and, more advanced, to investigate the possibility of usingmodel-based estimates (quicker and cheaper output). In this phase also the methodologicalframework is determined, including rules for observation and processing and the relationshipsbetween input and output domains (n tom relationships). Benefits and costs have to be carefullybalanced. An important principle is that this design phase is clearly separated form the actualimplementation, that is steps 2 to 8.

2. If these decisions are made, the data can be collected. This means that the respondents (e.g.enterprises) and register holders are approached. All data collection activities are coordinatedfrom one point in the organisation, so that respondents and data suppliers have to deal withonly one contact centre. In the case of surveys this process is supported by generic tools to drawsamples, to generate the different modes of the questionnaire (e.g. by paper or electronically)and to process the responses (input machines). Efficiency can be improved further by combiningdifferent surveys into one questionnaire, by optimising the use of already collected data in theprocess (e.g. smaller samples) etc.A key issue here is a controllable logistical process, including relations management with

Page 8: Towards an Integrated Statistical System at Statistics Netherlands

364 N. HEERSCHAP& L. W ILLENBORG

Figure 3. One production line for all business statistics as the optimum process.

respondents and register holders.Exchange of data is standardised, both in form as well as content, as much as possible (e.g.XML, XBRL etc.). Different approaches (multi-channel) are used for different groups ofrespondents in combination with different statistics.

3. An important prerequisite for the process is the availability of coordinated populations orbackbones of units. For business statistics, for example, these are made available through theCentral Business Register (CBR) of SN.

4. Subsequently the primary or secondary data received are directly loaded into the central datarepository (the transactional part, Baseline) and connected to the units of the related andcoordinated backbones (input phase). This also includes the translation from data related toobservational units to data related to statistical units. The coupling takes place on the basis ofidentifiable keys. The result of the coupling process is recorded, meaning that the populationcoverage can be calculated at any time. For business statistics, extra efficiency can be gainedfor the smaller and medium-sized enterprises by ensuring a one-to-one relationship betweenthe statistical unit and the observational unit, for example, units used in tax registers.

5. After the data are linked to one of the backbones, the data can be made available for athroughputprocessof coding, checking, editing and also (micro) integration. This leads eventually, oftenthrough all kinds of intermediate results, to publishable data. Although data can be changedhere, existing data is never overridden. This means that management of the different versionsis crucial. This is also the case when changes have to be made as a result of new informationon already collected data.Not all raw data can be processed simultaneously, because the data come from different sources

Page 9: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 365

with different quality and timeliness and they are obtained differently. Therefore, processingtakes place on the basis of clearly defined datasets or views. How these views (a section ofunits, variables and time) are composed, depends not only on the data source(s) used, butforemost on the required output (theme-oriented). This coincides with the fact that in micro-integration, it is clearly impossible to check and edit all available variables. Choices must bemade and managed.The fact that a lot of (integrated) data, primary and secondary, are available in the datarepository enhances the data checking, editing and imputation processes.

6. When a view is considered to be ‘good enough’, it is copied to the output side of the datarepository StatBase (a data warehouse). This will be micro data as well as aggregated data.Views can only be added and cannot be changed or deleted here.The whole transformation process from input to output is recorded by transformation rules forthe variables and units and coupling rules in the case of integration.

7. The data warehouse then is the basis for consistent and integrated data marts, which arederivations of the views, processed in the throughput phase. These data marts are mostlyinterrelated. In the case of business statistics, there is one central data mart, which consists ofan integrated and consistent set of key economic variables. The other data marts, which canbe considered as satellites, are then connected to this central data mart through one of the keyvariables.The data marts can be micro data, aggregates or time-series, which can be made availablefor the different expert groups within SN (output phase). These expert groups are based oncoherent themes, like the domains of care, production, crime, tourism or agriculture. Expertgroups can then use the data in two different ways:

a. for (standard) publications and (ad hoc) dissemination of data to the different customers.This way of using data is strictly controlled, authorized and fixed, thus always transparentand reproducible. Confidentiality is a major issue. The internet, including the internetdatabase StatLine, has clearly become the primary distribution channel. Not only for thedissemination of data but also for the access to data by, for example, remote access andremote execution. Gradually a set of web services will evolve. For strategic customersthis is supported by personalised consultancy. A special case here is that micro data aremade available in a restricted way to external researchers.

b. information development. This is an internal process, in fact related to the design phase,where the statistical analyst can integrate and manipulate datasets without limitations.It is his or her experimental garden for new statistics. Data confidentiality only plays amarginal role. This way of using the data is especially supported by OLAP tools (OnlineAnalytical Processing).

The system provides the statistical analyst with tools to translate the output data to variousformats, such as the format of StatLine, the format of Eurostat (GESMES) or formats such asASCII, XML, SPPS, Excel etc.

8. Last but not least, already integrated data is supplied to theNational Accounts (NA), wheremacro-integration of demand and supply takes place, resulting in different kinds of (satellite)accounts.

All activities in the different steps of the process are supported by standardised and generic(software) tools, which are contained in centrally maintained libraries (in Figure 3 representedby the boxes with an ‘L’). These libraries can easily be updated with the latest technologies andmethodologies, ideally without bothering the users.

Although there is only one standardised production line, it does not mean that all data followexactly the same path through the chain. There can be differentiation in the workflow for differentdatasets, especially as it concerns the step of checking and editing (step 5 in Figure 3). Different

Page 10: Towards an Integrated Statistical System at Statistics Netherlands

366 N. HEERSCHAP& L. W ILLENBORG

datasets may need different checks (e.g. small and big enterprises); some datasets need imputationother datasets do not; register data need to be treated differently from survey data; some data needto be weighted and other data are observed integrally etc. Tools will be strongly rule-based (e.g. thetransformation steps).

In fact, once the data is added to the transactional database Baseline, it does not move from oneplace to another. Instead, the various actors process the data for their respective step in de productionchain by looking at the data through different views, using different (software) tools. So, the data arenot actually routed, but the activities of the different actors in the production chain are scheduled,also implying that the specific status of the data should be recorded carefully. This scheduling mustbe managed by the deployment of a central Workflow Management System (WFMS), otherwise itbecomes too complex. The WFMS also supports the management of the processes as it monitorsprogress and delivers information—if necessary—to adjust the implementation.

3.5 Central Data Repository as the Core

The central data repository is the core of the imagined single standardised production line. Itsupports both transactional (in Baseline) as well as analytical processing (in StatBase), however,clearly separated from each other. It is clear that both parts must be geared to each other as much aspossible. Both databases serve as junctions in the statistical process where data are stored.

In a simplified form this data repository has three related dimensions, namely:

1. thecoordinated backbones of unitsor populations. For business statistics, these backbones aremainly based on the CBR. The statistical units (in Baseline and StatBase) can be, for example,the enterprise group, the enterprise, the local unit, a functional unit or a spatial unit. Butbackbones (in Baseline) can also consist of observation units, including tax or register units.There can be relationships between the different units, which must also be maintained. Forexample an enterprise can have more than one local unit, or, a statistical unit can be derivedfrom more than one observation unit. A backbone is longitudinal and cumulative in timewith periods of validity for all units. The maintenance is event-driven. Especially for reasonsof coherence and integration, it is essential that these backbones are coordinated betweenstatistics. These coordinated backbones do not only have a function for the output, but also forsampling, coupling of datasets and statistics, that is the demography of enterprises.

2. theattributes of these units, the variables. A distinction can be made between variables neededfor the maintenance of the backbones, and variables measured to make the actual statistics,including quality indications.

3. thetime dimension.

In other words, for business statistics, the data repository contains backbones of all units or entitiesin the economic domain of which data is collected on the basis of variables, seen in time. See Figure4.

It services production purposes as well as end-users. When fully implemented, (the applicationsor the tools of) the different users will have a uniform interface to all (input)data and metadataregardless of the source. In a technical sense applications and users will not directly interact withthe repository. Instead “interfaces” are provided by which applications and users can “call” whenthey need to interact with the repository. In this way applications and users don’t need to know thephysical data model underlying the repository or the low level business rules and constraints, but onlydefine, in terms of metadata, what they are seeking to do at a higher business level. This means that,for further development, it is more important to standardise these interfaces then the applications ortools themselves.

Page 11: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 367

Figure 4. Main dimensions of the central data repository.

3.6 Third Step: an Output and Metadata Driven Process

The last step in this natural path to an optimum situation is that the process becomes moreoutput-driven, supported by a full-fledged metadata infrastructure (see again Figure 3).

Output-driven

Such an output-driven process starts with the customer information needs. They guide all steps inthe underlyingprocess. Activities should only be performedwhen they serve these informationneeds.However, it must be clear that output-driven does not mean swimming with every tide immediatelyreacting to every wish or change in information needs. For a big part the output of SN is based on amore or less fixed core. This coincides with the fact that it would be too costly to change things allthe time on the input side, also for the respondents and register holders. Besides, it takes time to setup or change a survey.

Output-driven means that there is a strong relationship between the desired output and the specificdomains of the different expert groups, this is regardless of the division between business and socialstatistics; or, the desired output and the variables, which are checked, edited and integrated in relationto each other (the constructed views); or, the desired output and the questions, which are combinedon the same questionnaire; or, the desired output and the control over the complete production chain(e.g. the quality of the input and output between the different steps in the chain). Output-drivenalso means that the limited capacity for processing is allocated optimally to those areas which arestrategic (e.g. integration) and not to areas which are less important for the output.

Metadata infrastructure

If an organisation is divided process-wise, it is elementary that communication and coordinationis set up product-wise and vice versa. The deployment of a full-fledged metadata system is acrucial prerequisite, because it provides the necessary cohesion between the different links andtools in the chain. Metadata also complements a process view with a required data view. This not

Page 12: Towards an Integrated Statistical System at Statistics Netherlands

368 N. HEERSCHAP& L. W ILLENBORG

only concerns conceptual metadata about, for example, units, variables and classifications, but alsometadata about the quality of the data, events and especially the processes, including the rules for allthe transformations of the variables and units and coupling. Metadata can be reused when the sameinformation is needed in other parts of the production process.

Besides its crucial role for communication and coordination in the processes, a good meta-data infrastructure has other advantages. Obviously data without metadata has no meaning. It isa necessary prerequisite for the accessibility of the data. Centralised metadata also pressures formore coordination of concepts and definitions used. This is a key issue for the integration of datasets. Metadata also enhances reproducibility and provides information to improve the processesthemselves. Bottlenecks can be discovered. Metadata contributes to efficiency because it can bereused in (other parts of) the process. An important condition is that the metadata or business rulesare separated from and not embedded in the software (rule-based processing). And, finally, metadatacan be used to generate, in runtime, parts of the functionality (e.g. questionnaires, input machinesand checking rules for different datasets).

It should be made clear, that it is not only a question of technology, that is the creation ofinfrastructure and tools for metadata. It is also a question of sufficient resources and attitude. Thedrawing up of metadata, including documentation, is seen as boring and has a low status. It oftenturns out to be one of the last duties to be carried out in a working process. The priority is low. So itis important to make the benefits of metadata visible and realise that sufficient resources should bemade available to make it work. Metadata are as important as data, and metadata need as much workas data (Sundgren, 2003).

3.7 Advantages of a Single Standardised Production Line

A process with a single standardised production line for all statistics with a central data repositorysupported by generic (software) tools and metadata and workflow management systems has manyadvantages, which are fully in line with the strategic goals set by SN. They present a strong businesscase. It provides possibilities to reconcile opposing goals, such as a leaner and cheaper productionprocess (efficiency) and less survey burden as well as better quality and more flexible output.

Some of the desired outcomes, form input to output, are (e.g. Vosselman & Willeboordse (1997),Willenborg & Heerschap (2002), ABS (2004)):

� reducing the number of times a data provider is approached unnecessarily by a better coordi-nation of all data collection activities and the optimum use of secondary information (e.g. alsosmaller samples). This not only reduces the response burden, but it also improves the imageof SN and is more cost effective.

� better conditions for the use and enforcement of harmonised and coordinated backbones orpopulations.

� the existence of a comprehensive set of (integrated) data better supports the processing ofother datasets: meaning fewer revisions of published data, greater consistency, smaller errorsthrough better estimates and imputations. This may not only have a positive effect on thequality, but also on the timeliness and detail.

� a better infrastructure for (micro) integration or, at least, the (automated) confrontation andsubsequent coordination and integration of data that are derived from different registers andsurveys, thus resulting in more consistent aggregates and better data quality. This also impliesbetter possibilities to exploit secondary information in place of or alongside survey data.

� a uniform and consistent archive and a central output database StatBase for all statistics,which stores all historical and recent data. It will be the one and only source for all statisticalaggregates from which all publications are compiled and data is disseminated. It providesthe necessary flexibility at the output side, providing possibilities to respond more quickly to

Page 13: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 369

changing needs in the market place.� better possibilities for analyses and information development towards an increased range of

new products with, for example:Æ more flexibility in combining statistical data across the different subject matters, leading

to a broader view on, for example, the economic domain of enterprises, relating differentvariables with the possibility of producing new indicators, such as labour productivityand effects of innovation.

Æ a better production of consistent time series, by understanding that the trends reflectreal world developments while the effects of changes in the population can be shownseparately. Followed by, for example, the possibility to track individual enterprises (e.g.the top 500) over time, as well as specified panels of enterprises, which is important forspecific types of longitudinal analyses.

Æ a quicker dissemination through the use of estimation models (e.g. nowcasting).Æ better possibilities to attain consistency between micro data and their related aggregates,

i.e. the micro data should add up to the aggregate.� better facilitation of the processes of the NA accounts with more consistent and integrated data

and metadata.� better circumstances to enforce centralised metadata and proper documentation. This enables

a structured and reproducible way of publication and dissemination. Standardised concepts,formats and proper documentation also enhance the accessibility for statistical analysts and,with that, for customers. Customer information (a client database) for policymaking is moreeasily available. It also provides possibilities for a better alignment between the world of therespondents (input) on the one hand and the publication of data (output) and concepts of the NAon the other. Implementation is also seen as a key enabler for consolidation and enhancementof existing centralised corporate metadata stores.

� better coordination and communication between the different steps in the production chain,also more strongly based on the output needs.

� more efficiency (by economies of scale) and lower processing costs through lower techno-logical support costs for developing and maintaining a smaller number of data stores andassociated applications. These costs not only pertain to for IT, but also to salaries and otherprocessing costs. The introduction of a WFMS allows us to manage production better therebyimproving the efficiency in the deployment of the available staff.

� more rapid development of best practices, including the increased understanding of newtechnologies and methodologies. Also, the implementation of new software and methodologywill be quicker and more manageable through one production line with standardised librariesthan through stovepipe solutions.

Ultimately, the outcome should be to optimise the possibilities to serve the needs of the customersbetter in a cheaper and more flexible way, especially as it concerns integrated and consistent dataand a quicker dissemination.

3.8 Main Risk Factors of a Single Standardised Production Line

However, there is another side to the coin. A process with only one single standardised productionline where much of the data collection is based on secondary information has risk factors that arenot to be underestimated. These risk factors should be identified and dealt with.

First of all, the increased scope and complexity of the processes, the systems and the software willlead to more vulnerability. Also, the higher the complexity the more difficult it will be to adapt thesystem to changing circumstances. So adaptability should be a major prerequisite of the architecture(e.g. software, data components and hardware). For example, modifications of the content should

Page 14: Towards an Integrated Statistical System at Statistics Netherlands

370 N. HEERSCHAP& L. W ILLENBORG

not necessitate in modifications in the architecture and tools.In the transition there is much pressure to create more efficiency and to optimise the use of register

data. However, it is not always clear how this affects the quality of the output, which the outsideworld sees as one of the strong points of SN.

More dependency on registers is another risk factor. SN does not have as much influence on thecontent and form of registers as one would hope. Registers are set up for administrative not statisticaluse. They can come and go depending on the administrative needs. Therefore, these sources maylack stability and continuity. So, it is crucial to map these risks clearly (e.g. linked to the importanceof the output) and evaluate them and to have backup scenarios available (e.g. model-based).

There is also the danger of too much optimism concerning the use of registers. Administrativesources only provide a part of the variables needed. Timeliness or quality of certain registers may betoo poor to be acceptable, and their information may be biased. With the use of registers the tendencyis to eliminate all redundancy in the processes. This can cause problems because certain variables(e.g. employment) are also used as control variables (in questionnaires). So, there will always be aneed for surveys of the businesses themselves.

The desire for a more output-driven process raises the question who the customers are and howtheir information needs can be evaluated? How big is the need for integrated data? Maybe SN hasto allow for the need for more specific and detailed microdata, also regional and quarterly data, onthe one side and the need for more integrated, theme oriented data on the other. Even in the newsituation not all customers needs can be met.

Another question is if the economies of scale are not carried through too far. Can the situation becontrolled and managed efficiently. This was the ultimate legitimation for stovepipes. Is there enoughcommunication and coordination between the different steps of the process? Is the responsibility forthe total process embodied in the organisation and the supporting software? A workflow managementsystem and a metadata infrastructure are crucial prerequisites.

The growing use of registers on the input side and the need for more cooperation with otherresearch institutes on the output side expands the demand-and-supply chains considerably. However,it is not yet very clear how SN should position itself in these newly initiated demand-and-supplychains, in which SN is usually the weaker partner.

Finally, also from an organisational point of view, risks tend to creep in. New developments requiremore resources. However, day-to-day (production) problems often tend to overrule work on long-term solutions. That becomes an even bigger problem when budget cuts persist. Often the tendencyis to plan on the basis of a tight and often ambitious timetable. Disbelief and scepticism is easilycreated when the timetable slides due to unexpected complications. And the skills of managementand staff do not always suit the new situation. This also applies to the dominant corporate culture,where the urgency for change is often not felt as acutely as the current situation requires. Ten yearsof continuous reorganisations have established a strong inward-orientation, often loosing sight of theneeds of the customers.

So, although the business value of such a comprehensive system is very high, this is also true forthe associated risks and the initial costs. It is crucial to asses these risks and to monitor and controlthem continuously.

4 First Steps Towards an Integrated System for Business Statistics

4.1 From Vision to Reality

The optimum process described is an ideal target for the long run. However, it is clear that thiscannot be realised in just a few years. We even may actually never reach the optimum situation, asmany statistical and technical problems still remain to be solved. In the case of the business statisticsthe following examples can be mentioned.

Page 15: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 371

The steps at the start of the process of connecting the data to the backbones, checking and editingand micro-integration can become too complex (see section 3.4). There is little experience withinSN with such a complex process in the area of business statistics.

A major problem concerns the process of flexible and consistent weighting from the total set ofmicro data to aggregate views or data marts. In the context of social statistics there is some experiencewith this problem through repeated weighting (Renssenet al., 2001). However, it is already clearthat this method cannot be applied one-to-one in the context of business statistics.

The question is still open how to deal with the absence of observed data, for example as a result ofthe fact that enterprises are not drawn in the sample or the fact that they did not or only partly respond.One way of dealing with this problem is to use (mass) imputation, especially useful when consistencybetween micro data and their aggregates is important. However, this affects other statistical indicatorssuch as variance.

Although a crucial element of the proposed situation, it is not very easy to get consensus on howto deal with metadata, that is: how to set up an infrastructure, what should and what should not bepart of the system and how this fits day-to-day reality (e.g. coordination, ownership, maintenanceand change management). However, the first steps are made in this area on a SN-wide level (seesection 5).

Another problem concerns the dissemination of data in combination with statistical disclosurecontrol for business statistics. The data disclosure strategy must be determined in a rather earlystage, that is: which cells on what level can be published. Because this cannot be changed at a laterstage this may conflict with the ever changing information needs of the customers and the desiredflexibility.

At a technical level, there is still little experience with the building and the maintenance of bigdata warehouses. Or in more general terms, can IT and methodology support the new situation thatwill serve integration, workflow management, flexibility and both real-time transaction processingand analysis?

And last but not least, we have to reckon with the existing situation. The investments in and theresults of the reorganisations within the Division of Business Statistics of the last 15 years must alsobe taken into account. At the moment almost all the production of institutional business statistics isalready handled through one production line (the Impect project), limiting the number of stovepipesconsiderably. Also the processing of secondary data, such as registers and administrative sources, hasalready been handled through one production line (the Baseline project) for several years. However,this is still strongly limited to tax and social security data. Concerning the business register (CBR)there is project (the ABR-project) underway to update the system totally, so it fits new circumstances(main goals: efficiency, alignment with other external registers, more coordinationand better quality).And finally, on the output side, a database (Microlab) which is the main archive and output tool formost institutional statistics is operational.

4.2 Short and Medium Term Strategies

Taking all these factors into account, for the short and medium term (until 2008), in the Divisionof Business Statistics a strategy with a step-by-step development is chosen. Cathedral buildingis avoided. The described optimum situation can be seen as the guiding light on the horizon.This implies that existing projects, which fit the general strategy, are continued and new projectsare initiated in all parts of the production process. This not only concerns the processes and theunderlying IT-systems, but certainly also the content. For example, the replacement of (parts of)surveys by administrative data, model-based derivations and estimations, integration of datasets andtheme-orientated publications.

Some key projects in the Division of Business Statistics are (CBS/BES, 2005; see Figure 5):

Page 16: Towards an Integrated Statistical System at Statistics Netherlands

372 N. HEERSCHAP& L. W ILLENBORG

� statistical production:Æ total production process (main goal: more efficiency)

� further innovation towards one or the least possible number of standardised produc-tion lines with generic tools and in such a way that it facilitates more statistics thanthe current Impect-production line. Main project: Prodonna.

� the adaptation of the business register to new developments, e.g. the setting up of anexternal business register (BBR). Main project: ABR.

Æ input (main goal: reduction of the survey burden for enterprises):� further reduction of the use of primary surveys by optimising the use of administra-

tive sources. Main project: Screening of digital registrations.� extending the availability of electronic modes for questionnaires. Main project: E-

data reporting. This also relates to the coordination of the collection of enterpriseinformation at the government level (e-government), meaning: an infrastructure,where information about enterprises is collected once, and subsequently shared byall government organisations.

� better alignment between input and output, resulting in a smaller set of input variablesbeing asked.

� centralisation of in and outbound contacts with respondents (one contact centre),including a strategy to enforce response.

� adaptation to XBRL as the accountancy standard.Æ throughput (main goal: efficiency):

� extension of the use secondary information, especially the use of tax data for theestimation of all kinds of production variables (e.g. turnover, monthly as well asyearly), starting with the smaller enterprises. Main projects: PS+ and KS in waves.

� a clear separation of the business rules and the software needed (less dependencyon IT experts).

� very small surveys should be processed by software that can be maintained bystatisticians (e.g. SPSS).

� see total process (standardisation and generic tools). See: Prodonna.Æ output (main goals: better accessibility and use of data, further integration/consistency

of datasets and flexibility).� building a first version of the central output data warehouse (StatBase), as a first step.

This project goes hand in hand with the necessary conversion of an existing olderoutput database (Microlab). The business functionality associated with this olderoutput database will be replaced and improved with the long term aim in mind.This means a central storage, archive and output tool for all business statistics. Mainproject: ESB-Basis (Enterprise Service Buss).

� improvement of the quality (less updates) and timeliness of the output.� better insight in the use of statistics and customer satisfaction.

� information development:Æ the presentation of a set of more consistent economic data through micro-integration,

starting with some key variables taken from annual statistics, such as production, trade,investments, R&D and employment, including data from social statistics. This servicesgeneral output and the NA. More consistency is realised in two ways:

� because of their impact on the overall figures, the biggest enterprises will already bemade consistent in the data collection stage, using primary and secondary informa-tion. Main project: Top XXX. The number of enterprises involved will be graduallyextended.

Page 17: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 373

� for the smaller enterprises consistency will be gained mainly by using statisticalmethodology. This can be at the micro and at the meso-level (direct estimates of theoutput cells). Main project: ESB-Integration.

Æ the setting up of more coherent output themes, such as care, new economy and agriculture.In some cases across divisions.

Æ elaboration of the collaboration with other institutions (e.g. knowledge institute).

Figure 5. Main projects for the short and medium term within the Division for Business Statistics.

5 The Extension to a SN-wide Architecture

Not only in the Division of Business Statistics, but also in the Division of Social Statistics a similarredesigning process has been going on for some time. At the SN-wide level, this has recently ledto the development away from the now separate business architectures per division towards a moregeneral architecture for SN as a whole. The first proposals for such a SN-wide architecture fit theviews described here in paragraphs 3.4 to 3.6, but also include the activities of the NA and relatedstatistics. Existing and new projects in both divisions should gradually converge to this long termview. The core of this view is (Ypmaet al., 2005):

� to use administrative sources as much as possible;� to be able to integrate these sources mutually and to combine them with survey data;� and, in an efficient way, to produce consistent statistical information, both for individual

statistics as well as for general themes.

Page 18: Towards an Integrated Statistical System at Statistics Netherlands

374 N. HEERSCHAP& L. W ILLENBORG

The proposals for a SN-wide architecture has led to the initiation of five so-called overall coreprojects:

� data collection. This project is responsible for a SN-wide strategy for data collection andthe implementation of the supporting IT-systems. It concerns all the data collection, both forbusiness as well as social statistics. Part of this project deals with e-data reporting. The aimhere is to further digitalise the data-collection, as much as possible. The scope includes datacollection through surveys as well as the support to increase the use of administrative sources.When surveys are conducted, preference is given to the electronic mode, either web basedonline, web based offline or e-mail.

� infrastructure for data storage endata exchange. This project is responsible for the develop-ment of a conceptual model for data storage and data exchange and, in a later phase, for thetranslation of this model to IT-systems. It relates to the repository, described in sections 3.4and 3.5. (see Figure 3), including the interfaces to the software used for the processing of thedata in the throughput phase.

� generic tools for the processing of data. An important prerequisite here is that these generictools can be used and changed by statisticians without the intervention of IT experts. Theyshould be rule-based, with the aim to increase flexibility but at the same time to reducemaintenance costs.

� centralised infrastructure for metadata, that can be used in all the steps of the productionchain.The starting point is the coordinated description of the variables, classifications and populationunits as well as the data collection and processing methods used. Recently a conceptual model,the so called “view model”, was presented. The next step is the proof of concept of this modelon the basis of some pilots.

� remote accessand remote execution to improve the accessibility to statistical data and (web-based) services for external researchers.

These projects are coordinated through one overall, SN-wide, business architecture. This generalmodernisation programme is supplemented with the current redesign of the NA.

The development towards one overall architecture can, in fact, be seen as a third wave of adaptationsof the statistical processes within SN, strongly related to the organizational structure at that moment.

The first wave started in the 1980’s, with the computerization of the stovepipe surveys, basedon individual solutions and tailor-made applications. These adaptations were strongly driven by thegrowing possibilities of the information technology. The growing impact of the Internet started toplay a role. Especially the growing external pressure for more efficiency in the 1990’s, led to thesecond wave of adaptations at the turning of the century, aiming for more standardised processesand standardised tools for all statistics. The use of the Internet had become an integral part of theoutput of most of the processes. This went together with an organizational shift from a productstovepipe model to a process oriented model, providing the possibility to exploit the economies ofscale. However, this was limited to the historic separation of social and business statistics. This led tothe current organizational structure, that is a divisional structure separating business statistics, socialstatistics and the NA, all with their own architectures. The recent proposals for one overall SN-widebusiness architecture mark a third wave of adaptations, which is to be even more efficient by furtherstandardisation and the exploitation of economies of scale, disregarding the possible differencesbetween social and business statistics. The possibilities of Internet will be fully used, both on theoutput side as well as on the input side, including the use of administrative sources.

Seen from a broader perspective, there are certainly many similarities between the productionof business and social statistics. However, it is still a question if this holds true in all steps ofthe production chain on a more detailed level. Think for example on differences in (the dynamicsand characteristics of) the populations, type of variables, the size of the files, the way datasets are

Page 19: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 375

integrated, the handling of non-response, the weighting and the disclosure strategy.So, to make such an overall architecture eventually work, will not be a matter of the presentation of

a global vision, but much more of the implementation, migration of the current legacy systems whileensuring “business as usual” and the possibilities to change the mindset of the employees. SN is stillat the beginning of this third wave of modernisation of the statistical processes. Too often hamperedwith the danger of unrealistic planning, wanting too much too quickly. Also internationally, there arehardly any successful examples from which SN can learn some lessons.

Organisational structure

Another crucial question is if this new way of producing statistics can be successfully implementedwithin the current divisional structure? A single standardised statistical process for all statistics willeventually ask for a further fine-tuning of the organisational structure.

This may lead to the situation with one input division (“data production factory”) and one outputdivision. A hybrid situation will probably evolve. The structure of collecting and processing of thedata will be process orientated. However, on the output side it is quite conceivable that a situationwill develop, where you have different expert groups, which are product or better theme orientated.There will be no specific separation between business and social statistics anymore. These expertgroups will be rather small and flexible. New expert groups, depending on the needs from the marketplace, can be easily set up. A first example of such an expert group is the integration of all thestatistics on care.

As said earlier, for very small surveys it is not efficient to divide the production process intodifferent steps which are carried out by different organisational units. However, they are obliged touse the generic (software) tools, the metadata infrastructure and the central data repository as theirdata-storage, archive and output-tool.

6 Conclusions

Every organisational design has it advantages and disadvantages. There is no single best solution.Those organisations that can best adapt to the changes in the market place will have the greatest chanceto survive (contingency theory; Donaldson, 2001; Mintzberg, 1986). This is the situation in whichSN finds itself. Changing circumstances have forced SN to change its way of producing statisticsand thereby to redesign its statistical processes. A product stovepipe model is no longer in line withthe changing needs of customers, growing competition, the wish to reduce the response-burden byoptimising the use of secondary information, the availability of new information technologies andstatistical methodologies and, above all, the pressure for more efficiency. To survive, SN has to adaptto these new circumstances. It must reposition itself in the market place by adding new value to itsoutput. The bottom line is rather simple: if SN is unable to add new value, it has no reason to exist.

To adapt to these new circumstances an optimum situation has been sketched with the followingcharacteristics. Only one single standardised production line for all statistics with a central datarepository as the core, separated in a part for data transactions (Baseline) and a part for analysisto support data dissemination and information development (StatBase). It is output- and metadata-driven. Economies of scale are key. The production is supported by generic and centrally maintained(software) tools. Knowledge required for these tools—insofar they cannot be bought—is concentratedand not diffused. A workflow management system manages all data processing, meaning that datacan virtually follow different routes through the production chain depending on the source, thecharacteristics and the situation (e.g. skills and number of staff). A central metadata infrastructureis the corner stone, with key elements: communication, coordination, coherence, reusability andreproducibility. The data collection leans more and more on secondary information (registers). The

Page 20: Towards an Integrated Statistical System at Statistics Netherlands

376 N. HEERSCHAP& L. W ILLENBORG

required integration of the different data sets takes place as early in the process as possible and theorder disconnection point (e.g. micro to macro data) is put as far towards the end as possible.

The business processes and IT-systems are aligned in such a way that the reusability of existingmaterials, such as data, metadata, software, interfaces, methodology and even people, is encouragedthroughout all phases of the production chain. From a technological point of view, the overallstrategy is to minimise the number of platforms and applications in order to limit costs and tomaximise compatibility among products and services.

The impact of such an optimum situation can be characterised with the notions of improvedquality,flexibility, accessibility and coherence, as well as the implementation of optimum data collectiontechniques; the optimum use of all information, secondary and primary; full possibilities to produceintegrated statistics, highly standardised and sound documentation. Fully implemented and managed,at least in principle, this bears a huge efficiency potential. It is a key component in achieving manyof the strategic goals of SN. It supports the central business processing requirements and it is in linewith the general business architecture.

However, changing the processes and systems is not enough. The central management of severalaspects of the process should be embedded much more, that is workflow, metadata (coordination andharmonisation) and output/customer needs. The focus in content must be on the integration towardstheme-oriented datasets. Not only to be able to present more consistent data or to improve qualityof the processes, but foremost to produce new statistics, especially if datasets of business and socialstatistics can be integrated.

The described optimum situation cannot be reached within a few years, since there are still manytechnical and statistical problems that have to be solved. There is still some way to go. A step-by-stepapproach is desirable and realistic.

This article and in particular the examples used are based on research carried out for the Divisionof Business Statistics. The challenges, mentioned above, however, have dominated the developmentsin the Division of Social Statistics as well. These developments will, on the long run, converge tothe implementation of one overall Business Architecture for SN as a whole. A situation where thearchitectures of both statistical divisions and that of the NA are merged. The general opinion is thatthis should not lead to a transformation with one big bang. Rather an evolution then a revolution isadvocated. Only in the actual implementation similarities and differences in concepts, also betweensocial and business statistics, can be tested.

Such a transformation process is not without risk. The risks have to be identified and dealt with.A new element in this process is that SN has to realise that it is becoming more of a partnerin the different production chains of statistical information than the main contractor. The role ofintermediary of statistical information is different from the one of the main producer of statistics.

This modernisation process fits the experiences in other international statistical organisations,where similar developments are going on. However, successful examples of the actual implementationof big general overall systems in this area are still rare. Large scale efforts often have to be reevaluatedand downsized. It does not seem a matter of a strong vision for the long run, but the possibilitiesfor implementation, migration of current legacy systems and the required change in culture andorganisation. Therefore, international cooperation is strongly advocated.

Acknowledgements

The authors kindly thank the Referee and the Joint-Editor for their constructive comments.

Page 21: Towards an Integrated Statistical System at Statistics Netherlands

Towards an Integrated Statistical System 377

References and related reading

ABS (2004). The ABS Input Data Warehouse. The Australian Bureau of Statistics (ABS),The Survey Statistician.Bethlehem, J., Kent, J. & Ypma, W. (1999). On the use of metadata in statistical data processing. UNECE Work session on

Statistical Metadata, Geneva, Switzerland.CBS (2005). Rekenen op statistieken (Counting on statistics), ICT-Masterplan. Internal report, Statistics Netherlands.CBS/BES (2005). BEET (Business statistics, Efficient, Effective and on Time). Internal report, Statistics Netherlands.Colledge, M.J. (1999). Statistical integration through metadata management.International Statistical Review, 67, 79–98.Cook, L. (1999). Managing in a networked statistical system. Statistics New Zealand, Wellington, New Zealand.Donaldson, L. (2001).The contingency theory of organizations. Sage Publications.Dunnet, G. & Osborne, G. (2005). A new information model for a national statistical office in the 21 century. Presentation

ISI conference 2005, Sydney, Australia.Froeschl, K.A. (1997).Metadata management in statistical information processing. Wien-New York: Springer.Gates, W. (1999).Business @ the speed of thought. New York: Warner Books.Gillman, D.W., Appel, M.V. & LaPlant, W.P. (1996). Design principles for a unified statistical data/metadata system.Pro-

ceedings of the 8th International conference on Scientific and Statistical Database Management, pp. 150–155. U.S.Bureau of the Census, Washington, United States.

Gillman, D.W. (2001). Corporate metadata repository (CMR) model.Proceedings of the MetaNet Conference, Voorburg,Netherlands.

Graves, R. & Hutton, T. (2003).The Statistical Town Plan. Paper for the UN-ECE/Eurostat/OECD Meeting on The Manage-ment of Statistical Information Systems, Geneva, Switzerland.

Gyorki, I. & Pap, I. Metadata driven statistical data warehouse system at the Hungarian Central Statistical Office. Paper forthe UNECE/Eurostat/OECD Work session on Statistical Metadata, Geneva, Switzerland.

Immon, W.H. (2002).Building the data warehouse, (3rd edition). Wiley.Johanis, P. & Bellerose, P. (2004). Use of standardized metadata to find, select and access statistical data. Paper UN-

ECE/Eurostat/OECD Work session on Statistical Metadata, Geneva, Switzerland.Joshua, D. & Johnson, T. (2005). Meeting the Development Challenges at ONS—A case study. Paper for the UN-

ECE/Eurostat/OECD Meeting on the Management of Statistical Information Systems, Bratislava, Slovakia.Kimball, R. (1998).The data warehouse lifecycle toolkit. Wiley.Klep, J. (1999). Future challenges in the management of statistical metadata. Paper for the UNECE Work session on Statistical

Metadata, Geneva, Switzerland.Mintzberg, H. (1986).Structure in fives; designing effective organisations. Prentice Hall.Nordbotten, S. (1967). Purposes, problems and ideas related to statistical file systems.Proceedings from the 36th session of

the ISI, Sydney, Australia.Pedersen, L. & Jespersen, N. (2003). Cheaper, faster, better—what else is new? Re-engineering the statistical production

in digital Denmark. Paper for the UN-ECE/Eurostat/OECD Meeting on the Management of Statistical InformationSystems, Geneva, Switzerland.

Renssen, R.H., Kroese, A.H. & Willeboordse, A.J. (2001). Aligning Estimates by Re-peated Weighting. Report, StatisticsNetherlands.

Samuelson, L. & Thygesen, L. (2004). Building OECD’s new statistical information system. Paper for the UN-ECE/Eurostat/OECD Meeting on the Management of Statistical Information Systems, Geneva, Switzerland.

Sundgren, B. (1999). An information systems architecture for national and international statistical organisations. Paper forthe Seminar on Management of Statistical Information Technology, Geneva, Switzerland.

Sundgren, B. (2003). Developing and implementing statistical metadata systems. Metanet, Epros-project nr. IST-1999-29093.Vosselman, W. & Willeboordse, A.J. (1997). Breaking down the walls between business statistics. Report, Statistics Nether-

lands.Willeboordse, A.J. (2000). Towards a New Statistics Netherlands: blueprint for a process oriented organisation structure.

Internal paper, Statistics Netherlands.Willenborg, L. & Heerschap, N. (2002). Plans for an ESB, the Enterprise Service Buss. Internal report, Statistics Netherlands.Ypma, W., Huigen, R., Kazemier, B., Renssen, R. & Van Velzen, J.,et al. (2005). Doorkijk statistisch proces van de toekomst

(View on the statistical process of the future). Internal report, Statistics Netherlands.Zeila, Karlis, (2004). Metadata driven integrated statistical data management system. Paper for the UNECE/Eurostat/OECD

Meeting on the Management of Statistical Information Systems, Geneva, Switzerland.

Resume

Sous la pression d’un contexte en pleineevolution, Statistics Netherlands (SN) est contraint de revoir le processus deproduction de ses statistiques. Les developpements-cles sont les suivants: nouvelles demandes des utilisateurs de donnees,concurrence croissante, exigence de reduction de la charge imposee aux entreprises par les enquetes statistiques, mise enplace de nouvelles techniques et methodes, et avant tout le besoin d’une efficacite accrue en raison de coupes budgetaires.

La presente contribution montre comment SN, et en particulier ses statistiqueseconomiques, peut s’adapter aux nouvellescirconstances. La situation optimale telle que nous l’envisageons est centree sur une chaıne de production unique pour toutesles statistiqueseconomiques et un systeme de stockage centralise. Cette chaıne de production unique se base sur des standardsgeneriques de logiciel, de metadonnees et de gestion du deroulement des operations.

Il est toutefois clair qu’une telle situation optimale ne peut se realiser en quelques annees—on peut se demander meme s’il

Page 22: Towards an Integrated Statistical System at Statistics Netherlands

378 N. HEERSCHAP& L. W ILLENBORG

est possible de la realiser. Nous la voyons plutot comme une direction, un point sur l’horizon. Aussi mettons-nous l’accentsur les premiers pas de la transformation qui doit nous mener du processus statistique traditionnel, base sur une ligne deproduction specifique pour chaque produit, au processus generique et integre de l’avenir.

Un remodelage semblable est en cours pour les statistiques sociales. Bientot les systemes de statistiques sociales eteconomiques devront communiquer aux points pivotaux, et finiront par converger sur une infrastructure generalisee. Le debatportant sur une telle infrastructure generalisee pour SN a deja commence.

[Received August 2005, accepted March 2006]