23
CSU Data Stewardship CSU Data Stewardship Committee Committee Kickoff Meeting Kickoff Meeting April 4, 2003 April 4, 2003

CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

Embed Size (px)

Citation preview

Page 1: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

CSU Data Stewardship CSU Data Stewardship CommitteeCommittee

Kickoff MeetingKickoff Meeting

April 4, 2003April 4, 2003

Page 2: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

22

What is Data Stewardship?What is Data Stewardship?

• ““Data stewardship is the process of managing Data stewardship is the process of managing information necessary to support program and information necessary to support program and financial managers, and assuring data captured financial managers, and assuring data captured and reported is accurate, accessible, timely, and and reported is accurate, accessible, timely, and useable for decision-making and activity useable for decision-making and activity monitoring.”monitoring.”

– U.S. Department of the InteriorU.S. Department of the Interior

Page 3: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

33

What is Data Stewardship? What is Data Stewardship? (cont’d)(cont’d)

• ““Data Stewardship has, as its main objective, the Data Stewardship has, as its main objective, the management of the corporation's data assets in management of the corporation's data assets in order to improve their reusability, accessibility, and order to improve their reusability, accessibility, and quality. It is the Data Stewards' responsibility to quality. It is the Data Stewards' responsibility to approve business naming standards, develop approve business naming standards, develop consistent data definitions, determine data aliases, consistent data definitions, determine data aliases, develop standard calculations and derivations, develop standard calculations and derivations, document the business rules of the corporation, document the business rules of the corporation, monitor the quality of the data in the data monitor the quality of the data in the data warehouse, define security requirements, and so warehouse, define security requirements, and so forth….”forth….”

– Claudia Imhoff, Ph.D., President, Intelligent Solutions, Inc.Claudia Imhoff, Ph.D., President, Intelligent Solutions, Inc.

Page 4: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

44

What is Data Stewardship? What is Data Stewardship? (cont’d)(cont’d)

• ““Stewardship programs focus on improving data Stewardship programs focus on improving data quality, reducing data duplication, formalizing quality, reducing data duplication, formalizing accountability for data, and improving business accountability for data, and improving business and IT productivity.  An effective Data Stewardship and IT productivity.  An effective Data Stewardship program will rapidly improve the ROI from data program will rapidly improve the ROI from data warehousing and business intelligence efforts, warehousing and business intelligence efforts, application integration efforts, ERP, CRM, content application integration efforts, ERP, CRM, content and knowledge management, and EAI efforts.”and knowledge management, and EAI efforts.”

– Robert Seiner, Publisher, The Data Administration Robert Seiner, Publisher, The Data Administration NewsletterNewsletter

Page 5: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

55

Why do we need data Why do we need data stewardship?stewardship?

• Consider the costs of poor data qualityConsider the costs of poor data quality– Incorrect enrolled student countsIncorrect enrolled student counts– Incorrect flexibly scheduled course section countsIncorrect flexibly scheduled course section counts– Incorrect alumni dataIncorrect alumni data

• Why reinvent the wheel – and differently every Why reinvent the wheel – and differently every time, at that!time, at that!– Reports that claim to show the same information, but Reports that claim to show the same information, but

with different resultswith different results– Leads to decisions based on information that is incorrect Leads to decisions based on information that is incorrect

or improperly understoodor improperly understood

Page 6: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

66

Data Stewardship Committee Data Stewardship Committee chargecharge

• The charge of the Data Stewardship Committee The charge of the Data Stewardship Committee (DSC) is to define, validate, organize and protect (DSC) is to define, validate, organize and protect data assets, thus enabling areas throughout the data assets, thus enabling areas throughout the University to make decisions based upon high-University to make decisions based upon high-quality, easily usable informationquality, easily usable information

Page 7: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

77

Creating our common visionCreating our common vision

• What products should we develop?What products should we develop?

– Data martsData marts

– Data quality metricsData quality metrics

– Metadata repository/data dictionaryMetadata repository/data dictionary

– Other?Other?

Page 8: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

88

Creating our common vision Creating our common vision (cont’d)(cont’d)

• What services should we provide?What services should we provide?

– Change controlChange control

– Other?Other?

Page 9: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

99

What is a data mart?What is a data mart?

• “…“…the restriction of the data warehouse to a single the restriction of the data warehouse to a single business process or to a group of related business business process or to a group of related business processes targeted toward a particular business group.”processes targeted toward a particular business group.”

– Ralph Kimball, Ph.D., CEO Ralph Kimball AssociatesRalph Kimball, Ph.D., CEO Ralph Kimball Associates

• ““A data mart is a subject-specific collection of A data mart is a subject-specific collection of organizational data which can be used for analytical organizational data which can be used for analytical purposes relating to specific business questions or purposes relating to specific business questions or functions. A data mart contains functions. A data mart contains onlyonly that data which is that data which is needed to respond to the specified business questions.”needed to respond to the specified business questions.”

– David FullerDavid Fuller

Page 10: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1010

What is a data mart (cont’d)What is a data mart (cont’d)

• Data marts are usually derived by taking many Data marts are usually derived by taking many tables and “flattening” them into a few tablestables and “flattening” them into a few tables

• Data marts are easier to query and report fromData marts are easier to query and report from

Page 11: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1111

What are data quality metrics?What are data quality metrics?

• “…“…there is no meaningful concept of data quality in the real there is no meaningful concept of data quality in the real world; it is only as a by-product of the deficiencies of abstracting world; it is only as a by-product of the deficiencies of abstracting and representing reality that data quality arises as an issue at and representing reality that data quality arises as an issue at all.”all.”– Matt Duckham, Dept. of Computer Science, University of Keele, UKMatt Duckham, Dept. of Computer Science, University of Keele, UK

• Metrics are ways to measure data qualityMetrics are ways to measure data quality– How many values in a column are valid (internal consistency)?How many values in a column are valid (internal consistency)?

• What state is ‘ZZ’?What state is ‘ZZ’?– How many values across columns are consistent (external How many values across columns are consistent (external

consistency)?consistency)?• Why does ‘Ms.’ Jane Doe have a gender of ‘Male’?Why does ‘Ms.’ Jane Doe have a gender of ‘Male’?

• Metrics show data quality improving or worsening over timeMetrics show data quality improving or worsening over time

• We are developing a data quality architecture (DQA) for use with We are developing a data quality architecture (DQA) for use with a variety of data sourcesa variety of data sources

Page 12: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1212

What is a metadata repository?What is a metadata repository?

• ““Meta data is all physical data and knowledge-Meta data is all physical data and knowledge-containing information about the business and containing information about the business and technical processes, and data, used by a technical processes, and data, used by a corporation…. While meta data repositories corporation…. While meta data repositories perform all of the functions of a data dictionary, perform all of the functions of a data dictionary, their scope is far greater.”their scope is far greater.”

– David Marco, President, Enterprise Warehousing David Marco, President, Enterprise Warehousing SolutionsSolutions

• What features might a metadata repository have?What features might a metadata repository have?

Page 13: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1313

What is a metadata repository? What is a metadata repository? (cont’d)(cont’d)

• Definitions of columns and tables.Definitions of columns and tables.

• The ability to determine which tables contain a The ability to determine which tables contain a given column, or a column with a given given column, or a column with a given description – e.g., which tables contain “Academic description – e.g., which tables contain “Academic Sub-Plan”.Sub-Plan”.

• The ability to “query the queries” – e.g., find all The ability to “query the queries” – e.g., find all existing queries with “IPEDS” in their description.existing queries with “IPEDS” in their description.

Page 14: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1414

What is a metadata repository? What is a metadata repository? (cont’d)(cont’d)

• The ability to determine which queries reference a The ability to determine which queries reference a given column and/or specific values of that column – given column and/or specific values of that column – e.g., which queries use employee status “L” as one of e.g., which queries use employee status “L” as one of their criteria?their criteria?

• The ability to determine the path (menu group > The ability to determine the path (menu group > panel group > panel) to follow to reach a particular panel group > panel) to follow to reach a particular panel – e.g., how do I get to the “Application Data” panel – e.g., how do I get to the “Application Data” panel?panel?

• The ability to determine which columns from which The ability to determine which columns from which tables appear on a given panel, and vice versa – e.g., tables appear on a given panel, and vice versa – e.g., From which table and column is “Program Action” From which table and column is “Program Action” populated in panel X, and conversely, which panels, in populated in panel X, and conversely, which panels, in addition to panel X, are populated by ACAD_PROG?addition to panel X, are populated by ACAD_PROG?

Page 15: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1515

What is a metadata repository? What is a metadata repository? (cont’d)(cont’d)

• A metadata repository can capture data A metadata repository can capture data definitions, not create themdefinitions, not create them

• Definitions more detailed than those already Definitions more detailed than those already stored somewhere must be provided by subject stored somewhere must be provided by subject matter experts (SMEs)matter experts (SMEs)

• The metadata repository can provide a framework The metadata repository can provide a framework for the systematic capture and publication of this for the systematic capture and publication of this metadatametadata

Page 16: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1616

What is change control?What is change control?

• In this context, it means controlling certain In this context, it means controlling certain changes to the datachanges to the data– Adding new values to critical columnsAdding new values to critical columns– Any other changes that can impact reporting Any other changes that can impact reporting

• These changes should be brought to the attention These changes should be brought to the attention of this committee before they are madeof this committee before they are made– Data users can assess and discuss the impactData users can assess and discuss the impact– The changes can be published before they are madeThe changes can be published before they are made– Users can modify reports as needed before they risk Users can modify reports as needed before they risk

publishing incorrect informationpublishing incorrect information

• But how do we define critical data?But how do we define critical data?

Page 17: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1717

Identifying Critical DataIdentifying Critical Data

• Data stewardship over all CSU data is not cost-Data stewardship over all CSU data is not cost-effectiveeffective

• The prerequisite to The prerequisite to – developing data martsdeveloping data marts– implementing a data quality architectureimplementing a data quality architecture– developing a metadata repository/data dictionarydeveloping a metadata repository/data dictionary– and instituting change control over our dataand instituting change control over our data

is identification of the critical data over which we is identification of the critical data over which we will maintain stewardshipwill maintain stewardship

Page 18: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1818

University DataUniversity Data

• ““University data are institutional assets and are University data are institutional assets and are held by the university to support its fundamental held by the university to support its fundamental instructional, research, and public service instructional, research, and public service missions.”missions.”

– Arizona State UniversityArizona State University

Page 19: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

1919

University Data (cont’d)University Data (cont’d)

• ““UNIVERSITY INFORMATION -- A data element is considered UNIVERSITY INFORMATION -- A data element is considered UNIVERSITY INFORMATION if it provides support to and meets the UNIVERSITY INFORMATION if it provides support to and meets the needs of units of the University. Examples of UNIVERSITY needs of units of the University. Examples of UNIVERSITY INFORMATION include, but are not limited to, many of the elements INFORMATION include, but are not limited to, many of the elements supporting financial management, student curricula, payroll, personnel supporting financial management, student curricula, payroll, personnel management, and capital equipment inventory. Data may be management, and capital equipment inventory. Data may be considered UNIVERSITY INFORMATION if it satisfies one or more of the considered UNIVERSITY INFORMATION if it satisfies one or more of the following criteriafollowing criteria

A. It is used for planning, managing, reporting, or auditing a major A. It is used for planning, managing, reporting, or auditing a major

administrative function;administrative function;B. It is referenced or used by an organizational unit to conduct B. It is referenced or used by an organizational unit to conduct University business;University business;C. It is included in an official University administrative report;C. It is included in an official University administrative report;D. It is used to derive an element that meets the criteria above.D. It is used to derive an element that meets the criteria above.

… …Data that may be managed locally may yet have significant impact Data that may be managed locally may yet have significant impact if it is used in a manner that can impact University operations….“if it is used in a manner that can impact University operations….“

– Georgia State UniversityGeorgia State University

Page 20: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

2020

University Data (cont’d)University Data (cont’d)

• No one owns University data but the UniversityNo one owns University data but the University

• We may be data stewards, custodians, users, We may be data stewards, custodians, users, producers, etc., but we are not owners of the dataproducers, etc., but we are not owners of the data

Page 21: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

2121

TasksTasks

• Identifying University DataIdentifying University Data– Which columns in which tables (or which fields on which Which columns in which tables (or which fields on which

panels) do we need topanels) do we need to• Define?Define?• Use in reporting / analysis / decision-making?Use in reporting / analysis / decision-making?• Quality assure?Quality assure?• Exercise change control over?Exercise change control over?

• What are our relative priorities?What are our relative priorities?– Data martsData marts– Data qualityData quality– Data dictionaryData dictionary– Other products?Other products?

Page 22: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

2222

Future possibilitiesFuture possibilities

• Statistical analysis of dataStatistical analysis of data

• Data miningData mining– The “diapers and beer” discoveryThe “diapers and beer” discovery– Registration patternsRegistration patterns– Student attrition patternsStudent attrition patterns

• PrerequisitesPrerequisites– Better data structureBetter data structure– Better data qualityBetter data quality

Page 23: CSU Data Stewardship Committee Kickoff Meeting April 4, 2003

Thank You!Thank You!