View
215
Download
0
Category
Preview:
Citation preview
© 2005 Your name here© 2005 Open Commons License
Case Study:Integrating K-12 Education into the
National Information Exchange Model
Dan McCreary
Dan McCreary & Associates
2
© 2005 Your name here.
2
Background
Dan McCreary - Dan McCreary & Associates
President of consulting firm that focuses on metadata-driven IT strategy development infrastructures for:
– Service Oriented Architectures (SOA)
– Model Driven Architecture and Development (MDA, MDD)
– Data warehousing and Business Intelligence (BI)
– Metadata management training
Hired in January of 2005 to build and populate a enterprise-wide metadata registry for the Minnesota Department of Education in partnership with Wisconsin Department of Public Instruction and Michigan Department of Education
Presentation Web site:– http://www.danmccreary.com/presentations/semweb2006
3
© 2005 Your name here.
3
Agenda
Case study of building a “semantic garden” for K-12 metadata with a modest budget for a state agency (~$150K)
A place where your metadata can take root, grow and bloom
Target a broad audience with goal of concept retention – use of images and metaphors
Overview of Presentation
1970 Sci-Fi Classic: “The Forbin Project”
A NewIntersystemLanguage!
Lesson: Before you take over the world you must exchange semantically precise metadata!
6
© 2005 Your name here.
6
Big Hairy Audacious Goals: Search Agents
Legislator: What statewide programs increase test scores?
DistrictSuperintendent: What “subgroups” in my district need the most help
in math to meet NCLB guidelines?
School Principal: What areas do new teachers need help in?
Teacher: What areas do my students need the most help to pass statewide assessments?
7
© 2005 Your name here.
7
“Shopping” for Metadata
Your “shopping cart” isfull of Data Elements
8
© 2005 Your name here.
8
Key Business Drivers
Emphasis on “data driven decision making”
Need for longitudinal data analysis (i.e. a data warehouse) driven by the No Child Left Behind (NCLB) actRequired Consistency across:– Time– School districts– Grade-levels (K-12)– Assessment-subjects (reading, writing, math)
Need for cost-effective application interoperability and the desire to “break down application silos”
9
© 2005 Your name here.
9
Technology DriversDesire to promote Service Oriented Architectures (SOA)
– Web services– Build a library of exchange documents– Consistent web-form definitions
Desire to promote Model-driven Architecture (MDA)– Model driven development (MDD)– Model driven testing (MDT)
Migration from “procedural” to “declarative” programming– Procedural programming is over-emphasized and makes business logic
only maintainable by programmers– Declarative programming and transformation is much more appropriate
when a large metadata-databases are available– Metadata driven systems allow more non-programmers to maintain
business logic
Avoid invention of new standards– Desire to “build upon" other machine-readable standards– ISO metadata registries do exist
10
© 2005 Your name here.
10
Promotion of Loosely Coupled Systems
Tightly Coupled
– Like a wine glass
– Fragile
– Breaks easily when there are changes in either the source or destination system
Loosely Coupled
– Like a rubber ball
– Resilient
– Allows change and interoperability regression testing without breaking interfaces
– Example: the addition of new data elements
11
© 2005 Your name here.
11
US Department of Education effort to measure student “proficiency” deltas for nine subgroup populations (Asian, Black, Hispanic, Native American, Special Ed etc.) within each state over time and measure incremental gains in achievement levels
Introduced concept of Adequate Yearly Progress (AYP) for a School and School District – (if any sub-group fails your school and district fail)
Each state defines “proficiency” independently so state-to-state comparisons are not practical at this time
Multiple political interpretations of NCLB not discussed here:
– Republican vs. Democratic
– Rural vs. Suburban vs. Inner City
– Public vs. Private Educational Funding
US Dept of Ed. releasing $53 million in grants for longitudinal data systems to individual states
Message from the Department of Education: “Build your statewide assessment metadata garden”
NCLB
12
© 2005 Your name here.
12
US Department of Justice/Department of Homeland Security initiative to build a federal metadata registry based on Global Justice XML Data Model (GJXDM) project
Complies with federal ISO/ICE 11179 metadata registry guidelines (with a few exceptions)
Introduced very successful tools for subschema generation in conjunction with large ontologies in building XML exchange documents
Introduced concepts of “Universal” and “Core” classification schemes
Available today in an XML Schema and an Excel spreadsheet
Subschema generation tools may be available in 3Q of 2006
13
© 2005 Your name here.
13
NIEM Scope
You Are Here
Source: http://www.niem.gov/implementation.php
DomainSpecific
Student
Teacher
NIEM Type “Classification Scheme”
Common
Aircraft
Assessment
Boat
Case
Clothing
Activity Address
Document
Event
ImageLong/Lat
LocationOrganization Person
Residence
Street
Vehicle
Universal
Contact
15
© 2005 Your name here.
15
High Level Structure of the NIEM
The NIEM loosely follows ISO-11179 metadata registry guidelines
The structure is a subclass hierarchy of “Concepts”
Start with a abstract Thing
Start with shared upper-ontology “Concepts” (blue)
Add properties that each have Representation Terms (orange)
Add subclasses and then subclass properties (yellow)
Thing
ActivityStartDate
ActivityEndDate
PersonBirthDate
PropertyType
Activity Document PersonOrganization
ConceptType
StudentStateAssignedIDEnrollmentStateDate
Student Teacher
Education Extensions
Enrollment
16
© 2005 Your name here.
16
Reuse and Extension Strategy
Match: If an NIEM data element met our needs, we used the NEIM data element and created an OWL sameAs statement with a high-precision match (Note: The definitions must match exactly)
Trim: If an NIEM data element has more detail than we needed, we created a local definition but created a sameAs link with a lower precision match level.
Extend: If the NIEM doesn’t have everything we need, create a local definition, add to the definition and create a sameAs link with a medium match level.
New: If there is no data element that matches what we need, we create an new one an put it in our local namespace.
Submit: If this is not a state-specific data element and we think other states may use it we can submit it to the NIEM for inclusion.
17
© 2005 Your name here.
17
A Semantic Equivalence RegistryGoal: create semantic maps to a single federal metadata standard, not many standards
R5
R2
R3
R4R6
R7
RN
Mapping from Minnesota's metadata registry to N other metadata registries: The O(N2) problem
R2
R3
R4
R5
R6
R7
RN
NIEM
Mapping from Minnesota'smetadata registry to the NIEMThe O(N) problem
18
© 2005 Your name here.
18
ISO/IEC 11179 XML Tag Name
A standard naming convention for all XML data elements that “cross the wire” by most state and federal agencies that follow the ISO guidlines
Frequently called the “Data Element ISO name”
niem:PersonBirthDate
Object Class Term(leftmost)
Object Class Term(leftmost)
Representation Term(rightmost)
Representation Term(rightmost)
Property Term(follows object class term)
Property Term(follows object class term)
Namespace(domain)
Namespace(domain)
19
© 2005 Your name here.
19
The Data Mapping : The “Frontline” of Semantics
Left: A sample School District “flat file dump” from the Learning Management System (e.g. Moodle) of one school district (many data elements omitted for clarity)
Right: A mapping to a ISO named and defined Statewide XML schema standard for an on-line learning classes. Note because of names and definitions how much easier it is to quickly tell the semantics of the data element.
Screen shot from Altova MapForce™
20
© 2005 Your name here.
20
Need a “Semantically Aware” Mapper
Mapping tools have “auto connect matching children” but they require that the data element names be identical
They do not yet have the ability to “look up synonyms” in a metadata registry the equivalence of two data elements
We need semantic-aware tools!
Goal: Add menu item for “Consult
Semantic Broker”
21
© 2005 Your name here.
21
Constrain Exchange Document Data Element Selection
When creating an exchange document, we can now quickly select data elements from a list derived from a metadata registry that has semantically-precise definitions and namespaces
This can be done by business analysts (B.A.s) with under a week of training and does not require programmers
Constraints can be added to this document or a second constraint schema
Schema creation using Altova XMLSpy™ and importing a GJXDM
subschema
22
© 2005 Your name here.
22
Hypertext Links and Data Element Links
The Semantic Web
MetadataRegistry A
MetadataRegistry B
The semantic web is about linking data elements in published metadata registries
The Hypertext Web
The current web is focused on linking published documents with HTML
23
© 2005 Your name here.
23
Challenges: Education StandardsLack of machine-readable metadata registries for K-12 metadata with synonyms
Many standards• Minnesota historical 80-column fixed-with punch-card driven file
format standards• US Dept of Ed. National Center for Education Statistics (NCES)• Common Core Data (CCD)• Educational Data Network (EDEN)• SchoolMatters• School Integration Framework (SIF)• XML Business Reporting Language (XBRL)
No published synonyms in any of the above standards
As of December 2005, no K-12 education-specific data elements in the NIEM metadata registry
Lack of useful data element definitions:
– Document: “Details about inherent and frequently used characteristics of a document.”
24
© 2005 Your name here.
24
Metadata Publishing Standards
Lack of a single standard to publish metadata elements (XML Schema, Topic Maps, ISO/IEC-11179, OWL, XMDR) that includes metadata registry concepts
OWL one of few standards with “synonym” statements but few tools currently support OWL and inter-metadata registry synonym statements
OWL appears to be the best candidate for “over the wire” representations and the most easily extensible but it is not a metadata registry standard
25
© 2005 Your name here.
25
Challenge: We Need Semantic Aware Tools
Lack of semantically-precise production tools
– Altova XMLSpy™ – excellent graphical schema design and management but no semantics in the XML schema standards
– Stanford Medical Informatics Protégé (Open-Source)
– Altova SemanticWorks™ (1st release in October of 2005)
ISO/IEC 11179 metadata registry tools are expensive
– Frequently above $100K before customization
– Some lack workflow and public/private publishing
– Several excellent solutions if you have >$1M budget and consulting dollars
Ideal: A zero-footprint, AJAX-based, drag-and-drop, semantically-aware Open-Source schema design and data mapping tool that consults one or more synonym registries
Predict this is 3-4 years away (unless I get a grant)
26
© 2005 Your name here.
26
Tools UsedBuilt initial version using a collection of Open-Source tools and inexpensive Altova tools (XMLSpy™, MapForce™ and SemanticWorks™)
Model-driven-development using a XML Schemas for the model of the registry
– Define XML Schemas for all metadata registry structures (meta-metadata)
– XSL transforms of the data dictionary schema– XSL transforms of the XSL transforms for impact analysis
XML Transforms for metadata publishing and visualizations
Apache Ant build scripts to publish to public web site and private intranet site
Eclipse 3.1 IDE to build and maintain ant scripts
Saxon 8 XSLT Java libraries
Extensive use of XSLT 2.0 and XPath 2.0
FreeMind open source mind mapping tool with excellent XML interfaces
Various data element editing forms• (Castor, Struts, JSP, ASP, MS-Access)
27
© 2005 Your name here.
27
Diagram From ISO-11179 Specification
(1:1)
DATA ELEMENT CONCEPT DATA ELEMENT
Property
(1:N)
Object Class
(1:1)Property
(1:N)
Object Class
Representation
(1:1)
(1:1)
Taken from Figure 1"Fundamental Model for Data Elements"ISO/IEC 11179:1:2004(E) page 11(non-normative)
(1:N)
28
© 2005 Your name here.
28
UML Model for RDF
RDF Statement
Subject
Predicate
ResourceValuedStatement LiteralValuedStatement
Object
Resource
Property
Literal
TypedLiteral
Object
See Lee W. Lacy: OWL: Representing Information Using the Web Ontology Language p 82
29
© 2005 Your name here.
29
UML Model of Metadata Registry
A Data Dictionary is composed of many Data Elements
All Data Elements must have required names and ISO definitions
Each Data Element must be either a Concepts or a Property of a Data Element Concept
Each property is associated with a single concept and has a Property Name and a Representation Term
Some properties (where the representation term is of type Code) have one or more Enumerated Values
Data Element Concept Property
Property Name
Representation Term
Data Element
Name
ISO-Definition
Enumerated Value
Code
Definition
Data Dictionary
subClassOf
(simplified for clarity)
30
© 2005 Your name here.
30
Representation Terms(ebXML Core Component Tech Spec v1.9)
1. Amount – Monetary value with units of currency.
2. BinaryObject – Set of finite-length sequences of binary octets. (secondary: Graphic, Picture, Sound, Video)
3. Code – Character string that for brevity represents a specific meaningwhere the values are enumerated and each value has a clear definition.
4. DateAndTime – Date + time; a point in time where both date and time are known. (secondary: Date, Time)
5. Identifier – Character string used to establish identity of, and uniquelydistinguish one instance of an object within an ID scheme. (authorized abbreviation: ID)
6. Indicator – Boolean (exactly two mutually exclusive values).
7. Measure – Numeric value determined by measurement with units.
8. Number – Assigned or determined by calculation. (secondary: Value, Rate, Percent)
9. Quantity – Non-monetary numeric value or count with units.
10. Text – Character string generally in the form of words. (secondary: Name)
31
© 2005 Your name here.
31
Publishing Metaphor
Publishing implies high-quality information is shared with a large audience
Emphasis on multi-state reviews and clarity to a diverse base of consumers
Commitment to accuracy and change control
32
© 2005 Your name here.
32
The Psychology of Sharing and Trust
Research done in mid-1990s by Adele Goldberg and others
Groups only tend to share objects with other people or systems they trust
We need to create systems for building trust
– Have a define a peer review process (see 11179 standards)
– Have experts with credibility play a role in approval
– Publish list of users of metadata
– Publish test cases
– Publish change control process
– Publish success stories
33
© 2005 Your name here.
33
Metadata Publishing Workflow Funnel
Develop a simple workflow system for publishing data elements
Include harvesting areas of simple glossary-of-terms found in documentation, web sites and by using metadata “scrapers” to inventory all columns in relational database systems
Get stakeholder teams to “accept” a data elements, review them and take on the data stewardship role for these data elements
Commit to change-control only after data elements are marked “approved for publication” by over 50% of the stewardship team
Exclude sensitive information from public web sites (data sources)
UnderReview
Approved forPublication
GlossaryOf Terms
InitialDraft
Metadata H
arvesters
34
© 2005 Your name here.
34
Model-Driven Development
XML Form Editors
Data Elements (500 Small XML Files)
Data Dictionary (Single, Large XML File)
Transforms (Saxon 8) Apache Ant
HTML OWLFreeMindPDF MindManager ExcelSQL
Subversion
RDBMS
OLAPCubes
SemanticWorks
ProtégéIntranetPublicWeb
Server
35
© 2005 Your name here.
35
VisualizationPeople will not trust what they don’t understand
They tend to understand concepts if you make them clear
Visualizations are the best way to promote clarity to a subgroup
Focus attention and remove “chart junk”
Quickly display a subgroup’s data elements under review
Let them pick the colors!
50 line XSLT
Sample from FreeMind: Open Source mind mapping tool
36
© 2005 Your name here.
36
Results
http://education.state.mn.us/datadictionary
37
© 2005 Your name here.
37
Store Semantic Mappings to Foreign Data Elements Directly in the Metadata Registry
Current metadata registry standardsdo not clearly specify where and howsemantic equivalence and precision is stored.
38
© 2005 Your name here.
38
Owl:sameAs and owl:equivalentClassOWL is different from XML Schema because it addresses data element semantics
– XML Schema has no way of declaring two data types as "equivalent"
– XML Schema was designed to create a way to validate a data set used in messaging systems
OWL was designed to manage metadata
– Example:
– owl: Class Equivalency Operator "equivalentClass“
– OWL “sameAs” operator for instance equivalence
– NIEM:Person = SUMO:Human = CYC:Individual
Metadata Registry A Metadata Registry B
Metadata Equivalence Mappings
Future: Semantic Mappers and Semantic Brokers
ReportRequestIn Model
A
MetadataTranslation
ServiceXML
ResponseIn Model
ATDS
In ModelB
Metadata Registry
Model A Model B
Metadata Mappings
RDFQueries
XMLResults
Gartner: Vocabulary-based transformation
Data Warehouse (RDBMS)SQL or XMLA
QueriesIn Model
B
XMLA: XML for Analysis
40
© 2005 Your name here.
40
What Data Elements Are Important?
It costs time and money for each data element you add to your metadata registry (over $1,000 per data element)
The more unimportant data elements are in your metadata registry, the harder it becomes to detect duplicates
Prioritization criteria should be developed to determine what Data Elements should have priority
Metadata “scraping tools” developed to pull candidate Data Elements from databases, spreadsheets and documents
We developed a six-step criteria for determining the value of a data element in the data dictionary
Anything can be in a Glossary but only about 10% of Glossary data items are promoted to a data element
Low ValueData Elements
High ValueData Elements
41
© 2005 Your name here.
41
Wikipedia Rocks!It is currently burdensome to add new metadata to the registryWould like to add “Edit this data element” (ala Wikis)Ideally a “Semantic Wiki”
See: Wikipedia: “Semantic Wiki”
42
© 2005 Your name here.
42
Wantlist Standards<?xml version="1.0" encoding="UTF-8"?>
<w:wantList w:release="3.0.3" xmlns:w="http://gjxdmtools.gtri.gatech.edu/wantList/1">
<w:element w:prefix="j" w:name="ContactEmailID" w:isReference="false"/>
<w:element w:prefix="j" w:name="ContactTelephoneNumber" w:isReference="false"/>
<w:element w:prefix="j" w:name="Person" w:isReference="false"/>
<w:element w:prefix="j" w:name="PersonBirthDate" w:isReference="false"/>
<w:element w:prefix="j" w:name="PersonGivenName" w:isReference="false"/>
<w:element w:prefix="j" w:name="PersonSurName" w:isReference="false"/>
</w:wantList>
Metadata management tools could share data elements wantlists with other tools.
If you don’t have an appropriate data element, you should be able to look it up in clearinghouse of metadata with precise ISO definitions (e.g. Swoogle)
Web service queries and metadata translation services could be used
43
© 2005 Your name here.
43
McCreary’s Top 10 Recommendations
1. Organizations and applications that exchange data should be encouraged to publish their metadata in a machine-readable format to facilitate agent interoperability
2. Published data dictionaries should drive exchange document creation standards and published web services and metadata registry “shopping cart” tools should be accessible to non-programmers
3. Data warehouse initiatives should attempt to reuse and integrate existing federal metadata standards
4. Federal and state agencies should follow ISO/IEC 11179 and Data Reference Model (DRM) guidelines and use formal representation terms for all data element properties
5. Fundamentals of metadata publishing and transformation training should be encouraged by data architects and integration managers
6. Metadata standards should continue to be developed with the goal of building semantic integration brokers and agents
7. Producers of data mapping software should integrate semantic equivalency statements into automated mapping systems
8. XML integration appliance vendors should include semantic integration services to make integration easier
9. Organizations should perform ROI analysis on semantic integration
10. Awards should be given to organizations that publishing useful and high-quality metadata
44
© 2005 Your name here.
44
Things to Ponder…Just like the ARPANET and DAML, some worthy standards come from US federally funded efforts. But they will need to “evolve” before they are widely adopted outside government projects.
Before you “take over the world”, you need to publish your metadata with your stakeholders
Metadata publishing is 80% social engineering and 20% technical engineering and is achieved through building shared meaning via trust building systems
Standards are complex. Sometimes the more general they are, the more widely adopted they are but the more abstract they become. Some standards frequently need an expert interpreter to adjust for local business needs
People need to understand something before they trust it. One of the best ways is to build tools to allow users to visualize their data elements
When planting a metadata garden, start small and keep weeding out the unimportant and redundant data elements
45
© 2005 Your name here.
45
Agents
Open The Door To The Semantic Web!
Metadata publishing is hard
It is a foundation upon which the Semantic Web will be built
The benefits are indirect and need strong executive sponsorship
Metadata publishing is no “silver bullet”
I believe it is the most direct way to get to the Semantic Web
This will be the most practical way to build intelligent agents
46
© 2005 Your name here.
46
References
Web site for paper:– www.danmccreary.com/presentations/semweb2006
Data dictionary for Minnesota Department of Education– education.state.mn.us/datadictionary
ISO-11179 metadata registry standards
National Information Exchange Model (NIEM.gov)
Wikipedia Articles– Metadata registry– ISO/IEC 11179– Representation term– Metadata publishing– Semantic broker
47
© 2005 Your name here.
47
Questions & Answers
If software is ever going to be able to effectively inter-operate (in ways that were not explicitly preconceived and engineered), it will be because applications share enough of the semantics of their data elements.
Doug Lenat, CycorpSemantic Technology Conference
2005
48
© 2005 Your name here.
48
Contact Information
Dan McCreary, President
Dan McCreary & Associates
Dan <at> danmccreary.com
http://www.danmccreary.com
also: http://www.LinkedIn.com
Recommended