14
Data management

Data management. Purpose Have a peer conversation about data architectures Interactive Share your experience 7 problem spaces

Embed Size (px)

Citation preview

Data management

Purpose

• Have a peer conversation about data architectures

• Interactive• Share your experience• 7 problem spaces

Data management

• "Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.” (DAMA)

• "Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.” (DAMA)

Aspects

1. Data Governance2. Data Architecture,

Analysis & Design 3. Database

Management4. Data Security

Management5. Data Quality

Management

6. Reference & Master Data Management

7. Data Warehousing & Business Intelligence Management

8. Document, Record & Content Management

9. Metadata Management

Aspects

Structured data

• Largely database driven• Authoritative administrative system (Peoplesoft,

SAP)• Predictable manageable growth• Data entry standards • Cross referencing and datawarehouses • Designed largely by DBAs not data architects• Complex queries are difficult, high barrier of

entry, need to understand lay out of databases

Unstructured data

• Individually generated• In filesystems• No standardized metadata• Two schools of thought: library style nested

metadata, tagged metadata• Rich media formats cannot be easily mined and

searched• Management is nightmare, growing demands,

regulatory issues

Structured data Gap 1

• Finding stuff. Most institutions have data warehouses. Did you ever try to query them? Good luck. There are standard queries setup for people and if you want something different take a number, pull out your checkbook, write a proposal to bureaucratic committee and you may get it at some point.

• How are you dealing with this issue now and in the future?

Structured data Gap 2• Information is not easily consumable. Best example of

how we have addressed the lack of consumability: Our systems of record (Peoplesoft and SAP) have all the information about people at the institution. IdM systems provide complex joins and present this data in consumable standardized system - LDAP with a agreed upon schema (edu person). Ultimately this is what allows us federate authentication across institutions. Many records, however have not been normalized : class rosters, grades, etc.)

• How are you dealing with this issue now and in the future?

Structured data Gap 3

• Privilege management. Most systems allow per person privilege – authorization in the application based on name on a institutional role. Exposing us to a privilege snowball.

• How are you dealing with this issue now and in the future?

Unstructured data Gap 1• Electronically recorded lectures, talks etc. How do we

know what is there? Some metadata generated on onset: Fall 08, Russian 303, Dostoevsky, Lesson 2 the overcoat. But do I really have to listen to the whole thing to find out one piece of information. And what if I am looking for references to Dead souls and the ordinary person. I would have to listen to Lesson 7 on Other works, I would never find the reference that was made to ordinary people in the Overcoat lecture.

• Some advances at MIT and Duke with speech to text but where is the metadata kept (the text that is searchable) and is that a metadata anyway?

Unstructured Data Gap 2

• Faculty and students are continuously putting stuff onto the internet into some third party tool (Fickr, LinkedIn, MySpace, etc.) Where are the digital certs and digital signing to manage the Intellectual Property (IP) in the ether especially as it gets cross linked? Recent survey of CSG Mailing list less than 5% of folks from top R1 digitally sign their mail. If we do not sign it now how will we ever find it and assert the IP? Perhaps there is one thing we can learn from the RIAA.

Unstructured data Gap 3• Standardized media formats. More and more of

medical records (x-rays, MRIs, etc) are electronic. I sure hope someone will be able to open that referential hip MRI when I need a hip replacement in 17 years. If they do not how else will they see my arthritic progression? Actually isn’t it ridiculous that in this day and age you have to ask to get your records released on paper when you move to another HMO. I am sure my doctor read all the 285 (1.75in) pages of it in the 20 minutes allocated for our visit.

Unstructured data Gap 4

• E-discovery: anyone thinking about changes to their data management or metadata structures in response to e-discovery requests? How do decentralized environments play into the ability to produce data? How do you enforce institutional data management policy?