Upload
xiomara-boatman
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Why, what were the idea ?1. Create a data infrastructure, 2. Data + the knowledge products that are produced on the basis
of data
a) Efficiant access to large volumes of datab) Promote comparative analysisc) Support dissemination of knowledged) Support the idea that knowledge have to be empirically basede) Create an infrastructure that may grow by its own force
How
• A distributed model, data stored and maintained locally, modern technology substitute for central institutions
• One common entrypoint, a portal• One common metadata standard, that
we were supposed to contribute to• One technical solution• One common multilingual thesaurus
More hows• A requirement was that the user
communities participated, allowed themselves to be activated and invested some resourcesa) Developing a classification of resourcesb) Use common metadata standard
Give bettered semantics / ontologyHelp solve some language issuesProduce more heterogeneous dataProduce better quality of dataGive better administration of data
Resource promation and integration
• Tools for publishing and finding data• Guidelines for publishing and finding
data• Access control
• And there should be room for others, we could go beyond CESSDA
The Portal• Metadata is all about communication• A set of tools + an idea: Data is the core
that facilitates a ”conversation”
• Technology, functionality• Multilingual thesaurus• Metadata standard
Activity in numbers
• 10 000 manhours• 40+ persons• 41 deliverables • 3 workshops• 7 meetings• 15 presentations• 33 teleconferences• The portal contains:
– 3000 studies– 500 000 objects
Economic situation Year 3
Total budget Spent RemainingNSD 750 478 659 073 91 405UKDA 374 044 296 557 77 487DDA 183 868 158 777 25 091FSD 125 188 113 751 11 437NESSTAR 532 688 423 504 109 184EKKE 58 570 56 210 2 360SIDOS 156 829ZA 27 844 22 984 4 860
List of deliverables
D1.1 - Project Initiation Document D3.1 - Functional Specification and Design - M3D5.1 - Guidelines Thesaurus construction & translation D1.2 - Quality Assurance Plan D2.1 - User Analysis Report - M6 D3.2 - MADIERA Prototype - M6 D7.1 - Dissemination Plan - M6 D1.3 - Periodic Progress Report (6-month) - M7D2.2 - Usability test - MADIERA Prototype - M8 D3.3 - MADIERA Beta Version 1 - M15 D3.3a - MADIERA Beta Version 2 - M17 D3.3b - MADIERA Publisher Beta Version B - M17 D4.1 - Recommendation - Geo-referencing system D6.1 - Guidelines - Content provision &access control D2.3 - Usability test - MADIERA Beta version - D1.4 - Periodic Progress Report (12-month) - M14D4.2 - Methodology identification comparable elements D3.4 - MADIERA Version 1.0 - M23D4.3 - Naming and identification recommendation D5.2 - Report on adm mechanisms for thesaurus maintenance - M18 6.2 - User guides and training packs for content provision - M18
D6.3 - First version of hyper-linked information space demonstrator - M23 D6.4 - Data archive content provision workshop - D6.5 - Workshop on content metadata (CDG/DDI)D7.2 - On-going dissemination events
D7.3 - Userguides and training packs - M23 D8.2 - Workshops for non-archive data providers - D2.4 - Usability test - MADIERA Version 1 - M24 D1.5 - Periodic Progress Report (18-month) - M19D5.3 - Extended multilingual thesauri - M24 D6.6 - Hyperlinked information-space demonstrator version 2 - M24 D1.6 - Periodic Progress Report (24-month) - M26D4.4 - Package of revised recommendations - M27 D5.4 - Evaluation Workshops - M30 D1.7 - Periodic Progress Report (30-month) - M31D1.8 - Third annual report - M38D2.5 - Final usability test report - M38D3.5 - MADIERA Version 1.1 - M38D5.5 - Additional thesaurus hierarchies - M38D8.3 - Technological Implementation Plan - M41D1.8 - Final Report - M41
The PortalWe have data identified at 3 levels:
Study, Variable group and Variable
Study Variable group Variable Free text search X X XCESSDA Classification XELSST 1 XELSST 2 X X XArchives XNUTS X
The Portal• The free-text search give the user the possibility to specify a completely free search
term. If you search for “sausage”, you will presently get 1 hit, at variable level. This term (sausage) seems not to be in ELSST (yet)
• If you search for “radio”, you get 12.951 hits. “Radio” is a word used in many languages (all languages with data on the servers).
• If you search for “fjernsyn”, you get 2.911 hits. “Fjernsyn” is the Norwegian word for television.
• If we expand the word “fjernsyn” to the equivalent in other languages, we get 10.311 hits. Such an expansion checks against ELSST and picks up the translations.
• Common for all: Searching in free text may give hits at all three levels of data. When browsing, some terms (keywords) are automatically translated back to the user.
• The Cessda classification is a controlled vocabulary used for the DDI element topcClass, which is at study level. <codeBook> <stdyDscr> <stdyInfo> <subject> <topcClas>. If this term is systematically used, we can set up a catalog structure. Then a study typically could be published in more than one catalogue.
• ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level, <codeBook> <stdyDscr> <stdyInfo> <subject> <keyword> .
The Portal
• ELSST1 is a finer granulation then the Cessda classification, it gives the impression of an alphabethical sorted list of keywords, and it gives easy access to translations and the systematic structure with synonyms and related terms. But it works at study level, <codeBook> <stdyDscr> <stdyInfo> <subject> <keyword>
• ELSST2 matches on a few key text fields (title, abstract, keywords, subject, etc.) The most important thing about the etc is that it searches DDI elements at three different levels, study, variable group (name) and variable level (label, text, concept).
• Archives actually lists the servers under the portal, for every server studies are listed sorted alphabethic
• The NUTS list gives units at different levels of NUTS, the search could use coordinates inserted in GeoBndBox. I don’t know how this is done (which DDI-elements are used).
Functionality: Geo-Chartography
Finding data by geography
Europe a mixture of political, administrative and statistical units
Code, Name, Coordinates
Problem: Publish
Functionality: Naming ConventionsObjective: For a user to be able to update (metadata)
1. Add to metadata of a study2. Use could also lead to changes, corrections, updates
Distinguish between two components of an identification:
Identifier (static) – version code (dynamic)
Elements that we identify consist of data and metadata
Elements could also be a complex mixture of instances that make up a study
And studies could be part of series
Functionality: Naming Conventions
Series
Study
Instance 1 Instance 2
Data
Metadata
All this described as a complex set of modules
Data from data producersMetadata from archives
DDI 3.0ID Module Simple Complex P/L
W Wrapper 1..1 1..1 L
A Archive 1..1 1..1 L
G Group 0..0 1..n P
C Concept 1..1 1..n P
DC Data Collection 1..n 1..n P
I Instrumentation 1..n 1..n P
LC Logical Data Structure 1..n 1..n P
PS Physical Data Structure 1..n 1..n L
PI Physical Instance 1..n 1..n L