27
NESSTAR Limited w w w . n e s s t a r . c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

Embed Size (px)

Citation preview

Page 1: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Weaving the Web of European Social Science

Jostein RyssevikDirector of Technology and Development

Nesstar

Page 2: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

The last page of the Internet

Go there

Next

Page 3: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Tools for thought

“Much more time went into finding or obtaining information than into digesting it.”

Dr. J.C.R. Licklider

time spent on digesting and thinking

time spent on finding and accessing)(Maximize

Page 4: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

As We May Think

• Vannever Bush and Memex (1945)• A “furniture” able to enter and hold large amount of scientific

information on microfilm• ...all indexed• ...all hyperlinked (or linked together by trails)• ...even supporting the idea of user-supplied links as well as

user-supplied annotations• ...allowing public information to be privatized/customized• ...as well as private information to become public

Page 5: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

50 years later

• ...but even today, comparative social science research in Europe is hampered by the fragmentation of the scientific information space.

• Data, information and knowledge are scattered in space and

divided by technical, lingual and institutional barriers.

• As a consequence too much of the research are based on data

from a single nation, carried out by a single-nation team of

researcher and communicated to a single-nation audience.

Page 6: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Scenarios from “The Social Science Dream Machine”

…a user is looking for data on political trust from three different regions of Europe

He uses the geographical interface to a research location service to circle in the relevant regions on the map and enters the additional search criteria.

From the returned hit list he is able to browse layers of increasingly detailed metadata describing potential sources.

He is even allowed to perform simple statistical analysis and visualizations on-line to make sure that the data fulfills his requirements. Several datasets can be brought to the desktop at the same time to ease the comparison.

As soon as a decision has been made, the chosen datasets can be downloaded and automatically converted to the format of his favorite statistical package. All relevant metadata travels along with the data to assist the researchers in his analysis.

Page 7: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Another scenario…a user analysing a group of variables in dataset X would like to know if there are similar datasets from other countries that could be used for a comparative study.

..by hitting the “get comparable dataset” button, a list of potential datasets is immediately returned.

…she would also like to have an overview of knowledge products (papers, articles etc.) based on this study

…as references and links to derived knowledge products are an integrated part of the metadata of each single dataset a sorted list can be displayed by a single keystroke. Some of these references are also including e-mail and website-addresses to the relevant researchers.

…finding a problem with one of the variables, she writes a note and appends it to the ”user experience-section” of the metadata to allert future users (she also submits a rating of the resource according to an established quality standard within here community)

...and when the research paper is ready and published in an on-line journal, links to the dataset is added to allow future users to revisit her analysis

Page 8: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Yet another scenario... (I promise – this is the last one)

…a user that is reading an article in an on-line journal finds a link that connects him to the data that was used by the author to underpin the argument. The link allows the user to rerun the analysis, and also to dig deeper into the same data-source.

...through his ”digital research assistant” (an active agent), he is also also made aware of several other relevant data sources published after the article was written and he uses these to challenge the conclusion of the author

...he also leaves the query with his ”digital reserach assistant” to make sure that he is alerted if a new dataset meeting his requirements is published somewhere around the world at a later stage (the agent is instructed to only include datasets from “trusted sources” and moreover to exclude any dataset below a certain sample size)

Page 9: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

The Web revolution – a global Memex

• The very first really global information system• The very first “many to many” communication technology• From many local to a single global hypertext-space• True multi-media: The Web has taken all existing media as its

content • Decentralized• Scalable

Page 10: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Local versus global information systemsLocal Global

Limited amount of information (bounded)

Unlimited amount of information (unbounded)

The complete information space is known, categorized and indexed

Only parts of the information space are known, categorized and indexed A single unified system of

concepts securing unambiguous categorization of recourses as well as semantic interoperability across resources (“complete understanding”)

Several partially competing systems of concept producing alternative categorization of resources as well as low levels of semantic interoperability across resources (“partial understanding”)Deep standardization Shallow standardization

The original Web

- a naming convention (the URL)

- a simplistic hypertext format (HTML)

- a simple transport protocol (HTTP)

Limited scalability Unlimited scalability

Link integrity Broken links (the famous 404)

Centralized: a single publisher and registration authority

Decentralized: millions of publishers – no central authority

“The design of the Web fundamentally differed from traditional hypertext systems in sacrificing link integrity for scalability.”

Tim Berners-Lee: “Web Architecture: Describing and Exchanging Data”,http://www.w3.org/1999/04/WebData

Page 11: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Towards the Semantic Web

• From documents to data• From brainware to software• From machine readable to machine “understandable”

information• Metadata – the glue of the Semantic Web• A framework for “knowledge representation” – RDF• The introduction of namespaces (allowing different

system of terms and concepts to cohabitate in a single information system)

• Partial understanding/agreement• The vision: the creation of a dynamic framework

facilitating cooperation/interoperability across domains and communities - gradually expanding the “web of understanding”.

“The Semantic Web is an extension of the current Web in which information is given well defined meaning, better enabling computers and people to work in cooperation.”

Tim Berners-Lee

“Human endeavor is caught in an eternal tension between the effectiveness of small groups acting independently and the need to mesh with the wider community.

A small group can innovate rapidly and efficiently, but this produces a subculture whose concepts are not understood by others. Coordinating actions across a large group, however, is painfully slow and takes an enormous amount of communication.

The world works across the spectrum between these extremes, with a tendency to start small—from the personal idea—and move toward a wider understanding over time.”

Tim Berners-Lee, James Hendler and Ora Lassila The Semantic Web,

Scientific American, May 2001

The current Web The Semantic Web

Page 12: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

The Semantic Web at work

100110100101110110011001

Data Kowledge products

Brainware Software

Brainware

Page 13: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Hype or reality.....?

• Not many Semantic Web applications out there so far• RDF not really taken off• ...and the other layers of the semantic Web still on the planning

stage• ....“it will never work”• ....”will be overshadowed and overtaken by the other buzz-word

of today's Web community – “Web services””• ...on the other hand:• ...Berners-Lee and his crew has been right before• ...the amount of RDF-coded information is increasing• ...the number of tools and platforms is increasing• ...the “big guys” are gradually entering the scene, most

noteworthy Adobe’s new “eXtensible Metadata Platform (XMP) using RDF to model and represent metadata within documents and other information objects.

Page 14: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Metadata – data about.....

Unlabeled stuff Labeled stuff

The bean example is taken from: A Manager’sIntroduction to Adobe eXtensible Metadata Platform, http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf

Page 15: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

The functions of metadata

Finding

Understanding Assessing

Sharing

Page 16: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Users of metadata

• Metadata for human consumption– .... any human readable information that allow a user to

find, assess and understand information objects in order to use them in a sound way

• Metadata for machine consumption– ....any machine processable information that allow a

piece of software to find, assess and “understand” information objects in order to manipulate them in a sound way on behalf of a human user

• Metadata meant for humans versus machines are often just different representations of the same type of information.

An example from the world of statistics

Human readable information about data quality:information about sampling methods and procedures allowing a human reader to assess whether a statistical resource has been created according to a certain standard

Machine understandable information about data quality:• sample size• response rate• sampling methodology (according to a specific classification of methodologies – a controlled vocabulary identified by a namespace where the meaning of the different terms are well defined)• the number of publications produced on the basis of the resource• ratings (given by persons and organizations according to a well defined rating system – allowing a user to instruct a software agent to skip resources where the quality score given by a defined set of trusted organizations are below a certain limit)

Page 17: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Metadata on the Web

Data

Catalogue infoFinding

Data dictionaryUnderstanding

Data qualityAssessing

Articles/reportsSharing 1

Notes/discussionsSharing 2

...all hyperlinked

Page 18: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

An extended metadata concept

• Not only descriptions of data but also knowledge products derived from the use of data

• A dynamic concept - metadata are constantly developing through the life-time of a dataset

• A variety of authors - many who are not data producers

...as opposed to what we might call the “Cathedral View” on metadata:

.... metadata seen as a coherent and centralised collection of information with clearly defined boundaries, provided by a single authority for a defined community of users.

Page 19: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

• Established in 1995 to create a universally supported metadata standard for the social science community

• Initiated and organised by the the Inter-University Consortium for Political and Social Research (ICPSR), Michigan, USA

• Members coming from social science data archives and libraries in USA, Canada and Europe and from major producers of statistical data

• First version of the standard expressed as an SGML-DTD• Translated to XML in 1997• Extensive testing carried out spring-summer 1999• DDI 1.0 published spring 2000• DDI 1.1 with minor revisions and some additions published autumn

2001• The DDI 2.0 process???

Page 20: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Achievements

• Acceptance– fast take-up in the community of data archives and data

libraries world-wide

• Community building– revitalised the co-operation and sharing of know-how

and technologies among the archives and libraries

• Strengthening of the ties to the data producers

• Software development

Page 21: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

The DDI in action – what do we know?

• The costs of migrating a data archive to the DDI is high (much higher than the cost of any DDI-compliant archiving software)

• The DDI is a “cathedral standard” (no data provider is buying the full package – they are all using the DDI building blocks to build their own more modest “parish churches”).

• The DDI is a very loose structure loaded with alternatives and ambiguities (a single study described by two different archives will probably look quite different)

• The DDI is only telling half the story (data providers will have to add their own local guidelines on top of the DDI (controlled vocabularies, mandatory elements etc) to secure internal standardization

• The DDI is inflexible (there is no extension mechanism that allows a data provider to add local elements without breaking the standard)

• A pure “bottom-up” approach: The DDI is used to describe concrete files or products coming out of the statistical process. It has no level of abstraction above or beyond a physical statistical product

• Machine-understandable versus human-understandable: Using XML does not automatically create metadata that is complete and logical enough to drive software processes

Page 22: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Nesstar - vision

To develop a truly distributed platform for electronic publishing of statistical data, building on object technology, open (metadata-) standards and lightweight Internet protocols.

....or simply

To bring the models, technologies and collective energy of the Web to the world of statistics.

Page 23: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Nesstar – an overview

• An architecture for a totally distributed global data library

• The ability to locate multiple data sources across national boundaries

• The ability to browse detailed information about these data sources

• ..and to do data analysis and visualisation over the net

• ..or to download the appropriate subset of data in one of a number of formats

• Supporting standard micro-data as well as aggregated tables/cubes

• Allowing the user to bookmark/hyperlink resources in the data and metadata repositories– searches– datasets– analysis (tables, models

etc.)• ..and to hyperlink these

resources from external Web-objects (like texts)

• Support for metadata indexing using a multilingual thesaurus

• Powerful data preparation tools, including a system for remote publishing of data to NESSTAR servers

Page 24: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Nesstar - background

• Originally funded by EC under the Electronic Publishing Program– NESSTAR (1998-1999)– FASTER (2000-2001)

• Nesstar Limited established in June 2001– Owned jointly by The Norwegian Social Science data

Services (NSD) and UK Data Archive– Offices in Colchester (UK) and Bergen (Norway)

Page 25: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Nesstar - a fully distributed Data Web

End-user client

DataPublishers/owners

Page 26: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

MADIERA (Multilingual Access to Data Infrastructures of the European Research Area)

• An EC funded project to commence October 2002• Purpose: to develop and build a European-wide virtual social

science data library based on Nesstar and Semantic Web technology

• Building blocks:– A shared metadata standard (the DDI)– A shared technological platform (Nesstar)– A multilingual thesaurus (ELLST)– A classification system for social science data resources– A georeferencing system for social science data resources – A methodology for identification of comparable data– A feedback system for user supplied information– A system for creating links to between data and knowledge

products

Page 27: NESSTAR Limitedw w w. n e s s t a r. c o m Weaving the Web of European Social Science Jostein Ryssevik Director of Technology and Development Nesstar

NESSTAR Limited w w w . n e s s t a r . c o m

Where are we heading....?

From data graveyards to knowledge greenhouses

?