50
Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Embed Size (px)

Citation preview

Page 1: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Web Science: An Interdisciplinary Approach to Understanding the

Web

ACM Paper(James Hendler, Nigel Shadbolt,

Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Page 2: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Overview

Currently most computer science departments have been slow to adapt or respond to the Web's influence on computer science. Surprisingly the standards, protocols, architectures are not study in great detail.

Page 3: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

CS has for many years recognized the importance of networking (TCP/IP) protocols, Internet etc...

However, the Web, despite having its own protocols, algorithms and architectural principles, is often viewed as just an application of the network.

Page 4: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

This is may be the correct point of view at the dawn of the Internet, however, clearly the Web is the most used and one of the most transformative applications in the history of computing, even in human communications. Irrefutably, the Web has changed how the world communicates. Openness and 24/7 availability has revolutionized the communication grid.

Page 5: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The number of web pages is greater the population of the world...

Page 6: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

New algorithms underlying modern search are fundamental to Web use.

Page 7: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Hadoop, an () open-source software framework that support data-intensive distributed applications on large clusters of commodity computers make it possible to explore these algorithms and experiment with large-scale Web-programming practices

Page 8: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Human interaction on the Web

• social networking

• tagging

• data integration

• information retrieval

• Web ontologies

Form the basis of a new are called "social computing".

Page 9: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Whether in CS studies or in information-school courses, the Web is often studied exclusively as the delivery vehicle for content, technical or social, rather than as an object of study in its own right.

Page 10: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Where physical science is commonly regarded as an analytic discipline that aims to find laws that generate or explain observed events, CS is predominantly synthetic (like mathematics), in that formalisms and algorithms are created in order to support specific desired behaviors.

Page 11: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The Web needs to be studied and understood as a phenomenon but also as something to be engineered for future growth and capabilities.

Page 12: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The Web is also an infrastructure of artificial languages and protocols; it is a piece of engineering.

Page 13: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Also the Web is the interaction of human beings creating, linking, and consuming information that generates the Web's behavior overall.

Page 14: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The Web is part of a wider system of human interaction; it has profoundly affected society, with each emerging wave of creating new challenges and opportunities in making information more available to wider sectors of the population than ever before.

Page 15: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Web application are not built with a single user in mind, or for a single computer. All web applications by default are built for distributive use. A popular web application can grow similarly to an outbreak of a flu-virus. The use grew exponentially.

Page 16: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The Web, thought of as a marco system, is the use of the micro system by many users interacting with one another in often unpredicted ways.

One unexpected result came from the gaming of the search algorithms to improve search rank. Thus leading to a need for better search technologies to defeat the gaming.

Page 17: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The essence of our understanding of what succeeds on the Web and how to develop better Web applications is that we must create new ways to understand how to design systems to produce the effect we want.

Page 18: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

How do we design and build something at the micro level and have it function in a desirable way at the macroscale?

Page 19: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

How do we predict other side effects and the emergent properties of the macro?

Page 20: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Understanding the Web requires more than a simple analysis of technological issues but also of the social dynamic of perhaps millions of users.

Page 21: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Because of the multi-user (social) nature of the Web, its science is inherently interdisciplinary.

Page 22: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

WEB GRAPH

One way to model the Web is define it in terms of a graph where the nodes are Web pages and the edges are the hypertext links among these nodes.

This graph grows on the order of seven million new pages a day

Page 23: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Algorithms developed to exploit properties of the Web Graph

HITS and PageRank algorithms are attempts to rate the validity or importance of web page by the number of referring hyperlinks. This assumption led to the development of powerful search engines for finding content on the Web.

Page 24: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The edges of the Web graph represent single instantiations of the result of calling the HTTP protocol with a GET request. Which returns an html document based upon the URI (universal resource identifier).

Page 25: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Consider the fact that many html documents are made up of a variety of objects which are embedded in the document. These can be images on different servers, or formatting specifications like cascading style sheets, XML DTD documents for example. All of which can be captured by a crawler and then in turn, used in the defining of the Web graph.

Page 26: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Such a model would be static however when studying social interaction this type of model must be enhanced in order to accurately depict social interaction.

Page 27: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

However, the design of the Web's protocols and services must also be consider. Since without them, the Web would not be scale-free.

Page 28: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Core Design Components of the Web:

• Identification of resources

• Representation of resource state

• Protocols that support interaction between agents and resources in the space

Page 29: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The richness of the request protocol implies that there is an underlying attribute that is yet to be incorporated into the Web graph. That is, the user-dependent state.

Page 30: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

When considering the Web, as an application of the Internet, a static model just will not do.

Consider HTTP-POST request rather than HTTP-GET…

Page 31: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Sometimes sites generate complex URIs that use GET requests to pass on state, thus obscuring the identity of the actual resources.

URIs that carry state are used heavily in Web apps. These occurrences have yet to be completely analyzed.

Page 32: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

According to Udi Manber at Google, commented that on average 20%-25% of daily searches have never been submitted before. He points out that this makes search incredibly difficult.

Note: Google receives over 100 million queries per day.

Page 33: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Analyzing the Web solely as a graph ignores many of its dynamics.

The study of Web dynamics must take into account the growth of the Web. How creation and use of new applications effect the dynamics and Web architecture.

Page 34: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Modern sophisticated Web sites provide powerful user-interface functionality by running large script systems within the browser. These applications access the underlying remote data model through Web APIs.

New forms of global systems are appearing relying on the user’s computing power and massive storage available on Web servers for easy data retrieval.

Page 35: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

User-generated content sites

• Store personal information • Security issues• Awareness of public access to

information• Advances in three-party

authentication protocols to secure data (Trust)

Page 36: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Standard mathematical analysis of the Web is has been inconclusive.

Linguistic analysis through tagging may provide some insight, however complex.

Page 37: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The dynamics of any “social machine” are highly complex, and dozens of academic papers, from multiple disciplines have been written about it.

Page 38: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The idea of a social machine was introduced in Weaving the Web, which conjectured that the architectural design of the Web would allow developers, and thus end users, to use computer technology to help provide the management function for social systems as the were realized online.

Page 39: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Examples of Social Machines

• Blogging

• MySpace

• Facebook

Page 40: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The Web of Data is an emerging area of study. Which involves the heavy use of tagging provided by many of what are know as Web 2.0 technologies.

Page 41: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

What is tagging?

Articles, blogs, photos, videos, and all manner of other Web resources may be annotated with user-generated keywords, or tags, that can later be used for searching or browsing these resources.

Tagging can enhance metadata to explain content or objects being described.

Page 42: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Ambiguous tags

Example: Suppose a tag in a specific social context may be useful since it can designate a particular individual. The use of a tag as metadata of depends on such a context, and the “network effect”. The deeper meaning of the tag…

Page 43: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Use of MetadataRecent applications of semantic

Web technologies and represents an important paradigm shift that is a significant element of emerging Web technologies.

The semantic Web will allow programmers and users alike to refer to real-world objects without concern for the underlying documents in which these things, abstract and concrete are described.

Page 44: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

The semantic Web arena reflects two principle nexuses of activity. One tends to involve data (and the Web), and the other on the domain (semantics).

Page 45: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener
Page 46: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener
Page 47: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener
Page 48: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener
Page 49: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener
Page 50: Web Science: An Interdisciplinary Approach to Understanding the Web ACM Paper (James Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Daniel Weitzener