44
13 1 The Web 2.0 revolution and the promise of Science 2.0 David Stuart Abstract: The term ‘Web 2.0’ was coined to refer to certain attributes that differentiated some of the most successful websites at the beginning of the twenty-first century from those that were less than successful. It gained widespread attention and has been adopted by individuals in organizations in many sectors. This chapter discusses three of these attributes in detail: the web as a platform; harnessing collective intelligence; and data as the ‘next Intel Inside’. Particular attention is given to their application within science, where the ideas can be translated as transforming the world of scholarly publishing, providing new opportunities for citizen science and even offering a new scientific paradigm. Key words: Science 2.0, cloud computing, open data, linked data, citizen science. While the dot-com bubble burst in 2001, the growth and usage of the Internet continued: between March 2001 and March 2003 the number of Internet users worldwide grew from 458 million to 608 million, a growth of almost 33 per cent (Internet World Stats, 2011). In 2003 the term ‘Web 2.0’ was coined to reflect a new post-crash vision of the web.

From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

  • Upload
    david

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

13

1

The Web 2.0 revolution and the promise of Science 2.0

David Stuart

Abstract: The term ‘Web 2.0’ was coined to refer to certain attributes that differentiated some of the most successful websites at the beginning of the twenty-fi rst century from those that were less than successful. It gained widespread attention and has been adopted by individuals in organizations in many sectors. This chapter discusses three of these attributes in detail: the web as a platform; harnessing collective intelligence; and data as the ‘next Intel Inside’. Particular attention is given to their application within science, where the ideas can be translated as transforming the world of scholarly publishing, providing new opportunities for citizen science and even offering a new scientifi c paradigm.

Key words: Science 2.0, cloud computing, open data, linked data, citizen science.

While the dot-com bubble burst in 2001, the growth and usage of the Internet continued: between March 2001 and March 2003 the number of Internet users worldwide grew from 458 million to 608 million, a growth of almost 33 per cent (Internet World Stats, 2011). In 2003 the term ‘Web 2.0’ was coined to reflect a new post-crash vision of the web.

Page 2: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

14

In his seminal paper, ‘What is Web 2.0’, Tim O’Reilly produced a list of features that distinguished those websites that had most successfully survived the dot-com bubble from those that had not: the web as a platform; harnessing collective intelligence; data as the ‘next Intel Inside’; the end of the software release cycle; lightweight programming models; software above the level of a single device; and a rich user experience (O’Reilly, 2005). The term quickly caught on, and the ‘2.0’ suffix was added to everything from ‘library’ to ‘love’; it was used to reflect both a user-centric vision and the adoption of technologies that fell under the Web 2.0 banner by specific fields of study (Scholz, 2008). However, the term has not been universally popular. Some have questioned whether Web 2.0 is anything more than vague marketing jargon, while others have questioned the technologies and principles that fall under its banner. Nonetheless, whatever the merits of the term or the technologies and practices that fall under its banner, Web 2.0 has been a significant influence on discussions about how we use the web, and these technologies and practices have been adopted by a wide range of organizations and individuals making use of the web. In this chapter, the changing nature of Web 2.0 and the technologies involved, as well as their application within the realm of science, are discussed within the context of three of O’Reilly’s themes: the web as a platform; harnessing collective intelligence; and data as the ‘next Intel Inside’ (O’Reilly, 2005). While the details may have changed since 2005, these three themes continue to be the core of our understanding of Web 2.0. This is not to say that the other themes do not continue to be an important part of the web, but rather that issues surrounding the end of the software release cycle, lightweight programming models, software above the level of a single device and a rich user experience, are more appropriately

Page 3: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

15

discussed within the context of these three main themes. For example, the subject of lightweight programming models is discussed in the context of data as the ‘next Intel Inside’, and the subject of software above the level of a single device inevitably forms part of the discussion of the web as a platform.

This chapter could have focused solely on the harnessing of collective intelligence; as O’Reilly and Battelle (2009) stated in their revisiting of the subject of Web 2.0: ‘Web 2.0 is all about harnessing collective intelligence’. However, it is important to understand not only how people are harnessing collective intelligence, but also the computing paradigm within which it takes place, and this is discussed in the first section of this chapter: The web as a platform. What quickly becomes clear is how much the world has changed since 2005, as technologies and concepts that are now widely discussed and adopted were then still relatively niche products: the term ‘cloud computing’ was not popularized by Eric Schmidt until a conference in 2006 (Qian et al., 2009), Facebook was not made available to everyone until September 2006 and their platform was not launched until 2007 (Facebook, 2011), Apple’s iPhone was not unveiled until January 2007 (Honan, 2007), Amazon’s Kindle in November 2007, and the iPad in 2010. Each of these ideas and technologies has had an impact on people’s understanding of the web, the way they connect to it, and the expectations they have from it.

Equally as important as the changing computer paradigm has been the growth in the importance ascribed to data branded as the ‘next Intel Inside’. The leading-edge web-based commercial companies that were the focus of O’Reilly’s original paper have been joined by a wide range of non-profit organizations, governments, research institutions and individuals trying to make data publicly available online.

Page 4: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

16

This wide range of individuals and organizations have different objectives for the data that they are making available, and while the lightweight programming model that O’Reilly discussed continues to have an important role to play in making data available, there is also increasing interest in other approaches, such as making data available in a linked data format. As will be discussed later, while linked data is not the lightweight programming model that was the driving force behind the adoption of early application programming interfaces (APIs), the higher level of complexity offers the opportunity for both a more semantic web, and one that is more integrated.

The adoption of Web 2.0 technologies and ideas in the realm of science has resulted in the inevitable coining of ‘Science 2.0’. The term has been used variously to refer to different stages of the research process: Shneiderman (2008) uses the term to refer to an approach to scientific investigation that makes use of Web 2.0 technologies to gather data, whereas Waldrop’s interpretation focuses on the use of Web 2.0 technologies for communication within the scientific community (Waldrop, 2008). Both are an important part of the future of science, and within this chapter a broad definition of Science 2.0 is taken to include both of these uses. Burgelman et al. (2010) have identified three significant trends in Science 2.0: a growth in scientific publishing, a growth in scientific authorship and a growth in data availability. These trends may be seen as the scientific equivalents of the three Web 2.0 themes selected for discussion: the web as a platform has enabled a wide variety of new types of publication; harnessing collective intelligence is recognized not only to include those within the traditional science community but also to embrace those beyond the walls of academia; and data as the ‘next Intel Inside’ offers a new approach to science.

Page 5: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

17

The web as a platform: web services, the cloud and the app

In What is Web 2.0 the first principle of Web 2.0 is the use of the web as a platform, emphasizing the value of providing services automatically through the web rather than focusing on the traditional desktop software or services that necessitate lengthy human negotiation (O’Reilly, 2005). There is now a host of software available through the web: from those that are so integrated in our experience of the web that we do not give them a second thought, to those that have been so embedded in our desktop experience (and have grown to such complexity) that web versions continue to feel like poor imitations.

Probably the most important piece of software we use on the web is one we do not even think of as software: the search engine. While it is so integrated with our online experience as to be for some people synonymous with browsing, a consideration of the alternative of a search engine application running on a person’s personal computer (PC) quickly demonstrates some of the advantages of using the web as a platform. In simple terms, a search engine can be thought of in three parts: the robot, the indexer and the ranking algorithm. The robot (also known as a web crawler or spider) is a program that downloads pages from the web in an iterative fashion. Starting with a list of seed URLs, a robot will download the web pages at the URLs and extract any URLs it finds in those web pages, which will then be downloaded in turn; a process that is repeated until all the web pages that are required have been downloaded. To create a search engine that is in any way equal to that of one of the major search engines would require the robot to download billions of pages, and with new pages being created all the time and many of the pages updated on a

Page 6: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

18

regular basis it would be an endless task. The index enables the search engine to match documents to a query without having to search through every document each time a query is entered. At the very minimum, an index is likely to include all the words in a document, although it may also include a host of other features, such as anchor text on those pages linking to a site, the type of page (e.g. .pdf, .rtf or .doc), or the creation date. While a more extensive index may help with the retrieval of more pertinent results, it will nonetheless take up more computer processing power. Finally, with the simplest of queries returning millions of hits, the results need to be ranked. While this may once have been based on the frequency or position of search terms on a page and how other websites link to a site, a search engine such as Google now uses over two hundred signals when ranking web pages (Google, 2011). The downloading and ranking of billions of web pages would take huge bandwidth and processing power, far beyond that which is available to the average user. Even if it were possible, it would be a huge waste of resources, as few of the pages would ever need to be discovered by the person who had downloaded and indexed them. Although there will be popular searches that many people use, these will be dwarfed by the long tail of niche searches that will only be used occasionally by a handful of people (Anderson, 2006). It is only efficient to index the web when there is a sufficiently large audience to make use of the index. The large audience can also provide useful feedback for a search engine, and help to improve the overall system. If, for example, on one particular search, people are regularly clicking on one particular link rather than another, the search engine can start to rank the selected item more highly; this is an example of harnessing collective intelligence, which will be discussed in the next section.

Page 7: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

19

A wide variety of traditional desktop software is now available online, although it has been met with various degrees of success; whereas email clients seem to have made a smooth transition from desktop application to web service, the transition to web-based versions of office software has gained less penetration. Many email users are now seemingly happy to switch between desktop and webmail clients as necessary to suit their needs, with the same account often allowing access both via a web interface or via a desktop client. While a desktop client may have additional functionality and allow the reading and drafting of responses without an Internet connection, web-based versions offer a convenience of accessibility that is not possible with desktop clients. A Pew survey in 2008 found that 56 per cent of those asked said they used webmail services such as Hotmail, Gmail or Yahoo, in comparison to 29 per cent who said they used online application programs such as Google Documents (a browser-based alternative to Microsoft Office) or Adobe Photoshop Express (an image editing web application) (Pew, 2008). The gap between the use of the two types of services is in fact likely to be much larger once the time spent using the applications is taken into account, in addition to the emphasis on webmail access over traditional email accounts: whereas many users are likely to solely use an online mail service, the use of Google Docs or Adobe Photoshop Express is more likely to supplement rather than replace existing desktop services. One of the reasons for the high penetration of web-based email services is likely to be the relative importance of accessibility over functionality, as most email responses are often simple text responses. In contrast, office documents will often make use of a far wider range of functionality, much of which will not yet be available in the online equivalent.

Page 8: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

20

While online versions are likely to see increased functionality in the future, it is not enough that they are merely online equivalents of traditional office applications. Revisiting the web as a platform in Web Squared: Web 2.0 Five Years On, Tim O’Reilly and John Battelle (2009) emphasize that it is the network as platform that is important. This means it is not just about offering existing desktop software through the browser, but about building applications that get better the more people use them, such as enabling documents to be accessed by multiple users at the same time and analysing the way the word processor is used so that the word processor can be improved.

Although desktop services are being made available online, Web 2.0 and the web as a platform are more often thought of in association with many of the social media services that have emerged. Social media builds upon the ideas of Web 2.0 for the creation and exchange of user-generated content, and over the first decade of the twenty-first century, technologies such as blogs, wikis and social network sites have become increasingly well-established in a wide range of fields (Kaplan and Haenlein, 2010).

Blogs (frequently updated websites displaying posts in reverse-chronological order) are one of the longest established social media technologies. The term was first applied to this genre of website in 1997, and the launch of free web-based software in 1999 saw an explosion in the number of blogs online (Blood, 2000). Web-based blogging software makes the process of updating blogs very simple, and their popularity has also seen the creation of many sub-genres of microblog, with sites like Twitter being used primarily for the sharing of text messages of 140 characters or less, and Tumblr for the quick sharing of any type of content from a browser or desktop. Importantly, a blog platform such as WordPress provides not only a simple piece of software for

Page 9: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

21

individuals to share their opinions with the world, but also a plug-in architecture that allows users to extend the functionality of a blog. Where many blogs or microblogs are being hosted by the same site, such as LiveJournal or Twitter, communities emerge that have more in common with social network sites.

Social network sites have been defined as sites that enable users to create public profiles in a bounded system, to create connections with other users, and to navigate and view both their own as well as other users’ connections (Boyd and Ellison, 2007). While a site such as Twitter may be thought of as primarily a microblogging service, it can also be seen to adhere to each of these criteria and so may be thought of as a social network site. Since SixDegrees.com, the first recognizable social network site launched in 1997, many different social network sites have come and gone (ibid.). It has been suggested that social network sites can be categorized according to three main types: those for networking, those for socializing and those for navigation (Thelwall and Stuart, 2009). While a site such as LinkedIn may be seen as primarily for networking in a professional capacity with people who a user may or may not already know, a site such as Facebook is primarily used for socializing with people a user already knows. Navigation refers to those sites that, while having social network site characteristics, are primarily focused on providing access to content. For example, the photo-hosting site Flickr and the video-hosting site YouTube both have social networking capabilities, although they are primarily for accessing content, with the social network functionality providing a filtering facility. Whereas blogging can be seen primarily as an individual practice, social network sites enable users to engage more easily with one another and one another’s content, although both may encourage a fantasy of participation (Dean, 2008). Although

Page 10: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

22

users may believe their opinions are being listened to and are having an impact, in most cases they are either not being read or are being ignored. Whereas it was once publishing that was considered a claim to authority, now it is attention that is increasingly important. The increased use of certain social network sites in favour of the traditional blog has led to claims that blogging is dead or waning; rather than some inherent fault with blogging, however, it is a reflection of the emergence of more specialized tools, which may be more appropriate for particular situations.1 In September 2011 Nielsen reported that social network sites and blogs accounted for 23 per cent of the time Americans spent online (Nielsen, 2011).

Since 2007, social network sites have attempted to offer increased functionality by providing application platforms on which external developers can build applications; these both provide a marketplace for application designers and provide users with access to functionality that social network sites would not have the time or money to develop themselves. The most popular of these applications have been downloaded tens of millions of times, and although most downloaded applications are often games (e.g. Cityville, a city-building simulation game), there are also more obviously useful applications, such as those providing additional communication functionality (e.g. Windows Live Messenger) or enabling the editing of office documents (e.g. Microsoft’s Docs). The increasing power that a small number of social network sites have over the way people interact online should, however, be cause for concern. Facebook, the largest social network site and currently reporting over 800 million active users, now has a significant amount of power over an increasingly important communication platform, with the potential to dictate the sort of content that people share (Facebook, 2011). Rather

Page 11: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

23

than the limits of freedom of speech being established by the courts and the rule of law, it is increasingly at the whim of a social network site or the tyranny of the masses. While the size of Facebook and the advantage resulting from the network effect make it seem unlikely that it will lose its market dominance in the near future, the rapid fall of MySpace is a reminder that no website is invulnerable. Once the most popular social network site, MySpace was bought by News International in 2005 for $580 million, but was sold in 2011 for a mere $35 million. It may be that Facebook is not overtaken by a single competitor, but rather by a set of open standards, as there is increasing interest in distributed approaches to social network sites that will prevent any one site achieving such market dominance in the future, and allow individuals and organizations to take control of their own data (Stuart, 2011a).

The other type of social media site that has grown in popularity over the first decade of the twenty-first century, and that in one specific case has become infamous, is the wiki. Wikis enable the collaborative creation and editing of web pages through a web browser using either a simple markup language or text editor. The most famous of these, Wikipedia, provides both an example of what is possible through a wiki and its limitations. Many times the size of its rival encyclopaedias, and far more popular, its reliance on anyone contributing to the areas in which they are interested, and allowing anyone to contribute, has led to criticisms. It relies on a variation of Linus’s Law: ‘Given enough eyeballs, all bugs are shallow’, which was first applied to the development of open source software (and named after Linus Torvald, who started the development of the open source operating system, Linux) (Raymond, 1999). The expectation is that, given enough users, factual errors will be quickly spotted; this is not always the case, however, as a

Page 12: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

24

large number of users on a site does not mean a large number of users visiting every page equally. Most famously, the journalist John Seigenthaler’s Wikipedia biography was changed to falsely suggest that he had been linked with the assassinations of John and Robert Kennedy, and the information remained unchallenged for a number of months, with the resulting controversy leading to new guidelines for the biographies of living persons. Although a study by the journal Nature shortly after the controversy erupted showed not dissimilar levels of accuracy in a comparison between Wikipedia and the Encyclopaedia Britannica, the science pages that were analysed are not necessarily the most contentious or the most likely to be vandalized (Giles, 2005). While Wikipedia is the most well-known wiki, used by 53 per cent of American adult Internet users (Zickhur and Rainie, 2011), there are important differences in people’s understanding of how the site works, the credibility they assign to the information they read and their willingness to follow up sources (Flanagin and Metzger, 2011; Menchen-Trevino and Hargittai, 2011). Used properly, however, wikis can provide an ideal platform for group collaboration, allowing the quick and simple creation and editing of web pages, and allowing versioning for edits to be rolled back.

All of these Web 2.0 applications may be thought of as taking place in ‘the cloud’, a metaphor for the Internet, with ‘cloud computing’ generally used to refer to ‘storing, accessing, and sharing data, applications, and computer power in cyberspace’ (Anderson and Rainie, 2010). The cloud may be seen as the natural conclusion of the provision of services through the web, moving from the provision of software services to the provision of computing power itself. While cloud computing incorporates social software, it also takes the current generation of Web 2.0 services to the next level. Web 2.0 services as we generally think of them

Page 13: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

25

typically provide one particular service through a web browser, with users needing to go to different sites for different services; although Facebook now provides a platform in addition to its core social network service, it is far more limited than the services that could be available through the network as a platform. While cloud computing includes software as a service and data as a service, it can also include hardware as a service, and so could enable the virtualization of hardware (Wang et al., 2010). The era of the PC is synonymous with running applications on the desktop, as opposed to multiple users time-sharing first on mainframes and then on smaller mini-computers. While PCs originally offered convenience in comparison to the time-sharing of limited computing resources, such an approach may be seen as extremely wasteful as IT infrastructure becomes increasingly complex. Organizations are spending an increasing amount of time and money on the IT infrastructure of an organization: between the end of the 1960s and the year 2000, information technology went from less than 10 per cent of an American company’s capital equipment budget to 45 per cent (Carr, 2009). With the regular installing, configuring and updating of software, and with computer resources quickly becoming outdated, the outsourcing of computer platforms can be seen as the smart solution; especially as much of the current computing infrastructure sits idle most of the time (Wang et al., 2010). Using the network rather than the desktop as a platform means that organizations and individuals with the necessary skills can tap into the computing power they need as and when they need it, rather than constantly having to update systems and software to deal with peak demand; this would potentially allow new innovative Internet services without large capital outlays (Armbrust et al., 2009).

Page 14: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

26

Widespread usage of the cloud for both storage and processing power can be seen as a natural destination for the web of today, and in a ‘Future of the Internet’ survey carried out as part of the Pew Internet & American Life Project, 71 per cent of technology experts participating in the survey agreed with the statement that: ‘By 2020, most people won’t do their work with software running on a general-purpose PC. Instead, they will work in Internet-based applications such as Google Docs, and in applications run from smartphones’ (Anderson and Rainie, 2010). There are still a number of challenges ahead before cloud computing becomes more widely adopted, especially in the area of privacy and confidentiality, where there have been several high profile failures (Ryan, 2011); in September 2011, for example, Dropbox, a file hosting service, briefly allowed unauthorized access to accounts (Dropbox, 2011). Nonetheless, such concerns are likely to be quickly overcome when users find that there are significant advantages to the online services that are offered.

Not everything is moving to the cloud, however, and it is noticeable that Pew’s survey grouped Internet-based applications together with applications run from smartphones. Although there has been a change in the way increasing numbers of people access software through their desktops, the rise of the app store has created a renaissance of the downloaded application through the widespread use of mobile phones and tablets. No prediction of the number of future downloads currently seems too high, with one report claiming mobile app downloads will reach 98 billion by 2015 (Perez, 2011). Like computers, smartphones and mobile phones that run a high-level operating system (such as the Apple iPhone or the Android), are capable of running multiple programs at the same time, and are an increasingly significant part of the mobile phone market (Nielsen, 2009). They are merging people’s home and work lives, they are always turned on

Page 15: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

27

and smartphone users reportedly spend less time doing other activities after getting a smartphone (Ofcom, 2011). Between O’Reilly’s original paper in 2005, and his revisiting of the subject in 2009, the focus had moved from the web as a platform to the network as a platform, and the always-turned-on smartphone with an increasing number of sensors is an ideal way to connect to the network. Although mobile applications show us that it is possible to do this without putting the products on the web, in the same way that there is a risk in the dominance of a single social network site such as Facebook, there is the inevitable risk of vendors taking too much control of the phone as a platform; Apple, for example, has a censorship policy for apps and does not allow certain types of content in the app store (Dredge, 2011). Time will tell whether downloaded applications will be a short-term option until improvements in mobile telecommunications and greater functionality enabled by HTML5 will push many of these applications to the cloud, or whether the simplicity it provides as a way of getting money for applications will see developers continue to focus on the downloaded application.

The network as a science platform

One of the most noticeable trends caused by the introduction of the web as a platform has been the growth in scientific publishing, including new publishing models for the traditional journal article, the adoption of new technologies for the traditional article and more informal means of publication (Burgelman et al., 2010).

The rapid rise in journal costs towards the end of the twenty-first century, coupled with developments in IT, led to questions being raised about the suitability of the traditional publishing model. With the outputs of publicly funded research

Page 16: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

28

increasingly beyond the budgets of many public institutions, let alone unaffiliated individuals, alternative models have been proposed that enable OA to research findings. Unlike traditional paper publications where there are significant costs for every additional copy of a journal, the costs for someone to read an article online are negligible; online software can also ease the process of publishing work online. The two main variations of OA that have evolved are the ‘green road to OA’ and the ‘gold road to OA’; whereas green OA journals allow researchers to deposit articles in their institutional repositories, gold OA journals are freely available online (Harnad et al., 2004). In addition, there is an increasingly popular hybrid variation, where authors can pay for their specific article to be made freely available on the online version of a journal that would otherwise require a subscription. While the principles of OA have been widely embraced, and many research funders have a policy of making research outcomes available in an OA manner, there is nonetheless a wide variation in practices across different fields, and the overall proportion of papers that are freely available online remains a minority (Björk et al., 2010).

The nature of the article itself is also changing, as publishers are investigating new ways of capturing and sharing knowledge. In 2009 Elsevier’s Cell published its ‘article of the future’, which provided new ways for readers to navigate a journal’s contents. The Journal of Visualized Experiments uses video to share experiments and research findings, capturing knowledge that may be lost in the process of writing a journal article;2 in contrast, WebmedCentral3 has taken a wiki approach to the publishing of OA biomedical articles, endeavouring to publish articles within 48 hours of them being submitted, and relying on post-publication peer review. As well as attempting new ways of publishing articles, there have also been attempts at

Page 17: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

29

finding new ways of reading articles; Utopia Documents, for example, is a portable document format (PDF) reader that integrates data analysis and visualization tools with published articles (Attwood et al., 2010). We have moved to an era of enhanced publications, where the traditional written document is increasingly expected to be accompanied by other types of content.

It may be that one of the consequences of the OA movement will be a greater acceptance of variation in the format that ideas and results take; it is often not the final polished journal article that is freely available online, but a preprint version that does not adhere to the journal’s layout or reflect reviewers’ comments. This is part of a move towards ‘liquid science’, where draft and non-finalized outputs are available alongside the traditional, finalized publication (Burgelman et al., 2010). Scientists are no longer restricted to the traditional peer-reviewed research article, and the growth in publishing also includes the widespread adoption of social media technologies. These new technologies allow for the capturing and sharing of a far wider range of knowledge and new working practices, and there are increased expectations that researchers will make a wider range of knowledge available and engage further in new working practices. Many social media technologies have already been widely adopted within the research process, and none more so than the research project blog, which has seemingly become an essential part of every research project. Implemented properly, they can be a crucial part of the research process, not only providing the opportunity to share information with the wider academic community as well as members of the general public, but also providing an opportunity to gain feedback at a point when it can still contribute to the scientific process. The more personal science blog also provides the opportunity to include a wider

Page 18: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

30

range of content that either could not, or would not, be included in the traditional journal: not only writing about the research and its outputs, but also opinions on other areas of a researcher’s activities, such as conferences, teaching (Bukvova et al., 2010) and other people’s research; in this respect, blogs can provide an important area for post-publication feedback (Nature editorial, 2010). Importantly, it has been suggested that blogs have the potential to bridge the ‘two-cultures divide between the arts and the humanities’ that has long been recognized (Wilkins, 2008). As with any blog, establishing a science blog can take time and effort; networks of blogs, such as ScienceBlogs, can ease the process, although conflicts of interest may emerge, as blog platforms and bloggers have different interests.4

As well as blogs, researchers are also increasingly making use of social network sites, both those aimed specifically at the scientific research community, such as Mendeley and myExperiment, as well as others such as Facebook or Twitter. Each of the three types of social network site (navigation, networking and socializing [Thelwall and Stuart, 2009]) may be seen to have an important role in the research process. However, while socializing may be thought to have an important role in the spreading of research findings and ideas, the focus of social network sites in science is more often on the more professional idea of networking.

Schleyer et al. (2008) have identified five attributes that a social network site would need to effectively connect researchers with potential collaborators. Firstly, it would need to help users find other researchers they are compatible with, rather than just those with the necessary expertise. In many ways this may be easier with a less formal social network site, where people can get a better idea of what people are like. Secondly, it would need to help users find researchers in different domains. The real strength of social network sites is

Page 19: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

31

the ability build on the potential of weak ties and to transverse different communities; a social network site that is too specialized would fail to make use of this factor. Thirdly, it would need to help with establishing contact. Establishing contact does not necessarily need to be a formal proposal for collaboration, but the social network site may facilitate a number of exchanges that build up to this. Fourthly, it would need high-quality information that was comprehensive, complete and up-to-date. Although this high-quality information takes time and effort to produce, researchers have to see the benefits of the expanded functionality that this would bring. Finally, it would need to be kept up-to-date and be able to integrate into the user’s existing work patterns. Unless a social network site can be incorporated into their regular online activity, either through their workflow or their social interactions, researchers are unlikely to keep pages up-to-date. Different social network sites can be seen as being better at meeting each of these requirements.

Although Twitter was not primarily designed as a social network site for professional networking, it nonetheless does meet some of Schleyer et al.’s criteria (ibid.). Unlike the more formal social networking sites, the regular posting of comments throughout the day on topics of interest is likely to give researchers a far better idea of compatibility than a list of research interests. As a popular network site that appeals to a wide range of people who mostly have public accounts, Twitter can be seen to provide a good opportunity for building potentially collaborative relationships, especially as communication is relatively simple and non-intrusive, rather than requiring explicit collaborative proposals. The simplicity of Twitter also means that it can be regularly updated, and easily incorporated into people’s private and professional activities; people using Twitter have been found to share more information than they did before they started

Page 20: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

32

using the site (Letierce et al., 2010). While Twitter profiles do not have high-quality comprehensive information, but are simply profiles of 140 characters with a home page URL, the use of hashtags enables people to come together around specific topics, ideas and conferences. It may even be used to identify the most popular topics at a conference (ibid.).

In comparison, Mendeley is a more academic social network site.5 While it allows people to build a profile, make connections and share updates, what really distinguishes it from other social network sites are its attempts to integrate itself into researchers’ work patterns through the creation of its own reference management tools. Combining reference management tools with an online network creates the potential for identifying additional resources, and as Mendeley also enables open link resolvers and the public storage of articles, it can ease the process of downloading articles (Hicks, 2011) (see Figure 1.1).

Figure 1.1 Mendeley desktop

Source: http://www.linuxexpres.cz/uploads/gallery/detail/3034.jpg

Page 21: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

33

Although Mendeley incorporates a navigational element, a navigational social network site such as YouTube makes this feature its principal purpose. YouTube has successfully marketed itself to the general public; however, it nonetheless has potential for the scientific community, as it provides a simple platform for the sharing of videos with a large number of people: something that would be difficult for a small group to get the bandwidth for. A good example of public engagement through YouTube is ‘The Periodic Table of Videos’6 from the University of Nottingham, where videos were initially created for each of the 118 elements. Many of these videos have now been viewed hundreds of thousands of times, appealing to an audience as diverse as six-year-olds in Nova Scotia and Nobel laureates (Haran and Poliakoff, 2011). Social network sites also offer the potential for videos to go viral, rapidly circulating and being shared between individuals online; for this reason it is important in areas such as health promotion that the right information is available to counteract any false information (Freeman and Chapman, 2008).

Wikipedia also provides an opportunity for making sure that the public have access to accurate information, by inviting researchers to add their expertise to pages that are often among the most highly ranked in search engine results. This may involve contributing to the medical pages, which often contain outdated and incomplete references to inaccessible resources (Ozdoba, 2011), expanding the stub-articles that can dominate specialized areas such as ornithology (Bond, 2011) or proactively putting information on Wikipedia as a form of public outreach, as has been attempted in the area of forestry management (Radtke and Munsell, 2010).

Progress in science is achieved through the sharing of findings with one another; as Bernal noted in 1939: ‘The

Page 22: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

34

growth of modern science coincided with a definite rejection of the idea of secrecy’. Different social media technologies can be combined to create a more open science, beyond the mere open publishing of research articles. It is important that we do not see science as a binary of open or closed, but as a continuum. At the closed end of the continuum, researchers will neither publish their findings, nor even inform people they are engaging in the scientific process; at its most open, researchers could share all the information relating to the scientific process, not necessarily just at the end, but throughout the process. ‘Open science’ has been defined as: ‘making methodologies, data and results available on the Internet, through transparent working practices’ (Lyon, 2009); Open Notebook Science (ONS) is a practice that makes the process as open as possible, by organizing scientific production according to the public disclosure of its achievements and failures and its related data and procedures, so that it is ‘analysed and discussed openly to further advance science by solving and addressing specific problems’ (Vera, 2009). It is the open discussion of both achievements and failures that is important to scientific progress, as otherwise much time is spent repeating experiments and following the same dead ends, or potentially giving a positive bias to results. The possible harm caused by the creation of a positive bias in results is particularly acute with regard to clinical research trials, where a false impression of a drug’s effectiveness could be created by only reporting on the one trial where it was effective, rather than the nineteen where it was not. As such, the International Committee of Medical Journal Editors (ICMJE, 2009) instigated a policy of only publishing the results of clinical trials if they had been registered before taking place.

ONS is generally a bottom-up approach to publishing, with the individual or research group adopting the publishing

Page 23: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

35

tools that can be incorporated most easily into their working habits. Social media technologies have become popular due to the ease with which they allow people to create and share content, and it is not surprising to find that many of these technologies form the basis of open science notebooks. For example, Cameron Neylon’s LaBLog makes use of blogging software,7 while the Bradley Laboratory makes use of wiki software.8 The widespread adoption of open science notebooks will depend heavily on the technology that is available to simply capture the wide variety of data that is available, and integrate it into a researcher’s work practices. The smartphone and the simple user-friendly apps that can be created for it, offer the potential to seamlessly integrate data collection and publishing into a researcher’s activities.

Mobile applications can also be incorporated more directly into experiments, which would enable access a large number of users from a broad demographic without necessitating a working relationship with the phone companies themselves (Coulton and Bamford, 2011). While incorporating apps into experiments could have great potential, there are also a number of difficulties to overcome. Although creating an application (or at least hiring a developer to do it) may be a relatively simple process, there are other issues that need to be considered (Cramer et al., 2011; Henze et al., 2011): the additional costs from marketing and helping a large number of users to utilize the apps; the methodological problems involved in determining the demographic of end users, collecting data over time, and analysing large data-sets; and the ethical issues of making sure that end users are aware of how their data is being used, and that this data is secure.

The potential of the network as a platform for the scientific community does not stop with the adoption of social media technologies and mobile applications, and it is

Page 24: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

36

important to recognize more fully the possibilities of the cloud. For example, Chen et al. (2011) emphasize the contribution that cloud computing could make to scientific computing/computational science, which makes use of large-scale simulations and data analysis. Rather than needing to establish an expensive infrastructure for the running of a relatively small experiment, researchers could make use of resources that would be available as and when they needed them. This would make cloud computing particularly appropriate for small research groups (Truong and Dustdar, 2011).

Issues regarding privacy and confidentiality that apply to the individual making use of cloud computing can cause even greater damage in organizations where they may not only be risking access to their own data, but potentially illegally risking access to others’ data (Gellman, 2009). As Ryan (2011) has demonstrated, issues regarding privacy and confidentiality are not only raised by the incorporation of large-scale cloud computing for dealing with private data, but can also creep in quite innocuously in something as seemingly harmless as the adoption of a conference management system.

Harnessing collective intelligence

As well as increased accessibility, the big advantage of having the network as a platform is that it eases the process of harnessing collective intelligence; as O’Reilly and Battelle (2009) stated in their revisiting of the subject of Web 2.0: ‘Web 2.0 is all about harnessing collective intelligence’. There is no single approach to the harnessing of collective intelligence, but all approaches are built around the idea that

Page 25: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

37

tapping into users’ knowledge and experience can enable the provision of better services. One approach is to capture users’ existing behaviour to provide better services, while another is to change users’ behaviour so that they can provide a better service; a further approach is to provide incentives so that users are willing to contribute to something they may not be interested in.

Google Search is perhaps the most famous example of a site that successfully made use of the implicit information available on the web, with its PageRank algorithm for ranking search results. While the comparison had already been made between academic citations and hyperlinks, and the idea that a page with more hyperlinks pointing to it was more likely to be of interest on a topic than one with fewer hyperlinks, PageRank took this idea a step further by weighting the value of each hyperlink according to the number of links that page in turn had (Brin and Page, 1998). While this helped Google to reach market dominance, the number of quality hyperlinks is now only one of over 200 signals that are being used for ranking purposes, some of which are likely to include aspects of users’ browsing behaviour (Google, 2011): for example, a group at Microsoft Research Asia proposed BrowseRank, a method for calculating page importance based on user behaviour and the time spent on web pages (Liu et al., 2008). PageRank’s influence has dwindled not only because Google has identified other factors that may help to increase the relevance of search results to users, but also because PageRank increasingly became the focus of search engine optimization. The dominant position of Google in the search market means that there is a lot of value in being at the top of its search results for particular queries. While PageRank was more difficult to manipulate than the keyword stuffing that was previously used to manipulate earlier search

Page 26: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

38

engines, it nonetheless led to the creation of link farms, with sites being automatically created and linked to one another in an attempt to manipulate a website’s PageRank. This may be seen as an example of Gresham’s law of economics, where bad money drives out good money; it is a problem that regularly appears, as users increasingly try to take advantage of the system.

Whereas PageRank made use of information that was already available, many other successful services created systems where the user’s contributions are specifically designed to add value to the system as a whole. In the case of the online shopping store Amazon, it not only makes suggestions on what a reader might like according to what other readers have bought, but it also encourage users to rate and write reviews of books on the site. It is not only through the writing of reviews that web users can add value to web services, however, and the challenge for web services has been precisely to find ways to engage users in the process of creating value. This may be through promoting a story on Digg, liking a page on Facebook, tagging a site on Delicious, uploading content to Flickr or YouTube or by engaging with other people on social network sites like Facebook or Twitter. The successful harnessing of free contributions from the public has not only created online sources such as Wikipedia, but also a variety of extremely popular open source software, including the Mozilla Browser, Open Office (the open source alternative to Microsoft Office) and the operating system Linux.

The idea that these free sources can provide an alternative to traditional proprietary services has not been without contention: criticisms have focused on the quality, the economic model and the morality of free soucres. Keen’s (2007) polemic against ‘the cult of the amateur’, described in his book with the same title, focuses very much on the

Page 27: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

39

narcissism of Web 2.0 models, as they give everyone a voice irrespective of the value of what they have to say, and also undermine the traditional business models of the music and film industry. However, although Web 2.0 technologies have enabled everyone to have a voice, this does not mean that everyone is being listened to: social media technologies are as much about providing a filter for content as about the publishing of content. While the web has indeed undermined many established business models, it is important to recognize that such industries do not have an inalienable right to exist: the record industry has no more right to exist with its traditional structure in the face of new technologies than the chamber orchestra did when faced with the invention of the record. The creation of new innovations is necessarily accompanied by the destruction of existing models, and rather than focusing on the demise of the existing business model, the focus should be on the quality of the end product and the conditions for those contributing to the process (Schumpeter, 1942). In many cases the contributors of Web 2.0 content are working for nothing; the contributions of the individual are lost among the wisdom of the crowd, and second-order expression (such as remixes) are not as valuable as first-order expressions (Lanier, 2010). It would nonetheless be a mistake to write off much of the content that is being created online.

Journalism is not only a profession that is struggling to deal with the changing media landscape, but also one where the bottom line has become more important than quality journalism, especially in areas such as local news (Davies, 2008). However, as the traditional expression goes, necessity is the mother of invention, and journalism is an area that increasingly looks to take advantage of the potential of harnessing collective intelligence. While there are undoubtedly a large number of narcissistic blogs, there are also those that

Page 28: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

40

are written by people who see the blog and other new publishing methods as an opportunity to make a more valuable contribution to their community. Grassroots journalism, for example, offers the potential to provide an alternative to the news promoted by large media conglomerates; hyperlocal news blogs have emerged as an attempt to provide traditional local news for a small geographic area (Greenslade, 2010). While it has been suggested that traditional media companies continue to have the upper hand in the area of investigative reporting as individuals may not have the resources to fully investigate a story, it may simply be a matter of organization rather than any real advantage on behalf of these traditional media companies (Gilmor, 2006). News stories can put stress even on the largest of news organizations, which may not have the resources to investigate it fully. Following the scandal of UK Members of Parliament’s alleged misuse of expenses and allowances, for example, 458 832 pages of documents were made available to the media about their expenses; rather than trying to analyse all of these documents itself, the Guardian newspaper made them available online and asked members of the public to classify each type of document and highlight it if it seemed worthy of further investigation.9

In many situations there may be insufficient incentives for users to contribute towards an organization that has particular aims and objectives; for this reason, many organizations have attempted to incentivize the process by offering prizes. For example, in an attempt to improve their recommendations algorithm, Netflix offered a $1 million prize to whoever could substantially improve predictions about how someone would rate a particular film. Most such prizes that are offered are not so substantial, and so this is often a cheap way for organizations to encourage innovation with regard to their data capture; it has already been adopted

Page 29: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

41

by a wide range of organizations, including the World Bank,10 the UK government,11 and Mendeley, the reference manager and academic social network.12 The ethics of such an approach have been raised, however, as it can be seen as a way of undercutting traditional labour markets; and while there may be a few winners, most people will be essentially working for nothing. This is also illustrated by the web’s ability to harness collective intelligence by facilitating the process of matching organizations with a workforce: Amazon’s Mechanical Turk, for example, enables this to happen by breaking tasks down into small parts for which users need only to be paid a few cents each.13

Collective science

The harnessing of collective intelligence is not unique to the Web 2.0 community but has been a pivotal part of the scientific process since the emergence of the first scientific journals in the seventeenth century. As Isaac Newton famously stated in his reworking of Bernard de Chartres’ quotation, ‘If I have seen a little further it is by standing on the shoulders of giants’. No one discovers anything alone; instead, researchers necessarily build upon the work of others. Traditionally, this was achieved once the research was finished and the findings were shared in research journals among other scientists so that they could then build upon this work, giving credit in the form of citations. While OA has opened research to a wider community of users, and ONS has enabled the sharing of a wider range of knowledge than ever before, the harnessing of collective intelligence can still go much further.

One of the trends in science identified by Burgelman et al. (2010) is the growth of scientific authorship. In the same way that social media has enabled everyone to be a producer

Page 30: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

42

as well as a consumer of popular content such as videos and images, they argue that everyone potentially should be able to contribute to the scientific process. The idea is obviously appealing, and Clay Shirky (2010) has suggested that there is currently a significant cognitive surplus that is waiting to be tapped into, a small proportion of which could make a significant contribution to society: the 200 billion hours that Americans alone spend watching TV every year is calculated to be the equivalent of two thousand Wikipedia projects. Although the idea that people will give up significant proportions of their free time so that they can contribute to science may at first seem utopian, there are an increasing number of examples where people do just that; and with the massification of higher education providing an ever more educated workforce, potentially they can contribute far more than ever before. It is also important to recognize that the scientific process has never been a simple linear process with basic research funded at one end and innovations coming one after the other; instead, it relies on both formal and informal feedback loops and communication between different sections of society (Lundvall, 1992; Gibbons et al., 1994; Etzkowitz and Leydesdorff, 1995).

The harnessing of collective intelligence for the scientific process can range from something as simple as social bookmarking sites and folksonomies, to far more extensive projects (Stock, 2007). Whereas ONS is generally a person-centric approach to the adoption of Web 2.0 technologies, open source science and citizen science are project-centric approaches to the harnessing of collective intelligence. Open source science is the adoption of working practices that have previously been applied to the development of open source software for science, and involve breaking up a task so that the contributions from those with the requisite skills can be combined. Citizen science may be thought of as broader

Page 31: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

43

than open source science, as this is about harnessing the capabilities of members of the general public who do not necessarily have specialist skills.

Silverton (2009) has defined a citizen scientist as ‘a volunteer who collects and/or processes data as part of a scientific enquiry’. The engagement may be either passive or active (Borne et al., 2009): for example, in the case of SETI@home, an analysis of extraterrestrial radio signals in the search for intelligent life, all that is required of the citizen scientist is the downloading of a piece of software that runs either as a screensaver or while the user is working. In contrast, Galaxy Zoo, which makes use of citizen scientists in the classification of galaxies, requires citizen scientists to actively look at the image and classify it themselves. Citizen science itself is not new; there have been a number of offline examples going as far back as the year 1900 and the National Audubon Society’s annual Christmas bird count, for example (Cohn, 2008), and it continues to be an important part of many environmental and ecological studies (Silverton, 2009). The successful application of citizen science falls in the overlap of areas of research that are of interest both to the public and to research scientists, where the public are able to carry out research that scientists can make use of (Borne et al., 2009). As such, many of the projects make use of visual recognition, which humans are often better at than computers. For example, the Old Weather project makes use of citizen contributions to transcribe weather observations made on Royal Navy ships to contribute to our understanding of climate change,14 while Dickens Journals Online makes use of citizens to correct the optical character recognition errors that were made during the process of scanning Dickens’ weekly magazines.15 Citizen science does not only have to be about the benefit for the project, but can also provide an education

Page 32: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

44

benefit (Silverton, 2009); iSpot, for example, provides a social network site for natural history enthusiasts that encourages an interest in biodiversity.16 The scope for citizen science is necessarily limited, however, when incorporating members of the general public.

Open source science demands a higher level of expertise from its armchair contributors than citizen science, and incorporates their contributions alongside researchers working in real-world laboratories. It involves breaking up projects into smaller tasks and openly sharing the findings to overcome some of the economic barriers to scientific progress. In its purest form, open source science provides an opportunity for volunteers to contribute to areas of science that would not otherwise be funded; for example, The Synaptic Leap coordinates the development of drugs that are not being developed within the profit-driven system,17 such as those for tropical diseases (Kepler et al., 2006). Open source science is also, less idealistically, used for the adoption of non-competitive collaboration in an attempt to reduce the duplication of effort and the squandering of resources (Patlak, 2010), although this is nonetheless an important issue now that drug discovery is increasingly expensive and growth depends primarily on an increase in spending (Edwards, 2008). Efforts are also increasingly being made to incentivize the contributions of external contributors in a streamlined fashion: InnoCentive provides a platform for organizations to set challenges and offer a reward,18 while Kaggle provides a similar service specifically for predictive modelling competitions.19

Data as the ‘next Intel Inside’

In his original paper on Web 2.0, O’Reilly (2005) described data as the ‘next Intel Inside’, by which he meant that an

Page 33: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

45

organization that successfully exploits the data it has gathered will have the edge over its competitors. In the case of Amazon, this may mean analysing its customers’ buying habits so it can give them better suggestions for further reading than other booksellers; while for Google, it may mean analysing users’ search engine habits so it can improve the ranking of results. What O’Reilly also recognized was that it was not necessary for organizations to mine this data alone; by making data publicly available, external parties could also make use of it in an innovative fashion. When Amazon’s database is made publicly available, it is no longer necessary for Amazon to try and identify all the potential books that everyone may want, or to create the perfect interface for browsing the content; instead, there is the potential for other external developers to make use of the data and provide tailored solutions to sub-sections of users. Equally, when Twitter creates an extensive set of APIs for external developers to simply interact with their website’s data, it is no longer necessary for them to create one interface suitable for everybody; instead, external developers can create different interfaces suitable for different groups of people and provide additional functionality such as the addition of audio, video or images.

The provision of easily accessible APIs has played an important part in extending the services offered by many of the leading-edge Web 2.0 services, including Amazon, Google, eBay, Flickr and Twitter. This has not only led to the creation of a host of additional software built around Web 2.0 services, but also numerous mashups: a mashup is defined as a web application ‘that uses content from more than one source to create a single new graphical interface’ (Fichter, 2009); so even if people do not have the requisite skills to write the code that will enable them to make use of certain data themselves, they can create simple mashups

Page 34: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

46

with online tools such as Yahoo! Pipes,20 which allows the manipulation of online data through a graphical user interface. The leading-edge Web 2.0 organizations have therefore now been joined by a host of other organizations that are making their data available online; from the respectable, to the less-than-respectable.

Most noticeable has been the increased interest in having data made available by national governments in both the developed and the developing world, including the US,21 the UK,22 Greece23 and Kenya.24 The publishing of government data plays an important part in making governments accountable to members of the public, and it has therefore been increasingly called for both by the media and by non-profit organizations such as the Sunlight Foundation. It has been suggested that the importance that is now ascribed to open government data means that it could play an important role in benchmarking government services in comparison to traditional metrics, which can quickly become saturated (Osimo, 2008). Although government data theoretically enables individuals to hold governments to account, in practice most individuals are unlikely to have the requisite skills to collect and analyse the data themselves. Drawing conclusions based on the available data may require not only the ability to identify a particular data source, but also the specific skills to extract and manipulate the specific data that is required from one or more sources. As such, there is increased interest in the role of traditional intermediaries who provide access to these new sources of data: both journalists (Bradshaw, 2010) and library and information science professionals (Stuart, 2011b).

There are also economic incentives for the releasing of public data, and the financial crisis of 2008 gave this additional impetus. Governments are one of the largest producers of data on a range of topics that touch on every

Page 35: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

47

area of people’s lives, from cradle to grave. Whereas some data may have immediately obvious potential economic value, such as the Ordnance Survey mapping data, other data sets have less immediately obvious value, such as how children are transported to schools. The key realization, however, is that it is not for a small group of officials to determine which data sets have value, as they cannot be expected to identify all the potential ways in which this data might be used; instead, as wide a range of data sets should be made available as possible. In many countries, if members of the public want access to data that has not been published by public institutions, they can now submit Freedom of Information requests. In the UK this not only includes government data, but also research that has been carried out in public universities.

As well as the release of organizations’ own data, the publishing of other people’s data has also been increasingly used as a weapon over recent years, most famously in the case of Wikileaks, a website that publishes anonymous leaks of sensitive information from governments and other organizations, which has regularly made headlines around the world with the publishing of sensitive data: controversial examples include the membership list of the British National Party, the five hundred and seventy thousand pager messages that were intercepted on September 11th and diplomatic cables from US embassies.25 While the practice may be controversial, it is nonetheless increasingly accepted, with the publication of US embassy cables being a joint venture between Wikileaks and five traditional media organizations: the Guardian, The New York Times, Der Spiegel, Le Monde and El Pais. While only a small proportion of the quarter of a million cables were published, the event nonetheless raises a number of questions about the role of the press in a networked world (Benkler, 2011). Additionally, while

Page 36: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

48

Wikileaks has published leaked information, other groups have gone further: in response to the controversial pursuing of alleged file sharers by ACS:Law in the UK, the hacktivist group, Anonymous, published details of those who had been pursued and the films that had been downloaded, thus making ACS:Law accountable to the Information Commissioner’s Office.

With such large quantities of data available from a host of organizations, the question quickly becomes: what is the best way to make use of the data? In O’Reilly’s original article he focused on the importance of lightweight programming models, and made a comparison between Simple Object Access Protocol (SOAP) and Representational State Transfer (REST). Whereas SOAP enables applications to exchange information over the web in an Extensible Markup Language (XML) format, REST makes use of the structure of the web to encode queries in the form of a Uniform Resource Identifier (URI). Of the two competing methods, REST is by far the simplest, and the accessing of data can be as simple as entering a URI into the address bar of a web browser. As such, REST has become the most popular method of sharing data via APIs: at the beginning of October 2011, ProgrammableWeb, a website that keeps a database of APIs that are available online and some of the mashups that make use of them, listed 2918 RESTful services compared to 680 SOAP services.

Although ease of accessibility is an important consideration in making a web service available, it not the only consideration, and there can be advantages from additional levels of complexity. For example, as Twitter has grown, its priorities have changed; Twitter’s growth was enabled in part because of its simple RESTful API that allowed individuals to write applications that could interact with the service very easily, which led to a host of integrated web

Page 37: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

49

services and applications for different platforms. Its basic authentication merely required the embedding of a user name and password at the front of a URL (e.g. http://username:password@…), enabling even those with very limited programming skills to interact with the Twitter service. As the site grew, however, security was seen to become more important, while encouraging application development became less so. As such, the basic authentication was depreciated in favour of OAuth authentication. While it is more convoluted from the programmer’s perspective, requiring the passing of access tokens backwards and forwards, it is far more secure from the user’s perspective, as they no longer have to give their username and password to the applications they use, but can sign in via Twitter.

The biggest advantage of additional complexity is the potential for the creation of a semantic web, one in which the data is structured in such a fashion that it is meaningful to computers as well as people across different websites. Most of the data currently available on the web today is available in an unstructured format, and when data is structured it is done in such a way as to make it an idiosyncratic format specific to a particular website. This means that many online tasks currently require far more human input that would be necessary if the data were structured according to widely agreed standards. The difference between the current unstructured web and a semantic web may be illustrated with an example of someone who wants to spend the afternoon bowling in Birmingham, and who searches the web to find out where the nearest bowling alley is and how much it will cost. The searcher would reasonably expect all the necessary information to be available on the web, rather than expecting a search engine to explicitly answer the question ‘Where can I go bowling in Birmingham and how much will it cost?’ The searcher is therefore likely to enter the

Page 38: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

50

keywords ‘bowling’ and ‘Birmingham’ and then visit the relevant pages to determine how much it costs to go bowling, making sure that the pages are indeed referring to the same type of bowling (e.g. tenpin not crown green) and the same Birmingham (e.g. UK not Alabama) that the searcher is interested in. The experienced searcher may add additional terms to refine the search if there are too many false drops, while the search engine may make use of the searcher’s previous searches and location to provide a more personalized ranking. Where the information has been provided in a structured form, however, there is the possibility for more specialized search services: Google’s recipe search, for example, enables the narrowing of results by ingredients, calories and even cooking time.26 However, while it may be possible to automatically infer the structure of some information, or it may be in the interests of a search engine to create a bespoke solution for large websites of structured data, such as Yell.com, for most websites the onus is to make the data adhere to a widely adopted format. The move towards what is increasingly becoming a web of data includes a wide range of technologies, from simple markup using existing HyperText Markup Language (HTML) tags, to a whole framework of technologies that underlie the Semantic Web.27

The Semantic Web received widespread attention following the publication in 2001 of articles in Nature (Berners-Lee and Hendler, 2001) and Scientific American (Berners-Lee et al., 2001) that promised the potential for an increased range of automatic agents that would be able to carry out many of our mundane online tasks. However, the family of complex specifications involved in the Semantic Web’s attempts to provide a framework to deal with every type of information meant that it was a long way from O’Reilly’s idea of lightweight programming models, and so it was not instantly widely adopted. However, during the intervening

Page 39: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

51

years many of the necessary standards have become established, such as Web Ontology Language (OWL) for creating ontologies, and SPARQL Protocol and RDF Query Language (SPARQL) for querying semantic data. More recently, this has gained increased interest under the banner of linked data, which has been described by Tim Berners-Lee as ‘the web done right’ (Miller, 2008). Linked data uses the framework of the Semantic Web with some additional requirements: URIs should be used as the names for things; people should be able to use Hypertext Transfer Protocol (HTTP) to look up these URIs; looking up a URI should provide useful information; and data should include links to other URIs so more things can be discovered (Miller, 2008).

An open interlinked Semantic Web becomes far more useful, and we have seemingly reached a tipping point where an increasing number of organizations from different sectors have been making data available in a linked data format: the Open University is making library catalogue and course material available as linked data,28 while the British Museum has already made its catalogue of objects available,29 and the BBC has produced linked data sets for its wildlife content and its music database, as well as its database of television and radio programmes. There has also been interest in converting data that is already available: for example, as part of its release of a large number of data sets, the UK government republished much of this data in a linked data format.30 Others are converting data sets that are currently open online: the structured information within Wikipedia, for example, has been extracted and published as linked data.31 The usual criticisms of Wikipedia are equally valid, if not more so, with regard to its structured data, as the automatic extraction provides the opportunity for additional errors, yet its wide variety of data means it provides a central role in the linked data graph.

Page 40: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

Figu

re 1

.2

Link

ing

open

dat

a cl

oud

diag

ram

Sou

rce:

by

Ric

hard

Cyg

ania

k an

d An

ja E

ntzs

ch a

t ht

tp:/

/lod

-clo

ud .n

et

Page 41: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

53

A graph of the interlinked linked data sets has been regularly published since May 2007, during which time it has grown from 12 data sets to 295. The actual number of data sets in a semantic format will be much higher, although these are not all interlinked.

Although linked data has increased interest in the Semantic Web, it nonetheless still requires a certain level of technical knowledge to make data available, and to make use of data that is available. As such, there has also been increased interest in embedding semantic data within existing HTML pages, and various standards have emerged, such as microformats, microdata and Resource Description Framework in attributes (RDFa). While there are advantages in making data available in different formats, and lightweight programming models are likely to gain more initial interest, there has been a growing recognition over the last ten years of the importance of making data available, and the great potential of this data, whatever format it is made available in. Also, the more data that is available, the more that can be done with it.

Open science data

As with the harnessing of collective intelligence, the recognition of the potential advantages of sharing large quantities of data preceded the web. The UK’s first data archive, the Social Science Research Council Data Bank (now known as the UK Data Archive), was first established in 1967 in response to data banks that had already been created in the US, Germany and Holland (UK Data Archive, 2007). While the web has had an obvious impact on the way people can access such data stores, in many ways the bigger impact has come from people embracing the Web 2.0 idea of

Page 42: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

54

having a platform for the sharing of data that has allowed the publishing of data on a huge scale. For example, the UK Data Archive not only offers a curated data service, but has also established the UKDA-store for researchers to deposit their own data, funded by the UK’s Economic and Social Research Council.32 While the data in this store may not conform to the standards in the main store, data nonetheless has far more potential value being made available in an impure form than not at all.

The UK Data Archive is just one of an increasing number of data repositories being made available online. As well as being created to support a specific set of researchers, they have also been created for specific publications,33 or specific types of data (e.g. the Protein Data Bank34 and the Biological Magnetic Resonance Data Bank35). As well as those designed specifically for the scientific community, there are also a range of other tools aimed at a more general audience, which may nonetheless be used by the scientific community: Many Eyes, for example, provides a simple interface for sharing and visualizing data online.36 The web not only allows the public sharing of data, but also the public sharing of visualizations. This not only potentially reduces duplication, but can also encourage an exchange of ideas as to how the data may be used: Google Fusion Tables,37 for example, not only enables the public sharing and visualizing of data, but also the joining of data from multiple different tables (Gonzalez et al., 2010). An important part of Web 2.0 has been in putting tools into the hands of users, such as encouraging employees to blog, rather than using official press releases as the only public face of an organization; and in this respect, freely available public tools have an important contribution to make to the creation of a more innovative environment.

Encouraging researchers to deposit large quantities of data is unlikely to be easy. Even when a community of users

Page 43: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

The Web 2.0 revolution and the promise of Science 2.0

55

supports an idea in principle, as is the case with OA articles, it does not prevent the proportion of papers that are freely available online from being in the minority (Björk et al., 2010). As such, it is important that wherever possible, data should be captured and shared automatically: for example, myExperiment,38 a repository and social network for the sharing of bioinformatics workflows (Goble et al., 2010), also makes its data available as linked data.

It has been suggested that the large quantities of data now available have produced a new paradigm of science, where computers can be used to gain understanding from the vast quantities of data that are available (Bell, 2009); and it is important to recognize the value of the data that is being created through the use of Web 2.0 services, even when this is not explicitly scientific. The large quantities of data that are now available from social network sites such as Twitter and Facebook now form the basis of numerous studies. For example, statistically significant correlations have been found between the mood of Twitter users and the Dow Jones Industrial Average (Bollen et al., 2011), and the US military are investing millions of dollars in open-source intelligence based on Web 2.0 services (Weinberger, 2011). Whereas it once would have seemed surprising for people to share such large quantities of personal information, it is now being incorporated into numerous applications, as people make use of mobile apps that share everything from the routes they are jogging to how much they weigh. Data will not only increase in quantity as people start including ever increasing varieties of data over longer periods of time, but will potentially also, through the Internet of Things, extend the Internet to objects in the real world that each have their own unique identifier, such as a radio-frequency identification (RFID) tag (Gershenfeld et al., 2004).

Page 44: From Science 2.0 to Pharma 3.0 || The Web 2.0 revolution and the promise of Science 2.0

From Science 2.0 to Pharma 3.0

56

Notes

1. See, for example, Kopytoff (2011).2. See http://www.jove.com.3. See http://www.webmedcentral.com.4. See http://scienceblogs.com.5. See http://www.mendeley.com.6. See http://www.periodicvideos.com.7. See http://biolab.isis.rl.ac.uk/camerons_labblog.8. See http://usefulchem.wikispaces.com.9. See http://mps-expenses.guardian.co.uk.10. See http://appsfordevelopment.challengepost.com.11. See http://www.showusabetterway.co.uk/.12. See http://dev.mendeley.com/api-binary-battle.13. See http://www.mturk.com.14. See https://www.zooniverse.org/project/oldweather.15. See http://www.djo.org.uk.16. See www.ispot.org.uk.17. See http://www.thesynapticleap.org.18. See www.innocentive.com.19. See www.kaggle.com.20. See http://pipes.yahoo.com.21. See http://data.gov.22. See http://data.gov.uk.23. See http://geodata.gov.gr.24. See www.opendata.go.ke.25. A far-right, although not illegal, political group in the UK.26. See www.google.com/landing/recipes/.27. That is, the Resource Description Framework.28. See http://data.open.ac.uk/.29. See http://collection.britishmuseum.org.30. See http://data.gov.uk.31. See http://dbpedia.org.32. See http://store.data-archive.ac.uk.33. See, for example, Dryad at http://datadryad.org/.34. See www.pdb.org.35. See www.bmrb.wisc.edu.36. See http://www-958.ibm.com/software/data/cognos/manyeyes/.37. See www.google.com/fusiontables/.38. See http://www.myexperiment.org.