Web 2.0 for e-Science Environments

Embed Size (px)


Web 2.0 for e-Science Environments. SKG2007 Xian Hotel, Xian China October 29 2007 Geoffrey Fox and Marlon Pierce Computer Science, Informatics, Physics Community Grids Laboratory Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org. - PowerPoint PPT Presentation

Text of Web 2.0 for e-Science Environments

  • *Web 2.0 for e-Science Environments SKG2007Xian Hotel, Xian ChinaOctober 29 2007

    Geoffrey Fox and Marlon PierceComputer Science, Informatics, PhysicsCommunity Grids LaboratoryIndiana University Bloomington IN 47401


  • Applications, Infrastructure, TechnologiesThis field is confused by inconsistent use of terminology; I defineWeb Services, Grids and (aspects of) Web 2.0 (Enterprise 2.0) are technologiesGrids could be everything (Broad Grids implementing some sort of managed web) or reserved for specific architectures like OGSA or Web Services (Narrow Grids)These technologies combine and compete to build electronic infrastructures termed e-infrastructure or Cyberinfrastructuree-moreorlessanything is an emerging application area of broad importance that is hosted on the infrastructures e-infrastructure or Cyberinfrastructuree-Science or perhaps better e-Research is a special case of e-moreorlessanything

  • Relevance of Web 2.0They say that Web 1.0 was a read-only Web while Web 2.0 is the wildly read-write collaborative WebWeb 2.0 can help e-Science in many waysIts tools can enhance scientific collaboration, i.e. effectively support virtual organizations, in different ways from gridsThe popularity of Web 2.0 can provide high quality technologies and software that (due to large commercial investment) can be very useful in e-Science and preferable to Grid or Web Service solutionsThe usability and participatory nature of Web 2.0 can bring science and its informatics to a broader audienceWeb 2.0 can even help the emerging challenge of using multicore chips i.e. in improving parallel computing programming and runtime environments

  • *Best Web 2.0 Sites -- 2006Extracted from http://web2.wsj2.com/ All important capabilities for e-ScienceSocial Networking

    Start Pages

    Social Bookmarking Peer Production News

    Social Media Sharing

    Online Storage (Computing)

  • Web 2.0, Grids and Web Services IWeb Services have clearly defined protocols (SOAP) and a well defined mechanism (WSDL) to define service interfacesThere is good .NET and Java supportThe so-called WS-* specifications provide a rich sophisticated but complicated standard set of capabilities for security, fault tolerance, meta-data, discovery, notification etc.Narrow Grids build on Web Services and provide a robust managed environment with growing but still small adoption in Enterprise systems and distributed science (so called e-Science)Web 2.0 supports a similar architecture to Web services but has developed in a more chaotic but remarkably successful fashion with a service architecture with a variety of protocols including those of Web and Grid servicesOver 500 Interfaces defined at http://www.programmableweb.com/apis Web 2.0 also has many well known capabilities with Google Maps and Amazon Compute/Storage services of clear general relevance There are also Web 2.0 services supporting novel collaboration modes and user interaction with the web as seen in social networking sites, portals, MySpace, YouTube

  • Web 2.0 Systems like Grids have Portals, Services, ResourcesCaptures the incredible development of interactive Web sites enabling people to create and collaborate

  • Web 2.0, Grids and Web Services III once thought Web Services were inevitable but this is no longer clear to meWeb services are complicated, slow and non functionalWS-Security is unnecessarily slow and pedantic (canonicalization of XML)WS-RM (Reliable Messaging) seems to have poor adoption and doesnt work well in collaborationWSDM (distributed management) specifies a lotThere are de facto Web 2.0 standards like Google Maps and powerful suppliers like Google/Microsoft which define the architectures/interfacesOne can easily combine SOAP (Web Service) based services/systems with HTTP messages but dominance of lowest common denominator suggests additional structure/complexity of SOAP will not easily survive

  • Distribution of APIs and Mashups per ProtocolNumber ofMashupsNumber ofAPIsSOAP is quite a small fraction

  • Where did Narrow Grids and Web Services go wrong?Too much Computing: historically one (including narrow grids) has tried to increase computing capabilities byOptimizing performance of codes at cost of re-usabilityExploiting all possible CPUs such as Graphics co-processors and idle cycles (across administrative domains)Linking central computers together such as NSF/DoE/DoD supercomputer networks without clear user requirementsNext Crisis in technology area will be the opposite problem commodity chips will be 32-128way parallel in 5 years time and we currently have no idea how to use them especially on clientsOnly 2 releases of standard software (e.g. Office) in this time spanInteroperability Interfaces will be for data not for infrastructureGoogle, Amazon, TeraGrid, European Grids will not interoperate at the resource or compute (processing) level but rather at the data streams flowing in and out of independent Grid islandsData focus is consistent with Semantic Grid/Web but not clear if latter has learnt the usability message of Web 2.0One needs to share computing, data, people in e-moreorlessanything, Grids initially focused on computing but data and people are more importanteScience is healthy as is e-moreorlessanythingMost Grids are solving wrong problem at wrong point in stack with a complexity that makes friendly usability difficult

  • Some Web 2.0 Activities at IUUse of Blogs, RSS feeds, Wikis etc.Use of Mashups for Cheminformatics Grid workflowsMoving from Portlets to Gadgets in portals (or at least supporting both)Use of Connotea to produce tagged document collections such as http://www.connotea.org/user/crmc for parallel computingSemantic Research Grid integrates multiple tagging and search systems and copes with overlapping inconsistent annotationsMSI-CIEC portal augments Connotea to tag a mix of URL and URIs e.g. NSF TeraGrid use, PIs and ProposalsHopes to support collaboration (for Minority Serving Institution faculty) Multicore SALSA project using for Parallel Programming 2.0

  • Use blog to create posts. Display blog RSS feed in MediaWiki.

  • Semantic Research Grid (SRG)Integrates tagging and search system that allows users to use multiple sites and consistently integrate them with traditional citation databasesWe built a mashup linking to del.icio.us, CiteULike, Connotea allowing exchange of tags between sites and between local repositoriesRepositories also link to local sources (PubsOnline) and Google Scholar (GS) and Windows Academic Live (WLA)GS has number of cited publications. WLA has Digital Object Identifier (DOI)We implement a rather more powerful access control mechanismWe build heuristic tools to mine web lists for citationsWe have an event based architecture (consistency model) allowing change actions to be preserved and selectively changedSupports integrating different inconsistent views of a given document and its updates on different tagging systems


  • MSI-CIEC PortalMSI-CIECMinority Serving Institution CyberInfrastructure Empowerment Coalition

  • NSF Grants Tag SystemNSF has the ability to get information (in XML) on all of the grants a particular person worked on We downloaded, parsed, and bookmarked this info using a little scavenger robot.Each grant is represented by a bookmark and tagged with relevant information in MSI-CIEC PortalGrant tags point to URLs of the NSF award page.The investigators are imported as users Each has a bookmark for each project they worked onThey are also represented in the tags of these projects.Can now form research collaborations by linking researchers with common tagsHopefully will enable broader collaborations and not just those between usual suspects

  • Superior (from broad usage) technologies of Web 2.0

    Mash-ups can replace Workflow

    Gadgets can replace Portlets

    UDDI replaced by user generated registries

  • *Mashups v Workflow?Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63 Workflow Tools are reviewed by Gannon and Fox http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdfBoth include scripting in PHP, Python, sh etc. as both implement distributed programming at level of servicesMashups use all types of service interfaces and perhaps do not have the potential robustness (security) of Grid service approachMashups typically pure HTTP (REST)

  • *Grid Workflow Datamining in Earth ScienceWork with Scripps InstituteGrid services controlled by scripting workflow process real time data from ~70 GPS Sensors in Southern California NASA GPSEarthquake

  • Grid Workflow Data Assimilation in Earth ScienceGrid services triggered by abnormal events and controlled by workflow process real time data from radar and high resolution simulations for tornado forecastsTypical graphical interface to service compositionTaverna another well known Grid/Web Service workflow tool

    Recent Web 2.0 visual Mashup tools include Yahoo Pipes and Microsoft Popfly

  • Parallel Programming 2.0Web 2.0 Mashups will (by definition the largest market) drive composition tools for Grid, web and parallel programmingParallel Programming 2.0 will build on Mashup tools like Yahoo Pipes and Microsoft Popfly

  • Web 2.0 Mashups and APIshttp://www.programmableweb.com/apis has (Sept 12 2007) 2312 Mashups and 511 Web 2.0 APIs and with GoogleMaps the most often used in MashupsThis is the Web 2.0 UDDI (service registry)

  • The List of Web 2.0 APIsEach site has API and its featuresDivided into broad categoriesOnly a few used a lot (49 APIs used in 10 or more mashups)RSS feed of new APIsGoogle maps dominates but Amazon S3 growing in popularity

  • Now to Portals*Grid-style por