Upload
martina-kelley
View
215
Download
1
Embed Size (px)
Citation preview
01 March 2011 Kaiser: COMS E6125 1
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2011Spring 2011
01 March 2011 Kaiser: COMS E6125 2
Today’s Topics:
• What is Web 2.0?• Information Sharing and
Privacy• Applications Beyond the Web
Tim O’Reilly, September 2005
3
01 March 2011 Kaiser: COMS E6125 4
Netscape vs. Google: The Web As Platform
• Netscape: free web browser as flagship to establish market for high-priced server products that push content to the “webtop” – but servers also turned out to be commodities
• Google: Native web application, never sold or packaged or ported, delivered as a service with no scheduled software releases, massively scalable - core competency is data management
01 March 2011 Kaiser: COMS E6125 5
Akamai vs. BitTorrent:Internet Decentralization
• Akamai: Treats network as platform at deeper level of stack, transparent caching and content delivery that eases bandwidth congestion – also limited by business model catering to large providers
• BitTorrent: P2P file fragment downloads, every client is also a server, the service automatically gets better the more people use it - architecture of participation
01 March 2011 Kaiser: COMS E6125 6
Harness Collective Intelligence
• Google PageRank using link structure• eBay enabler of user activity requiring
critical mass• Amazon uses community activity to
produce better search results (real-time “most popular” computation)
• Wikipedia – radical experiment in trust, profound change in content creation
01 March 2011 Kaiser: COMS E6125 7
Harness Collective Intelligence
• Web of connections grows organically• Viral marketing – if a site or product relies
on advertising to get the word out, it isn’t Web 2.0
• Peer-production open source development of much web infrastructure – linux, apache, mysql, various perl, php, python
• Network effects from user contributions are the key to market dominance
01 March 2011 Kaiser: COMS E6125 8
Blogosphere• Blogging vs. personal home pages,
replaced personal dairy, daily opinion column, NNTP (Usenet’s Network News Protocol), now being supplanted by facebook and twitter
• RSS (Really Simple Syndication) allows subscribing to a page – the incremental (or live) web
• Permalink builds bridges between weblogs, effects PageRank search results
01 March 2011 Kaiser: COMS E6125 9
Perpetual Beta• Software delivered as a service, not a
product• Upgrades every day vs. every 2-3 years• Operations and monitoring must
become core competencies• Scripting languages as duct tape• Innovation in assembly
01 March 2011 Kaiser: COMS E6125 10
AJAXRich User Experiences
• Standards-based presentation using XHTML and CSS
• Dynamic display and interaction using the Document Object Model
• Data Interchange and manipulation using XML and XSLT
• Asynchronous data retrieval using XMLHttpRequest
• Javascript binding everything together
01 March 2011 Kaiser: COMS E6125 11
Infoware• Database management as core competency• Specialized databases: web crawl, distributed file
databases• Map databases: starting with Mapquest, many
services now license the same data from NavTeq (digital street maps), Digital Globe (satellite images)
• Amazon licensed ISBN registry from Bowker but added publisher-supplied data and user annotations
• Mashups based on lightweight programming model create value-added data
Key issue: Who owns the data?
01 March 2011 Kaiser: COMS E6125 12
Information Sharing: Web 1.0
• The original purpose of the Web!• Generally viewed as an information resource, download
without upload• Websites owned by “someone else” may store your
information in a database – usually limited to basic identification (name, address, phone number, credit card) and “preferences”
• Personal websites (e.g., hosted by geocities) might be universally browse-able but in practice visited by few
• Key issue: Who owns the data?
01 March 2011 Kaiser: COMS E6125 13
Information Sharing: Web 2.0
• Message boards with user-supplied content• Portals with user-selected content “portlets”• Blogs, wikis, news feeds, texting• Social networking, collaborative filtering• RIAs (rich internet applications)• The Web as Platform, widgets, mashups, user-
supplied applications Key issue: Who owns the data?
01 March 2011 Kaiser: COMS E6125 14
The Right To Privacy• Secrecy (confidentiality): The extent
to which we are known to others• Anonymity: The extent to which we
are the subject of others’ attention• Solitude: The extent to which others
have access to us
01 March 2011 Kaiser: COMS E6125 15
Rights to Sue (wrt Privacy)
• Intrusion upon seclusion or solitude, or into private affairs
• Public disclosure of embarrassing private facts
• Inaccurate reporting: Publicity that places a person in a false light in the public eye
• Appropriation of identity: “identity theft”
01 March 2011 Kaiser: COMS E6125 16
A New Yorker cartoon from 1993
01 March 2011 Kaiser: COMS E6125 17
But in 2011, your browser (and its addons, plugins, etc.) know
• You’ve searched for local veterinarians and groomers
• You’ve read reviews comparing flea powders• You’ve ordered “chew sticks” and “squeaky toys”• You’ve printed coupons for Alpo• You’ve downloaded 101 Dalmations and Lassie
“on demand” movies• Your email contains sales notices from petco.com Your “My Pictures” folder contains 100s of
images of fire hydrants and frisbees
01 March 2011 Kaiser: COMS E6125 18
01 March 2011 Kaiser: COMS E6125 19
Web Tracking• Bits: How Do They Track You? • Data collection events:
– Pages displayed– Search queries entered– Videos played– Advertising displayed (both same party and
third party)• In December 2007 alone, yahoo
collected 400 billion events, aol 100 billion, google 91 billion, microsoft 51 billion
01 March 2011 Kaiser: COMS E6125 20
From study bycomScore publishedin NY Times online3/9/08
01 March 2011 Kaiser: COMS E6125 21
Caveats• Not all of this data is useful• Not all of it is retained by the
companies with access to it• Much of it cannot be traced back to
individuals• Several data collection events may be
triggered by a single Web page • Does not include user-volunteered data
(website registration, social networking)
01 March 2011 Kaiser: COMS E6125 22
Why Track?• Targeted advertising supports
“free” services and content (ad serving was the first widely deployed mashup)
• But collected information can be used for other purposes…
• Ad choice (e.g., TACO, Evidon)
01 March 2011 Kaiser: COMS E6125 23
Privacy Before and After
• Before the Web, you participated in a variety of activities
• These might have involved groups of people, in public or private, possibly even “the press”
• Photos or recordings might have been taken, with or without your knowledge
• You might have borrowed or purchased books or magazines related to your activities
• You might have sent/received letters by snailmail
• What is different now? Does it matter?
01 March 2011 Kaiser: COMS E6125 24
Privacy Before and After
• Before the Web, you might have typed your name, address, phone number, birth date, social security number, bank account numbers, credit card numbers, etc. into your PC for personal storage
• It was unlikely anyone outside your household could access your PC
• Now you type at least part of that information into your PC all the time (if you make online purchases and/or sign up for online services)
• And you have no idea who might be reading them, from either your PC (if connected to Internet) or from the Websites you sent them to
01 March 2011 Kaiser: COMS E6125 25
Privacy Before and After
• Your name, phone number, address were always easily available (phone book, reverse listings)
• So was your birth date, although harder to obtain (birth records, drivers license)
• And your SSN - lots of forms ask for it• Your checking account and/or credit card
numbers were available through the issuing banks and the merchants where you made purchases
• So what is different now? Does it matter?
01 March 2011 Kaiser: COMS E6125 26
Web 2.0 Applications for Scientific Communities
• Scientists collaborating together in the same lab on the same project share:– Data: specimens, samples, materials, observations,
etc.– Tools: instruments, software, hardware– Knowledge: open discussion, whiteboard Real-world social networking
• However, there are time and space constraints• More significantly, this model does not scale
well to communities of scientists working on different projects but who could possibly learn from each other’s expertise, experience, etc.
01 March 2011 Kaiser: COMS E6125 27
CSCW Approaches• CSCW (Computer-Supported Collaborative
Work) aims to augment same-time/same-place collaboration but more significantly different-time/different-place collaborations and communities
• Current generation CSCW systems support data sharing (e.g., PNNL Collaboratories) and/or tool sharing (e.g., UIUC BioCoRE)
• However, these systems do not address knowledge sharing how/when/where/why to use tools and data
01 March 2011 Kaiser: COMS E6125 28
Knowledge Sharing• Knowledge sharing is partially enabled
through labor intensive static approaches: publications, email lists, wikis, chat, shared display, etc.
• We seek to enable automatic knowledge sharing - without requiring “extra work” on the part of scientists
01 March 2011 Kaiser: COMS E6125 29
Social Networking Metaphor
• Some online social networking is a form of CSCW that is potentially enjoyable and profitable but still requires “extra work”, with dynamism limited by explicit user participation– Facebook, LinkedIn, Twitter, etc.
• Other social networking automatically records what people do online to aggregate, data mine and disseminate in an enjoyable and profitable fashion, with no “extra work” required - but can be enhanced by very simple user actions (e.g., ratings)– Collaborative filtering – “people like you …”
01 March 2011 Kaiser: COMS E6125 30
genSpace Overview• We combine implicit and explicit social
networking concepts in our approach to knowledge sharing
• Prototype implemented as a set of plugins for geWorkbench, a platform for analysis and visualization tools for integrated genomics
• Records, aggregates, data mines and disseminates geWorkbench users’ activities with tools and tool sequences (workflows)
01 March 2011 Kaiser: COMS E6125 31
Questions genSpace Can Answer
• What do I do next? • Which tools work well together?• Where does this tool fit in a typical
workflow?• Who do I know who also uses this tool?• How can I get help (from an expert who
is online right now)?
01 March 2011 Kaiser: COMS E6125 32
genSpace Features• Collaborative Workflow Composition: past
history of analysis tool usage is used to identify commonly-occurring sequences/workflows
• Tool Suggestions: suggests analysis tools that may be useful, based on what tools were previously used
• Social Networking: allows users to associate with each other and share knowledge within groups
• Data Suggestions: suggest data sets based upon previous analyses and CF
01 March 2011 Kaiser: COMS E6125 33
genSpace Architecture
01 March 2011 Kaiser: COMS E6125 34
Privacy/Confidentiality Concerns
• Users can choose anonymous logging or disable it entirely
• Security/privacy of the activity logs is being investigated (data sets are NOT recorded*)
• Issues when users change their collaborative networks and/or opt out preferences
• Must we provide privacy by default?
Research in the Cloud• geWorkbench, most other analysis
tools are “fat” desktop applications
• Why not create a browser-based client?
01 March 2011 Kaiser: COMS E6125 35
More open questions for genSpace
• What other CSCW techniques can help support researchers?
• How can we efficiently address privacy concerns while providing helpful recommendations?
01 March 2011 Kaiser: COMS E6125 36
01 March 2011 Kaiser: COMS E6125 37
Summary• genSpace embodies an approach to
knowledge sharing that is based on social networking metaphors
• genSpace is built on the geWorkbench platform for integrated genomics
• Potentially applicable to other kinds of scientists and engineers, including software engineers
February 22, 2011 COMS 6125 38
Next Assignments• Full paper due Tuesday March 8th
• Project Proposal due Tuesday March 8th
01 March 2011 Kaiser: COMS E6125 39
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)
Prof. Gail KaiserProf. Gail Kaiser
Spring 2011Spring 2011