Upload
rinky25
View
2.991
Download
3
Tags:
Embed Size (px)
Citation preview
UC Berkeley
Web 2.0 Applications
EuroSys 2010 Tutorial
Armando Fox
UC Berkeley Reliable Adaptive Distributed Systems Lab
Who I Am
• Adjunct Prof. at UC Berkeley Computer Science• Research
– 2006-now: applying machine learning to problems of datacenter-scale applications
– 2001-2006: Recovery-Oriented Computing (ROC)
– 1996-2000: Mobile computing meets SaaS
• Teaching: undergraduate Software-as-a-Service/Software Engineering
• Developer & maintainer of active Web app• Know just enough about languages to be
dangerous
2
Where I Work:RAD Lab 5-year mission
Enable 1 person to develop, deploy, and operate next-generation Internet application at scale
• Key enabling technology: Statistical machine learning– management, scaling, anomaly detection, performance prediction...
• interdisciplinary: 7 faculty, ~30 PhD’s, ~6 ugrads, ~1 sysadm
• Engagement with industrial affiliates keeps us honest
3
Goals & Non-Goals
• Goals– New Web 2.0 features, technologies, challenges – Web 2.0 & Software Engineering Education– Server-centric view, though client highly nontrivial– Assumption: basic familiarity with Web 1.0
• Non-goals– Plug our own research (you can read it elsewhere)– Teach you to code (plenty of good frameworks, docs)– Instead, know the landscape & where to go next
• Disclaimers– My views are mine alone, etc.– Specific tools mentioned for sake of example only
4
Key Messages
• Social Computing & Rich UI’s • DADO teams (develop, assess, deploy,
operate) vs. waterfall • Agile, Behavior-Driven Development vs. Big
Design Up Front• High-productivity tools, languages, frameworks:
undergrads deploy ready-to-use apps in ~weeks• Cloud computing is a game changer for Web
education, research, & business
5
Outline of topics
• Web 1.0 review & what’s new in 2.0• Web 2.0 application frameworks• Service-oriented architecture• DADO, a new view of software development• Deployment• Education• Research Challenges
6
UC Berkeley
WEB 1.0 REVIEW &WHAT’S NEW IN 2.0
7
Software-as-a-Service (SaaS) Evolution
• (Dates are approximate...)• 1990: Web 0.9 (physicists using NCSA Mosaic)• 1995: Web 1.0 (static & some dynamic content,
e-commerce, Netscape)• 1997: "Content is King" => "Services are King"
(email, search engines, photo sharing...)• 2000: Web 2.0 (rich UI's, social computing)• 2004: SaaS & SOA (Service Oriented
Architectures) (Google Maps, Amazon S3...)• 2008: Cloud Computing (pay as you go)
8
The Web is a Client-Server, Request-Reply Architecture
• HTTP (Hypertext Transfer Protocol), ASCII-based request/reply protocol that runs over TCP– HTTPS: variant that first establishes symmetrically-encrypted channel via
public-key handshake, so suitable for sensitive info
• By convention, servers listen on TCP port 80 (HTTP) or 443 (HTTPS)• Universal Resource Identifier (URI) format: scheme, host, port, resource,
parameters, fragment
http://search.com:80/img/search/file?t=banana&client=firefox#p2
Web browser Web serverA series of tubes
DNS server
1.
2.
9
A Conversation With a Web Server
GET /index.html HTTP/1.0User-Agent: Mozilla/4.73 [en] (X11; U; Linux 2.0.35 i686)Host: www.yahoo.comAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
image/png, */*Accept-Language: enAccept-Charset: iso-8859-1,*,utf-8
• Server replies:HTTP/1.0 200 OKContent-Length: 16018Set-Cookie: B=2vsconq5p0h2nContent-Type: text/html
<html><head><title>Yahoo!</title><base href=http://www.yahoo.com/> …etc.
• Repeat for embedded content (images, stylesheets, scripts...) <img width=230 height=33 src="http://us.a1.yimg.com/us.yimg.com/a/an/anchor/icons2.gif">
HTTP method & URIHTTP method & URI
Cookie data: up to 4KiBCookie data: up to 4KiB
MIME content typeMIME content type
10
Cookies
• On first visit to a server, browser may receive a cookie from server in HTTP header– Data is arbitrary (up to 4KB long) – typically opaque, interpretation is up to the server– usually HMAC’d or encrypted, since client untrusted
• Browser automatically passes appropriate cookie back to server on each request– Server may update cookie value with any response– Thus can synthesize concept of “session” using this
• Many, many uses– track user’s ID (canonical use: authentication)– track session state (up to 4KB) or a handle to it– before cookies, “fat URL’s” used for this in Web 1.0
11
XML (eXtensible Markup Language)
<?xml version="1.0" encoding="UTF-8"?><book year="1967"> <title>The politics of experience</title> <author> <firstname>Ronald</firstname> <lastname>Laing</lastname> </author></book>
• Really a metalanguage for describing hierarchical, semistructured, schema-less data
• XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type
12
ValueAttribute
ElementElementElement
XML (eXtensible Markup Language)
<?xml version="1.0" encoding="UTF-8"?><book year="1967"> <title>The politics of experience</title> <author> <firstname>Ronald</firstname> <lastname>Laing</lastname> </author></book>
• Really a metalanguage for describing hierarchical, semistructured, schema-less data
• XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type
13
HTML, XHTML & Beyond
• XHTML: a document conforming to a particular DTD describing a hierarchical collection of HTML elements– Variants: Strict, loose, transitional (for compatibility with deterioriating
HTML syntax 1990-95)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
– inline (headings, tables, lists...)– embedded (images, video, Java applets, JavaScript code...)– fill-in forms—text, radio/check buttons, dropdown menus...,
marshaling arguments into either URI or request body
• CSS (Cascading Stylesheets) for presentation– Strict XHML forbids presentational markup– Idea: complete separation of appearance from structure
14
Selectors identify specific tag(s)
<link rel="stylesheet" href="mystyles.css"/> <div class="pageFrame" id="pageHead">
<h1> Welcome, <span id="custName">Armando</span> </h1></div>
• tag name: h1• class name: .pageFrame • element ID: #pageHead• tag name & class: div.pageFrame• tag name & id: span#custName• descendant relationship: .pageFrame h1, div h1• descendant relationship: div .custName• child relationship: div > .custName
both of these match the outer div above
CSS Styles apply visual styling based on selectors
<link rel="stylesheet" href="mystyles.css"/> <div class="pageFrame" id="pageHead">
<h1> Welcome, <span id="custName">Armando</span> </h1></div>
• In mystyles.css (static asset with MIME type text/css):
div.pageFrame { background-image: url('/banner.gif'); }h1 { font-size: large; float: left; }#custName:hover { background-color: yellow; font-weight: bold; }
• Style properties include borders, background images, and layout directives (floating, absolute positioning, min/max scaled sizes, etc.)
• Changing style properties has side effect of re-rendering• e.g., change display or visibility property to show/hide elements
17
18
Dynamic content generation
• Most Web 1.0 (e-commerce) sites actually run a program that generates the output
• Originally: templates with embedded code “snippets”
• Eventually, embedded code became “tail that wagged the dog” and moved out of the Web server
• Languages/frameworks evolved to capture common tasks– Perl, PHP, Python, Ruby on Rails, ASP, ASP.NET,
Java Servlet Pages, Java Beans/J2EE, ...
19
SaaS 3-tier architecture
• Common gateway interface (cgi): allows Web server to run a program– Server maps some URI’s to application names– App is run, gets handed complete HTTP request
including headers
• “Arguments” embedded in URL with “&” syntax or sent as request body (with POST)http://www.foo.com/search?term=white
%20rabbit&show=10&page=1
• App generates entire response– content (HTML? an image? some javascript?)– HTTP headers & response code
• Plug-in modules for Web servers allow long-running CGI programs & link to language interpreters
HTTPserver
application
persistentstorage
appserver
storage
• Various frameworks have evolved to capture this common structure
20
3 Tier Deployment• HTTP server (“web server”)
– “fat” (e.g. Apache): support virtual hosts, plugins for multiple languages, URL rewriting, reverse proxying, ....
– “thin” (nginx, thin, Tomcat, ...): bare-bones machinery to support one language/framework; no frills
• App server1. separate server process, front-ended by a “thin” HTTP
server
2. or linked to an Apache worker via FastCGI or web server plug-in: mod_perl, mod_php, mod_rails, ...
– Apache can spawn/quiesce/reap independent processes
• Persistent storage– most commonly RDBMS (MySQL, PostgreSQL, etc.)
– communicate w/app via proprietary or standardized database “connector” (ODBC, JDBC, ...)
• Hence LAMP: Linux, Apache, MySQL, PHP/Perl
21
HTTPserver
application
persistentstorage
appserver
storage
Frameworks
• Support for more languages: Apache modules (mod_perl, mod_php, mod_rails ...)– avoid spawning new process per request– typically embed language interpreter in Apache
• Support for common idioms like sessions– Cookie management– virtualize connection to database– “dispatcher” interactions with front-end HTTP server
• Early “templating systems” (e.g. PHP) vs. modern “full stack frameworks” (e.g. Rails)
22
Example: Rails, a Ruby-based Model/View/Controller Framework
apache
your app
CGI or other dispatching
RelationalDatabase
mysql orsqlite3
Rubyinterp.
firefox
tables
models/*.rb
controllers/*.rb
Rails routingRails routing
views/*.html.erb
Rails renderingRails rendering
Model, View, Controller
Subclasses of ActiveRecord::Base
Subclasses of ActiveRecord::Base
Subclasses of ActionView
Subclasses of ActionView
Subclasses of ApplicationControll
er
Subclasses of ApplicationControll
er
• Implemented almost entirely in Ruby• Distributed as a Ruby “gem” (collection of related libraries & tools)• Connectors for most popular databases
A trip through a Rails app
1. Declarative routes map URL’s to actions (methods in a class) and unmarshal parameters from URL or form
2. Actions can set variables that are visible to views3. Every controller action eventually renders something
1. HTML page: view template with variables expanded2. Response to AJAX request3. Error page
http://.../foo/my_action?x=Howdy
routes.rb
app/controllers/foo_controller.rb
def my_action @var = params[:x]end
app/views/foo/my_action.html.erb
<p> Hey, <%= @var %></p>
25
ActiveRecord, an object-relational mapping layer
class User < ActiveRecord::Base• table name inferred from class name• columns introspected from database• example of convention over configuration
# To find by column values:armando = User.find_by_name('fox')armando = User.find_by_name_and_birthdate('fox',
Date.parse('May 12, 1968'))armando.birthdate = Date.parse('June 6, 1969')armando.save!
# To find only a few, and sort by an attributeold_guys = User.find(:all,
:conditions => ["birthdate < ?", Date.parse("1/1/80")], :order => "birthdate")
users
id*
name
birthdate
Protect from SQL injection
attacks
Protect from SQL injection
attacks
26
ActiveRecord Associations
users
id*
name
description
pics
id*
user_id**
filename
SELECT *FROM users u JOIN pics p ON u.id = p.user_id;
class User < ActiveRecord::Base has_many :picsendclass Pic < ActiveRecord::Base belongs_to :userendthisuser.pics << Pic.new(...)thisuser.pics.sort { |p| p.user.birthdate }
27
Multiple joins
user has_many :groups, :through=>:memberships
group has_many :users,:through=>:memberships
membership belongs_to :user, belongs_to :group
• Can now write user.groups, group.users, etc.
• Separates relationships from storage schema
memberships
user_id**
group_id**
status
groups
id*
name
topic
users
id*
name
description
Rails & Security
• Application-based attacks on Web 2.0 apps– SQL injection (defense: sanitize untrusted user input)– Cross-site request forgery, cross-site scripting
(defense: include session authentication token)– Good frameworks help protect against these
• Infrastructure-based attacks (DDoS, etc.)– Your deployment provider matters (more on this later)
28
What’s new in Web 2.0?
• Primitive UI => Rich UI– enable “desktop-like” interactive Web apps– enable browser as universal app platform on cell phones
• “Mass customize” to consumer => Social computing– tagging (Digg), collaborative filtering (Amazon reviews), etc. =>
primary value from users & their social networks– write-heavy workloads (Web 1.0 was read-mostly)– lots of short writes with hard-to-capture locality (hard to shard)
• Libraries => Service-oriented architecture– Integrate power of other sites with your own (e.g. mashups that
exploit Google Maps; Google Checkout shopping cart/payment)– Pay-as-you-go democratization of “services are king”– Focus on your core innovation
• Buy & rack => Pay-as-you-go Cloud Computing
29
Rich Internet Apps (RIAs)
• Closing gap between desktop & Web– Highly responsive UI’s that don’t require server roundtrip per-action– More flexible drawing/rendering facilities (e.g. sprite-based animation)– Implies sophisticated client-side programmability– Local storage, so can function when disconnected
• early example: Google Docs + Google Gears– include offline support, local storage, support for video, support for
arbitrary drawing, ...
• currently many technologies—Google Gears, Flash, MS Silverlight...– client interpreter must be embedded in browser (plugin, extension, etc.)– typically has access to low-level browser state => new security issues– N choices for framework * M browsers = N*M security headaches
• proposed HTML5 may obsolete some of these
30
Rich UI with AJAX(Asynchronous Javascript and XML)
• Web 1.0 GUI: click page reload• Web 2.0: click page can update in place
– also timer-based interactions, drag-and-drop, animations, etc.
How is this done?1. Document Object Model (c.1998, W3C) represents
document as a hierarchy of elements2. JavaScript (c.1995; now ECMAScript) makes DOM
available programmatically, allowing modification of page elements after page loaded
3. XMLHttpRequest (c.2000) allows async HTTP transactions decoupled from page reload
4. JavaScript libraries (jQuery, Prototype, script.aculo.us) encapsulate useful abstractions
31
DOM & JavaScript:Document = tree of objects
• hierarchical object model representing HTML or XML doc
• Exposed to JavaScript interpreter– Inspect DOM element value/attribs
– Change value/attribs redisplay or fetch new content from server
• Every element can be given a unique ID• JavaScript code can walk the DOM tree or select
specific nodes via provided methods
<input type="text" name="phone_number" id="phone_number"/><script type="text/javascript"> var phone = document.getElementById('phone_number'); phone.value='555-1212'; phone.disabled=true; document.images[0].src="http://.../some_other_image.jpg";</script>
32
JavaScript
• A browser-embedded scripting language– OOP: classes, objects, first-class functions, closures– dynamic: dynamic types, code generation at runtime– JS code can be embedded inline into document...
<script type="text/javascript"> <!-- # protect older browsers
calculate = function() { ... } // --> </script>
– ...or referenced remotely: <script src="http://evil.com/Pwn.js"/>
• Current page DOM available via window, document objects– Handlers (callbacks) for UI & timer events can be attached to JS
code, either inline or by function name: onClick, onMouseOver,...
Changing attributes/values of DOM elements has side-effects, e.g.: <a href="#" onClick="this.innerHTML='Presto!'">Click me</a>
33
AJAX ==Asynchronous Javascript And Xml
• Recipe:– attach JS handlers to events on DOM
objects – in handler, inspect/modify DOM elements
and optionally do asynchronous HTTP request to server
– register callback to receive server response– response callback modified DOM using
server-provided info
• JavaScript as a target language– Google Web Toolkit (GWT): compile Java => emit JS– Rails: runtime code generation ties app abstractions to JS
34
JavaScript example for AJAX
r=XmlHttpRequest.newr.open("GET","http://www.example.com",true)
last arg true means script should not block (important!)r.send(request_content) # eg, form fields
• Callbacks during XHR processingr.onReadyStateChange=function(XmlHttpRequest req)
{ ... }
– inspect req.readyState uninitialized,open, sent,receiving,loaded
req.status contains HTTP status of responsereq.responseText contains response content• Libraries like JQuery and Prototype abstract this
and provide some cross-browser support
35
Example: AJAX via Rails
• Embedded Ruby code in HTML template:
link_to_remote('Show article',:update => 'article_content',:url => {:action =>'get_article_text',:id =>article},:before => "Element.show('spinner')",:loading => "Element.hide('spinner'); Element.show('stopwatch')",:success => "Element.hide('stopwatch')",404 => alert("Article text not found!"),:failure => alert("Some other error"))
• Delivered page contains JS that embeds calls to Prototype, defines and dispatches to callback handlers, etc.
• Simple auto-completion handler:observe_field('student[last_name]', :url => {:controller=>'students',
:action=>'lookup_by_lastname'}, :update=>'lastname_completions')
36
Sidebar: It’s Tough Being a Browser
• Users now expect “Web apps” to include animation, sound, 3D graphics, disconnection, responsive GUI...– Browser =~ new OS: manage mutually-untrusting apps (sites)
37
Sou
rce:
Rob
ert O
’Cal
laha
n (M
ozill
a.or
g),
Insi
de F
irefo
x
Social Computing
• Web 1.0: add value via mass customization– select content/presentation for you based on best guesses about
your interests– resource: demographic/analytic data about you
• Web 2.0: add value via connecting to social network– vendor: your friends’ interests are a good indicator of your
interests – user: value added to existing content == how your friends interact
with it– resource: your social network
• From social networking site to social network as a way of structuring applications 38
Social Computing
• Amount of content “created” by each user small!– e.g., Digg article, rate video, play a Facebook game
• but still creates lots of short random writes– consider “Like” feature on Facebook– social graphs naturally hard to partition (though would
love to see a paper about this from FB)
• question for Web 2.0 developers is not whether social computing is part of your app, but how
• later we will discuss technical architecture of “connecting” an app to social networks
39
UC Berkeley
SOA
40
Amazon.com: Web 1.0 SOA
• ~50 “two-pizza” teams of “developer/operators”
• ~10 operators – monitor the whole site– page the resolvers on alarm
• ~1000 resolvers – 10-15 per team, 1 on-call 24x7– monitor own service, fix problems
• Over 140 code change commits/month• Internal microcosm of service-oriented
architecture (as were Yahoo, Google, others)
web serverweb serverweb server
web serverweb serverservice Aweb serverweb serverservice B
web serverweb serverservice C
DB DB DB
P. Bodík et al., Advanced Tools for Operators at Amazon.com, Proc. ICAC 2005
What is SOA?
• Use other services as RPC servers for your app• Web 1.0: large sites organized this way internally
– Yahoo!, Amazon, Google, ...– External “Services” available, but getting them is high-
touch: Doubleclick ads, Akamai content distribution
• Web 2.0: consumer-facing service API’s and typically pay-as-you-go (vs. contractual)– Services: Google AdSense, Google Analytics, Amazon
CloudFront...
– Platforms: Facebook, Google Maps, ...
– Mashups, e.g. housingmaps.com
– User-composable services, e.g. Yahoo Pipes
42
SOA == RPC
• Transport: HTTP(S)• Data interchange: XML DTD (e.g., RSS), JSON• Request protocol:
– SOAP (Simple Object Access Protocol)– JSON-RPC
• On the horizon: WebHooks (HTTP POST callback, for “push”)
43
JSON-RPC
• Open connection to designated port on server• Send HTTP method & request URI, with MIME type of body set to application/json• Then send request body:
{ "version": "1.1", "method": "confirmFruitPurchase", "id": "194521489", "params": [ [ "apple", "orange", "pear" ], 1.123 ]}• Response might be something like this:{ "version": "1.1", "result": "done", "error": null, "id": "194521489"}• You have to handle substantially all errors
44
RSS• Request is a regular HTTP GET to a specified URL
<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1"> <channel> <title>Altarena Playhouse Ticket Availability</title> <link>http://www.audience1st.com/altarena/store</link> <description>Altarena Ticket Availability</description> <item> <title>Sylvia - Friday, May 14, 8:00 PM – Buy now</title> <link>http://.../store?showdate_id=347</link> <guid permalink="false">http://.../store?ts=1271058414</guid> </item> <item> ... </item> </channel></rss>
45
AJAX and SOA
• AJAX: client server– client makes (async) requests to HTTP server
– client-side JavaScript upcall receives reply and decides what to do
– commonly, response includes XHTML/XML to update page, or JavaScript to execute
– Doesn’t really make sense except in context of client
• SOA: server server or client server– one principal makes (sync or async) requests to an HTTP server
– formerly, principal was a server running some app
– today, powerful JavaScript clients blur the line
46
• Facebook plug-in apps
• Facebook platform (“Facebook Connect”)
47
AJAX
Facebook.com Your app2.3.
FB dataFBQL
4.html
1.
SOA
3.Your app Facebook.com
FB data
html+xfbml
1.
4.
2 (opt.).
REST
REST via JavaScript & XFBML HTML IFRAME w/FB content
Google Maps
• Your app embeds Javascript-heavy client code (provided by Google) – client-side functionality: clear/draw overlays, etc.– server-side functionality: fetch new map, rescale, geocoding
• Attach callbacks (handled by your app) to UI actions• Result of callback can trigger additional calls to Google
Maps code, which in turn contact GMaps servers
html+js
Yourapp
GoogleMaps
1.
2.
3.4.
48
Mashups: housingmaps.com
49
Two ways to do it...
50
“Thin” browser
client
“Thin” browser
client
Web 2.0 app
Web 2.0 app Craigslist.orgCraigslist.org
Google MapsGoogle Maps“Fat” browser client
“Fat” browser client
+ Client portability
+/– Client performance (both app download & JavaScript execution)
+ Availability of utility libraries for app development
– Privacy/trustworthiness of aggregator app
– Caching
REST (Representational State Transfer) Philosophy
• Architectural style (not a standard per se):– Client-server, Stateless, Cacheability indicated
– a/k/a post-hoc description of properties that made Web 1.0 successful by constraining SOA interactions
• In context of SOA for Web 2.0– HTTP is transport; HTTP methods (Get, Put, etc.) are the
only commands
– Reify idea that URI names resource (broadly...)
– Client has resource has enough info to request modification of resource on server
– cookie can encode part of transferred state
• If your app is RESTful, it’s easy to “SOA”-ify
51
REST with HTTP examplesHTTP GET HTTP PUT HTTP POST HTTP DELETE
Collection URI, such as http://example.com/customers/257/orders
List the members of the collection, complete with their member URIs for further navigation
Replace the entire collection with another collection
Create a new entry in the collection. The ID created is usually included as part of the data returned by this operation.
delete the entire collection
HTTP GET HTTP PUT HTTP POST HTTP DELETE
Element URI, such as http://example.com/resources/7HOU57Y
Retrieve a representation of the addressed member of the collection in an appropriate MIME type
Update (or create) the addressed member of the collection
Treats the addressed member as a collection in its own right and creates a new subordinate of it.
Delete the addressed member of the collection.
52
UC Berkeley
AGILE DEVELOPMENT & WEB 2.0
53
New models of software development
• Process: SupportDADO Evolution, 1 group
• Waterfall: Static Handoff Model, N groups
Develop
Assess Deploy
Operate
Develop
Assess
Deploy
Operate54
Why is this here?• For many, a new way to develop software• Highly productive: undergraduates produce complete
working apps, with tests, in weeks• Great structural fit for Web 2.0 applications• Amazingly good tools: “make it fun” just as important for
testing as for development
55
(Short) History of Software Engineering
• “1/3 of software development projects fail or are abandoned outright because of cost overruns, delays, and reduced functionality”
• IRS Tax Modernization System: – “The IRS must recognize that technology is an enabler, not a driver, of
business success, and that it needs a strategic plan with business objectives that drive the use of technology.” House Commission on Restructuring the IRS, 1997 report
• Denver Airport Baggage Handling System– 1.5 year delay, $1M/day during modifications/repairs, ultimately
abandoned 10 years later (source: Wikipedia)
• Software Development Failures: Anatomy of Abandoned Projects, K.Ewusi-Mensah, 2003
56
“Big Design Up Front”
• Started with elaborate, detailed specification of what customer wants– 100s of pages
• Problem: Customers may change mind– change wrecks schedule in unpredictable ways
– some use cases may have been forgotten or misrepresented
• But change is inevitable– “If a problem has no solution, it may not be a problem, but a
fact; not to be solved, but to be coped with over time”Israeli foreign minister Shimon
Peres
57
Agile Development
Big Design Up Front Agile
• Time, resources, and scope “fixed”
• Changing one affects the others, as well as quality
• Manage the plan• Try to minimize change
• Time, resources, and quality fixed
• Changing time or resources affects scope
• Manage the priorities• Change as you learn more
Agile methods break tasks into small increments with minimal planning, and do not directly involve long-term planning. Each iteration involves a team working through a full SW development cycle including planning, requirements analysis, design, coding, unit testing, and acceptance testing when a working product is demonstrated to stakeholders. This helps minimize overall risk, and lets the project adapt to changes quickly. An iteration may not add enough functionality to warrant a market release, but the goal is to have an available release (with minimal bugs) at the end of each iteration. Multiple iterations may be required to release a product or new features.
Test-Driven/Behavior-DrivenDevelopment
Behavior driven: start from behaviors, and behavior spec == acceptance test– Start from user behavior by writing the code you wish
you had (results in better API than top-down design)– Script the tests you’d single-step manually– when done, get automatable integration/acceptance
test for free
Test driven: write tests first– debugging, testing, isolating bugs: need modular codewrite test first ensures code is modular/debuggable
59
User Stories for Acceptance/Integration
Testing• A story from user perspective that provides business
value to stakeholder and is testable• As a [type of stakeholder]
I want to [perform some task] so that I can [reach some goal]
– Complete Web app has 100’s or 1000’s of stories
– Long stories (“epics”) broken down to smaller chunks
• Development proceeds in fixed-period iterations (typically 2 weeks)– Each story small enough to implement in 1 iteration
– Developer estimates difficulty (points) to implement
– “Deliver” (release) N new points/iteration (velocity)
60
A Feature Comprises Several User Stories
Feature: Subscriber purchases additional tickets
As a season subscriber I want to go to the Store page So that I can buy discounted tickets for a show
Scenario: Subscriber logs in Given I am logged in as a subscriber When I visit the "Store" page Then I should see the Subscriber message
Scenario: Subscriber offered discount ticket price Given I am on the "Store" page And there are upcoming performances of "Chicago" When I select the show "Chicago" Then "Subscriber Discount" should appear in the "Ticket Prices" menu
61
A Feature Comprises Several User Stories
Feature: Subscriber purchases additional tickets
As a season subscriber I want to go to the Store page So that I can buy discounted tickets for a show
Scenario: Subscriber logs in Given I am logged in as a subscriber When I visit the "Store" page Then I should see the Subscriber message
Scenario: Subscriber offered discount ticket price Given I am on the "Store" page And there are upcoming performances of "Chicago" When I select the show "Chicago" Then "Subscriber Discount" should appear in the "Ticket Prices" menu
62
1. Title1. Title
2. Narrative2. Narrative
3. User stories3. User stories
Rails Testing Ecosystem
• Unit testing: RSpec (based on Java Spec)– more expressive, and Ruby-specific
– extensive support for isolation (mocking & stubbing) by exploiting Ruby dynamic language features
• Integration/acceptance testing: Cucumber– can be used for non-Ruby systems
– bridges user stories and integration tests
• Cucumber on Rails– Web browser interactions: use Webrat or Selenium to
emulate or script browser interactions, incl. JavaScript
– (Optional) Use RSpec facilities to setup preconditions, check postconditions of tests
63
Given...
• Regular expressions match scenario text to test code• “Steps” implement Given, When, Then• Given: setup preconditions either directly or via
Webrat/Selenium
Given /^I am logged in as a subscriber$/ do visit '/customers/login' @customer = customers(:tom_the_subscriber) fill_in 'customer_login', :with => @customer.login fill_in 'customer_password', :with => @customer.pass click_button 'Login' response.should match(/Login successful/)end
64
When... Then...
• When: use Webrat or Selenium to emulate browser or drive a real browser
• Then: use RSpec (unit test) facilities to check outcome (should, should_receive, etc.)
When /^I visit the Store page$/i do visit '/store'end
Then /^I should see the (.*) message$/ do |msg| response.should have_selector("div. #{m}") response.should match (Regexp.new( "Welcome,.*" + @customer.first_name))end
65
Test case 2Test case 2
Test case 1Test case 1
Preconditions before each testPreconditions before each test
Expectations example
describe "transferring a ticket" do context "when recipient doesn't exist" do before(:each) do @t = Ticket.new(...) @from = Customer.find(:first) @from.tickets << @t @from.save! @to = create_nonexistent_customer_id() end it "should not cause an error" do lambda { @t.transfer_to_customer(@to) }. should_not raise_error end it "should not remove from original owner" do @t.transfer_to_customer(@to) @from.tickets.should include(@t) end endend
66
Expectations example
describe "successful purchase" do it "should contact the payment gateway" do
Store.should_receive(:pay_via_gateway). with(@amount,@credit_card,@params). exactly(1).times.and_return(@success)Store.purchase!(...)
• Expectation modifiers: at_least(n).times, any_number_of_times
• Argument modifiers: with(:any_args), with() • Return value modifiers: and_return(val)• Ruby dynamic language features used to implement
this test scaffolding
67
Outside-In Development: Red/Green/Refactor
For each step in user story
1. Write the step definition
2. Run & watch it fail
For each behavior of underlying objects/models
Write unit test (expectation)
3. Watch it fail
4. Implement just enough to pass
5. Refactor if needed
6. Watch user story step pass
7. Refactor step(s) if needed
68
Tracking Progress with PivotalTracker.com
69
Summary: Agile & Behavior-Driven Development
• Agile, iteration-based process based on user stories
• Planning, coding, testing proceed as a cycle by 1 person
• Test-first promotes modularity, debugability, and a concrete measure of progress
• Attention to productivity in testing tools as well as dev tools– Student projects in Berkeley SaaS class: ~50% LOC
were testing-related
70
UC Berkeley
DEPLOYMENT
71
Scaling via Replication
• The “most general” deployment scenario for a 3 tier Web app– Many Web servers
– possibly including static-asset servers
– L4/L7 load balancers distribute load among them
• Caches and reverse proxies remember previously-computed content– whole page caching
– page fragment caching, query caching
– Apache in reverse-proxy mode, or memcached process(es) addressed byapp server
• Integration of caching with app logic varies by framework
WSWS …
$ $…
LB LB…
App App
DB DB?
…
the Internets
AssetSvr
72
“Scale makes availability affordable”
• Goal: interchangeability (send any user request to any available server)– each server handles 1/N load– affinity can be used to “soft-pin” users to
particular servers– requires good support for session state
abstraction in app framework
• lose 1 server => lose 1/N capacity– Load Balancers have logic to detect
failed servers & remove from rotation until they are resurrected
73
WSWS …
$ $…
LB LB…
App App
DB DB?
…
the Internets
AssetSvr
Asset Servers
• For serving static assets (images, sound clips, CSS, etc.)
• Separate Web server process, configuration optimized for fast static file serving
• Web 2.0: use Amazon S3 (blob store) or CloudFront (CDN)– helps to have good asset-server abstraction in app
framework
74
Deploying a new release
• Checkout new code on production server(s)• Run database schema migrations if any• Quiesce old version, soft-restart new version• If necessary, temporary disable access during
quasi-atomic switchover• Differentiate between asset servers, code
servers, database machines• Be prepared to roll back if any problems• Tools like capistrano help automate the above
steps
75
Deployment scenarios (& approximate pricing)
• Buy/rack/install/configure it yourself...that’s so Web 1.0• Shared hosting ($3/month)
– turnkey support for popular frameworks, hosted versions of popular building blocks (e.g. MySQL)
– highly variable performance, multitenant per machine
• Virtual private host ($10/month)– better isolation and security through virtualization– substantially more administration
• “Framework VM” or “curated” environments (Heroku, Google AppEngine, Force.com) – pricing varies– hosted extensions: memcached, profiling, etc.– integration of 3rd party hosted services, e.g. Amazon S3 backup
• Cloud Computing
76
Pay-as-you-go Cloud Computing
7777
“Instances” Platform Cores Memory Disk
Small - $0.085 / hr 32-bit 1 1.7 GB 160 GB
Large - $0.34/ hr 64-bit 4 7.5 GB 850 GB – 2 spindles
XLarge - $0.68/ hr 64-bit 8 15.0 GB 1690 GB – 3 spindles
Options....extra memory, extra CPU, extra disk, ...
A Berkeley View of Cloud Computing (2/09)
abovetheclouds.cs.berkeley.edu• Goal: stimulate discussion on what’s new
– Clarify terminology– Quantify comparisons– Identify challenges & opportunities
• UC Berkeley perspective– industry engagement but no axe to grind– users of Cloud Computing since late 2007
• New: pay-as-you-go, utility computing– Illusion of infinite resources on demand (minutes)– Fine-grained billing: release == don’t pay, no minimum
78
Unused resources
Cloud Economics 101
• Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA
“Statically provisioned” data center
“Virtual” data center in the cloud
Demand
Capacity
Time
Demand
Capacity
Time
79
Unused resources
Cloud Economics 101
• Cloud Computing Provider: Could save energy
“Statically provisioned” data center
Real data center in the cloud
Demand
Capacity
Time
Demand
Capacity
Time
80
Unused resources
Risk of Overprovisioning
• Underutilization results if “peak” predictions are too optimistic
Static data center
Demand
Capacity
Time
81
New Scenarios Enabled by “Risk Transfer” to Cloud
• “Cost associativity” from linear pricing: 1,000 CPUs for 1 hour same price as 1 CPUs for 1,000 hours (@$0.10/hour)– Washington Post converted Hillary Clinton’s travel documents to
post on WWW <1 day after released– RAD Lab graduate students demonstrate improved Hadoop (batch
job) scheduler—on 1,000 servers
• Major enabler for SaaS startups– Animoto traffic doubled every 12 hours for 3 days when released
as Facebook plug-in– Scaled from 50 to >3500 servers– ...then scaled back down
• Goal: fix any transient problem by adding/removing nodes– Single-node performance becomes much less important
82
Classifying Clouds for Web 2.0
• Instruction Set VM (Amazon EC2)• Managed runtime VM (Microsoft Azure)• Curated “IDE-as-a-service” (Heroku)• Platform as service (Google AppEngine, Force.com)
• flexibility/portability vs. built-in functionality
EC2 Azure Force.com
Lower-level,Less managed
Higher-level,More managed,
more value-added SW
83
Heroku,AppEngine
Joyent
Summary: Deployment
• “Deployment-as-a-service” increasingly common– monthly pay-as-you-go curated environment (Heroku)– hourly pay-as-you-go cloud computing (EC2)– hybrid: overflow from fixed capacity to elastic capacity– Remember administration costs when comparing!
• Good framework can help at deployment time– Separate abstractions for different types of state: session state,
asset server, caching, database– ORM – natural fit for social computing, and abstracts away from
SQL (vs Web 1.0 PHP, e.g.)– REST – encourages you to make your app RESTful from start,
so that “SOA”-ifying it is trivial
• Scaling structured storage: open challenge
84
UC Berkeley
EDUCATION
85
Software Education in 2010 (or: the case for teaching SaaS)
• “depth first” CS curricula vs. Web 2.0 breadth– DB, Networks, OS, SW Eng/Languages, Security, ...
– Medium of instruction for SW Eng. courses not tracking languages/tools/techniques actually in use
– Students learn bad practices by osmosis so they can create Web apps
• New: languages & tools are actually good now– Ruby, Python, etc. are tasteful and allow reinforcing
important CS concepts (higher-order programming, closures, etc.)
– order-of-magnitude greater productivity than 1 generation ago, including for testing
86
Team Skills
• Web 2.0 apps increasingly composed of loosely coupled teams doing DADO
• Technical as well as “social” team skills needed– repository management– branching, tagging, merging– distributing responsibility during collaboration
• Web 2.0 SaaS == Great fit for ugrad education– Apps can be developed/deployed on semester
timescale– Rapid gratification => projects outlive the course– Team skills in context of agile development
87
SaaS Using RoR at Cal:Course Goals
• What’s different about DADO for SaaS– Basic *ilities: Horizontal scaling, load balancing, H/A– Consistency, caching, database scaling, CAP– Benchmarking, tuning, understanding SLA’s
• How CS “big ideas” make RoR high productivity– H.O. programming, metaprogramming, introspection =>
ActiveRecord ORM– runtime code generation => AJAX support
• Major Vehicle: DADO an app of your choice, in teams of 2-3; deploy to public cloud– zero to prototype in ~6 weeks– assume OOP skills, but no DB or web programming
88
Comparison to other SW Eng./programming courses
• Open-ended project– vs. “fill in blanks” programming
• Focus on SaaS– vs. Android, Java desktop apps, etc.
• Focus on RoR as high-level framework• Projects expected to work
– vs. working pieces but no artifact– most projects actually do work, some continue life
outside class
• Focus on how “big ideas” in languages/programming enable high productivity
89
Topic coverage & labs
• “Hello World” web app in Rails• Unit-test-driven design of a specified module• User-story-driven design of an app (work in
teams of 2 or 3 students)• Deploy own app to Amazon EC2• Use Cloudstone benchmark app to saturate
MySQL database (using EC2)• Experiment with different types of caching to
observe effect on database saturation• Final demo: publicly-deployed app, short talk
90
Web 2.0 SaaS as Course Driver
• Majority of students: ability to design own app was key to appeal of the course– design things they or their peers would use
• High productivity frameworks => projects work– actual gratification from using CS skills, vs. getting N
complex pieces of Java code to work but not integrate
• Fast-paced semester is good fit for agile iteration-based design
• Tools used are same as in industry
91
Cloud Computing as a Supporting Technology
• Elasticity is great for courses!– Donation from AWS; ~$100/student– Watch a database fall over: ~200 servers needed – Lab deadlines, final project demos
• VM image simplifies courseware distribution– Prepare image ahead of time– Students can be root if need to install weird SW, libs...
• students get better hardware – cost associativity– cloud provider updates HW more frequently
• VM images compatible with Eucalyptus—enables hybrid cloud computing
92
Moving to cloud computing
What Before After
Compute servers 4 nodes of R cluster EC2
Storage local Thumper S3, EBS
Authentication login per student, MySQL username/tables per student, ssh key for SVN per student
EC2 keypair + Google account
Database Berkeley ITS shared MySQL
MySQL on EC2
Version control local SVN repository Google Code SVN
Horizontal scaling ??? EC2 + haproxy/nginx
Software stack management
burden Jon Kuroda create AMI
93
Success stories
94
Success stories, cont.
• Fall 2009 project: matching undergrads to research opportunities
• Fall 2009 project: Web 2.0 AJAXy course scheduler with links to professor reviews
• Spring 2010 projects: apps to stress RAD Lab infrastructure– gRADit: vocabulary review as a game– RADish: comment filtering taken to a whole new level
95
SaaS Courses at Cal
Lower div.
Upper div.
Grad.
Understand Web 2.0 app structure ✔
Understand high-level abstraction toolkits like RoR
✔ ✔
How high-level abstractions implemented
✔ ✔
Scaling/operational challenges of SaaS
✔ ✔
Develop & deploy SaaS app✔ ✔
Implement new abstractions, languages, or analysis for SaaS
✔96
Planning a SaaS course?
• Pick a highly-productive framework– Projects can be deployed, and will actually work– Students can use production-quality tools & methods– We used Ruby on Rails; Google AppEngine probably also a good
choice
• Avail yourself of *-as-a-service– Google Code for Subversion version control– PivotalTracker for project tracking– EC2 for app deployment (Amazon is very good about donating AWS
credits for education)
• Tie high-productivity mechanisms back to CS “big ideas”– Code generation, introspection/reflection, metaprogramming, higher
order programming
• Steal our materials (http://radlab.cs.berkeley.edu)
97
Summary: Education
• Web 2.0 SaaS is a great motivator for teaching software skills– students get to build artifacts they themselves use– some projects continue after course is over– opportunity to (re-)introduce “big ideas” in software
development/architecture
• Cloud computing is great fit for CS courses– elasticity around project deadlines– easier administration of courseware– students can take work product with them after
course (e.g. use of Eucalyptus in RAD Lab)
98
UC Berkeley
WEB 2.0 RESEARCH
99
What’s New in Web 2.0
• Very large structured data storage that scales elastically with app
• Understanding & generating large spikes• Operational problems: finding the “needle in the
haystack”• Renewed focus on client side challenges
(JavaScript, client security, browser performance)
• Cloud Computing enables large scale and elasticity
100
Cloud Computing
• Cost associativity makes it possible to obtain results on 100’s or 1000’s of servers– Console log mining– BOOM (declarative cloud programming)– SCADS (SIGMOD 2010 demo)
• Eucalyptus makes hybrid cloud computing reasonably practical– Run small experiments locally, then “scale up” to
cloud for paper results
• Why aren’t you using cloud computing yet?
101
Example: Facebook
• Facebook has 2 datacenters, 1 per coast– reads spread across both– writes only to W. Coast; periodically (~10 minutes)
replicated to E. Coast– >2000 MySQL servers, >25TB RAM for memcached
• Challenge: inconsistency due to stale data– I change status message => Friends on East Coast
datacenter don’t see change for 10 min– What if E.Coast person changes own status??
102
Web at 100 feet: georeplication & CDN’s
Source: “How Facebook Works”,Technology Review, Jul/Aug 2008
103
SCADS: Scalable, Consistency-Adjustable Data Storage
• Most popular websites follow the same pattern– Outgrow initial prototype (on MySQL) due to scale – Build large, complicated ad-hoc systems to deal with
scaling limitations as they arise
• Want Scale Independence as new users join:– No changes to application– Cost per user & request latency don’t increase
• Key Innovations1. Performance-{safe,insightful} query language2. Declarative performance/consistency tradeoffs3. Automatic scale up & down using machine learning
M. Armbrust et al., SCADS: Scalable Consistency-Adjustable Data Storage for Interactive Applications. Proc. CIDR 2009M. Armbrust et al., PIQL: A Performance-Insightful Query Language. Proc. SOCC 2010.
UC Berkeley
UC Berkeley