Upload
reginald-poole
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Tutorial for Web Mining Project
-cloud computing platform
Introduction
In mis510 project, your team is required to create a web business, with a complete web site and business functionalities for specific customers, using either Google App Engine or Amazon EC2 platform.
Since Google App Engine and Amazon EC2
have distinct interfaces, service features and pricing policies, this tutorial gives instructions of how to use these platforms respectively.
Google App Engine Tutorial
Written by Jonathan Jiang
updated by Julian Guo
Overview
A cloud platform for publishing web application .
Simple, web-based application management console.
Developers can focus on application logic, no need to worry about hardware ,system administration, scalability etc.
Support Java, Python, and Go.
Guideline
0. Preparation 1. Create a Google Web Application
Project 2. Debug, Run and Deploy 3. Interaction with User 4. Use Cloud Database 5. Pricing
0.Preparation
0.1. Sign up a Google App Engine account: https://appengine.google.com/start
0.2. Download App Engine SDK http://code.google.com/appengine/downloa
ds.html 0.3. For Java/Eclipse users, it is
recommended to download Eclipse Plugins to build, debug and deploy your application. http://code.google.com/eclipse/docs/downlo
ad.html
0.1. Sign up a Google App Engine account:
You need to login to your gmail account to see this page. Sometimes, your [email protected] account does not work. If so, sign up a new one.
This is your application ID, write it down.
Steps 0.2 and 0.3
Steps 0.2 and 0.3 can be combined in Eclipse: Help->Install New
Software Type
https://dl.google.com/eclipse/plugin/3.7 in the “Work with” and press Enter.
Then choose the required packages and download them.
These are required. Others are optional
1.Create a Google Web Application Project
1.1 Create a New Project
Now you should be abele to create a Google AppEngine project in Eclipse
New->Web Application Project
Type the project name and package you like, then choose the Google SDKs you want to use. Typically you only need ‘Use Google App Engine’for your SDK.
1.2 File Structure of the Web Application
src/ includes all source files for your application.
Java source codes
META-INF/ includes other configuration files
WEB-INF/ includes used libraries, compiled classes and configuration files.
Images, data, HTML and JSP files are put directly under /war folder.
war/ includes all the files that are deployed and actually used on the server.
1.2 File Structure of the Web Application
In WEB-INF folder, there are two configuration files.
o appengine-web.xmlo web.xml
The first five lines of appengine-web.xml looks like
Don’t forget to add your registered application ID between <appliction> tags.
Web.xml is SUPER IMPORTANT. It is mainly responsible for mapping URIs to your servlet classes and web pages (Examples are provided later.)
<?xml version="1.0" encoding="utf-8"?><appengine-web-app xmlns="http://appengine.google.com/ns/1.0"> <application>your application ID</application> <version>1</version></appengine-web-app>
2.Debug, Run and Deploy the Web application
2.1.Debug and Run
Eclipse plugin has already created a Hello World example for you. You can directly run your project and test if it works. Right click on the project folder-> Debug As Web
Application. In Debug mode, Google App Engine will
create a server on your local machine, and your project will run on that local server. If it is running successfully, the console will display a line like:
If you use Eclipse, the server is running at http://localhost:8888/ You can open a web browser and paste the link above to test
you project.
2.1.Debug and Run
When the server is running in debug mode, any changes to your project files should be automatically detected by Google App Engine, so you don’t have to rebuild the project (but still you need to refresh the browser to see the changes). *Don’t over-trust this statement. When you always encounter the
same error, it is very likely that just rebuilding the project will help you out.
An exception is web.xml. If you make changes to it, you must rebuild your project.
2.2.Deploy
When you are satisfied with your application, you can deploy it to the cloud environment Google provides so that users all over the world have access to it.
Simply click the ‘Deploy’icon, and enter your account information for the AppEngine Account.
Now you can visit your application athttp://your-applicationID.appspot.com
3. Interaction with User
3. Interaction with User
Often, you want your application not only to present static information, but also to interact with users.
Your system needs to pass user inputs from web pages to your Java or Python program.
Here we provide a JSP/Java example of a movie related web mining application. This example returns movie’s plot based on the movie name given by users.
Interface
Web Mining
Component (Server
Side Logic)Output
User Input
Your ApplicationWeb Pages/
API
3.1 Receive User Input
Create form_input.jsp, add the following lines between the <body> </body> tags.
When the user visits form_input.jsp. It will show a field for input:
You want to pass the input to your Java Servlet application (your background program), say, SampleServlet.java
Input a movie name here:<form action="/processinput" method="post"> <div><input type="textarea" name="moviename" rows="3" cols="60"></div><div><input type="submit" value="Submit" /></div></form>
3.1 Receive User Input
You need to configure web.xml to let the system know how to map the form submission URI to the appropriate Java class. The following example shows such a mapping: http://your application ID.appspot.com/processinput
SampleServlet.class
<?xml version="1.0" encoding="utf-8"?><web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xmlns="http://java.sun.com/xml/ns/javaee"xmlns:web="http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"xsi:schemaLocation="http://java.sun.com/xml/ns/javaeehttp://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5"><servlet><servlet-name>Sample</servlet-name><servlet-class>mis510.SampleServlet</servlet-class></servlet><servlet-mapping><servlet-name>Sample</servlet-name><url-pattern>/processinput</url-pattern></servlet-mapping><welcome-file-list><welcome-file>index.html</welcome-file></welcome-file-list></web-app>
3.2 Process Use Input
Copy the following code to SampleServlet.java Use req.getParameter() method to obtain the user input (movie
name) and process it in SampleServlet.java. An external API is used to retrieve the movie’s plot from web.
package mis510;import java.io.IOException;import javax.servlet.ServletException;import javax.servlet.http.*;import myUtility.IMDB_Handler;public class SampleServlet extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
String movieName = req.getParameter("moviename"); //the following code retrieves the input movie’s plot using an external web API IMDB_Handler imdbAPI=new IMDB_Handler();
try{ String movieID=imdbAPI.convert(movieName); //convert movie name to its IMDB ID. String result=imdbAPI.getPlot(movieID); //get the movie plot
req.setAttribute("result", result);// the resulting movie plot is saved in a variable “result”
req.getRequestDispatcher("form_input.jsp").forward(req, resp);} //return the user back to original page
catch (Exception e) {e.printStackTrace();} }}
3.2 Process Use Input
Here’s a snippet of the API use code. The complete sample code is given in ‘samplecode.rar’.
public class IMDB_Handler {static private String EndPoint="http://imdbapi.com/";public IMDB_Handler(){}
/*public static String convert(String moviename){} */public String getPlot(String movieID){
String RESTurl,plot; RESTurl=EndPoint+"?id="+movieID; try{ HTTPProxy handler=new HTTPProxy(); String content=handler.GetContent(RESTurl); JSONObject jobj=new JSONObject(content); plot=jobj.getString("plot_simple");} catch(Exception e) {plot=“The API server is currently down.";} return plot; }}
3.3 Return the Output to User
Now you can display the results to user by adding a line to the designated jsp page. In this example, we use the same jsp page as user input. Now the form_input.jsp should look like:<body>
Input a movie name here: <form action="/processinput" method="post"> <div><input type="textarea" name="moviename" rows="3" cols="60"></div> <div><input type="submit" value="Submit" /></div> </form>
<%=request.getAttribute("result") %> <%--add this line to display the value in "result“--%>
</body> Try it in http://localhost:8888/form_input.jsp
4. Use Cloud Database
4. Use Cloud Database
Situations where using cloud database may help: Remember user activities. Store the results of web mining process to
speed up next inquiry. Upload a large file which is a component of
your application. ….
In next slides we show an example of using Google Datastore to save and retrieve users’ comments for movies.
4. Use Cloud Database
Updating the form_input.jsp to receive user comments:
<form action="/processinput" method="post"> Input Movie Name Here: <div><input type="text" name="moviename" rows="3" cols="60"></div>Input Your Name Here: <div><input type="text" name="username" rows="3" cols="60"></div> Type Your Comment Here: <div><textarea name="comment" rows="3" cols="60"></textarea></div> <div><input type="submit" value="Submit" /></div> </form>
Movie Plot: <br><%=request.getAttribute("result") %>
4.1 Google Datastore
4.1.1 Store Comments Add this component to SampleServelet.java (For complete sample, please refer to samplecode.rar)
//Store the user comments: Key movieKey = KeyFactory.createKey("MovieComment", movieName); String content = req.getParameter("comment"); String username = req.getParameter("username"); Date date = new Date(); Entity comment = new Entity("Comment", movieKey); comment.setProperty("user", username); comment.setProperty("date", date); comment.setProperty("content", content);
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); datastore.put(comment);
try {req.getRequestDispatcher("form_input.jsp?moviename="+movieName).forward(req, resp);} catch (ServletException e) {e.printStackTrace();} catch (IOException e) {e.printStackTrace();}
4.1 Google Datastore
4.1.1 Store Comments Modify SampleServelet.java as: (For complete sample, please refer to samplecode.rar)
package mis510;import java.io.IOException;import java.util.Date;import javax.servlet.ServletException;import javax.servlet.http.*;import com.google.appengine.api.datastore.DatastoreService;import com.google.appengine.api.datastore.DatastoreServiceFactory;import com.google.appengine.api.datastore.Entity;import com.google.appengine.api.datastore.Key;import com.google.appengine.api.datastore.KeyFactory;import myUtility.IMDB_Handler;public class SampleServlet extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {String movieName = req.getParameter("moviename"); IMDB_Handler imdbAPI=new IMDB_Handler();try{ String movieID=imdbAPI.convert(movieName); //convert movie name to its IMDB ID. String result=imdbAPI.getPlot(movieID); //get the movie plot req.setAttribute("result", result);// the resulting movie plot is saved in a variable “result”
4.1 Google Datastore
4.1.1 Store Comments Modify SampleServelet.java as (cont’d): (For complete sample, please refer to samplecode.rar)
//Store the user comments: Key movieKey = KeyFactory.createKey("MovieComment", movieName); String content = req.getParameter("comment"); String username = req.getParameter("username"); Date date = new Date(); Entity comment = new Entity("Comment", movieKey); comment.setProperty("user", username); comment.setProperty("date", date); comment.setProperty("content", content);
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); datastore.put(comment);
req.getRequestDispatcher("form_input.jsp?moviename="+movieName).forward(req, resp);}catch (Exception e) {e.printStackTrace();} }}
4.1 Google Datastore
4.1.1 Retrieve Comments Add this component to form_input.jsp, before <html> (For complete sample, please refer to samplecode.rar)
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><%@ page import="java.util.List" %><%@ page import="com.google.appengine.api.datastore.DatastoreServiceFactory" %><%@ page import="com.google.appengine.api.datastore.DatastoreService" %><%@ page import="com.google.appengine.api.datastore.Query" %><%@ page import="com.google.appengine.api.datastore.Entity" %><%@ page import="com.google.appengine.api.datastore.FetchOptions" %><%@ page import="com.google.appengine.api.datastore.Key" %><%@ page import="com.google.appengine.api.datastore.KeyFactory" %>
4.1 Google Datastore
4.1.1 Retrieve Comments Add this component to form_input.jsp, inside <body> and
</body>. (For complete sample, please refer to samplecode.rar)
<% String movieName = request.getParameter("moviename");if(movieName==null)movieName="default"; DatastoreService datastore = DatastoreServiceFactory.getDatastoreService(); Key movieKey = KeyFactory.createKey("MovieComment", movieName);
Query query = new Query("Comment", movieKey).addSort("date", Query.SortDirection.DESCENDING); List<Entity> comments = datastore.prepare(query).asList(FetchOptions.Builder.withLimit(5)); if (comments.isEmpty()) { %> <p>The movie '<%= movieName %>' has no comments posted yet.</p> <% }
4.1 Google Datastore
//cont’d
else { %> <p>Comments for movie '<%= movieName %>'.</p> <% for (Entity comment : comments) { if (comment.getProperty("user") == null) { %> <p>An anonymous person wrote:</p> <% } else { %> <p><b><%= comment.getProperty("user") %></b> wrote:</p> <% } %> <blockquote><%= comment.getProperty("content") %></blockquote> <% } }%>
4. Use Cloud Database
Advantages of Google Datastore: Google provides data management capacity for
you. Very Flexible (schemaless) Option to view & manage the data online
Login to Google App Engine:https://appengine.google.com/, choose your application-> Datastore Viewer
Disadvantages: Limit of 1GB free data storage quota, compared
to Amazon EC2(10GB). Only for small data object(entity) in Datastore.
To store larger data, Google Blobstore can be used. http://code.google.com/appengine/docs/java/blobstore/ov
erview.html
5. Cost
Resource Daily Limit
Frontend Instance Hours 28hr
High Replication Datastore Storage
1 GB
Datastore Reads 50k ops
Datastore Writes 50k ops
Outgoing Network Traffic 1 GB
Incoming Network Traffic 1 GB
Google App Engine sets a resource usage quota for free application.
Free Quota for Major Resources
For more details:https://cloud.google.com/products/app-engine/#pricing
5. Pricing
Resource Rate
Frontend Instance Hours $0.08/hour
Datastore Amount $0.18/GB/month
Datastore Reads $0.06/100k read ops
Datastore Writes $0.09/100k write ops
Outgoing Network Traffic $0.12/GB
Incoming Network Traffic Free
Billing Rate for Major Resources
For resource usage exceeding the quota, Google charges at the price rates below.
5. Pricing
Costs vary greatly depending on different resource usage. The following table lists a rough estimation of daily costs for typical apps: App 1 App 2 App 3
Data store 1GB 10GB 10GB
Bandwidth in&out
1GB 1GB 5GB
Cost Free $5/day $15/day
5.Pricing
Suggestions for reducing cost. Login to App Engine Console and set daily budget.
Reduce instance hours Datastore is expensive Debug on your local server most of the time
(completely free!). Deploy the full version of your app only during last weeks of the mis 510.
Applying these suggestions will reduce the cost for projects.
This is the safest way to control your cost, but resource usage exceeding this budget will not be allowed (so your app throw errors.)
Amazon EC2 Tutorial
Written by Julian Guo
Amazon Elastic Compute Cloud (Amazon EC2)
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
simple web service interface complete control of your computing resources fast obtain and boot new server instances quickly scale capacity as your computing
requirements change pay only for capacity that you actually use
Tutorial Guideline
1. Sign up EC2 2. Launch an Instance 3. Connect to Windows Instance 4. Connect to Unix/Linux Instance 5. Application Example 6. Pricing 7. Resources
1. Sign Up EC2
Sign up an Amazon EC2 Account: http://aws.amazon.com/ec2/ If you have an Amazon Shopping Account, just
use this account.
2. Launch an Instance
Sign in AWS Management Console (choose EC2):
http://aws.amazon.com/console/
AWS Management Console
Create and Download a Key Pair
A key pair is a security credential similar to a password, which you use to securely connect to your instance after it's running.
Choose an Amazon Machine Image (AMI)
Amazon Linux Windows Server 2008 with SQL Server Red Hat/Ubuntu/Debian Linux
Just like choosing a virtual machine You can choose 64-bit or 32-bit machines Prices for different machines are different
Configure Firewall (create a security group)
Create rules to get access to instanceFor a windows server, we need HTTP port 80, MS SQL port 1433, Remote Desktop port 3389 and HTTP 8080 (for Tomcat).For Linux, we need SSH to login (to use PuTTY and WinSCP).
3. Connect to Windows Instance
Go to the AWS Management Console and locate the instance on the Instances page.
Right-click the instance and select Get Windows Password.
Use Remote Desktop to login with Decrypted Password
Get an elastic IP (static IP)
Click “Elastic IP” in “Navigation” Click “Allocate New Address” Associate Address to your instance Elastic Address is desirable resource. You should release the address,
if you don’t want to associate it to any instance. Otherwise, Amazon will charge you money!
Login with Elastic IP
Get a Windows Server!
Manage and Control the Server
Stop = Shutdown computerReboot = Restart computerTerminate = throw away your computer!
You can monitor your instance in AWS management console
4. Connect to Unix/Linux Instance
Install PuTTY on your windows machine
Start PuTTYgen (e.g., from the Start menu, click All Programs > PuTTY > PuTTYgen).
Click Load and browse to the location of the private key file that you want to convert (e.g., hello.pem) into hello.ppk.
Save hello.ppk somewhere.
Use PuTTY to connect
Open PuTTY Use Public DNS
as hostname Use root (Red-
Hat), bitnami (Ubuntu), ec2-user (Amazon Linux) as username
Click SSH->Auth to load the.ppk file
Example: Login a Red Hat System
Use WinSCP to connect
Install WinSCP on your windows machine
Use Public DNS as hostname
Use root (Red Hat), bitnami (Ubuntu), ec2-user (Amazon Linux) as username
Load .ppk file (get it from PuTTYgen)
Click login
Example: Login a Red Hat System
5. Application Example (deploy my last year project)
Use a Micro On-Demand Instances Run a Windows Server 2008 with elastic IP SQL Server 2008 R2 is ready Install Firefox, Java JRE, Tomcat 7.0
(server), Eclipse IDE, Dropbox (for data transmission).
Deploy my web application on this server (run a Tomcat server on Eclipse)!
Get access to web application via HTTP port 8080.
Tomcat on Eclipse
6. Pricing
Pay only for what you use. There is no minimum fee.
See Details: http://aws.amazon.com/ec2/pricing/
Estimate your monthly bill using AWS Simple Monthly Calculator.
You might pay for: EC2 instances Elastic IP Data Transfer (In and Out) Amazon EBS Storage
Amazon EC2 Instance Purchasing Options
Amazon EC2 provides customers three different purchasing models that give you the flexibility to optimize your costs.
On-demand Instances: pay for compute capacity by the hour
Reserved Instances: one-time payment (1 year term, 3 year term), cheaper than on-demand instance
Spot instances: bid for unused Amazon EC2 capacity (can be every cheap if having good bidding strategy)
For a typical MIS 510 project, you might pay $20-30 in total. You can prepare codes on local platforms, and just deploy project code on EC2 for 1-2 weeks. To save running hours, you can shut down EC2 in the night.
Notice: Price also varies in different regions
Appendix. Resources
Documentations and tutorials: http://code.google.com/appengine/docs/ http://aws.amazon.com/documentation/ec2/
Google App Engine main page: http://code.google.com/appengine Amazon AWS main page: http://aws.amazon.com/
AL lab’s resource for MIS 510: http://ai.arizona.edu/mis510/