Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
5/12/16
1
CanadianBioinforma1csWorkshops
www.bioinforma1cs.ca
2 Module #: Title of Module
6/14/16
1
Introduc.ontocloudcompu.ng
MalachiGriffith,ObiGriffith,FrancisOulle?e
bioinformatics.ca RNAsequencingandanalysis
Learningobjec6vesofthecourse
• Module0:Introduc6ontocloudcompu6ng• Module1:Introduc.ontoRNASequencing• Module2:AlignmentandVisualiza.on• Module3:ExpressionandDifferen.alExpression• Module4:IsoformDiscoveryandAlterna.veExpression• Tutorials
– UsetheAWSEC2consoletosetupanEC2instance– Logintoinstancefromcommandline
6/14/16
2
bioinformatics.ca RNAsequencingandanalysis
Learningobjec6vesofmodule0
• Introduc.ontocloudcompu.ngconcepts• Introduc.ontocloudcompu.ngproviders• UsetheAmazonEC2consoletocreateaninstanceforeachstudent– Willbeusedformanyhands-ontutorialsthroughoutthecourse
• Howtologintoyourcloudinstance
bioinformatics.ca RNAsequencingandanalysis1990 1992 1994 1996 1998 2000 2003 2004 2006 2008 2010 20120
1
10
100
1,000
10,000
100,000
1,000,000
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
Disk Storage (Mbytes/$)
DNASequencing(bp/$)
Harddiskstorage(MB/$)Doubling*me=14mo
Pre-nextgensequencing(bp/$)Doubling*me=19mo
Nextgensequencing(bp/$)Doubling*me=4mo0
DiskCapacityvsSequencingCapacity,1990-2012
6/14/16
3
bioinformatics.ca RNAsequencingandanalysis
AboutDNAandcomputers
• We'llhitthe$1000genomeduring2015-?,thenneedtothinkaboutthe$100genome.
• Thedoubling.meofsequencinghasbeen~5-6months.• Thedoubling.meofstorageandnetworkbandwidthis
~12months.• Thedoubling.meofCPUspeedis~18months.• Thecostofsequencingabasepairwilleventuallyequal
thecostofstoringabasepair
bioinformatics.ca RNAsequencingandanalysis
Whatisthegeneralbiomedicalscien6sttodo?
• Lotsofdata• PoorITinfrastructureinmanylabs• Wheredotheygo?• Writemoregrants?• Getbiggerhardware?
6/14/16
4
bioinformatics.ca RNAsequencingandanalysis
Cloudcompu6ngproviders
• AmazonAWS– h?ps://aws.amazon.com/
• Googlecloud– h?ps://cloud.google.com/
• Digitalocean– h?ps://www.digitalocean.com/
• OthersIhavenottried:– MicrosofAzure(h?ps://azure.microsof.com/en-us/)– Rackspacecloud(h?p://www.rackspace.com/cloud)
bioinformatics.ca RNAsequencingandanalysis
AmazonWebServices(AWS)
• Infinitestorage(scalable):S3(simplestorageservice)• Computeperhour:EC2(elas.ccloudcompu.ng)• ReadywhenyouareHighPerformanceCompu.ng• Mul.plefootballfieldsofHPCthroughouttheworld• HPCareexpandedatonecontainerata.me:
6/14/16
5
bioinformatics.ca RNAsequencingandanalysis
Someofthechallengesofcloudcompu6ng:
• Notcheap!• Geingfilestoandfromthere• Notthebestsolu.onforeverybody• Standardiza.on• PHI:personalhealthinforma.on&securityconcerns• IntheUSA:HIPAAact,PSQIAact,HITECHact,Patriotact,CLIAandCAPprograms,etc.– h?p://www.biostars.org/p/70204/
bioinformatics.ca RNAsequencingandanalysis
Someoftheadvantagesofcloudcompu6ng:
• WereceivedagrantfromAmazon,sosupportedby‘AWSinEduca.ongrantaward’.
• Therearebe?erwaysoftransferringlargefiles,andnowAWSmakesitfreetouploadfiles.
• AnumberofdatasetsexistonAWS(e.g.1000genomedata).
• Manyusefulbioinforma.csAMI’s(AmazonMachineImages)existonAWS:e.g.cloudbiolinux&CloudMan(Galaxy)–nowoneforthiscourse!
• Manyflavorsofcloudavailable,notjustAWS
6/14/16
6
bioinformatics.ca RNAsequencingandanalysis
Inthisworkshop:• Sometools(data)are
• onyourcomputer• ontheweb• onthecloud.
• Youwillbecomeefficientattraversingthesevariousspaces,andfindingresourcesyouneed,andusingwhatisbestforyou.
• Therearedifferentwaysofusingthecloud:1. Commandline(likeyourownverypowerfulUnixbox)2. Withaweb-browser(e.g.Galaxy):notinthisworkshop
bioinformatics.ca RNAsequencingandanalysis
Thingswehavesetup:
• Loadeddatafilestoanfpserver• WebroughtupanUbuntu(Linux)instance,andloadedawholebunchofsofwareforNGSanalysis.
• Wethenclonedthis,andmadeseparateinstancesforeverybodyintheclass.
• We’vesimplifiedthesecurity:youbasicallyallhavethesameloginandfileaccess,andopenedports.Inyourownworldyouwouldbemoresecure.
6/14/16
7
bioinformatics.ca RNAsequencingandanalysis
AmazonAWSdocumenta6on
h?ps://github.com/griffithlab/rnaseq_tutorial/wiki/Intro-to-AWS-Cloud-Compu.ng
h?p://aws.amazon.com/console/
bioinformatics.ca RNAsequencingandanalysis
LoggingintoAmazonAWS
6/14/16
8
bioinformatics.ca RNAsequencingandanalysis
LogintoAWSconsole
https://364840684323.signin.aws.amazon.com/console
bioinformatics.ca RNAsequencingandanalysis
Select"EC2"service
Make sure you are in Oregon region
6/14/16
9
bioinformatics.ca RNAsequencingandanalysis
LaunchanewInstance
bioinformatics.ca RNAsequencingandanalysis
ChooseanAMI–FindtheCSHLSEQTEC2015AMIintheCommunityAMIs
Search for: cshl_seqtec_2015_v3 - ami-58031239 (US West - Oregon)
6/14/16
10
bioinformatics.ca RNAsequencingandanalysis
Choose”m4.2xlarge"instancetype,then"Next:ConfigureInstanceDetails".
bioinformatics.ca RNAsequencingandanalysis
Select"Protectagainstaccidentaltermina6on",then"Next:AddStorage".
6/14/16
11
bioinformatics.ca RNAsequencingandanalysis
Youshouldsee"snap-xxxxxxx"(32GB)and"snap-xxxxxxx"(500GB)asthetwostoragevolumesselected.Then,"Next:TagInstance"
bioinformatics.ca RNAsequencingandanalysis
Createataglike“Name=ObiGriffith”[useyourownname].Thenhit"Next:ConfigureSecurityGroup".
Important: Don’t forget to name your instance
6/14/16
12
bioinformatics.ca RNAsequencingandanalysis
SelectanExis6ngSecurityGroup,choose"SSH_HTTP_8081_IN_ALL_OUT".Thenhit"ReviewandLaunch".
bioinformatics.ca RNAsequencingandanalysis
Reviewthedetailsofyourinstance,notethewarnings,thenhitLaunch
6/14/16
13
bioinformatics.ca RNAsequencingandanalysis
Chooseanexis6ngkeypair:"CBW"andthenLaunch.
bioinformatics.ca RNAsequencingandanalysis
ViewInstancestoseeyournewinstancespinningup!
6/14/16
14
bioinformatics.ca RNAsequencingandanalysis
FindYOURinstance,selectit,andthenhitconnectforinstruc6onsonhowtoconnect
bioinformatics.ca RNAsequencingandanalysis
TakenoteofyourIPaddressandtheinstruc6onsonchangingpermissionsforthekeyfile(Note,wewilllogin
asubuntuNOTroot)
6/14/16
15
bioinformatics.ca RNAsequencingandanalysis
Openinga‘terminalsession’onaMac
In a Finder window ‘Applications’ -> ‘Utilities’ -> ‘Terminal’
Or on your dock
bioinformatics.ca RNAsequencingandanalysis
AddtheterminalApptoyourdock
6/14/16
16
bioinformatics.ca RNAsequencingandanalysis
Crea6ngaworkingdirectoryonyourMaccalled‘cbw’
bioinformatics.ca RNAsequencingandanalysis
OnMac:Control+
SaveLinkAs
ObtainyourAWS‘key’filefromcoursewiki
Save key file to your new ‘cbw’ directory
Go to course wiki, “Presentations” page
6/14/16
17
bioinformatics.ca RNAsequencingandanalysis
Viewingthe‘key’fileoncedownloaded
bioinformatics.ca RNAsequencingandanalysis
ls-l(longlis.ng)drwx------+67ogriffitstaff227822May21:25../-rw-r--r--@1ogriffitstaff169622May21:31CBW.pemrwx:ownerrwx:grouprwx:worldrread(4)wwrite(2)xexecute(1)Whicheverwayyouaddthese3numbers,youknowwhichintegerswereused(6isalways4+2,5is4+1,4isbyitself,0isnoneofthemetc…)So,whenyouhave:chmod400<filename>Itis“r”forthethefileowneronly
Changingfilepermissionsofyour‘key’file(Mac/Linux)
6/14/16
18
bioinformatics.ca RNAsequencingandanalysis
Loggingintoyourinstance
Mac/Linux
cd cbw/ chmod 400 CBW.pem ssh -i CBW.pem ubuntu@[YOUR INSTANCE IP ADDRESS]
bioinformatics.ca RNAsequencingandanalysis
CopyingfilesfromAWStoyourcomputer(usingawebbrowser)
http://[YOUR INSTANCE IP ADDRESS]/
6/14/16
19
bioinformatics.ca RNAsequencingandanalysis
Loggingoutofyourinstance
Mac/Linux – simply type exit
exit
Note, this disconnects the terminal session (ssh connection) to your cloud instance. But, your cloud instance is still running! See next slide for how to stop your instance.
bioinformatics.ca RNAsequencingandanalysis
Whenyouaredoneforthedayyoucan“Stop”yourinstance–Don’tTerminate!
Go to AWS EC2 Dashboard, select “Instances” tab, then find your instance. Right-click and chose
‘Instance State’ -> ‘Stop’
6/14/16
20
bioinformatics.ca RNAsequencingandanalysis
Nextmorning,youcan“Start”yourinstanceagain
Go to AWS EC2 Dashboard, select “Instances” tab, then find your instance. Right-click and chose
‘Instance State’ -> ‘Start’
bioinformatics.ca RNAsequencingandanalysis
WhenyourestartyourinstanceyouwillneedtofindyournewIPaddress.Selectyourinstanceand“Connect”orlookinDescrip6ontab.Thengobacktoinstruc6onsfor“Loggingintoyourinstance”
6/14/16
21
bioinformatics.ca RNAsequencingandanalysis
So,atthispoint:
• YourMacisreadyfortheworkshop• Ifitisnot,youknowwheretogettheinforma.onyouneed
• YouknowhowtologintoAWS• ThenextstepistologintoyourlinuxmachineonAWSandlearnthebasicsofalinuxcommandline