10
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant ([email protected]) Bio Computing & iPlant Collaborative Eric Lyons ([email protected]) Plant Sciences & iPlant Collaborative University of Arizona http://goo.gl/ p4j3m or https://sites.google.com/site/appliedciconcepts/ Will Computers Crash Genomics? Science Vol 331 Feb 2011

1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant ([email protected]) Bio Computing & iPlant Collaborative Eric Lyons

Embed Size (px)

Citation preview

Page 1: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

1

Applied CyberInfrastructure ConceptsISTA 420/520 Fall 2015

1

Nirav Merchant ([email protected])Bio Computing & iPlant CollaborativeEric Lyons ([email protected])Plant Sciences & iPlant CollaborativeUniversity of Arizonahttp://goo.gl/p4j3m or https://sites.google.com/site/appliedciconcepts/

Will Computers Crash Genomics? Science Vol 331 Feb 2011

Page 2: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

Tasks for todayLog into shell.u.arizona.edu (ssh) also learn how

to transfer files old school way Shell what is it good for ?Navigating in the shellWorking with GNU core utilsData analysis on the command lineBuilding your Big Data tool kit

Page 3: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

LINUX fundamentalsSsh to shell.arizona.edu Directory structurePermissionsListing, > , < , | and use of ‘ and “ and ;

Page 4: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

ssh keys and managing them Quick Intro to keys (public, private) Where will you use these keys ? Lets create keys to allow easier login to shell.u Use linux/mac tutorial

https://www.digitalocean.com/community/tutorials/how-to-set-up-ssh-keys--2

Windows users visit putty:– https://www.digitalocean.com/community/tutorials/ho

w-to-create-ssh-keys-with-putty-to-connect-to-a-vps

– https://www.digitalocean.com/community/tutorials/how-to-use-pageant-to-streamline-ssh-key-authentication-with-putty

Page 5: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

Process management

Use of &bf, fgKill nice renice detaching and why you need tmux (or screen)

Page 6: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

GNU Core utils http://www.gnu.org/software/coreutils/manual

/html_node/index.html#toc_Introduction

Commands we will work with cat: Concatenate and write files tac: Concatenate and write files in reverse nl: Number lines and write files head: Output the first part of files tail: Output the last part of files split: Split a file into pieces. csplit: Split a file into context-determined pieces 6.1 wc: Print newline, word, and byte counts 6.2 sum: Print checksum and block counts 6.3 cksum: Print CRC checksum and byte counts 6.4 md5sum: Print or check MD5 digests

Page 7: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

Hands onhttp://blog.comsysto.com/2013/04/25/data-anal

ysis-with-the-unix-shell/

How are you going to get the data from git ?What is missing in this data set ? (how to fix ?)Do you have access to gnuplot ?Make a plot described in this exercise – Save the plot as pdf output – How are you going to view the pdf– Run this without interactive prompts i.e straight from

command line

Page 8: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

Preview pieces of toolbox

• http://datascienceatthecommandline.com/• We will work though Step 5 and go straight to

commands • We will work with csvkit today– http://csvkit.readthedocs.org/– Download the sample data set from city country– Install pip and then install csvkit– Explore the multiple commands

Page 9: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons
Page 10: 1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2015 1 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons

Next class

Please practice your command line skillsGet a github account