Upload
melva
View
79
Download
0
Embed Size (px)
DESCRIPTION
96-Summer 生物資訊程式設計實習 ( 二 ). Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯. Schedule. Regular expression. File handle. File handle. Reserved file handle File manipulation File test operator File status Localtime. Reserved file handle. STDIN STDOUT STDERR DATA - PowerPoint PPT Presentation
Citation preview
96-Summer生物資訊程式設計實習(二 )
Bioinformatics with Perl
8/13~8/22 蘇中才8/24~8/29 張天豪
8/31 曾宇鳯
ScheduleDate Time Subject Spea
ker
8/13 一
13:30~17:30 Perl Basics 蘇中才
8/15 三
13:30~17:30 Programming Basics 蘇中才
8/17 五
13:30~17:30 Regular expression 蘇中才
8/20 一
13:30~17:30 Retrieving Data from Protein Sequence Database
蘇中才
8/22 三
13:30~17:30 Perl combines with Genbank, BLAST 蘇中才
8/24 五
13:30~17:30 PDB database and structure files 張天豪
8/27 一
8:30~12:30 Extracting ATOM information 張天豪
8/27 一
13:30~17:30 Mapping of Protein Sequence IDs and Structure IDs
張天豪
8/31五 13:30~17:30 Final and Examination 曾宇鳳
Regular expression
File handle
File handle
Reserved file handle File manipulation File test operator File status Localtime
Reserved file handle
STDIN STDOUT STDERR DATA ARGV ARGVOUT
File handle - open
Inputopen SEQ, “seq.txt”;open SEQ, “< seq.txt”;
Outputopen SEQ, “> seq.txt”;
Appended outputopen LOG, “>> log.txt”;
File handle - close
Input/Outputclose SEQ;close LOG;
File handle - die
Error handlingdie “<your error message>”;$! : system error message
Example#!/usr/bin/perl -w#log.pl : write the read-only fileopen LOG, ">> disorder.fa" or die "LOG ERROR:$!\n";# write logclose LOG;
File handle - warn
Warning handlingwarn “<your error message>”;$! : system error message
Example open LOG, “>> disorder.txt” or warn “LOG ERROR:$!”;
File copy#!/usr/bin/perl -w
#copy1.pl : copy data from the input file into the output file
open INPUT, "<disorder.fa" or die "disorder.fa can't be opened\n";
open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n";
my $line;while ( $line = <INPUT> ){ chomp $line; print OUTPUT "$line\n";}close INPUT;close OUTPUT;
File test operators (1/3)Operator Description
-A Returns the access age of OPERAND when the program started.
-b Tests if OPERAND is a block device.
-B
Tests if OPERAND is a binary file. If OPERAND is a file handle,then the current buffer is examined, instead of the file itself.
-c Tests if OPERAND is a character device.
-C Returns the inode change age of OPERAND when the program started.
-d Tests if OPERAND is a directory.
-e Tests if OPERAND exists.
-f
Tests if OPERAND is a regular file as opposed to a directory,symbolic link or other type of file.
File test operators (2/3)
Operator Description
-g Tests if OPERAND has the setgid bit set.
-k Tests if OPERAND has the sticky bit set.
-l
Tests if OPERAND is a symbolic link. Under DOS,this operator always will return false.
-M Returns the age of OPERAND in days when the program started.
-o
Tests if OPERAND is owned by the effective uid.Under DOS, it always returns true.
-O
Tests if OPERAND is owned by the read uid/gid.Under DOS, it always returns true.
-p Tests if OPERAND is a named pipe.
-r Tests if OPERAND can be read from.
File test operators (3/3)Operator Description
-R
Tests if OPERAND can be read from by the real uid/gid.Under DOS, it is identical to -r.
-s
Returns the size of OPERAND in bytes.Therefore, it returns true if OPERAND is non-zero.
-S Tests if OPERAND is a socket.-t Tests if OPERAND is opened to a tty.-T
Tests if OPERAND is a text file. If OPERAND is a file handle,then the current buffer is examined, instead of the file itself.
-u Tests if OPERAND has the setuid bit set.-w Tests if OPERAND can be written to.-W Tests if OPERAND can be written to by the real uid/gid.
Under DOS, it is identical to -w.-x Tests if OPERAND can be executed.-X
Tests if OPERAND can be executed by the real uid/gid.Under DOS, it is identical to -x.
-z Tests if OPERAND size is zero.
File copy +
#!/usr/bin/perl -w
#copy2.pl : copy data from the input file into the output file
if (not -e "disorder1.fa") { die "disorder1.fa isn't existed\n"; print "continue to open disorder1.fa\n";}open INPUT, "<disorder1.fa" or die "disorder1.fa can't be opened\n";if (-e "temp.fa") { warn "temp.fa is existed\n"; print "continue to write temp.fa\n";}open OUTPUT, ">temp.fa" or die "temp.fa can't be created\n";my $line;while ( $line = <INPUT> ){ chomp $line; print OUTPUT "$line\n";}close OUTPUT;close INPUT;
Exercise
File handle
File size Get the size of a file
my $size = -s “disorder.fa”;
Check file size if ( -s “disorder.fa” > 5*1024) { … }
if ($size=-s “disorder.fa” > 5*1024) { print “disorder.fa has $size bytes\n”;}
What’s the value of $size ? Why ?
Exercise – linenumber.pl Input (disorder.fa)>GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces cere
visiae (Baker's yeast).MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQDTPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW...EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSLGDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFTDALGIDEYGG
Output 1 >GCN4_YEAST (P03069) General control protein GCN4 - Saccharomyces
cerevisiae (Baker's yeast). 2 MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD 3 TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW ... 128 EHAYSRARTKNNYGSTIEGLLDLPDDDAPEEAGLAAPRLSFLPAGHTRRLSTAPPTDVSL 129 GDELHLDGEDVAMAHADALDDFDLDMLGDGDSPGPGFTPHDSAPYGALDMADFEFEQMFT 130 DALGIDEYGG
Regular expression
File status, localtime
File information - stat0 dev number of filesystem
1 ino inode number
2 mode file mode (type and permissions)
3 nlink number of (hard) links to the file
4 uid numeric user ID of file's owner
5 gid numeric group ID of file's owner
6 rdev the device identifier (special files only)
7 size total size of file, in bytes
8 atime last access time in seconds since the epoch
9 mtime last modify time in seconds since the epoch
10 ctime inode change time in seconds since the epoch (*)
11 blksize preferred block size for file system I/O
12 blocks actual number of blocks allocated
File status#!/usr/bin/perl -w
#stat.pl : show the information of the file
my $fn = shift @ARGV;die "please enter a filename\n" if(not defined($fn));die "$fn isn't existed\n" if(not -e $fn);my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);
print "device = $dev\n";print "inode = $ino\n";print "mode = $mode\n";print "node link = $nlink\n";print "user id = $uid\n";print "group id = $gid\n";print "rdev = $rdev\n";print "size = $size\n";print "atime = $atime\n";print "mtime = $mtime\n";print "ctime = $ctime\n";print "block size = $blksize\n";print "blocks = $blocks\n";
Local time#!/usr/bin/perl -w
#localtime1.pl : show the readable time of the file
my $fn = shift @ARGV;die "please enter a filename\n" if (not defined($fn));die "$fn isn't existed\n" if (not -e $fn);my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size, $atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);
my $alocal = localtime $atime;my $mlocal = localtime $mtime;my $clocal = localtime $ctime;print "atime = $alocal\n";print "mtime = $mlocal\n";print "ctime = $clocal\n";
Local time +
#!/usr/bin/perl -w
#localtime2.pl : show the user-defined time of the file
my $fn = shift @ARGV;
die "please enter a filename\n" if (not defined($fn));
die "$fn isn't existed\n" if (not -e $fn);
my ($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks) = stat($fn);
my ($sec,$min,$hour,$day,$mon,$year,$wday,$yday,$isdst) = localtime $mtime;
print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";
Local time
$sec : 0~59 $min : 0~59 $hour : 0~23 $day : 1~31 $mon : 0~11 $year : +1900 $wday : 0 (Sunday) ~ 6 (Saturday) $yday : 0 (Jan 1) ~354 or 355 $isdst: daylight saving time (positive or zero)
Exercise
localtime
Quiz – localtime
my ($sec,$min,$hour,$day,$mon,$year,$wday, $yday,$isdst) = localtime $mtime;
print "mtime = ($year/$mon/$day $hour:$min:$sec ($wday;$yday;$isdst)\n";
mtime = (107/7/2 10:10:16 (4;213;0)
my $mlocal = localtime $mtime;print "mtime = $mlocal\n";mtime = Thu Aug 2 10:10:16 2007
my ($mlocal) = localtime $mtime;
?
Exercise
How to show the time information of disorder.fa like “ 2007/8/2 10:10:16 (Thu) “ ?Hint: year, month and weekday @weekDays = qw(Sun Mon Tue Wed Thu Fri Sat Sun);
How to show the time information of disorder.fa like “Aug 2 2007 10:10:16 (Thu)“ ? @months = qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
Regular expression
Basic
How to search a word in a text file ?
Unix commandgrep
PerlRegular expression
An example of Regular expression#!/usr/bin/perl -w
#google1.pl : check string with/without a certain pattern
while (1){ print "Please enter your query:"; $line = <>; if ($line =~ /google/) { print "Found!!!\n"; } else { print "No match\n"; }}
If we want to find the following words google, g01gle, g12gle, gabgle, …, gxxgle
ggle, gogle, google, gooogle, …, go…ogle
gogle, google, gooogle, …, go…ogle
google, goooogle, goooooogle, …, goo…oogle
ggle, gogle, google, gooogle, …, go…ogle, gagle, gaagle, gaaagle, gbgle, gbbgle, …
Meta-character
Wildcard (.)Except for “\n”
Quantifier? : one character or none* : one character ~ or none+ : one character ~
If we want to find the following words google, g01gle, g12gle, gabgle, …, gxxgle
/g..gle/ ggle, gogle, google, gooogle, …, go…ogle
/go*gle/ gogle, google, gooogle, …, go…ogle
/go+gle/ google, goooogle, goooooogle, …, goo…oogle
/g(oo)+gle/ ggle, gogle, google, gooogle, …, go…ogle, gagl
e, gaagle, gaaagle, gbgle, gbbgle, … /g.*gle/
Character class
[ ] - ^
Examples [abcdefghijklmnopqrstuvwxyz] or [a-z] [0123456789] or [0-9] [abcxyz] [02468] or [^13579] [A-Za-z0-9]
Character class simplicity [\d] : [0-9] [\w] : [A-Za-z0-9_] [\s] : [\f\t\n\r ]
Something you don’t want [\D] : [^\d] [\W] : [^\w] [\S] : [^\s]
How about [\s\S] ?
What’s different between . and [\s\S] ?
Please think …
/google/ /g[\d][\d]gle/ /g..gle/ /g[\w]*gle/ /g.*gle/ /g[\d\D]*gle/ /g……….gle/
Additional quantifiers
| { n, m }
Examples /(google|Google)/ or /(G|g)oogle/ /g……….gle/ or /go{10}gle/ /go{0,100}gle/ /g(oo)+gle/ or /g(oo){1,}gle/
Additional quantifiers
^ : beginning of the string $ : end of the string \b : boundary of a word \B : [^\b]
Examples /^google$/ /\bgoogle\b/
Additional quantifiers
( ) \1, \2, … : backreference
Examples /g(o)\1gle/ /g([\S])\1gle/
Output (matched variable) $1, $2, …
Exercise
Basic regular expression
Exercise
How to extract these words ?
gogle, gooogle, gooooogle, gooooooogle (No ggogles)
g11gle, g33gle, g55gle, g77gle, g99gle (excluding gg99gles)
What do those mean ?
/g[\d]+gle/
/go?gle/
/g([\w])([\w])\2\1gle/
Magic variable - $_
Originalwhile ($line = <>) {
chomp($line);
if ($line =~ /google/) {
print “$line\n”;
}
}
Magicwhile (<>) { chomp; if (/google/) { print “$_\n”; }}
Magic variable - $_#!/usr/bin/perl -w
#google2.pl : check string with/without a certain pattern
print "Please enter your query:";while (<>){ chomp; if (/google/) { print "Found!!!\n"; } else { print "No match\n"; } print "Please enter your query:";}
Regular expression
Flags
Regular Expression
String matching m// or //
String substitutions///
String transliterationtr/// or y///
Matching
Complete syntax m//
Examplesm/google/m/g(oo){0,}gle/
Othersm<google>, m[google], m!google!, …
Flag options /i : case insensitivity /s : let . become [\d\D] /m : multiple lines
Examples google, Google, GOOGLE, gOOGLE, GooGle, …
m/google/i
Matched patterns
$& : the last matched patterns $` : prefix-string of $& $’ : suffix-string of $&
Examples$string = "Microsoft google Yahoo";
$string =~ m/google/i;
print “[$`][$&][$‘]\n";
[Microsoft ][google][ Yahoo]
Matched pattern - $&, $`, $’#!/usr/bin/perl -w#google3.pl : check string with/without a certain patter
nprint "Please enter your query:";while (<>){ chomp; if (m/google/i) { print "Match:[$&]\n"; print "prefix : [$`]\n"; print "suffix : [$']\n"; } else { print "No match\n"; } print "Please enter your query:";}
Substitution
Complete syntax s/// or s###
Examples$string =~ s/google/GOOGLE/s/(google|GOOGLE)/Microsoft/
Otherss#^https://#http://#;
Flag options /i : case insensitivity /s : let . become [\d\D] /g : multiple replacement
Examples s/google/yahoo/sg s/\s+/ /g s/^\s+// s/\s+$// s#^.*/##s
Flag options \U : upper case \L : lower case \E : end-point for case setting \u : upper case for the first word \l : lower case for the first word
Examples s/(GOOGLE)/\L$1/ig s/(\w+) kiss (\w+)/\U$2\E was kissed by $1/i;
Transliteration
Complete syntax tr/// or y///
Examples$string =~ tr/a-z/A-Z/ $string =~ tr/a-z/b-za/$string =~ tr/ATCG/TAGC/$string =~ tr/ATCG/UAGC/
Transliteration - flags
/d : delete$text =~ tr/ //d;
/c : replace unassigned char with a certain char$text =~ tr/[0-9]/*/c;
/s : remove the redundant char$text =~ tr/a-zA-Z//s;
Exercise
Replacement
Exercise
A Ingenious Love letterhttp://love.english.tw/post/43/630Stored in love.txt
How to decode this letter ?Please replace ‘I’ with your namePlease replace “you” with your boy/girl friend’s
namePlease replace “we” with your names.