Upload
gavin-powell
View
238
Download
2
Embed Size (px)
Citation preview
Bioinformatics CourseDay 3
MySQL
Topics
● Databases● MySQL● SQL● Permissions● Usage● Examples
What are databases?
● DBMS (database management systems)● Data storage and provision● Software running on servers● Designed for high-capacity, high-
availability usage● Relational, object-orientated,
hierarchical, network model● Examples:
Oracle, PostgreSQL, MySQL, Sybase, DB2, dBASE, Microsoft SQL Server
What do they do?
● Record storing● Indexing for quick access● Data organization● Data processing
Application areas
● BioPharma● E-Commerce● Education● Energy● Finance
● Government● Media● Retail● Telecom● Transport
Anywhere with large data volumes!
MySQL Customers
● Bayer● Sanger● Ensembl● Google● Yahoo● Ticketmaster● Deutsche Post
● State of New York● UNICEF● Yamaha● Wikipedia● BT● Nokiache Post● Lufthansa
Why MySQL?
● World's most popular open source database ( 8 million active installations and 50,000 downloads per day)
● High-performance● Reliable● Ease of use● Free!
What is SQL?
● Structured Query Language● create, modify, retrieve and manipulate
data● 1970's IBM: Structured English Query
Language ("SEQUEL"), later SQL● simple command set● intuitive
SQL Examples
● Most important: data selectionSELECT name,sequence FROM swissprot WHERE name = 'TLR4_HUMAN';
DELETE FROM blast WHERE expect > 1e-20;
UPDATE installs SET version = 8 WHERE db = 'uniprot';
● Data update:
INSERT INTO BLAST VALUES('TLR4_HUMAN', 'TLR4_PANPA', 1e-104);
● Data insertion:
● Data deletion:
MySQL setup
MySQLServer
MySQLClient
MySQLClient
MySQLClient
MySQLClient
MySQLClient
MySQLClientlocal and
remote access
MySQL accounts
● Administrator: root● Users: kahokamp, guest1
(not necessarily the same as login names)● Passwords: *******
(not necessarily the same as login passwords)
Permissions
● Assigned by administrator● Multiple levels:
– Access– Database usage– Select, Insert, Update, Delete, Drop, ...
● May depend on host
Connection
$ mysql -h localhost -u guest -p
Command line access:
Connection
$ mysql -h localhost -u guest -p
MySQLclient
programserver host user name password
Connection
$ mysql -h localhost -u guest -p Enter password:Welcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 427 to server version: 5.0.18-standard-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql>
Connection
$ mysql -h bioinf.gen.tcd.ie -u guest -p uniprot
Remote command line access:
preselect a database
Connection
use DBI;
$user = 'guest';$host = 'bioinf.gen.tcd.ie';$password = '';$db = 'uniprot';
$dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password);
$statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”;$sth = $dbh->prepare($statement);$rv = $sth->execute;
unless ($rv >= 1) {die “No match!”;
}
($sequence) = $sth->fetchrow_array;
print “$sequence\n”;
Using Perl:
Connection
use DBI;
$user = 'guest';$host = 'bioinf.gen.tcd.ie';$password = '';$db = 'uniprot';
$dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password);
$statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”;$sth = $dbh->prepare($statement);$rv = $sth->execute;
unless ($rv >= 1) {die “No match!”;
}
($sequence) = $sth->fetchrow_array;
print “$sequence\n”;
Using Perl: database connection
module
access details
connection
query
data retrieval
Connection
Using the Web (PHPMyAdmin):
Orientation
mysql> SHOW TABLES;+-------------------+| Tables_in_uniprot |+-------------------+| swissprot |+-------------------+1 row in set (0.00 sec)mysql>
Show what's available:
Orientation
mysql> SHOW DATABASES;+--------------------+| Database |+--------------------+| information_schema || test || uniprot || uniprotKB8 |+--------------------+4 rows in set (0.00 sec)mysql>
What other databases are there?
Orientation
mysql> SHOW DATABASES;+--------------------+| Database |+--------------------+| information_schema || test || uniprot || uniprotKB8 |+--------------------+4 rows in set (0.00 sec)mysql> USE TEST;Database changedmysql>
What other databases are there?
Organization
uniprot test
swissprot test1 test2
test4test3
MySQL Server
databases
tables
Permissions
● Creation of databases:– Normally only by administrator (root)
● Creation of tables:– All users with according permissions
● Special database 'test':– Normally accessible by all users
● Special user 'guest':– Limited access– Empty password
Work flow
Create database
Create of table(s)
Insert data
Query database
Table creation
– Text– Numbers – Dates– Binary data– Sets
Table columns need to be defined!
Column types:
SQL Examples
SELECT name,length FROM swissprot;+-------------+--------+| name | length |+-------------+--------+| 104K_THEAN | 893 || 104K_THEPA | 924 || 108_LYCES | 102 || 10KD_VIGUN | 75 |.........| ZYX_CHICK | 542 || ZYX_HUMAN | 572 || ZYX_MOUSE | 564 |+-------------+--------+222289 rows in set (0.89 sec)
SQL Examples
SELECT name,length FROM swissprot LIMIT 10;+-------------+--------+| name | length |+-------------+--------+| 104K_THEAN | 893 || 104K_THEPA | 924 || 108_LYCES | 102 || 10KD_VIGUN | 75 || 110KD_PLAKN | 296 || 11S2_SESIN | 459 || 11S3_HELAN | 493 || 11SB_CUCMA | 480 || 128UP_DROME | 368 || 12AH_CLOS4 | 29 |+-------------+--------+10 rows in set (0.00 sec)
SQL Examples
SELECT name,length FROM swissprot LIMIT 222279,10;+-------------+--------+| name | length |+-------------+--------+| ZYG12_CAEEL | 774 || ZYG1_CAEBR | 709 || ZYG1_CAEEL | 706 || ZYGBL_HUMAN | 766 || ZYGBL_MOUSE | 779 || ZYGBL_PONPY | 766 || ZYS3_CHLRE | 371 || ZYX_CHICK | 542 || ZYX_HUMAN | 572 || ZYX_MOUSE | 564 |+-------------+--------+10 rows in set (0.60 sec)
SQL Examples
SELECT name,length FROM swissprot ORDER BY length LIMIT 10;+------------+--------+| name | length |+------------+--------+| GWA_SEPOF | 2 || ACI_TRIGI | 3 || GRWM_HUMAN | 3 || LUXE_VIBFI | 3 || TRH_BOMOR | 3 || TRH_NOTVI | 3 || TRH_PIG | 3 || TRH_SHEEP | 3 || ACH1_ACHFU | 4 || DCML_PSECH | 4 |+------------+--------+10 rows in set (0.00 sec)
SQL Examples
SELECT * FROM swissprot WHERE length = 2;+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+| name | accession | version | dataset | created | modified | prot_name | component | type | lineage | tax_id | organism | checksum | length | seq_version | mass | seq_date | sequence | keyword |+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+| GWA_SEPOF | P83570 | 15 | Swiss-Prot | 2004-01-16 | 2006-02-07 | Neuropeptide GWa | | | Eukaryota; Metazoa; Mollusca; Cephalopoda; Coleoidea; Neocoleoidea; Decapodiformes; Sepioidea; Sepiidae; Sepia | 6610 | Sepia officinalis (Common cuttlefish) | 7378100000000000 | 2 | 1 | 261 | 2003-06-01 | GW | Amidation; Direct protein sequencing; Neuropeptide |+-----------+-----------+---------+------------+------------+------------+------------------+-----------+------+----------------------------------------------------------------------------------------------------------------+--------+---------------------------------------+------------------+--------+-------------+------+------------+----------+----------------------------------------------------+1 row in set (0.00 sec)
SQL Examples
SELECT name,sequence,organism,prot_name FROM swissprot WHERE length = 2;+-----------+----------+---------------------------------------+------------------+| name | sequence | organism | prot_name |+-----------+----------+---------------------------------------+------------------+| GWA_SEPOF | GW | Sepia officinalis (Common cuttlefish) | Neuropeptide GWa |+-----------+----------+---------------------------------------+------------------+
1 row in set (0.00 sec)
SQL Examples
SELECT * FROM swissprot WHERE length = 2 \G*************************** 1. row *************************** name: GWA_SEPOF accession: P83570 version: 15 dataset: Swiss-Prot created: 2004-01-16 modified: 2006-02-07 prot_name: Neuropeptide GWa component: type: lineage: Eukaryota; Metazoa; Mollusca; Cephalopoda; ... tax_id: 6610 organism: Sepia officinalis (Common cuttlefish) checksum: 7378100000000000 length: 2seq_version: 1 mass: 261 seq_date: 2003-06-01 sequence: GW keyword: Amidation; Direct protein sequencing; Neuropeptide1 row in set (0.01 sec)
SQL Examples
SELECT name,length FROM swissprot ORDER BY length DESC LIMIT 10;+-------------+--------+| name | length |+-------------+--------+| DIG1_CAEEL | 13100 || SYNE1_HUMAN | 8797 || ANC1_CAEEL | 8545 || UNC89_CAEEL | 8081 || OBSCN_HUMAN | 7968 || LGRC_BREPA | 7756 || BPA1_MOUSE | 7389 || R1AB_CVMJH | 7180 || R1AB_CVMA5 | 7176 || R1AB_CVM2 | 7124 |+-------------+--------+10 rows in set (0.00 sec)
SQL Examples
SELECT name,length FROM swissprot WHERE length < 10;+-------------+--------+| name | length |+-------------+--------+| GWA_SEPOF | 2 || ACI_TRIGI | 3 || GRWM_HUMAN | 3 || LUXE_VIBFI | 3 |.........| UPA7_HUMAN | 9 || XYLA_STRS8 | 9 || YBFR_AZOVI | 9 |+-------------+--------+365 rows in set (0.01 sec)
SQL Examples
SELECT COUNT(*) FROM swissprot WHERE length < 10;+----------+| count(*) |+----------+| 365 |+----------+1 row in set (0.00 sec)
SQL Examples
SELECT DISTINCT length FROM swissprot WHERE length < 10;+--------+| length |+--------+| 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 |+--------+8 rows in set (0.00 sec)
SQL Examples
SELECT length, COUNT(length) FROM swissprot WHERE length < 10 GROUP BY length;+--------+---------------+| length | COUNT(length) |+--------+---------------+| 2 | 1 || 3 | 7 || 4 | 22 || 5 | 30 || 6 | 18 || 7 | 50 || 8 | 103 || 9 | 134 |+--------+---------------+8 rows in set (0.00 sec)
SQL Examples
CREATE TABLE test.splen SELECT length, COUNT(length) FROM swissprot GROUP BY length;
Query OK, 2717 rows affected (0.30 sec)Records: 2717 Duplicates: 0 Warnings: 0
SQL Examples
SELECT * FROM test.splen ORDER BY `COUNT(length)` DESC LIMIT 10;+--------+---------------+| length | COUNT(length) |+--------+---------------+| 379 | 1004 || 146 | 921 || 141 | 749 || 156 | 694 || 148 | 633 || 207 | 591 || 155 | 590 || 152 | 579 || 215 | 573 || 119 | 570 |+--------+---------------+10 rows in set (0.01 sec)
SQL Examples
SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR4\_PA%';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) || TLR4_PAPAN | Papio anubis (Olive baboon) |+------------+---------------------------------+2 rows in set (0.00 sec)
Wild-cards: _ (single character)% (multiple characters)
Escape with backslash (\)!
SQL Examples
SELECT name,organism FROM swissprot WHERE NAME LIKE 'tlr4\_PA%';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) || TLR4_PAPAN | Papio anubis (Olive baboon) |+------------+---------------------------------+2 rows in set (0.00 sec)
Case-insensitive (unless binary format)!
SQL Examples
SELECT name,organism FROM swissprot WHERE NAME = 'TLR__PANPA';Empty set (0.00 sec)
SQL Examples
SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR__PANPA';+------------+---------------------------------+| name | organism |+------------+---------------------------------+| TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) |+------------+---------------------------------+1 row in set (0.00 sec)
SQL Examples
SELECT name,length FROM swissprot WHERE NAME REGEXP '^TLR[4-9]\_HUMAN';+------------+--------+| name | length |+------------+--------+| TLR4_HUMAN | 839 || TLR5_HUMAN | 858 || TLR6_HUMAN | 796 || TLR7_HUMAN | 1049 || TLR8_HUMAN | 1041 || TLR9_HUMAN | 1032 |+------------+--------+6 rows in set (0.00 sec)
Normalization
● Optimize database design● Avoid duplication of data● Least redundancy in tables
Normalization
Name KeywordTLR4_HUMAN Direct protein sequencing; Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;TLR4_MOUSE Disease mutation; Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;TLR4_BOVIN Glycoprotein; Immune response; Inflammatory response; Innate immunity; Leucine-rich repeat;
Bad design! Repetition of entries, difficult
to index and awkward to search
Normalization
Name Keyword1 Keyword2 Keyword3TLR4_HUMAN Direct protein sequencing Glycoprotein Immune responseTLR4_MOUSE Disease mutation Glycoprotein Immune responseTLR4_BOVIN Glycoprotein Immune response Inflammatory response
Alternative Design:
Not optimal either:different number of keywords for each entry
still very repetitive
Normalization
Name SequenceTLR4_HUMAN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKTLR4_MOUSE MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKOUSETLR4_BOVIN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASK
ID Keyword1 Direct protein sequencing 2 Disease mutation3 Glycoprotein4 Immune response
ID1 ID21 TLR4_HUMAN2 TLR4_MOUSE3 TLR4_HUMAN3 TLR4_MOUSE3 TLR4_BOVIN
Normalized version:
Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name
Normalization
Name SequenceTLR4_HUMAN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKTLR4_MOUSE MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASKOUSETLR4_BOVIN MAREASDPDDFAAEKAEASKMAREASDDDDFAAEKAEASKMAREASDDDDFAAEKAEASK
ID Keyword1 Direct protein sequencing 2 Disease mutation3 Glycoprotein4 Immune response
ID1 ID21 TLR4_HUMAN2 TLR4_MOUSE3 TLR4_HUMAN3 TLR4_MOUSE3 TLR4_BOVIN
Normalized version:
Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name
More Info
● MySQL tutorials on the web
● Learning MySQL (O'Reilly)
● http://dev.mysql.com/doc/ (searchable and browsable on-line)