DATA MINING PROJECT (cis-734)
PROTEIN SEARCH ENGINEURL – http://web.njit.edu/~sm363
Submitted By: Asad Siddiqui ([email protected])
Supriya Malhotra ([email protected]) Ojus Bathla ([email protected])
Table Of Content
Topics
1. Introduction
2. E_R diagram
3. Database Schema
4. Soap Implementation
5. Source Code (Database Tables)
6. Screenshot (Tables)
7. Source Code (HTML/JSP)
8. Screenshots (Project)
Introduction
This project implements a biological database using data mining techniques. The
output should be similar to which produced by SYSTERS. There are two tables in
the schema. The first table contains all the information about clusters of proteins.
Information of individual protein such as description, protein name, gene name and
the cluster number it belongs to is stored in the systers_protein_table. Protein
sequences are store in the protein_sequence_table.
The project is three-tier architecture. The front end is HTML and the middle tier is
JSP and the back end is oracle. The JSP pages are connected to the oracle tables. So,
when the user runs a query on the html page, the JSP code gets executed and it gets
the results from the table it is connected to. Java connectivity has been used to
connect the database tables to the website.
It works like a search engine. It searches as a local host and also it searches from the
web. A user who uses this project is given an option of searching the biological data
in four ways. We have give four attributes as search criteria. They are: protein name,
database, raccno and cluster number. A user, who knows about this biological data,
just needs to type in any of the fields and hit on search and will get the desired results.
E-R DIAGRAM and DATABASE SCHEMA
The following ER diagram shows how the two tables are connected or related to each
other. It is very important to know how the tables are related and the best way which
explains this is the E-R Diagram and the schema. In the protein_sequence_ table the
attribute accno is foreign key referencing to the raccno attribute in
systers_protein_table which is a primary key.
Systers_Protein_Table
Sequence Is
Protein_Sequence_Table
RACCNO
PNAME DESCGNAME IDENTICAL
FRAGMENT OF
CLUSTERNO
ACCNOSEQUENCE
CLUSTER NO
DB
SCHEMA
To create the protein tables we have used NJIT oracle server.
To query the database using protein ID, database name, protein name, etc. we have used
JSP pages.
SOAP IMPLEMENTATION
What is SOAP?
• SOAP stands for Simple Object Access Protocol
• SOAP is a communication protocol
• SOAP is for communication between applications
• SOAP is a format for sending messages
• SOAP is designed to communicate via Internet
• SOAP is platform independent
• SOAP is language independent
• SOAP is based on XML
• SOAP is simple and extensible
• SOAP allows you to get around firewalls
Why do we use SOAP?
• It is important for application development to allow Internet communication
between programs.
• Today's applications communicate using Remote Procedure Calls (RPC) between
objects like DCOM and CORBA, but HTTP was not designed for this.
• A better way to communicate between applications is over HTTP, because HTTP
is supported by all Internet browsers and servers. SOAP was created to
accomplish this.
• SOAP provides a way to communicate between applications running on different
operating systems, with different technologies and programming languages.
SYNTAX RULES
• Here are some important syntax rules:
A SOAP message MUST be encoded using XML
A SOAP message MUST use the SOAP Envelope namespace
A SOAP message MUST use the SOAP Encoding namespace
A SOAP message must NOT contain a DTD reference
A SOAP message must NOT contain XML Processing Instructions
Skeleton SOAP Message:
<?xml version="1.0"?>
<soap:Envelope xmlns:soap=“web.njit.edu/~aas44/soap-envelope"
soap:encodingStyle=“web.njit.edu/~aas44soap-encoding">
<soap:Header>
...
</soap:Header>
<soap:Body>
...
...
<soap:Fault>
...
...
</soap:Fault>
</soap:Body>
</soap:Envelope>
SOAP BODY
• The required SOAP Body element contains the actual SOAP message intended
for the ultimate endpoint of the message.
• Immediate child elements of the SOAP Body element may be namespace-
qualified. SOAP defines one element inside the Body element in the default
namespace. This is the SOAP Fault element, which is used to indicate error
messages.
Source Code – Database Tables
To create tables in oracle. We write the following query:
1. TO CREATE THE TABLE SYSTERS PROTEIN
create table systers_protein_table(db varchar2(4),raccno varchar2(25),pname varchar2(25), Description varchar2(200),Gene_name varchar2(25),Identical_To varchar2(25),Fragment_Of varchar2(25),Cluster_No varchar2(25),primary key (raccno));
2. TO CREATE THE TABLE SYSTERS PROTEIN SEQUENCE TABLEcreate table protein_sequences_table(accno varchar2(25),sequence varchar2(1000),ClusterNo varchar2(25),foreign key (accno) references systers_protein_table (raccno));
To insert rows into the table, we write the following query:
1. TO INSERT ROWS IN FIRST TABLE
begininsert into systers_protein_table values ('TRE','Q9PU83','Q9PU83','Vitamin D receptor','NULL','NULL','NULL',136821);insert into systers_protein_table values end;
2. TO INSERT ROWS IN SECOND TABLEbegininsert into protein_sequences_table values ('Q9PU83','GeneTRE|Q9PU83|Q9PU83 (215 AA) Vitamin D receptor (Fragment) [Crocodylus niloticus (Nile crocodile) (African crocodile)]ILTDEEVQRKREMIMKRKEEEALKESMKPKLSEEQQNVIDILLEAHRKTYDPTYSDFTQF
RPPVRSSEEQRLTRSSSVLTQGFSSEDSSEPFGSSPDSVEHGMFSNLMLSEPEESASMSINFSPLTMLPHLADLVxYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEVIMLRSNQSFTLEDMSWNCGSNDFKYKVSDVTQAGHNMELLEPLV',136821);end;
Screenshot - Tables
Oracle tables that are created(sample of the entire table).
ACCNO SEQUENCE CLUSTERNO
Q9PU83
GeneTRE|Q9PU83|Q9PU83 (215 AA) Vitamin D receptor (Fragment) [Crocodylus niloticus (Nile crocodile) (African crocodile)]ILTDEEVQRKREMIMKRKEEEALKESMKPK LSEEQQNVIDILLEAHRKTYDPTYSDFTQFRPPVRSSEEQRLTRSSSVLTQGFSSEDSSEPFGSSPDSVEHGMFSNLMLSEPEESASMSINFSPLTMLPHLADLVxYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEVIMLRSNQSF TLEDMSWNCGSNDFKYKVSDVTQAGHNMELLEPLV
136821
Q9PTN2
>TRE|Q9PTN2|Q9PTN2 (453 AA) Vitamin D receptor [Brachydanio rerio (Zebrafish) (Danio rerio)]MLTENSAVNSGGKSKCEAGACESTVNGDATSLMDLMAVSTSATGQDQFDRNAPPICGV CGMLTENSAVNSGGKSKCEAGACESTVNGDATSLMDLMAVSTSATGQDQFDRNAPPICGVCGMMKEFILTDEEVQRKKDLIMKRKEEEAAREARKPRLSDEQMQIINSLVEAHHKTYDDSYSDFVRFRPPVREGPVTRSASRAASLHSLS DASSDSFNHSPESVDTKLNFSNLLMMYQDSGSPDSSEEDQQSRLSMLPHLADLVSYSIQKVIGFAKMIPGFRDLTAEDQIALLKSSAIEIIMLRSNQSFSLEDMSWSCGGPDFKYCINDVTKAGHTLELLEPLVKFQVGLKKLKLHEEEH VL
136821
ENSANGP00000010943
>AG|ENSANGP00000010943 (236 AA) Gene:ENSANGG00000008454 Clone:AAAB01008839 Contig:AAAB01008839_60 Chr:3R Basepair:35325398 Status:novelNNKKPQKAPHHRCTM ASFDVYDRSSWYFGAMSRQDATDLLLNERESGVFLVRDSTTIVGDFVLCVREDSKVSHYIINKLPSGDECFVYRIGDQTFADLPDLLSFYKLHYLDTTPLRRPMVRRLEKVIGKFDFDGSDPDDLPFKKGEILHIISKDEEQWWTARNGA GQTGQIPVPYLPALARVKQERVPNAYDETALKLSVGDVIKVLKTNINGQWEGELKGKIGHFPFTHVEFIDE
136822
CG1587-PA
>DM|CG1587-PA (271 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:knownMDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCVREDTKVSN YIINKVQQQDQIVYRIGDQSFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPFQRGEVLTIVRKDEDQWWTARNSSGKIGQIP
136822
VPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNRKLPAYARVKQS RVPNAYDKTALKLEIGDIIKVTKTNINGQWEGELNGKNGHFPFTHVEFVDDCDLSKNSTEIC
Q8JIZ9
>TRE|Q8JIZ9|Q8JIZ9 (329 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] YAAYKSTGYHFNAMTCEGCKGFCRRAMKRPAQLCCPFQSACVITK SNRRQCQSCRLQKCL SIGMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLNAHKKTFDMTCAHF SQFRPLDRGQKSVSESSPVTNGSWIDHRPIAEDPVQWVFNSTSLSSSSSSYQSLDKEKKH FKSGSFTSLPHF TDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMF FNEVTGIWECGPLQYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFS PDRPGVTDHKVIDRNQETLALTLKTYIEA
136821
Q8QGH6
>TRE|Q8QGH6|Q8QGH6 (322 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] GMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLN AHKKTFDMTCAHFSQ FRPLDRDQKSVSESSPLTNGSWIDHRPIAEDPMQWVFNPTSLSSSSSSYQSLDNKEKKHF KSGNFSSLPHFTDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMFF NEVTGIWECGPL QYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFSP DRPGVTDHKVIDRNQETLALTLKTYIEAKRNGPEKHLLFPKIMGCLTEMRSMNEEYTKQV LKIQDMQPEVSPLWLEIISKDT
136821
Q90WS4
>TRE|Q90WS4|Q90WS4 (270 AA) Putative vitamin D receptor (Fragment) [Elaphe sp] RKAMFTCPFNGDCKITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMIMKRKEE EALKESLKPK LLEEQQRVIEILLEAHRKTYDPTYSDFSQFRPPVRQNEKEHTSRSSNMTP GFSFSDDSSDTSSFSSEPMMLSSLELNDDSTSMSIDFSHLSMLPHLADLVSYSIQKVIGF AKMIPGFRSLTAEDQIALLKSSAIEVIMLRSNQSFSLE DMSWFCGSNDFKYQVSDVTQAG HSLDLLEPLVKFQISLKKLNLHEEEHVLLM
136821
Q8JIZ9
>TRE|Q8JIZ9|Q8JIZ9 (329 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] YAAYKSTGYHFNAMTCEGCKGFCRRAMKRPAQLCCPFQSACVITK SNRRQCQSCRLQKCL SIGMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLNAHKKTFDMTCAHF SQFRPLDRGQKSVSESSPVTNGSWIDHRPIAEDPVQWVFNSTSLSSSSSSYQSLDKEKKH FKSGSFTSLPHF TDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEIILIHFNMF FNEVTGIWECGPLQYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFS PDRPGVTDHKVIDRNQETLALTLKTYIEA
136821
Q8QGH6
>TRE|Q8QGH6|Q8QGH6 (322 AA) Pregnane X receptor (Fragment) [Brachydanio rerio (Zebrafish) (Danio rerio)] GMKRELIMSDEAVEKRRLQIRRKRMQEEPVTLTPQQEAVIQELLN AHKKTFDMTCAHFSQ FRPLDRDQKSVSESSPLTNGSWIDHRPIAEDPMQWVFNPTSLSSSSSSYQSLDNKEKKHF KSGNFSSLPHFTDLTTYMIKNVINFGKTLTMFRALVMEDQISLLKGATFEI
136821
ILIHFNMFF NEVTGIWECGPL QYCMDDAFRAGFQHHLLDPMMNFHYTLRKLRLHEEEYVLMQALSLFSP DRPGVTDHKVIDRNQETLALTLKTYIEAKRNGPEKHLLFPKIMGCLTEMRSMNEEYTKQV LKIQDMQPEVSPLWLEIISKDT
Q90WS4
>TRE|Q90WS4|Q90WS4 (270 AA) Putative vitamin D receptor (Fragment) [Elaphe sp] RKAMFTCPFNGDCKITKDNRRHCQACRLKRCVDIGMMKEFILTDEEVQRKREMIMKRKEE EALKESLKPK LLEEQQRVIEILLEAHRKTYDPTYSDFSQFRPPVRQNEKEHTSRSSNMTP GFSFSDDSSDTSSFSSEPMMLSSLELNDDSTSMSIDFSHLSMLPHLADLVSYSIQKVIGF AKMIPGFRSLTAEDQIALLKSSAIEVIMLRSNQSFSLE DMSWFCGSNDFKYQVSDVTQAG HSLDLLEPLVKFQISLKKLNLHEEEHVLLM
136821
ENSP00000325217
>HS|ENSP00000325217 (465 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known SILCTGLFKVDPRGEVGAK NLPPSSPRGPEANLEVRPKESWNHADFVHCEDTESVPGKPS VNADEEVGGPQICRVCGDKATGYHFNVMTCEGCKGFFRRAMKRNARLRCPFRKGACEITR KTRRQCQACRLRKCLESGMKKEMIMSDEAVEERRALIKRKKSERTGT QPLGVQGLTEEQR MMIRELMDAQMKTFDTTFSHFKNFRPGVLSSGCELPESLQAPSREEAAKWSQVRKDLCSL KVSLQLRGEDGSVWNYKPPADSGGKEIFSLLPHMADMSTYMFKGIISFAKVISYFRDLPI EDQISLLKGAAFEL CQLRFNTVFNAETGTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYM LKKLQLHEEEYVLMQAISLFSPDRPGVLQHRVVDQLQEQFAITLKSYIECNRPQPAHRFL FLKIMAMLTELRSINAQHTQRLLRIQDIHPFATPLMQELFGI TGS
136821
ENSP00000273389
>HS|ENSP00000273389 (434 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known LEVRPKESWNHADFVHCED TESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFS HFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGG FQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS
136821
Q96AC7 >TRE|Q96AC7|Q96AC7 (378 AA) Nuclear receptor subfamily 1, group I, member 2 [Homo sapiens (Human)] MTCEGCKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESG MKKEMIMSD EAVEERRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRPG VLSSGCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEI FSLLPHMADMSTYMFKGI ISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAET
136821
GTWECGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGV LQHRVVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRS INAQHTQRLLRIQD IHPFATPLMQELFGITGS
O75469
>SPR|O75469|PXR_HUMAN (434 AA) Orphan nuclear receptor PXR (Pregnane X receptor) (Orphan nuclear receptor PAR1) (Steroid and xenobiotic receptor) (SXR ) [Homo sapiens (Human)] MEVRPKESWNHADFVHCEDTESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRA LIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIED QISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHP F ATPLMQELFGITGS
136821
ACCNO SEQUENCE CLUSTERNO
Q8SQ00
>TRE|Q8SQ00|Q8SQ00 (330 AA) Pregnane X receptor (Fragment) [Sus scrofa (Pig)] GMRKEMIMSDAAVEQRRALIRRKKREQIGAQPPGAKGLTEEQRTMISELMNAQMKTFDTT FTHFKNFRLPE VLSSSLEIPECLQTPSSREEAAKWSKLREDLCSVKLSLQLRGEDGSVWN YKPPADNSGKEIFSLLPHIADMSTYMFKGIINFAKVISYFRDLPIEDQISLLKGATFELC QLRFNTVFNAETGTWECGRLSYSLEDPSGGFQQLLLQPM LKFHYMLKKLQLHKEEYVLMQ AISLFSPDRPGVVQRQVVDQLQERFAITLKAYIECNRPQPAHRFLFLKIMAMLTELRSIN AQHTQRLLRIQDIHPFATPLMQELFSITES
136821
Q8SQ01
>TRE|Q8SQ01|Q8SQ01 (434 AA) Pregnane X receptor [Macaca mulatta (Rhesus macaque)] MEVRPKEGWNHADFVYCEDTEFAPGKPTVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRR AMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDAAVEE RRALIKRKKRERIGTQPPGVQGLTEEQRMMIRELMDAQMKTFDTTFSHFKNFRLPGVLSS GCEMPESLQAPSREEAAKWNQVRKDLWSVKVSVQL RGEDGSVWNYKPPADNGGKEIFSLL PHMADMSTYMFKGIINFAKVISYFRDLPIEDQISLLKGATFELCQLRFNTVFNAETGTWE CGRLSYCLEDPAGGFQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVVQHR VV DQLQEQYAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS
136821
CG1587-PB
>DM|CG1587-PB (253 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:known MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCDQIVYRIG DQ SFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPF
136822
QRGEVLTIVRK DEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNR KLPAYARVKQSRVPNAYDKTALKLE IGDIIKVTKTNINGQWEGELNGKNGHFPFTHVEFV DDCDLSKNSTEIC
Q95RW2
>TRE|Q95RW2|Q95RW2 (253 AA) LD08427p (CG1587-PB) [Drosophila melanogaster (Fruit fly)] MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDYVLCDQIVYRIGDQ SF DNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGSDQDDLPFQRGEVLTIVRK DEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSISGSSNVFESTLKRTDLNR KLPAYARVKQSRVPNAYDKTALKLEIGDII KVTKTNINGQWEGELNGKNGHFPFTHVEFV DDCDLSKNSTEIC
136822
ENSP00000273389
>HS|ENSP00000273389 (434 AA) Gene:ENSG00000144852 Clone:AC069444 Contig:AC069444.17.1.165093 Chr:3 Basepair:119128142 Status:known LEVRPKESWNHADFVHCED TESVPGKPSVNADEEVGGPQICRVCGDKATGYHFNVMTCEG CKGFFRRAMKRNARLRCPFRKGACEITRKTRRQCQACRLRKCLESGMKKEMIMSDEAVEE RRALIKRKKSERTGTQPLGVQGLTEEQRMMIRELMDAQMKTFDTTFS HFKNFRLPGVLSS GCELPESLQAPSREEAAKWSQVRKDLCSLKVSLQLRGEDGSVWNYKPPADSGGKEIFSLL PHMADMSTYMFKGIISFAKVISYFRDLPIEDQISLLKGAAFELCQLRFNTVFNAETGTWE CGRLSYCLEDTAGG FQQLLLEPMLKFHYMLKKLQLHEEEYVLMQAISLFSPDRPGVLQHR VVDQLQEQFAITLKSYIECNRPQPAHRFLFLKIMAMLTELRSINAQHTQRLLRIQDIHPF ATPLMQELFGITGS
136821
P47941
>SPR|P47941|CRKL_MOUSE (303 AA) Crk-like protein [Mus musculus (Mouse)] MSSARFDSSDRSAWYMGPVTRQEAQTRLQGQRHGMFLVRDSSTCPGDYVLSVSENSRVSH YIINSLPNRRFKIGDQE FDHLPALLEFYKIHYLDTTTLIEPAPRYPSPPVGSVSAPNLPT AEENLEYVRTLYDFPGNDAEDLPFKKGELLVIIEKPEEQWWSARTKDGRVGMIPVPYVEK LVRSSPHGKHGNRNSNSYGIPEPAHAYAQPQTTTPLPTVASTPGA AINPLPSTQNGPVFA KAIQKRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRKGLFPFTHVKIFDPQNPD DNE
136822
SINFRUP00000150362
>FR|SINFRUP00000150362 (325 AA) Gene:SINFRUG00000141661 Clone:scaffold_326 Contig:scaffold_326 Chr:Chr_scaffold_326 Basepair:104364 Status:known MAGNF DAEDRDSWYWGRLTRQEAVSLLQGQRHGVFLVRDxISIRGGYVLSVSENSKVSHY IINSVSDNRQCENDIAFPLSGLTPPYFRIGDQEFEALPALLEFYKIHYLDTTALIEPVSK AQHTGFISSSAGVPPPSQEEAEFVRALFDFSGN DEEDLPFRKGDILRVLEKPEEQWWNAA NQEGRAGMIPVPYVEKYRPASPTAAALGPTTSVPGQVPEGGRPTGGTDGMAGAQDNPLCD PGQYAQPVVNAQLPNLQNGPVYARVIQKRVPNAYDKTALALEVGEMVKVTKINVNGQWEG ECKGKRGHFPFTHVRLMEQQHPDGD
136822
CG1587-PC
>DM|CG1587-PC (271 AA) Gene:CG1587 Clone:4 Contig:4_3759 Chr:4 Basepair:230506 Status:known MDTFDVSDRNSWYFGPMSRQDATEVLMNERERGVFLVRDSNSIAGDY
136822
VLCVREDTKVS NY IINKVQQQDQIVYRIGDQSFDNLPKLLTFYTLHYLDTTPLKRPACRRVEKVIGKFDFVGS DQDDLPFQRGEVLTIVRKDEDQWWTARNSSGKIGQIPVPYIQQYDDYMDEDAIDKNEPSI SGSSNVFESTLKRTDLNRKLPAYAR VKQSRVPNAYDKTALKLEIGDIIKVTKTNINGQWE GELNGKNGHFPFTHVEFVDDCDLSKNSTEIC
SINFRUP00000144694
>FR|SINFRUP00000144694 (315 AA) Gene:SINFRUG00000136477 Clone:scaffold_306 Contig:scaffold_306 Chr:Chr_scaffold_306 Basepair:110782 Status:known MAGNF DAEDRNSWYWGRLSRQEAVSLLQGQRHGVFLVRDSSTIHGDYVLSVSENSKVSHY IINSISNNRQSGPGSAHPRFRIGDQEFVALPALLEFYKIHYLDTTTLIEPINKSRLTSFI NVGPGGGPPQRLEDEYVRALFDFPGNDEEDLPF KKGDILRVLEKPEEQWWNAQNSEGRAG MIPVPYVEKYRPASPSLVAGHGLPGGPPGGTGMQGNSDGSAAQTSAPLLGDPSQYAQPTP LPNLQNGPVFARAIQKRVPNAYDKTALALEVGDTVKVTKINVNGQWEGECKGKRGHFPFT HVKLLDQHSAEDELS
136822
SINFRUP00000164144
>FR|SINFRUP00000164144 (299 AA) Gene:SINFRUG00000154261 Clone:scaffold_3683 Contig:scaffold_3683 Chr:Chr_scaffold_3683 Basepair:4533 Status:known MSTS RFDSADRSAWYFGPVSRHEAQNRLQGQKHGIFLVRDSSTCHGDYVLSVSENSKVSH YIINSLPNKRFKIGDREFEHLPALLEFYKYHYLDTTTLIEPASRYPSTLSCPVQPAGPED NLEYVRTLYDFTGSDAEDLPFKKGEVLVILEK PEEQWWSARNKDGRVGMIPVPYVEKLAR PAPLPGQPGHGSRNSNSYGVPEPSHAVVHAYALPQTPSPLPAPGPVINPQNGPAMAKAIQ KRVPCAYDKTALALEVGDIVKVTRMNINGQWEGEVNGRRGLFPFTHVKIIDAQNPDESD
136822
Q8R5B8
>TRE|Q8R5B8|Q8R5B8 (98 AA) Similar to v-crk avian sarcoma virus CT10 oncogene homolog-like [Mus musculus (Mouse)] MSSARFDSSDRSAWYMGPVTRQEAQTRLQGQRHGMF LVRDSSTCPGDYVLSVSENSRVSH YIINSLPNRRFKIGDQEFDHLPALLEFYKIHYLDTTTM
136822
Q8JIB3
>TRE|Q8JIB3|Q8JIB3 (367 AA) C-fos proto-oncogene [Coturnix coturnix (Common quail)] MMYQGFAGEYEAPSSRCSSASPAGDSLTYYPSPADSFSSMGSPVNSQDFCTDLAVSSANF VPTVT AISTSPDLQWLVQPTLISSVAPSQNRGHPYGVPPPAPPAAYSRPAVLKAPGGRGQ SIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEEEKSALQA EIANLLKEKEKLEFILAAHRPACKMPEELRFSE ELAAATALDLGAPSPAAAEETFALPxM TEAPPAVPPKEPSGSGLELKAEPFDELLFSTGPREASRSVPDMDLPGASSFYASDWEPLG AGSSGELEPLCTPVVTCTPCPSTYTSTFVFTYPEADAFPSCAAAHRKGSSSNEPSSDSLS SPTLLAL
136823
P79702 >SPR|P79702|FOS_CYPCA (347 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Cyprinus carpio (Common carp)] MMFTSLNADCDASSRCSTASAAAESVACYPLNQT QKFTELSVSSASFVPTVTAISSCPDL QWMVQPMVSSVAPSNGGARSYNPNPYPKMRVTGTKSPNSNKRARAE
136823
QLSPEEEEKKRVRR ERNKMAAAKCRNRRRELTDTLQAETDELEDEKSALQNDIANLLKEKERLEFILAAHKPIC K IPSSSVSPIPAASVPEIHSITTSVVSTANAPVTTSSSSSLFSSTASTDSFGSTVEISDL EPTLEESLELLAKAELETARSVPDVDLSSSLYARDWESLYTPANNDLEPLCTPVVTRTPA CTTYTSSFTFTYPENDVFPSCGPVHRRGS SSNDQSSDSLNSPTLLTL
Q8HZP6
>TRE|Q8HZP6|Q8HZP6 (381 AA) Immediate early protein [Felis silvestris catus (Cat)] MMFSGFNADYEASSSRCSSASPAGDNLSYYHSPADSFSSMGSPVNAQDFCTDLAVSSANF IPTVTA ISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGVPAPSAGAYSRAGVVKTVTAGGR AQSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSAL QTEIANLLKEKEKLEFILAAHRPACKIPDDLGFP EEMSVASLDLSGGLPEAATPESEEAF TLPLLNDPEPKPSVEPVKSISSMELKAEPFDDFLFPASSRPSGSETARSVPDMDLSGSFY AADWEPLHGGSLGMGPMATELEPLCTPVVTCTPSCTTYTSSFVFTYPEADSFPSCGAAHR K GSSSNEPSSDSLSSPTLLAL
136823
ACCNO SEQUENCE CLUSTERNO
P01101
>SPR|P01101|FOS_MOUSE (380 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Mus musculus (Mouse)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSP VNTQDFCADLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQSAGAYARAGMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKEK EKLEFILAAHRPACKIPDDLGFPEEMSVASLDLTGGLPEASTPESEEAFT LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETSRSVPDVDLSGSFYA ADWEPLHSNSLGMGPMVTELEPLCTPVVTCTPGCTTYT SSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL
136823
O88479
MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGSPVNAQDFCTDLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGVPTPSTGAYSRAGMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNK MAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKEKEKLEFILAAHRPACKIPDDLGFPEEMFVASLDLTGGLPEATTPESEEAFS LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETTARSVPDMDLS GSFY AADWEPLHSSSLGMGPMVTELEPLCTPVVTCTPSCTTYTSSFVFTYPEADSFPSCAAAHR KGSSSNEPSSDSLSSPTLLAL
136823
O97930 >TRE|O97930|O97930 (380 AA) P55-C-FOS proto-oncogene protein (Cellular oncogene C-FOS) (C-FOS) [Sus scrofa (Pig)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADS FSSMGSPVNAQDFCTDLAVSSVNF IPTVTAISISPDLQWLVQPTLVSSVAPSQTRAPHPYGVPTPSAGAYSRAGAVKTMPGGRA
136823
QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEI ANLLKEKEKLEFILAAHRPACKIPDDLGFPEEMSVASLDLSGGLPEAATPESEEAFT LPLLNDPEPKPSVEPVKKVSSMELKAEPFDDFLFPASSRPGGSETARSVPDMDLSGSFYA ADWEPLHGGSLGMGPMATELEPLCTPVVTCT PSCTAYTSSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL
P01102
>SPR|P01102|FOS_MSVFB (381 AA) p55-v-fos transforming protein [FBJ murine osteosarcoma virus] MMFSGFNADYEASSFRCSSASPAGDSLSYYHSPADSFSSMGSPVNTQDFCADLSVS SANF IPTVTATSTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTQSAGAYARAEMVKTVSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDKKSALQ TEIANLLKEKEKLEFILAAHRPA CKIPDDLGFPEEMSVASLDLTGGLPEASTPESEEAFT LPLLNDPEPKPSLEPVKSISNVELKAEPFDDFLFPASSRPSGSETSRSVPNVDLSGSFYA ADWEPLHSNSLGMGPMVTELEPLCTPVVTCTPLLRLPELTHAAGPVSSQRR QGSRHPDVP LPELVHYREEKHVFPQRFPST
136823
P12841
>SPR|P12841|FOS_RAT (380 AA) Proto-oncogene protein c-fos (Cellular oncogene fos) [Rattus norvegicus (Rat)] MMFSGFNADYEASSSRCSSASPAGDSLSYYHSPADSFSSMGS PVNTQDFCADLSVSSANF IPTVTAISTSPDLQWLVQPTLVSSVAPSQTRAPHPYGLPTPSTGAYARAGVVKTMSGGRA QSIGRRGKVEQLSPEEEEKRRIRRERNKMAAAKCRNRRRELTDTLQAETDQLEDEKSALQ TEIANLLKE KEKLEFILAAHRPACKIPNDLGFPEEMSVTSLDLTGGLPEATTPESEEAFT LPLLNDPEPKPSLEPVKNISNMELKAEPFDDFLFPASSRPSGSETARSVPDVDLSGSFYA ADWEPLHSSSLGMGPMVTELEPLCTPVVTCTPSCTTY TSSFVFTYPEADSFPSCAAAHRK GSSSNEPSSDSLSSPTLLAL
Source Code for the HTML pages:
The first page is the index.html page. The following is the source code for the page:
<html><body BACKGROUND="bck.jpg"><font color="blue"><br>
<hr><FONT FACE="Times new roman" SIZE=5><B><I><CENTER><h1>CIS-734 DATA MINING</h1></b></i><hr>
<br><br><img src="W1.jpg" border="3"><br><br>
<A HREF="ab.doc">DOCUMENTATION OF PROJECT</A><br><A HREF="link.html">IMPLEMENTATION OF PROJECT</A><br>
<br><br><br><br><p ALIGN="LEFT"><font size="4">SUBMITTED BY:<BR>ASAD SIDDIQUI.<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><br>SUPRIYA MALHOTRA.<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><BR>OJUS BATHLA<br>E-Mail:<a href="mailto:[email protected]">[email protected]</a><BR></FONT>
</body></html>
The second page is the link.html which connects to the ‘implementation of the project’ link
<html><body BACKGROUND="bck.jpg"><font color="blue"><br><br>
<FONT FACE="Times new roman" SIZE=5><B><I><CENTER>SEARCH ENGINE<HR><BR></FONT>
<form action="prots.jsp" method='POST'>
<center> Protein Name: <input type="text" name="txtpname"> <input type="submit" name="butpname" value="SEARCH">
<br><BR> Database: <input type="text" name="txtdb"> <input type="submit" name="butdb" value="SEARCH">
<br><BR> Raccno: <input type="text" name="txtraccno"> <input type="submit" name="butraccno" value="SEARCH">
<br><BR> Cluster: <input type="text"name="txtclusterno"> <input type="submit" name="butcluster" value="SEARCH"><br><br><BR>
<input type="reset" name="reset"></form></body></html>
The third page is the aa.html page which displays the tables after search. The following is the source code:
<TD><table align=center border= 1 ><tr><td align=right>Database Name: </td> <td> <%=dbname%> </td>
<td align=right> Raccno: </td> <td> <%=raccno%> </td> <td align=right> Protein Name: </td> <td><%=pname%> </td>
<td align=right> Gene Name: </td> <td> <%=gname%> </td> <td align=right>Cluster No: </td> <td><%=clusterno%> </td>
<td align=right>Description: </td> <td><%=description%> </td>
<td align=right>Sequence No:</td> <td> <a href="seq.jsp?seq=<%=seq%>&pname=<%=pname%>" >Sequence Details</a> </td> </tr>
</table> </TD>
JSP SOURCE CODE
In our project the JSP page prots.jsp is used. The following is the source code for to run the JSP page:
<%@page contentType="text/html"%><%@page pageEncoding="UTF-8"%><%@page import= "java.io.*" %><%@page import= "java.net.*" %><%@page import= "java.sql.*" %><%@page import= "javax.servlet.*" %><%@page import= "javax.servlet.http.*"%><%@page import= "java.sql.*"%><%@page import= "java.util.*"%><%@page import= "java.sql.Connection" %><%@page import= "java.sql.DriverManager" %><%@page import= "java.sql.SQLException"%>
<html> <head><title align= center>Search Result</title></head> <body background="bck.jpg">
<%
try { String db = ""; String raccno=null; String pname=null; String description=null; String gname=null; String identical_to=null;
String tre = "TRE "; // String fragment_no=null; String cluster_no=null; String seq=null; String txtpname=request.getParameter("txtpname");String txtdb=request.getParameter("txtdb");String txtraccno=request.getParameter("txtraccno");String txtclusterno=request.getParameter("txtclusterno");
// Load the Oracle JDBC driverDriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
String url = "jdbc:oracle:thin:@prophet.njit.edu:1521:course";
try {String url1 = System.getProperty("JDBC_URL");if (url1 != null)url = url1;} catch (Exception e) {// If there is any security exception, ignore it// and use the default}
// Connect to the databaseConnection conn =DriverManager.getConnection (url, "sm363", "hariom");
// Create a StatementStatement stmt = conn.createStatement ();Statement stmtdburl = conn.createStatement ();ResultSet rs = stmt.executeQuery("select * from systers_protein_table s, protein_sequences_table p where (s.pname=upper('"+txtpname+"') or s.db=upper('"+txtdb+"') or s.raccno=upper('"+txtraccno+"') or s.cluster_no=upper('"+txtclusterno+"')) and (s.raccno = p.accno)");//ResultSet rs = stmt.executeQuery("select * from systers_protein_table s, protein_sequences_table p where s.raccno = p.accno");
%> <br><table align=center border= 1 ><tr><td align=right>Database Name: </td>
<td align=right> Raccno: </td>
<td align=right>Description: </td>
<td align=right>Cluster No: </td>
<td align=right> Gene Name: </td>
<td align=right>Identical_to: </td>
<td align=right>Cluster No: </td><td align=right>SEQUENCE: </td> </tr> <% boolean flag1=true; while (rs.next()) { flag1=false; db=rs.getString(1);
raccno=rs.getString(2); pname=rs.getString(3);
description=rs.getString(4); gname=rs.getString(5); identical_to=rs.getString(6); // fragment_of=rs.getString(7);
cluster_no=rs.getString(8); seq=rs.getString(10);
//I have to code here %>
<tr>
<td> <%=db%> </td><%
if(db.equals("TRE")){
%><td> <a href="http://ca.expasy.org/uniprot/<%=raccno%>"><%=raccno%></a> </td>
<%} if(db.equals("DM")){
%><td> <a href="http://www.ensembl.org/Drosophila_melanogaster/protview?peptide=<%=raccno%>"><%=raccno%></a> </td>
<%} %><td> <a href="seq.jsp?raccno=<%=raccno%>"><%=raccno%></a> </td><td><%=description%> </td><td><%=cluster_no%> </td> <td> <%=gname%> </td><td><%=identical_to%> </td><td><%=cluster_no%> </td><td><%=seq%> </td>
</tr>
<%}%> </table>
<% if(flag1){ RequestDispatcher dis = request.getRequestDispatcher("/errorprotein.jsp"); if (dis != null) dis.forward(request, response); } %> <% } catch (Exception e) { out.println("Got an exception! "+e);
} %>
</body></html>
SCREENSHOTS – Project Execution
The screeshots below show the different steps how our project is being executed in a step by step manner. It also shows how it is getting connected to the web when we click on a link for raccno:
When we click on the raccno: link in the above page,it takes us to the following weblink.This shows that our project is connected to the web also.
When we click on the description of a particular raccno: it retrieves data from the local host. As shown below:
Summary
The project was implemented successfully. It can be tested by the following url http://web.njit.edu/~sm363This project works like a search engine and retrieves relevant data from the local host/database created as well as the web. It is a user friendly search engine.We have learnt a lot from this project like data mining techniques and programming languages like JSP, HTML, SOAP, SQL. We learnt how to connect database to the website using java connectivity.