MamPol Home Page Search Analysis Help Page Statistics Links Contact us
(1) Sequence comparison (2) Nucleotide Diversity
Clustal Jalview SNPs-Graphic PDA Server

Example Alignment formats

 


FASTA

Each new sequence begins with a line >SEQ_ID . The sequence follows on the next lines until the next >SEQ2_ID line is found.

Note: FASTA format doesn't mean output from the fasta program!!

 

Example FASTA format

>LCAT_RAT
MGLPGSPWQWVLLLLGLLLPPATSFWLLNVLFPPHTTPKAELSNHTRPVILVPGCMGNRLEAKLDKPNVVNW
LCYRKTEDFFTIWLDFNMFLPLGVDCWIDNTRVVYNRSSGHMSNAPGVQIRVPGFGKTYSVEYLDDNKLAGY
LNTLVQNLVNNGYVRDETVRAAPYDWRLAPRQQDEYYQKLAGLVEEMYAAYGKPVFLIGHSLGCLHVLHFLL
RQPQSWKDHFIDGFISLGAPWGGSIKPMRILASGDNQGIPIMSNIKLREEQRITTTSPWMFPAHHVWPEDHV
FISTPNFNYTGQDFERFFADLHFEEGWHMFLQSRDLLAGLPAPGVEVYCLYGVGMPTAHTYIYDHNFPYKDP
VAALYEDGDDTVATRSTELCGQVQGRQSQGVHLLRMNGTDHLNMVFSNKTLEHINAILLGAYPHGTPKSPTA
SLGPPPTE

>LCAT_MOUSE
MGLPGSPWQRVLLLLGLLLPPATPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNRLEAKLDKPDVVNW
MCYRKTEDFFTIWLDFNLFLPLGVDCWIDNTRIVYNHSSGRVSNAPGVQIRVPGFGKTESVEYVDDNKLAGY
LHTLVQNLVNNGYVRDETVRAAPYDWRLAPHQQDEYYKKLAGLVEEMYAAYGKPVFLIGHSLGCLHVLHFLL
RQPQSWKDHFIDGFISLGAPWGGSIKAMRILASGDNQGIPILSNIKLKEEQRITTTSPWMLPAPHVWPEDHV
FISTPNFNYTVQDFERFFTDLHFEEGWHMFLQSRDLLERLPAPGVEVYCLYGVGRPTPHTYIYDHNFPYKDP
VAALYEDGDDTVATRSTELCGQWQGRQSQPVHLLPMNETDHLNMVFSNKTMEHINAILLGAYRTPKSPAA
SPSPPPPE
>LCAT_PAPAN
MGPPGSPWQWVPLLLGLLLPPAAPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNQLEAKLDKPDVVNW
MCYRKTEDFFTIWLDLNMFLPLGVDCWIDNTRVVYNRSSGLVSNAPGVQIRVPGFGKTYSVEYLDSSKLAGY
LHTLVQNLVNNGYVRDETVRAAPYDWRLEPGQQEEYYHKLAGLVEEMHAAYGKPVFLIGHSLGCLHLLYFLL
RQPQAWKDRFIDGFISLGAPWGGSIKPMLVLASGDNQGIPIMSSIKLKEEQRITTTSPWMFPSRLAWPEDHV
FISTPSFNYTGRDFQRFFADLHFEEGWYMWLQSRDLLAGLPAPGVEVYCLYGVGLPTPRTYIYDHGFPYTDP
VDVLYEDGDDTVATRSTELCGLWQGRQPQPVHLLPLRGIQHLNMVFSNQTLEHINAILLGAYRQGPPASLTA
SPEPPPPE

 

Clustal

The word CLUSTAL MUST appear at the beginning of a line before the alignment starts . The rest of the line:
W (1.7) multiple sequence alignment
isn't necessary. Spaces are NOT allowed in the sequence string. Use - or . instead. The consensus line is optional.

 

Example CLUSTAL format

CLUSTAL W (1.7) multiple sequence alignment


LCAT_RAT        MGLPGSPWQWVLLLLGLLLPPATSFWLLNVLFPPHTTPKAELSNHTRPVILVPGCMGNRL
LCAT_MOUSE      MGLPGSPWQRVLLLLGLLLPPATPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNRL
LCAT_PAPAN      MGPPGSPWQWVPLLLGLLLPPAAPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNQL
LCAT_HUMAN      MGPPGSPWQWVTLLLGLLLPPAAPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNQL
LCAT_PIG        ------------------------FWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGN--
RABLCAT_1       MGPPGSPWQWVLLLLGLLLPPAAPFWLLNVLFPPHTTPKAELSNHTRPVILVPGCLGNQL
S45131          ----------------------------------------------HPVVMVPGVISTGI
YND2_YEAST      ----------------------------------------------HPVVMVPGVISTGI
                                                              :**::*** :..  

LCAT_RAT        EAKLDKPNVVNWLCYRKTEDFFTIWLDFNMFLPLGVDCWIDNTRVVYNRSSGHMSNAPGV
LCAT_MOUSE      EAKLDKPDVVNWMCYRKTEDFFTIWLDFNLFLPLGVDCWIDNTRIVYNHSSGRVSNAPGV
LCAT_PAPAN      EAKLDKPDVVNWMCYRKTEDFFTIWLDLNMFLPLGVDCWIDNTRVVYNRSSGLVSNAPGV
LCAT_HUMAN      EAKLDKPDVVNWMCYRKTEDFFTIWLDLNMFLPLGVDCWIDNTRVVYNRSSGLVSNAPGV
LCAT_PIG        ------PDVVNWMCYR-----FTIWLDLNMFLPLGVD-----------------------
RABLCAT_1       EAKLDKPSVVNWMCYRKTEDFFTIWLDLNMFLPLGVDCWIDNTRVVYNRSSGRVVISPGV
S45131          EGVIGDDECDSSAHFRKR-----LWGSFYMLRTMVMDKVCWLKHVMLDPETGL--DPPNF
YND2_YEAST      EGVIGDDECDSSAHFRKR-----LWGSFYMLRTMVMDKVCWLKHVMLDPETGL--DPPNF
                       .  .   :*       :* .: :: .: :*                       

Optionally numbers can appear at the end of each line e.g.
LCAT_RAT        MGLPGSPWQWVLLLLGLLLPPATSFWLLNVLFPPHTTPKAELSNHTRPVILVPGCMGNRL 64

 

PIR

This is similar to FASTA format except a title line is included after the >SEQ_ID line, the id is slightly modified and each sequence is terminated by a *. The ID line is of the form >P1;SEQ_ID where SEQ_ID is the id of the sequence.

 

Example PIR format

>P1;OPSD_SHEEP
RHODOPSIN
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLA
VADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFT
WVMALACAAPPLVGWSRYIPQGMQCSCGALYFTLKPEINNESFVIYMFVVHFSIPLIVIFFCYGQLVFTVKEAAAQQQES
ATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKSSSVYNPVIYIMMNKQFRNCMLTT
LCCGKNPLGDDE--ASTTVSKTETSQ-----VAPA*
>P1;OPSD_BOVIN
RHODOPSIN
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLA
VADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFT
WVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQES
ATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTT
LCCGKNPLGDDE--ASTTVSKTETSQ-----VAPA*

 

MSF (GCG)

This is tricky to define. Jalview looks for lines containing
Name:
to define the sequence ids. Anything after a line beginning
//
it defines as sequence.

 

Example MSF (GCG) format

PileUp



   MSF:  570  Type: P    Check:  5858   .. 

 Name: LCAT_RAT oo  Len:  570  Check:  5435  Weight:  1.00
 Name: LCAT_MOUSE oo  Len:  570  Check:  4982  Weight:  1.00
 Name: LCAT_PAPAN oo  Len:  570  Check:  7409  Weight:  1.00
 Name: LCAT_PIG oo  Len:  570  Check:  7761  Weight:  1.00
 Name: LCAT_HUMAN oo  Len:  570  Check:  7204  Weight:  1.00
 Name: RABLCAT_1 oo  Len:  570  Check:  9507  Weight:  1.00
 Name: S45131 oo  Len:  570  Check:  7898  Weight:  1.00
 Name: YND2_YEAST oo  Len:  570  Check:  5662  Weight:  1.00

//



LCAT_RAT        MGLPGSPWQW VLLLLGLLLP PATSFWLLNV LFPPHTTPKA ELSNHTRPVI 
LCAT_MOUSE      MGLPGSPWQR VLLLLGLLLP PATPFWLLNV LFPPHTTPKA ELSNHTRPVI 
LCAT_PAPAN      MGPPGSPWQW VPLLLGLLLP PAAPFWLLNV LFPPHTTPKA ELSNHTRPVI 
LCAT_PIG        .......... .......... ....FWLLNV LFPPHTTPKA ELSNHTRPVI 
LCAT_HUMAN      MGPPGSPWQW VTLLLGLLLP PAAPFWLLNV LFPPHTTPKA ELSNHTRPVI 
RABLCAT_1       MGPPGSPWQW VLLLLGLLLP PAAPFWLLNV LFPPHTTPKA ELSNHTRPVI 
S45131          .......... .......... .......... .......... ......HPVV 
YND2_YEAST      .......... .......... .......... .......... ......HPVV 


LCAT_RAT        LVPGCMGNRL EAKLDKPNVV NWLCYRKTED FFTIWLDFNM FLPLGVDCWI 
LCAT_MOUSE      LVPGCLGNRL EAKLDKPDVV NWMCYRKTED FFTIWLDFNL FLPLGVDCWI 
LCAT_PAPAN      LVPGCLGNQL EAKLDKPDVV NWMCYRKTED FFTIWLDLNM FLPLGVDCWI 
LCAT_PIG        LVPGCLGN.. ......PDVV NWMCYR.... .FTIWLDLNM FLPLGVD... 
LCAT_HUMAN      LVPGCLGNQL EAKLDKPDVV NWMCYRKTED FFTIWLDLNM FLPLGVDCWI 
RABLCAT_1       LVPGCLGNQL EAKLDKPSVV NWMCYRKTED FFTIWLDLNM FLPLGVDCWI 
S45131          MVPGVISTGI EGVIGDDECD SSAHFRKR.. ...LWGSFYM LRTMVMDKVC 
YND2_YEAST      MVPGVISTGI EGVIGDDECD SSAHFRKR.. ...LWGSFYM LRTMVMDKVC

* From Michele Clamp (michele@ebi.ac.uk), in http://www.es.embnet.org/Doc/jalview/formats.html.

 


DGM UAB