Complete information about AXT format can be found here
As described in the link above each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. The summary line contains coordinate and size information about the alignment. It consists of 9 required fields.
Alignment number : The alignment numbering starts with 0 and increments by 1.
Chromosome (primary organism)
Alignment start (primary organism) : The first base is numbered 1.
Alignment end (primary organism) : The end base is included.
Chromosome (aligning organism)
Alignment start (aligning organism)
Alignment end (aligning organism)
Strand (aligning organism) : If the strand value is "-", the values of the aligning organism's start and end fields are relative to the reverse-complemented coordinates of its chromosome.
Score
example: BlastZ-net alignments between human chromosome 1 and the mouse genome
0 1 2040 2122 17 66048810 66048885 + 7968
GATTGGAGGAAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTG
GGTTGGAGGGAAGATGAGTGAAGGGATCAATTTCTCTGATGACCTGGGCCGGTAGG-------TGTGGTGTCCTCTTTGTCTG
1 1 123187 123246 7 25002102 25002162 + 148569
CTGCCCCTGCCCTGACTCCCAGCCCTG-TGGGGGTCCTGACCGCACCTCACCTGGCTCAGA
CTACTCCTGTCCCCACTCCCAGCCCTGCTGGGGGCCCTGACCCCACCTCTCCAGGCTCGGA
This format is our own extension of the axt format, with an extended header and the freedom to have the query sequence (primary organism) in - strand (axt assumes always the query sequence to be in + strand)
The header is now 12 spaced-separated columns (only 9 in the former axt format) :
Alignment number
Chromosome (primary organism)
Alignment start (primary organism) : The first base is numbered 1.
Alignment end (primary organism) : The end base is included.
Strand (primary organism)
Chromosome (aligning organism)
Alignment start (aligning organism)
Alignment end (aligning organism)
Strand (aligning organism)
Score
Chromosome length (primary organism)
Chromosome length (aligning organism)
example: BlastZ-net alignments between human chromosome 1 and the mouse genome
0 1 90200 90289 + 1 178533830 178533921 + 126712 247249719 197069962
ATAGCCCATTAGGCCTCAATGAAGTCTTATGCAAGACCAGAAGCCAATTTGCCATTT--AAGGTGATTCTCCATGTTTCTGCTCTAACTGTG
AAGGTCTATTAACTGTTGAATAAGTCTTACACAAACACAGAAGCCAATCCTCCTTTTTGTAGGTGATTCTCCATGCTGCTGTTCTCACTTGG
1 1 122259 122362 + 7 24996509 24996614 + 148569 247249719 145134094
GCTGATGATGGCTTTAGCACCACCGACACCGATCTCAAGTTCAAGGAGTGGGTGACCGAC--TGAGAGTGGGGACAACTCTGGGGAGGAGCCAGAGGGCAACAAGG
GCTGATGACAGCTTTGGCACCACCGACATTGATCTCAAGTGCAAGGAACGAGTGACTGACAGTGAAAGTGGAGACAGCTCTGGGGAGGACCCAGAGGGTAACAAGG
Complete information about MAF format can be found here
As described in the above the MAF format is line-oriented. Each multiple alignment ends with a blank line. Each sequence in an alignment is on a single line, which can get quite long, but there is no length limit. Words in a line are delimited by any white space. Lines starting with # are considered to be comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form or another.
Each alignment begins with an 'a' line that set variables for the entire alignment block. The 'a' is followed by name=value pairs which correspond to the score of the alignment.
The 's' lines together with the 'a' lines define a multiple alignment. The 's' lines have the following fields which are defined by position rather than name=value pairs.
src : The name of one of the source sequences for the alignment. For sequences that are resident in a browser assembly, the form 'database.chromosome' allows automatic creation of links to other assemblies. Non-browser sequences are typically reference by the species name alone.
start : The start of the aligning region in the source sequence. This is a zero-based number. If the strand field is '-' then this is the start relative to the reverse-complemented source sequence.
size : The size of the aligning region in the source sequence. This number is equal to the number of non-dash characters in the alignment text field below.
strand -- Either '+' or '-'. If '-', then the alignment is to the reverse-complemented source.
srcSize : The size of the entire source sequence, not just the parts involved in the alignment.
text : The nucleotides (or amino acids) in the alignment and any insertions (dashes) as well.
example: GERP (Genomic Evolutionary Rate Profiling)
a score=8.617251
s hsap.chr1 240616145 23 - 247249719 AAGGAGTCCTAGAATGGAGCACA
s ptro.chr1 223263538 23 - 229974691 AAGGAGTCCTAGAATGGAGCACA
s mmul.chr1 218580358 23 - 228252215 AAGGAGTCCTAGAATGGAGTATA
s cfam.chr5 28512329 23 - 91976430 AAGGAGTCCTAGGATGGAGTACA
s btau.chr1 81704165 23 + 102834029 AAGGAGTCCTGGAAAGGAGCACA
s mdom.chr4 43504342 16 - 430141050 AAGGAA-ACTAGCATGG------
Is a FASTA like format for pairwise alignments. For the pairwise alignments the header is pre-defined with alignment coordinates, score and species name, and each Fasta blocks are separated from one another by a number sign (#).
example: Human - Chicken Pairwise BlastZ-net alignment
>chr:1|start:5646|end:5996|strand:1|score:10605|Homo sapiens
AGGGCCCGCTCACCTTGCTCCTGCTCCTTCTGCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTT
GCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGC-CAGCCACCGGAGGGGTCAACCACTTCCCTGGGAGCTC
CCTGGACTGGAGCCGGGAGGTGGGGAACAGGGCAAGGAGGAAAGGCTGCTCAGGCA--GGGCTGGGGAAGCTTACTGTGT
CCAAGAGCCTGCTGGGAGGGAAGTCACCTCCCCTCAAACGAGGAGCCCTGCGCTGGGGAGGCCGGACC-------TTTGG
AGACTGTGTGTGGGGGCCTGGGCACTGACTTCTGCAACCAC
>chr:1|start:62073494|end:62073803|strand:1|score:10605|Gallus gallus
AGAGCCTCAGCACCTTGTTCTTGTTCCTTTTGCTTCTTCTTCTCCAGCTTCCTCTCCTTGACACTGCGGAGGTTGGCCTT
CCCAATGCCCCCTGCCTGGCGAATGGATTCCAGGAGACTGGCACGCCCAGTTGAGGGGTTCACCACCTCCTTTGGGGCTC
CCTGGACCGAAGCTG------------CAGGGACAGGAAGGA--------CATGCATCAGCCTCGGGTA-----------
--------TAGCTGCAAGCCCAGCCAAATCCCCCCACAGAAAGGG--CTGCTCT---GATGCCCTGCCATCTTCTTTTTG
GGACTGTGCG-------CCTGACACAGACACTTTCAGGCAC
#
Are FASTA like format for homologues and aligned homologues respectively. The headers can be defined by the user by selecting any attributes available. Fasta blocks are separated from one another by a number sign (#).
example: Human - Mouse Homologues (Peptide)
>Homo sapiens|16|ENSG00000162073|ENST00000318782|ortholog_one2one
MAFLAGPRLLDWASSPPHLQFNKFVLTGYRPASSGSGCLRSLFYLHNELGNIYTHGLALL
GFLVLVPMTMPWGQLGKDGWLGGTHCVACLAPPAGSVLYHLFMCHQGGSAVYARLLALDM
CGVCLVNTLGALPIIHCTLACRPWLRPAALVGYTVLSGVAGWRALTAPSTSARLRAFGWQ
AAARLLVFGARGVGLGSGAPGSLPCYLRMDALALLGGLVNVARLPERWGPGRFDYWGNSH
QIMHLLSVGSILQLHAGVVPDLLWAAHHACPRD
>Mus musculus|17|ENSMUSG00000023909|ENSMUST00000024702|ortholog_one2one
MAFLTGPRLLDWASSPPHLQFNKFVLTGYRPASSGSGCLRSLFYLHNELGNIYTHGLALL
GFLVLVPMTMPWSQLGKDGWLGGTHCVACLVPPAASVLYHLFMCHQGGSPVYTRLLALDM
CGVCLVNTLGALPIIHCTLACRPWLRPAALMGYTALSGVAGWRALTAPSTSARLRAFGWQ
AGARLLVFGARGVGLGSGAPGSLPCYLRMDALALLGGLVNVARLPERWGPGRFDYWGNSH
QIMHLLSVGSILQLHAGVVPDLLWAAHHACPPD
#
example: Human - Mouse Homologues (Aligned Peptide)
>Homo sapiens|12|ENSG00000182196|ENST00000315580|ortholog_one2one
MERAGPAGEEGGAREGRLLPRAPGAWVLRACAERAALEVGAASADTGVRGCGARGPAPLL
ASAGGGRARDGTWGVRTKGSGAALPSRPASRAAPRPEASSPPLPLEKARGGLSGPQGGRA
RGAMAHVGSRKRSRSRSRSR-G-RGSEKRKKKSRKDTSRNCSASTSQGRKASTAPGAEAS
PSPCITERSKQKARRRTRSSSSSSSSSSSSSSSSSSSSSSSSSDGRKKRGKYKDKRRKKK
KK--RKKLKKKGKEKAEA-QQVEALPGPSLDQWHRSAGEEEDGPVLTDEQKSRIQAMKPM
TKEEWDARQSIIRKVVDPETGRTRLIKGDGEVLEEIVTKERHREINKQATRGDCLAFQMR
AGLLP
>Mus musculus|5|ENSMUSG00000029404|ENSMUST00000031351|ortholog_one2one
------------------------------------------------------------
------------------------------------------------------------
---MAHVGSRKRSRSRSRSRSGRRGSEKRSKRSSKDASRNCSASRSQGHKAGSASGVE--
------ERSKHKAQRTSRSSSTSSSSS--SSSSA---SSSSSSDGRKKRAKHKEKKRKKK
KKKRKKKLKKRVKEKAVAVHQAEALPGPSLDQWHRSAGEDNDGPVLTDEQKSRIQAMKPM
TKEEWDARQSVIRKVVDPETGRTRLIKGDGEVLEEIVTKERHREINKQATRGDGLAFQMR
TGLLP
#
© 2024 Inserm. Hosted by genouest.org. This product includes software developed by Ensembl.