Common Feature Tables

Information Information Software Ensembl Core Database Schema

Common Feature Tables

The dna_align_feature table

seq_region_id, seq_region_start, seq_region_end, seq_region_strand: sequence position of the alignment.
hit_name the name or primary id of the aligned sequence in the database, described in the associated analysis.
hit_start, hit_end, hit_strand: sequence position of the aligned stretch of DNA on the hit_name sequence.
score, evalue, perc_ident selected quality numbers for the alignment. The API supports score as a filter criterion.
cigar_line a string that describes the alignment. Pieces of matching sequence M, deleted bases D and inserted bases I.

The protein_align_feature table

Describes an alignment between an external peptide sequence and a piece of Ensembl DNA sequence. It is equivalent to the dna_align_feature table apart from:

there is no hit_strand as protein sequence cannot have a negative strand
the interpretation of the hit coordinates, as they are amino acid counts.
the cigar_string uses DNA basepair counts for the alignment description.

The repeat_features and repeat_consensus tables

Repetitive regions on the genome are stored in this table. All repeats are further classified by the repeat_consensus table.
repeat_start, repeat_end and score are retrievable by the API but not used by it (the API cannot filter by this criteria). Repeats are retrievable by their different consensus sequence (type).

The prediction_transcripts table

A prediction transcript is a gene prediction of low quality where there is not much information provided (compared to the information for a gene, see below). Prediction transcripts link to one or more prediction exons and an analysis, which generated this prediction (and the prediction exons).

Prediction exons

rank The position of the exon, starting from 1, in the transcript from the 5 prime end.
start_phase describes whether the prediction exon starts with the first, second or third base in a codon.
- Phase 0: starts with the first base
- Phase 1: starts with the second base
- Phase 2: starts with the third base of the codon
The score and pvalue are used to store scores for the prediction algorithm (no filter criteria).

.

Common Feature Tables

The dna_align_feature table

The protein_align_feature table

The repeat_features and repeat_consensus tables

The prediction_transcripts table

Prediction exons

GermOnline