This document describes the tables that make up the Ensembl Regulation schema. Tables are grouped logically by their function, and the purpose of each table. Web front-end derived from Ensembl webcode, Ensembl schema databases. WormBase Parasite, Website presenting draft genome sequences for helminths. This creates the schema for the empty database you created in step 3. Note that we are using the example MySQL settings of /data/mysql as the install directory.

Author: Gardakinos Mejas
Country: Angola
Language: English (Spanish)
Genre: Love
Published (Last): 2 March 2006
Pages: 71
PDF File Size: 2.5 Mb
ePub File Size: 7.80 Mb
ISBN: 950-4-91491-809-2
Downloads: 79270
Price: Free* [*Free Regsitration Required]
Uploader: Bragore

This document describes the tables that make up the Ensembl Compara schema. Tables are listed grouped in different categories, and the purpose of each table is explained. Several examples are also given. They are intended to allow people to familiarise themselves with the schema. The overall diagram can be found here. It is used to store options on clades and group of species. It has been initially developed for the gene tree view. Each category corresponds to data stored in different tables.

This table contains information about the comparisons stored in the database. This table contains the distribution of the gene order conservation scores. Species-tree used in the Compara analyses incl.

Installing the Ensembl Data

This table contains different ense,bl, aliases and meta data for the taxa used in Ensembl. This table stores species trees used in compara. Description of the genomes assembly, sequences, genes, etc. This table contains information about the version of the genome assemblies used in this database. This table defines the genomic sequences used in the comparative genomics analyisis.

This table includes alternative sequences for Member, like sequences with flanking regions. These tables store information about genomic alignments in the Scheja schema. Contains all the syntenic relationships found and the relative orientation of both syntenic regions.

This table contains the genomic regions corresponding to every synteny relationship found. There are two ensekbl regions for every synteny relationship. This table is the key table for the genomic alignments.

This allows the user to also access the tree alignments as normal multiple alignments. This table is used to index tree alignments, e. These alignments include inferred ancestral sequences. The tree required to index these sequences is stored in this table.

This table stores the structure of the tree. This table contains the coordinates and all the information needed to rebuild genomic alignments. Every entry corresponds to one of the aligned sequences. Several scores are stored per row. These tables store information about gene alignments, trees and homologies.


This table stores the raw local alignment results of peptide to peptide alignments returned by a BLAST run. The hits are actually stored in species-specific tables rather than in a single table. The following query corresponds to a particular hit found between a Homo sapiens protein and a Anolis carolinensis protein: This table holds the gene tree data structure, such as root, relation between parent and child, leaves, etc In our data structure, all the trees of a given clusterset are arbitrarily connected to the same root.

This eases to store and query in the same database the data from independant tree building analysis. Hence the “biological roots” of the trees are the children nodes of the main clusterset root.

See the examples below. The following query returns the root nodes of the independant xchema trees stored in the database. The database is able to contain several sets of trees computed on the same genes. This table contains arbitrary data related to gene-trees.

This table contains all the genomic homologies. The following query defines a pair of paralogous xenopous genes. This table contains the sequences corresponding to every genomic homology relationship found. As written in the homology table section, both schema and API can deal with more than pairwise relationships. The cigar line defines the sequence of matches or mismatches and deletions in the alignment. The shcema query refers to the two homologue sequences from the first xenopus’ paralogy object.

Gene and peptide sequence of the second homologue can retrieved in the same way. Various member gene and proteins related information stored in the database, either loaded from Core databases ense,bl aggregated from Compara analyses.

Ensembl Core – Schema documentation

This table stores data about projected transcripts in the gene-annotation processwhich is used to help the clustering. This table can only be used when both genomes have been loaded. This table stores cross-references for gene members derived from the core databases.

It is used by Bio:: This table contains all the group homologies found. This table contains the proteins corresponding to protein family relationship found. The following query refers to the members of the protein family PTHR This table stores different HMM-based profiles used and produced by gene trees. History of the gene-tree and family IDs across different versions of Ensembl. We shall treat empty strings as NULLs.

List of the tables: Show columns [Back to top]. This query defines which API version must be used to access this database.


These are our current entries: This query shows all the EPO alignments in this database: This examples shows how to get the lineage for Homo sapiens: Here is an example on how to get the taxonomic ID for a species: Imported from the core databases locator varchar NULL Used for production purposes or for user configuration in in-house installation.

The numeric identifier of the codon-table that applies to this dnafrag https: It shows the dnafrag the member is on.

May be 0 when the sequence is not available in the sequence table, e. Whether there are SeqEdits that modify the transcript sequence. Whether there are SeqEdits that modify the protein sequence. The following query refers to the LastZ alignment between medaka and zebrafish: This column defines the window size used to calculate the average score key: Shows the target of the projection, i. Shows the source of the projection key: The percentage of identity between the two members. Indicates the gene to which the xref applies.

Indicates to which external database the xref belongs. We are working in getting IDs stable between releases. This prevents the date from chaging even if we accidentally remove the entry and have to re-load it. Follow us Blog Twitter Facebook. Name of the species set e. Used to match similar types. Source of the data. Currently either “ensembl” or “ucsc” if data were imported from UCSC. Boolean value which defines whether this rank is used or not in the abbreviated lineage.

Represents which genome the dnafrag is part of. When this happens, the exon coordinates don’t match the transcript sequence. When this happens, the protein sequence doesn’t match the transcript sequence.

Used for pairwise comparison. Defines the percentage of identity between both sequences. Level of orthologous layer.

Used in self alignments to ensure only one Bio:: GenomicAlignBlock is visible when you have more than 1 block covering the same region. The scores are stored at different resolution levels. This column defines the window size used to calculate the average score.

The difference between the expected and observed variation, i.