Description of the EF-Hand Calcium-Binding Proteins Data Library

Description of Aspects of the EF-Hand Calcium-Binding Proteins Data Library

The EF-Hand Calcium-Binding Proteins Data Library (EF-Hand CaBP-DL) is a highly curated collection of sequence, structural, and functional information about the EF-Hand superfamily of calcium-binding proteins. It has been conceived, designed, and implemented by Melanie Nelson, a former graduate student in Walter Chazin's lab.

Information Integrity
Information Storage
Information Access
- browsable interface
- search interface
Data Feeding

Information Integrity

All information that is not obtained directly from another public database has been published in a peer-reviewed journal. Users of the data library can view the reference associated with a particular piece of information via the InfoCard. The InfoCards also hold information about who submitted the data to the EF-Hand CaBP-DL, and which library administrator checked the information and validated it for inclusion in the database. Every piece of information included in the data library has been validated by a library administrator.

Information Storage

There are two types of information stored in the data library: 1) Information that is stored in a relational database and written to browsable HTML pages on a regular schedule, and 2) information that is stored solely in HTML pages.

The majority of information is stored in the relational database, which is implemented in the PostgreSQL database management system. The database was designed using the relational paradigm, although normalization is occasionally broken for convenience of reporting the information. The entity-relationship model for the database is available online. As can be seen in this model, the information in the database is organized around proteins. Each mutant or isoform of a protein is considered a unique protein in the database, and is assigned a unique identifier (the prot_id). This allows storage of the calcium-binding constants, for instance, of two isoforms of parvalbumin from the same species, and clearly indicates that the two sets of binding constants are for two different chemical entities. It also allows storage of any type of information about a mutant, eliminating the need to predict which types of information about a mutant will be useful. All of the isoforms and mutants of a given protein are associated by a common group identifier (the group_id). This allows all of the isoforms and mutants of a given protein to be grouped together for reporting purposes. For instance, the three human isoforms of caltractin each have a distinct prot_id, but share a single group_id with each other and any other isoforms or mutants of caltractin from various species that are stored in the database. When the protein home page for caltractin is generated, information from all of the isoforms and mutants from various species is included.

Information Access

All information that is stored in the relational database can be accessed both by searching the database and by browsing the web pages in the data library. The web pages that are supported by the database are regenerated via Perl scripts on a regular schedule. The maintenance of the browsable interface is an important design feature of the EF-Hand CaBP-DL because it allows users to find information even if they do not have a clear idea of what they are trying to find. However, the ability to directly search the underlying database is also important, because it allows users to associate different types of information about the various proteins together. We believe that as the amount of information in the data library grows, this ability will allow the identification of unexpected correlations among different properties of the proteins.

The browsable interface for the EF-Hand CaBP-DL is divided into four main sections: general information (which includes functional information), sequence information, structural information, and analytical tools. There is also a section of links to other web resources and a picture gallery, as well as a collection of information with limited access (this information is mostly unpublished work from the Chazin lab, and can only be accessed from within the University and by some of our collaborators).

General Information includes the main entry point for information about a particular protein: the protein home pages. The home page for a protein summarizes or links to all information that is stored in the relational database about that protein. In addition to these pages, the general information section includes lists of references, evolutionary information, information about mutants of proteins, and information about proteins that are targets of EF-hand CaBPs.
Sequence Information includes alignments of the proteins included in the data library, information about the amino acid composition of the binding loops, and "home pages" for each position in the conserved EF-hand sequence motif. These individual amino acid home pages include the identity of the amino acid at that particular position in all species of all proteins included in the data library, as well as information about mutations at that position.
Structural Information includes links to the entries in the Protein Data Bank (PDB) for all known structures of EF-Hand calcium-binding proteins, NMR assignments of some proteins, and detailed structural information derived from the available structures. This detailed information includes interhelical angles, information about hydrogen bonds, dihedral angles, inter- residue contact lists, and solvent accessible surface area information.
Analytical Tools is a collection of web-based scripts to analyze the information in the data library. This is currently the most limited section of the library, and only two scripts are available at this time: a per residue solvent accessible surface calculator and HomologFinder, a script that reports the homologs in the other proteins of a residue input by the user.

The search interface for the relational database allows three types of searches:

A field-based system for retrieving specific information about specific proteins allows users to generate a table that directly compares the various properties of different proteins.
The text-based search form allows users to search for the protein or proteins in the database that satisfy the user's search constraints.
The reference search form allows users to find specific references that are stored in the data library. Each reference on the report page has a link to a page showing the information from that reference that is stored in the database.

Data Feeding

The data library will only be as useful as the information it contains. Therefore, it is imperative that its information content grows and is kept current. This is too large of a task for any one person or laboratory to undertake. Therefore, we have provided online forms for submitting new information to the data library. We hope that the larger community will help us to maintain and expand this resource by submitting information about EF-hand CaBPs to the database. Each piece of information that is submitted is reviewed by a library administrator before it is actually deposited in the database. Data integrity is also ensured by the requirement for the inclusion of the reference from which the information was obtained. Only data that has been published in a peer-reviewed article will be accepted into the database.

We welcome any ideas, suggestions, critiques, or comments about the current status of the data library and our future plans for it. Please e-mail: cabp_admin@structbio.vanderbilt.edu.