Protein Domains & Families

InterProScan

Integrated search in PROSITE, Pfam, PRINTS and other family and domain databases. InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

http://www.ebi.ac.uk/Tools/pfa/iprscan/

CDD Search

Conserved Domain Database Search @ NCBI

http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

PANTHER Families

PANTHER version 7.0 contains 6594 protein families, each with a phylogenetic tree relating modern-day genes in 48 organisms. Expert biologists have divided each family into subfamilies, which are generally orthologous groups but may also contain recently duplicated paralogs. Each family and subfamily is also represented as a hidden Markov model (HMM), which can be used to classify new sequences to an existing subfamily.

http://www.pantherdb.org/panther/

TIGRFAMs

TIGRFAMs are protein families based on Hidden Markov Models or HMMs. TIGRFAMs is a resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins.

http://www.tigr.org/TIGRFAMs/index.shtml

Pfam

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

http://pfam.sanger.ac.uk/

Search Pfam

Find Pfam families within your sequence of interest.

http://pfam.sanger.ac.uk/search?tab=searchSequenceBlock

PRODOM

ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases

http://prodom.prabi.fr/prodom/current/html/home.php

DOUTfinder

Try DOUTfinder analysis of your protein, which will you help evaluate the subsignificant domain hits when other databases have failed.

http://mendel.imp.ac.at/dout/

SYSTERS

SYSTERS (short for SYSTEmatic Re-Searching) is a collection of graph-based algorithms to hierarchically partition a large set of protein sequences into homologous families and superfamilies. The methods unified now under the name SYSTERS (short for SYSTEmatic Re-Searching) are based on an all-against-all database search (using Smith-Waterman comparisons on a GeneMatcher machine).

http://systers.molgen.mpg.de/

CDART

The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved domains in proteins.

http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps

PANDIT

PANDIT is a collection of multiple sequence alignments and phylogenetic trees covering many common protein domains.

http://www.ebi.ac.uk/goldman-srv/pandit/

AnDom

AnDom helps to assign structual domains to protein sequences and to classify them according to SCOP.

http://coot.embl.de/AnDom/Usage.html

SUPERFAMILY

SUPERFAMILY is a database of structural and functional protein annotations for all completely sequenced organisms.

http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/

ProtMap

Proteins from complete genomes have been clustered by sequence similarity into groups - COGs, or in case of viruses, VOGs. Genome ProtMap maps each protein from a COG/VOG back to its genome, and displays all the genomic segments coding for members of this particular group of related proteins.

http://www.ncbi.nlm.nih.gov/sutils/protmap.cgi?cluster=COG4690E&result=map

ProtClustDB

The NCBI Entrez Protein Clusters database is a collection of Reference Sequence (RefSeq) proteins from the complete genomes of prokaryotes, plasmids, and organelles grouped and annotated based on sequence similarity and protein function.

http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters

PROSITE

PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them

http://www.expasy.ch/prosite/

ScanProsite

Scans a sequence against PROSITE or a pattern against the UniProt Knowledgebase (Swiss-Prot and TrEMBL)

http://www.expasy.ch/tools/scanprosite/

HAMAP

High-quality Automated and Manual Annotation of microbial Proteomes. HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.

http://www.expasy.ch/sprot/hamap/

svmprot

SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence.

http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi

PIRSF

The PIRSF classification system is based on whole proteins rather than on the component domains; therefore, it allows annotation of generic biochemical and specific biological functions, as well as classification of proteins without well-defined domains.

http://pir.georgetown.edu/pirsf/

CDTree

CDTree: a protein domain hierarchy viewer and editor

http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml

EVEREST

EVEREST is an automatic identification and classification of protein domains. EVEREST combines methodologies from the fields of finite metric spaces, machine learning and statistical modeling and achieves state of the art results. Our process begins by constructing a database of protein segments that emerge in an all vs. all pairwise sequence comparison.

http://www.everest.cs.huji.ac.il/index.php

ProtoNet

ProtoNet provides automatic hierarchical classification of protein sequences. The site allows users to study the clustering as well as its qualities.

http://www.protonet.cs.huji.ac.il/index.php

Pandora

PANDORA: keyword-based analysis of protein sets by integration of annotation sources.

http://www.pandora.cs.huji.ac.il/

Jevtrace2

Jevtrace is a implementation of the evolutionary trace method. The software expands on the evolutionary trace by allowing manipulation of the input data and parameters of analysis, and presents a number of novel tree inspired analysis of protein families.

http://compbio.berkeley.edu/people/marcin/jevtrace/

BLOCKS

Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. BLOCKS is being discontinued.

http://blocks.fhcrc.org/blocks/blocks_search.html

SBASE

SBASE is a collection of protein domain sequences collected from the literature, from protein sequence databases and from genomic databases (Vlahovicek et al, 2002). The protein domains are defined by their sequence boundaries given by the publishing authors or in one of the primary sequence databases (Swiss-Prot, PIR, TREMBL etc.). Domain groups are included if they have well defined sequence boundaries, and if they can be distinguished from other sequences using a similarity search technique.

http://hydra.icgeb.trieste.it/sbase/

mkdom

mkdom 2 is the program used to build the ProDom database.

http://prodom.prabi.fr/prodom/xdom/welcome.html

CluSTr

The CluSTr database offers an automatic classification of UniProt Knowledgebase and IPI proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE,

http://www.ebi.ac.uk/clustr/