Journal article icon

Journal article

Using phylogeny to improve genome-wide distant homology recognition

Abstract:
The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10^-8 for SUPERFAMILY and >10^-4 for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method.
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Statistics
Research group:
Oxford Protein Informatics Group
Role:
Author


Publisher:
Public Library of Science
Journal:
PLoS Computational Biology More from this journal
Volume:
3
Issue:
1
Article number:
e3
Publication date:
2007-01-01
Edition:
Publisher's version
DOI:
EISSN:
1553-7358
ISSN:
1553-734X


Language:
English
Keywords:
Subjects:
UUID:
uuid:00e41229-1c9f-4c1d-ac3a-b51d34bbbe8f
Local pid:
ora:1810
Deposit date:
2008-03-20

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP