Journal article icon

Journal article

ProteinGLUE multi-task benchmark suite for self-supervised protein modeling

Abstract:
Self-supervised language modeling is a rapidly developing approach for the analysis of protein sequence data. However, work in this area is heterogeneous and diverse, making comparison of models and methods difficult. Moreover, models are often evaluated only on one or two downstream tasks, making it unclear whether the models capture generally useful properties. We introduce the ProteinGLUE benchmark for the evaluation of protein representations: a set of seven per-amino-acid tasks for evaluating learned protein representations. We also offer reference code, and we provide two baseline models with hyperparameters specifically trained for these benchmarks. Pre-training was done on two tasks, masked symbol prediction and next sentence prediction. We show that pre-training yields higher performance on a variety of downstream tasks such as secondary structure and protein interaction interface prediction, compared to no pre-training. However, the larger base model does not outperform the smaller medium model. We expect the ProteinGLUE benchmark dataset introduced here, together with the two baseline pre-trained models and their performance evaluations, to be of great value to the field of protein sequence-based property prediction. Availability: code and datasets from https://github.com/ibivu/protein-glue
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Publisher copy:
10.1038/s41598-022-19608-4
Publication website:
https://www.nature.com/articles/s41598-022-19608-4.pdf

Authors

More by this author
Institution:
University of Oxford
Role:
Author
ORCID:
0000-0002-3757-5313
More by this author
Role:
Author
ORCID:
0000-0002-5464-822X
More by this author
Role:
Author
ORCID:
0000-0002-7971-6209
More by this author
Role:
Author
ORCID:
0000-0003-0298-873X
More by this author
Role:
Author
ORCID:
0000-0002-0189-5817


Publisher:
Nature Research
Journal:
Scientific Reports More from this journal
Volume:
12
Issue:
1
Pages:
16047-16047
Article number:
16047
Publication date:
2022-09-26
DOI:
EISSN:
2045-2322
ISSN:
2045-2322


Language:
English
Keywords:
Pubs id:
2397001
Local pid:
pubs:2397001
Source identifiers:
W4297243376
Deposit date:
2026-03-31
ARK identifier:
This ORA record was generated from metadata provided by an external service. It has not been edited by the ORA Team.

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP