Patch-based separable transformer for visual recognition

Sun, S; Yue, X; Zhao, H; Torr, PHS; Bai, S

Journal article

Patch-based separable transformer for visual recognition

Abstract:: The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work [9] significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g. object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Sun, S., Yue, X., Zhao, H., Torr, P. H. S., & Bai, S. (2022). Patch-based separable transformer for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 9241–9247.

MLA Style

Sun, S., et al. “Patch-Based Separable Transformer for Visual Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 7, IEEE, 2022, pp. 9241–47.

Chicago Style

Sun, S, X Yue, H Zhao, PHS Torr, and S Bai. 2022. “Patch-Based Separable Transformer for Visual Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (7): 9241–47.
Share
Print

Access Document

Files:: Sun_et_al_2022_Patch_based_separable.pdf

(Preview, Accepted manuscript, pdf, 709.5KB, Terms of use)

Publisher copy:: 10.1109/TPAMI.2022.3231725

Authors

+ Sun, S More by this author

Role:: Author

+ Yue, X More by this author

Role:: Author

+ Zhao, H More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Torr, PHS More by this author

Role:: Author

+ Bai, S More by this author

Role:: Author

Publisher:: IEEE
Journal:: IEEE Transactions on Pattern Analysis and Machine Intelligence More from this journal
Volume:: 45
Issue:: 7
Pages:: 9241 - 9247
Publication date:: 2022-12-23
Acceptance date:: 2022-12-18
DOI:: 10.1109/TPAMI.2022.3231725
EISSN:: 1939-3539
ISSN:: 0162-8828

Language:: English
Keywords:: object detection

FFR

image classification

instance segmentation

transformer
Pubs id:: 1325768
Local pid:: pubs:1325768
Deposit date:: 2023-02-10

Terms of use

Copyright holder:: IEEE
Rights statement:: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Notes:: This is the accepted manuscript version of the article. The final version is available from IEEE at: 10.1109/TPAMI.2022.3231725

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Patch-based separable transformer for visual recognition

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Patch-based separable transformer for visual recognition

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions