Journal article icon

Journal article

Manycore algorithms for batch scalar and block tridiagonal solvers

Abstract:
Engineering, scientific, and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on manycore systems, the choice between different tridiagonal solution algorithms, such as Thomas, Cyclic Reduction (CR) or Parallel Cyclic Reduction (PCR) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for CPU, Intel MIC, and NVIDIA GPU with a focus on minimizing the amount of data transfer to and from the main memory using novel algorithms and the register-blocking mechanism, and maximizing the achieved bandwidth. It also considers block tridiagonal solutions, which are sometimes required in Computational Fluid Dynamic (CFD) applications. A novel work-sharing and register blocking--based Thomas solver is also presented.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1145/2956571

Authors

More by this author
Institution:
University of Oxford
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Mathematical Institute
Role:
Author



Publisher:
Association for Computing Machinery
Journal:
ACM Transactions on Mathematical Software More from this journal
Volume:
42
Issue:
4
Article number:
31
Publication date:
2016-06-30
Acceptance date:
2015-09-01
DOI:
EISSN:
1557-7295
ISSN:
0098-3500


Keywords:
Pubs id:
pubs:570072
UUID:
uuid:6985082c-8c58-4549-bfb8-6051b72b1fdc
Local pid:
pubs:570072
Source identifiers:
570072
Deposit date:
2015-10-13
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP