Parallel QR factorization using givens rotations in MPI-CUDA for multi-GPU

Miguel Tapia-Romero; Amilcar Meneses-Viveros; Erika Hernandez-Rubio

doi:10.14569/IJACSA.2020.0110578

Parallel QR factorization using givens rotations in MPI-CUDA for multi-GPU

Miguel Tapia-Romero, Amilcar Meneses-Viveros, Erika Hernandez-Rubio

Escuela Superior de Cómputo (ESCOM)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Modern supercomputers incorporate the use of multi-core processors and graphics processing units. Applications running on these computers take advantage of these technologies with scalable programs that work with multicores and accelerator such as graphics processing unit. QR factorization is essential for several numerical tasks, such as linear equations solvers, compute inverse matrix or compute a diagonal matrix, to name a few. There are several factorization algorithm such as LU, Cholesky, Givens and Householder, among others. The efficient parallel implementation of each parallelization algorithm will depend on the structure of the data and the type of parallel architecture used. A common strategy in parallel programming is to break a problem into subproblems to solve them in different processing units. This is very useful when dealing with complex problems or when the data is too large to work with the available memory. However, it is not clear how data partitioning affects subtask performance when mapping to processing units, specifically to graphical processing units. This work explores the partitioning of large symmetric matrix data for QR factorization using Givens rotations and its parallel implementation using MPI and CUDA is presented.

Original language	English
Pages (from-to)	636-645
Number of pages	10
Journal	International Journal of Advanced Computer Science and Applications
Volume	11
Issue number	5
DOIs	https://doi.org/10.14569/IJACSA.2020.0110578
State	Published - 2020

Keywords

CUDA
Givens factorization
Heterogeneous programming
Scalable parallelism

Access to Document

10.14569/IJACSA.2020.0110578

Cite this

@article{774643209aac4370a62ce6b6fdbed17b,

title = "Parallel QR factorization using givens rotations in MPI-CUDA for multi-GPU",

abstract = "Modern supercomputers incorporate the use of multi-core processors and graphics processing units. Applications running on these computers take advantage of these technologies with scalable programs that work with multicores and accelerator such as graphics processing unit. QR factorization is essential for several numerical tasks, such as linear equations solvers, compute inverse matrix or compute a diagonal matrix, to name a few. There are several factorization algorithm such as LU, Cholesky, Givens and Householder, among others. The efficient parallel implementation of each parallelization algorithm will depend on the structure of the data and the type of parallel architecture used. A common strategy in parallel programming is to break a problem into subproblems to solve them in different processing units. This is very useful when dealing with complex problems or when the data is too large to work with the available memory. However, it is not clear how data partitioning affects subtask performance when mapping to processing units, specifically to graphical processing units. This work explores the partitioning of large symmetric matrix data for QR factorization using Givens rotations and its parallel implementation using MPI and CUDA is presented.",

keywords = "CUDA, Givens factorization, Heterogeneous programming, Scalable parallelism",

author = "Miguel Tapia-Romero and Amilcar Meneses-Viveros and Erika Hernandez-Rubio",

note = "Publisher Copyright: {\textcopyright} 2020 Science and Information Organization.",

year = "2020",

doi = "10.14569/IJACSA.2020.0110578",

language = "Ingl{\'e}s",

volume = "11",

pages = "636--645",

journal = "International Journal of Advanced Computer Science and Applications",

issn = "2158-107X",

publisher = "Science and Information Organization",

number = "5",

}

TY - JOUR

T1 - Parallel QR factorization using givens rotations in MPI-CUDA for multi-GPU

AU - Tapia-Romero, Miguel

AU - Meneses-Viveros, Amilcar

AU - Hernandez-Rubio, Erika

PY - 2020

Y1 - 2020

N2 - Modern supercomputers incorporate the use of multi-core processors and graphics processing units. Applications running on these computers take advantage of these technologies with scalable programs that work with multicores and accelerator such as graphics processing unit. QR factorization is essential for several numerical tasks, such as linear equations solvers, compute inverse matrix or compute a diagonal matrix, to name a few. There are several factorization algorithm such as LU, Cholesky, Givens and Householder, among others. The efficient parallel implementation of each parallelization algorithm will depend on the structure of the data and the type of parallel architecture used. A common strategy in parallel programming is to break a problem into subproblems to solve them in different processing units. This is very useful when dealing with complex problems or when the data is too large to work with the available memory. However, it is not clear how data partitioning affects subtask performance when mapping to processing units, specifically to graphical processing units. This work explores the partitioning of large symmetric matrix data for QR factorization using Givens rotations and its parallel implementation using MPI and CUDA is presented.

AB - Modern supercomputers incorporate the use of multi-core processors and graphics processing units. Applications running on these computers take advantage of these technologies with scalable programs that work with multicores and accelerator such as graphics processing unit. QR factorization is essential for several numerical tasks, such as linear equations solvers, compute inverse matrix or compute a diagonal matrix, to name a few. There are several factorization algorithm such as LU, Cholesky, Givens and Householder, among others. The efficient parallel implementation of each parallelization algorithm will depend on the structure of the data and the type of parallel architecture used. A common strategy in parallel programming is to break a problem into subproblems to solve them in different processing units. This is very useful when dealing with complex problems or when the data is too large to work with the available memory. However, it is not clear how data partitioning affects subtask performance when mapping to processing units, specifically to graphical processing units. This work explores the partitioning of large symmetric matrix data for QR factorization using Givens rotations and its parallel implementation using MPI and CUDA is presented.

KW - CUDA

KW - Givens factorization

KW - Heterogeneous programming

KW - Scalable parallelism

UR - http://www.scopus.com/inward/record.url?scp=85085772673&partnerID=8YFLogxK

U2 - 10.14569/IJACSA.2020.0110578

DO - 10.14569/IJACSA.2020.0110578

M3 - Artículo

AN - SCOPUS:85085772673

SN - 2158-107X

VL - 11

SP - 636

EP - 645

JO - International Journal of Advanced Computer Science and Applications

JF - International Journal of Advanced Computer Science and Applications

IS - 5

ER -

Parallel QR factorization using givens rotations in MPI-CUDA for multi-GPU

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this