MPI all-2-all transposition

This module ed_hamiltonian_normal_common defines several variables shared across the Hamiltonian setup in the ed_mode = normal mode. It also contains the procedure vector_transpose_mpi() implementing the MPI Allv-2-Allv parallel transposition of a matrix. This is the key function of the massively parallel execution of matrix-vector products discussed in j.cpc.2021.108261.

Description

Global variables related to sector Hamiltonian construction. It contains the vector_transpose_mpi() implementing the MPI Allv-2-Allv parallel matrix transposition.

Quick access

Routines:: vector_transpose_mpi()

Used modules

sf_misc
- assert_shape()
sf_linalg
- kronecker_product()
- eye()
sf_sp_linalg
- sp_lanc_tridiag()
ed_input_vars: Contains all global input variables which can be set by the user through the input file. A specific preocedure ed_read_input() should be called to read the input file using parse_input_variable() procedure from SciFortran. All variables are automatically set to a default, looked for and updated by reading into the file and, sequentially looked for and updated from command line (std.input) using the notation variable_name=variable_value(s) (case independent).
ed_vars_global: Contains all variables, arrays and derived types instances shared throughout the code. Specifically, it contains definitions of the effective_bath, the gfmatrix and the sector data structures.
ed_bath: Contains routines for setting, accessing, manipulating and clearing the bath of the Impurity problem.
ed_aux_funx: Hosts a number of auxiliary procedures required in different parts of the code. Specifically, it implements: creation/annihilation fermionic operators, binary decomposition of integer representation of Fock states and setup the local impurity Hamiltonian
ed_sector: Contains procedures to construct the symmetry sectors corresponding to a given set of quantum numbers \(\vec{Q}\), in particular it allocated and build the sector_map connecting the states of a given sector with the corresponding Fock ones.
ed_setup: Contains procedures to set up the Exact Diagonalization calculation, executing all internal consistency checks and allocation of the global memory.

Subroutines and functions

subroutine ed_hamiltonian_normal_common/vector_transpose_mpi(nrow, qcol, a, ncol, qrow, b)

Performs the parallel transposition of the vector a , as a matrix of dimensions [nrow, qcol], using MPI AlltoAllV procedure, which transfers data such that the j-block, sent from the process i, is received by process j and placed as block i. This parallel transposition involves the minimum amount of data transfer necessary to execute the matrix-vector product, removing the communicational congestion and unlocking optimal parallel scaling.

See j.cpc.2021.108261 for a detailed description of the algorithm implemented in this procedure.

Options:

nrow [integer] – Global number of rows
qcol [integer] – Local number of columns on each thread
ncol [integer] – Global number of columns
qrow [integer] – Local number of rows on each thread

Parameters:

a (nrow, qcol) [real] – Input vector to be transposed
b (ncol, qrow) [real] – Output vector \(b = v^T\)