Cosma

Jul 20, 2023

Distributed communication-optimal matrix multiplication algorithm

COSMA is a parallel, high-performance, GPU-accelerated, matrix-matrix multiplication algorithm that is communication-optimal for all combinations of matrix dimensions, number of processors and memory sizes, without the need for any parameter tuning. The key idea behind COSMA is to first derive a tight optimal sequential schedule and only then parallelize it, preserving I/O optimality between processes. This stands in contrast with the 2D and 3D algorithms, which fix process domain decomposition upfront and then map it to the matrix dimensions, which may result in asymptotically more communication. The final design of COSMA facilitates the overlap of computation and communication, ensuring speedups and applicability of modern mechanisms such as RDMA. COSMA allows to not utilize some processors in order to optimize the processor grid, which reduces the communication volume even further and increases the computation volume per processor.



Checkout these related ports:
  • Zn_poly - C library for polynomial arithmetic
  • Zimpl - Language to translate the LP models into .lp or .mps
  • Zegrapher - Software for plotting mathematical objects
  • Zarray - Dynamically typed N-D expression system based on xtensor
  • Z3 - Z3 Theorem Prover
  • Yices - SMT solver
  • Yacas - Yet Another Computer Algebra System
  • Xtensor - Multi-dimensional arrays with broadcasting and lazy computing
  • Xtensor-python - Python bindings for xtensor
  • Xtensor-io - Xtensor plugin to read/write images, audio files, numpy npz and HDF5
  • Xtensor-blas - BLAS extension to xtensor
  • Xspread - Spreadsheet program for X and terminals
  • Xppaut - Graphical tool for solving differential equations, etc
  • Xplot - X11 plotting package
  • Xlife++ - XLiFE++ eXtended Library of Finite Elements in C++