Contributor
Nikolaos Chatzikonstantinou

Benchmarking Parallel Performance of Numerical MPI Packages


Mentors
Francesco Ballarin, Drew Parsons
Organization
Debian
Technologies
python, c++, mpi, fortran, Plotly, Debian, BLAS, dpkg, fenics, schroot, Scalapack, PT-SCOTCH, Hypre, MUMPS, PETSc, NWChem
Topics
continuous integration, benchmarking, parallelism, Regression Testing, Software Distribution, Numerical Libraries
Deliver an automated method for Debian maintainers to test selected numerical Debian packages for their parallel performance in clusters, in particular to catch performance regressions from updates, and to verify expected performance gains, such as Amdahl’s and Gustafson’s law, from increased cluster resources. A number of numerical packages (typically, software libraries) available from the Debian repositories are designed to make use of parallelism. I will identify key functions to benchmark for, and key parameters to benchmark with. Those parameters may be: function input, such as benchmarking for nontrivial data, or for the cluster environment, such as the number of nodes, or implementations of the dependencies (e.g. BLAS), or the allowed deviation from expected benchmark metric (e.g. running time) before a regression is said to occur. Cluster tests will be configured for execution on the Grid5000 network (https://www.grid5000.fr/) in partnership with project mentor and Debian developer Prof. Drew Parsons. The FEniCS suite of packages for finite element computation will be used as a target case study to develop test protocols. The protocols will then be applied to a selection of other parallelised packages. Finally I will report the information in web pages for ease of consumption. To enable more exhaustive testing of Debian parallel packages, I will conduct an audit of the official Debian repository (and online) for keywords that will bring up numerical packages that use parallelism, beyond the set identified above. I will read their documentation and research the problems involved to find the right functions and data to test for, and I will use the initial data gathered during my benchmarks to establish a baseline for which the regression tests will test against in the future. I will consult references for benchmarking numerical packages to make sure my metrics are meaningful. I will use Plotly to produce graphs for the metrics shown in the web page.