Grosset, A. V. PascalPrasad, ManasaChristensen, CameronKnoll, AaronHansen, CharlesC. Dachsbacher and P. Navrátil2015-05-242015-05-242015https://doi.org/10.2312/pgv.20151157Modern supercomputers have very powerful multi-core CPUs. The programming model on these supercomputer is switching from pure MPI to MPI for inter-node communication, and shared memory and threads for intra-node communication. Consequently the bottleneck in most systems is no longer computation but communication between nodes. In this paper, we present a new compositing algorithm for hybrid MPI parallelism that focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a direct send stage where nodes are arranged in groups and exchange regions of an image, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting, show strong scaling results and explain how we generally achieve better performance than these two algorithms.I.3.1 [Computer Graphics]Hardware ArchitectureParallel processingI.3.2 [Computer Graphics]Graphics SystemsDistributed/network graphicsTOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism10.2312/pgv.2015115767-76