Implementing a parallel C++ runtime system for scalable parallel systemsSynchronization minimization in a SPMD execution model