Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial timeGeneralized unimodular loop transformations for distributed memory multiprocessors