Polly can perform classical loop transformations, exploit OpenMP level parallelism, expose SIMDization opportunities. However, due to the lack of a machine-specific performance model and missing optimizations, these transformations sometimes lead to compile and execution time regressions, and the generated code is at least one order of magnitude off in comparison to the corresponding vendor implementations. The goal of the project is to reduce such influence through implementation of optimizations aimed to produce code compatible with the best implementations of BLAS and an attempt to avoid vectorization of loops, when it is not profitable for the target architecture. It could be a step in transformation of Polly into an optimization pass used in standard -O3 optimizations.




  • Tobias Grosser