On the conferences in Pisa (GPU Computing in High Energy Physics) and Rome (Perspectives of GPU Computing in Physics and Astrophysics) we presented a new version of our code with improved performance and a lot of new features which allow a wider range of application.

The code is not yet ready for publishing, but will be available soon.

With the new code we reach more than 80% of the theoretical peak throughput that is achievable with the 12 parameter representation for SU(3) which is a major improvement compared to cuLGT1 (the version which is currently on this website).

Besides this nice performance improvements we have a completely rewritten modularized code which makes it very easy to include the gauge fixing procedures into existing code.

Performane for the Landau gauge overrelaxation code on GTX580 on a 32^4 lattice in SP
Performane for the Landau gauge overrelaxation code on a Geforce GTX 580 on a 32^4 lattice in single precision.


Performance on different GPU for the same code and lattice. Mixed precision (MP) is basically single precision but with the innermost part of the update calculated in double precision, which gives better numerical stability.