On the conferences in Pisa (GPU Computing in High Energy Physics) and Rome (Perspectives of GPU Computing in Physics and Astrophysics) we presented a new version of our code with improved performance and a lot of new features which allow a wider range of application.
The code is not yet ready for publishing, but will be available soon.
With the new code we reach more than 80% of the theoretical peak throughput that is achievable with the 12 parameter representation for SU(3) which is a major improvement compared to cuLGT1 (the version which is currently on this website).
Besides this nice performance improvements we have a completely rewritten modularized code which makes it very easy to include the gauge fixing procedures into existing code.

