Neuer Compiler GCC 6 mit HSA- und Zen-Support

Der schon länger angekündigte Compiler GCC in der Version 6 steht offenbar kurz vor der Veröffentlichung. GCC ist der Standard-Compiler unter Linux, mit dem die C‑Quellcodes in ausführbare Maschinensprache übersetzt werden.

Neu in der Version 6 ist die Unterstützung für AMDs kommende CPU-Architektur Zen sowie für HSA, also der zur Lösung einer Aufgabe gemeinsam von CPU und GPU genutzte Speicher, erstmals unterstützt in AMDs Kaveri-APUs, voll in Carrizo.

Hier die für die x86-Welt relevante Featureliste von GCC 6:

Heterogeneous Systems Architecture

GCC can now generate HSAIL (Heterogeneous System Architecture Intermediate Language) for simple OpenMP device constructs if configured with –enable-offload-targets=hsa. A new libgomp plugin then runs the HSA GPU kernels implementing these constructs on HSA capable GPUs via a standard HSA run time.

If the HSA compilation back end determines it cannot output HSAIL for a particular input, it gives a warning by default. These warnings can be suppressed with ‑Wno-hsa. To give a few examples, the HSA back end does not implement compilation of code using function pointers, automatic allocation of variable sized arrays, functions with variadic arguments as well as a number of other less common programming constructs.

When compilation for HSA is enabled, the compiler attempts to compile composite OpenMP constructs
#pragma omp target teams distribute parallel for
into parallel HSA GPU kernels.

IA-32/x86-64

GCC now supports the Intel CPU named Skylake with AVX-512 extensions through ‑march=skylake-avx512. The switch enables the following ISA extensions: AVX-512F, AVX512VL, AVX-512CD, AVX-512BW, AVX-512DQ.

Support for new AMD instructions monitorx and mwaitx has been added. This includes new intrinsic and built-in support. It is enabled through option ‑mmwaitx. The instructions monitorx and mwaitx implement the same functionality as the old monitor and mwait instructions. In addition mwaitx adds a configurable timer. The timer value is received as third argument and stored in register %ebx.

x86-64 targets now allow stack realignment from a word-aligned stack pointer using the command-line option ‑mstackrealign or __attribute__ ((force_align_arg_pointer)). This allows functions compiled with a vector-aligned stack to be invoked from objects that keep only word-alignment.

Support for address spaces __seg_fs, __seg_gs, and __seg_tls. These can be used to access data via the %fs and %gs segments without having to resort to inline assembly. Please refer to the documentation for usage instructions.

Support for AMD Zen (family 17h) processors is now available through the ‑march=znver1 and ‑mtune=znver1 options.

Leider werden die meisten Distributionen und Anwendungen nicht automatisch in den Genuss der Optimierungen kommen, da die Pakete in der Regel vorkompiliert aus Repositorys geladen werden. Wer optimierten Code möchte, muss eine Distribution wählen, die aus Quellcode kompiliert wird oder sich wenigstens die relevanten Anwendungen selbst aus dem Quellcode kompilieren, was bei Open-Source-Software zumindest theoretisch machbar ist – die entsprechenden Fähigkeiten vorausgesetzt.

Wieviel optimierter Code bringen kann, sieht man an den GCC-Kompilaten mit verschiedenen CPU-Flags, die wir vor einiger Zeit für die Bulldozer-Architektur veröffentlicht haben. Hier ein Kommentar dazu aus dem damaligen Artikel:

Wenig überraschend dauert das Encoden einer WAV ins MP3-Format weniger lang, je stärker das Kompilat auf die CPU-Architektur optimiert ist. Die Unterschiede innerhalb der Kompilate jedoch liegen immer auf ähnlichem Niveau. Steamroller ist ~5 % schneller als Piledriver und Excavator legt noch mal 10 Prozentpunkte drauf. So liegen zwischen dem Encodieren mit dem Standard-Kompilat auf Piledriver und dem des maximal optimierten auf Excavator immerhin 65 % Performance-Gewinn für dieselbe WAV bei identischer Konfiguration und identischem Takt. Hallo Software-Entwickler! Hier schlummert Potenzial, das einfach per Compiler-Flag aktiviert werden kann!

Quelle: GCC 6 Release Series Changes, New Features, and Fixes

Links zum Thema:

Gerücht: Zen-Engineering-Samples mit 3.0 GHz an Partner verschickt (06.04.2016)
Zen mit L0-Cache und µOp-Puffer (29.02.2016)
Zen im Zeitplan und AMD-APUs 2017 mit Konsolen-Performance (27.11.2015)
Analyse der vermuteten Zen-Architektur (06.10.2015)
AMD Piledriver vs. Steamroller vs. Excavator — Leistungsvergleich der Architekturen (14.08.2015)
AMD veröffentlicht HSA-Open-Source-Laufzeitumgebung für Linux (14.11.2014)
AMD kündigt Veröffentlichung von Open-Source-HSA-Softwarestack an (10.11.2014)