TOP500 Expands Exaflops Capacity Amidst Low Turnover

FRANKFURT, Ger­ma­ny; BERKELEY, Calif.; and KNOXVILLE, Tenn.—The 56th edi­ti­on of the TOP500 saw the Japa­ne­se Fug­a­ku super­com­pu­ter soli­di­fy its num­ber one sta­tus in a list that reflects a flat­tening per­for­mance growth cur­ve.  Alt­hough two new sys­tems mana­ged to make it into the top 10, the full list recor­ded the smal­lest num­ber of new ent­ries sin­ce the pro­ject began in 1993.

The ent­ry level to the list moved up to 1.32 peta­flops on the High Per­for­mance Lin­pack (HPL) bench­mark, a small increase from 1.23 peta­flops recor­ded in the June 2020 ran­kings. In a simi­lar vein, the aggre­ga­te per­for­mance of all 500 sys­tems grew from 2.22 exa­flops in June to just 2.43 exa­flops on the latest list. Like­wi­se, avera­ge con­cur­ren­cy per sys­tem bare­ly increased at all, gro­wing from 145,363 cores six months ago to 145,465 cores in the cur­rent list.

The­re were, howe­ver, a few nota­ble deve­lo­p­ments in the top 10, inclu­ding two new sys­tems, as well as a new high­wa­ter mark set by the top-ran­ked Fug­a­ku super­com­pu­ter. Thanks to addi­tio­nal hard­ware, Fug­a­ku grew its HPL per­for­mance to 442 peta­flops, a mode­st increase from the 416 peta­flops the sys­tem achie­ved when it debut­ed in June 2020. More signi­fi­cant­ly, Fug­a­ku increased its per­for­mance on the new mixed pre­cis­i­on HPC-AI bench­mark to 2.0 exa­flops, best­ing its 1.4 exa­flops mark recor­ded six months ago. The­se repres­ents the first bench­mark mea­su­re­ments abo­ve one exa­flop for any pre­cis­i­on on any type of hardware.

Here is a brief rundown of cur­rent top 10 systems:

  • Fug­a­ku remains at the top spot, gro­wing its Arm A64FX capa­ci­ty from 7,299,072 cores to 7,630,848 cores. The addi­tio­nal hard­ware enab­led its new world record 442 peta­flops result on HPL. This puts it three times ahead of the num­ber two sys­tem in the list. Fug­a­ku was con­s­truc­ted by Fuji­tsu and is instal­led at the RIKEN Cen­ter for Com­pu­ta­tio­nal Sci­ence (R‑CCS) in Kobe, Japan.
  • Sum­mit, an IBM-built sys­tem at the Oak Ridge Natio­nal Labo­ra­to­ry (ORNL) in Ten­nes­see, remains the fas­test sys­tem in the US with a per­for­mance of 148.8 peta­flops. Sum­mit has 4,356 nodes, each one housing two 22-core Power9 CPUs and six NVIDIA Tes­la V100 GPUs.
  • Sier­ra, a sys­tem at the Law­rence Liver­mo­re Natio­nal Labo­ra­to­ry in Cali­for­nia, is ran­ked third with an HPL mark of 94.6 peta­flops. Its archi­tec­tu­re is very simi­lar to that of Sum­mit, with each of its 4,320 nodes equip­ped with two Power9 CPUs and four NVIDIA Tes­la V100 GPUs.
  • Sun­way Tai­hu­Light, a sys­tem deve­lo­ped by China’s Natio­nal Rese­arch Cen­ter of Par­al­lel Com­pu­ter Engi­nee­ring & Tech­no­lo­gy (NRCPC) and instal­led at the Natio­nal Super­com­pu­ting Cen­ter in Wuxi, is lis­ted at num­ber four. It is powered exclu­si­ve­ly by Sun­way SW26010 pro­ces­sors and achie­ves 93 peta­flops on HPL.
  • At num­ber five is Sele­ne, an NVIDIA DGX A100 Super­POD instal­led in-house at NVIDIA Corp. It was lis­ted as num­ber seven in June but has dou­bled in size, allo­wing it to move up the list by two posi­ti­ons. The sys­tem is based on AMD EPYC pro­ces­sors with NVIDIA’s new A100 GPUs for acce­le­ra­ti­on. Sele­ne achie­ved 63.4 peta­flops on HPL as a result of the upgrade.
  • Tian­he-2A (Mil­ky Way-2A), a sys­tem deve­lo­ped by China’s Natio­nal Uni­ver­si­ty of Defen­se Tech­no­lo­gy (NUDT) and deploy­ed at the Natio­nal Super­com­pu­ter Cen­ter in Guang­zho, is ran­ked 6th. It is powered by Intel Xeon CPUs and NUDT’s Matrix-2000 DSP acce­le­ra­tors and achie­ves 61.4 peta­flops on HPL.
  • A new super­com­pu­ter, known as the JUWELS Boos­ter Modu­le, debuts at num­ber seven on the list. The Atos-built Bull­Se­qua­na machi­ne was recent­ly instal­led at the For­schungs­zen­trum Jülich (FZJ) in Ger­ma­ny. It is part of a modu­lar sys­tem archi­tec­tu­re and a second Xeon based JUWELS Modu­le is lis­ted sepa­ra­te­ly on the TOP500 at posi­ti­on 44. The­se modu­les are inte­gra­ted by using the Par­Tec Modu­lo Clus­ter Soft­ware Suite. The Boos­ter Modu­le uses AMD EPYC pro­ces­sors with NVIDIA A100 GPUs for acce­le­ra­ti­on simi­lar to the num­ber five Sele­ne sys­tem. Run­ning by its­elf the JUWELS Boos­ter Modu­le was able to achie­ve 44.1 HPL peta­flops, which makes it the most powerful sys­tem in Europe
  • HPC5, a Dell PowerEdge sys­tem instal­led by the Ita­li­an com­pa­ny Eni S.p.A., is ran­ked 8th. It achie­ves a per­for­mance of 35.5 peta­flops using Intel Xeon Gold CPUs and NVIDIA Tes­la V100 GPUs. It is the most powerful sys­tem in the list used for com­mer­cial pur­po­ses at a cus­to­mer site.
  • Fron­te­ra, a Dell C6420 sys­tem that was instal­led at the Texas Advan­ced Com­pu­ting Cen­ter of the Uni­ver­si­ty of Texas last year is now lis­ted at num­ber nine. It achie­ves 23.5 peta­flops using 448,448 of its Intel Pla­ti­num Xeon cores.
  • The second new sys­tem at the top of the list is Dammam‑7, which is ran­ked 10th. It is instal­led at Sau­di Aram­co in Sau­di Ara­bia and is the second com­mer­cial super­com­pu­ter in the cur­rent top 10. The HPE Cray CS-Storm sys­tems uses Intel Gold Xeon CPUs and NVIDIA Tes­la V100 GPUs. It rea­ched 22.4 peta­flops on the HPL benchmark.

Other TOP500 highlights

A total of 149 sys­tems on the list are using acce­le­ra­tor/­co-pro­ces­sor tech­no­lo­gy, up from 146 six months ago. 140 of the­se use NVIDIA chips.

Intel con­ti­nues to domi­na­te in TOP500 pro­ces­sor share with over 90 per­cent of sys­tems equip­ped with Xeon or Xeon Phi chips. Despi­te the recent rise of alter­na­ti­ve pro­ces­sor archi­tec­tures in high per­for­mance com­pu­ting, AMD pro­ces­sors (inclu­ding the Hygon chip) repre­sent only 21 sys­tems on the cur­rent list, along with ten Power-based sys­tems and just five Arm-based sys­tems. Howe­ver, the num­ber of sys­tems with AMD-based pro­ces­sors dou­bled from what it was six months ago.

The break­down in sys­tem inter­con­nects is lar­ge­ly unch­an­ged from recent lists, with Ether­net used in about half the sys­tems (254), Infi­ni­Band in about a third of sys­tems (182), Omni­Path in about one-tenth of sys­tems (47), and Myri­net in one sys­tem; the rema­in­der use cus­tom inter­con­nects (38) and pro­prie­ta­ry net­works (6). Infi­ni­Band-con­nec­ted sys­tems con­ti­nue to domi­na­te in aggre­ga­te capa­ci­ty with more than an exa­flop of per­for­mance. Sin­ce Fug­a­ku uses the pro­prie­ta­ry Tofu D inter­con­nect, the aggre­ga­te per­for­mance in the six pro­prie­ta­ry net­works sys­tems (472.9 peta­flops) is near­ly equal to that of the 254 Ether­net-based sys­tems (477.7 petaflops)

Chi­na con­ti­nues to lead in sys­tem share with 212 machi­nes on the list, han­di­ly bea­ting out the US at with 113 sys­tems and Japan with 34. Howe­ver, despi­te the smal­ler num­ber of sys­tems, the US con­ti­nues to lead the list in aggre­ga­te per­for­mance with 668.7 peta­flops to China’s 564.0 peta­flops. Thanks main­ly to the num­ber one Fug­a­ku sys­tem, Japan’s aggre­ga­te per­for­mance of 593.7 peta­flops edges out that of China.

Green500 results

The most ener­gy-effi­ci­ent sys­tem on the Green500 is the new NVIDIA DGX Super­POD in the US. It achie­ved 26.2 gigaflops/watt power-effi­ci­en­cy during its 2.4 HPL per­for­mance run and is lis­ted at posi­ti­on 172 in the TOP500.

Next on the list is the pre­vious Green500 champ, MN‑3. Alt­hough it impro­ved its score from 21.1 to 26.0 gigaflops/watt, it slips into the num­ber two posi­ti­on. The sys­tem uses the MN-Core chip, an acce­le­ra­tor opti­mi­zed for matrix arith­me­tic. It is ran­ked num­ber 332 in the TOP500.

In the num­ber three Green500 is the Atos-built JUWELS Boos­ter Modu­le instal­led at For­schungs­zen­trum Jülich (FZJ) in Ger­ma­ny. It achie­ves 25.0 gigaflops/watt and is ran­ked seventh in the TOP500.

In fourth posi­ti­on is Spartan‑2, ano­ther Atos-built machi­ne. It achie­ves 24.3 gigaflops/watt on HPL and is ran­ked at posi­ti­on 148 on the TOP500 list.

The fifth-ran­ked sys­tem on the Green500 is Sele­ne, with an effi­ci­en­cy of 24.0 gigaflops/watt. It also occu­p­ies the num­ber five spot on the TOP500.

With the excep­ti­on of the MN‑3 sys­tem, the remai­ning top five Green500 sys­tems are using the new NVIDIA A100 GPU as an acce­le­ra­tor. All four of the­se sys­tems use AMD EPYC as their main CPU.

Of the top 40 sys­tems on the Green500, 37 levera­ge acce­le­ra­tors, 2 use A64FX vec­tor-pro­ces­sors, and one (Tai­hu­Light) a Sun­way many-core processor.

Extra­po­la­ting the power effi­ci­en­cy value of 26.2 gigaflops/watt of the NVIDIA DGX Super­POD out line­ar­ly to an exa­flop would result in a power con­sump­ti­on of 38 MW (igno­ring addi­tio­nal hard­ware nee­ded for scaling).

HPCG Results

The TOP500 list has incor­po­ra­ted the High-Per­for­mance Con­ju­ga­te Gra­di­ent (HPCG) Bench­mark results, which pro­vi­des an alter­na­ti­ve metric for asses­sing super­com­pu­ter per­for­mance and is meant to com­ple­ment the HPL measurement.

The list-lea­ding Fug­a­ku expan­ded its HPCG result with a record 16.0 HPCG-peta­flops. The two US Depart­ment of Ener­gy sys­tems, Sum­mit at ORNL and Sier­ra at LLNL, are second and third, respec­tively, on the HPCG bench­mark. Sum­mit achie­ved 2.93 HPCG-peta­flops and Sier­ra 1.80 HPCG-peta­flops. The only other sys­tems to break the peta­flops bar­ri­er on HPCG are the upgraded Sele­ne sys­tem at 1.62 peta­flops and the new JUWELS Boos­ter Modu­le at 1.28 petaflops.

HPL-AI Results

The HPL-AI bench­mark seeks to high­light the con­ver­gence of HPC and arti­fi­ci­al intel­li­gence (AI) workloads based on machi­ne lear­ning and deep lear­ning by sol­ving a sys­tem of line­ar equa­tions using novel, mixed-pre­cis­i­on algo­rith­ms that exploit modern hardware.

The top-ran­ked sys­tem for this bench­mark is RIKEN’s Fug­a­ku sys­tem, which achie­ved 2.0 exa­flops of mixed pre­cis­i­on com­pu­ta­ti­on. At num­ber two is ORNL’s Sum­mit super­com­pu­ter, which achie­ved 0.55 exa­flops, fol­lo­wed by NVIDIA’s Sele­ne which tur­ned in an HPL-AI result of 0.25 exaflops.

About the TOP500 List

The first ver­si­on of what beca­me today’s TOP500 list star­ted as an exer­cise for a small con­fe­rence in Ger­ma­ny in June 1993. Out of curio­si­ty, the aut­hors deci­ded to revi­sit the list in Novem­ber 1993 to see how things had chan­ged. About that time, they rea­li­zed they might be onto some­thing and deci­ded to con­ti­nue com­pi­ling the list, which is now a much-anti­ci­pa­ted, much-wat­ched and much-deba­ted twice-year­ly event.

The TOP500 list is com­pi­led by Erich Stroh­mai­er and Horst Simon of Law­rence Ber­ke­ley Natio­nal Labo­ra­to­ry; Jack Don­gar­ra of the Uni­ver­si­ty of Ten­nes­see, Knox­ville; and Mar­tin Meu­er of ISC Group, Germany.