AMD Instinct‚ĄĘ Expands Ecosystem and Delivers Exascale-Class Technology for HPC and AI Applications

‚Äď Powered by AMD CDNA‚ĄĘ 2 archi¬≠tec¬≠tu¬≠re and AMD ROCm‚ĄĘ5, new AMD Instinct MI210 GPUs acce¬≠le¬≠ra¬≠ting insights and dis¬≠co¬≠very for main¬≠stream users ‚Äď

SANTA CLARA, Calif., March 22, 2022 (GLOBE NEWSWIRE) ‚ÄĒ AMD (NASDAQ: AMD) today announ¬≠ced the avai¬≠la¬≠bi¬≠li¬≠ty of the AMD Instinct‚ĄĘ eco¬≠sys¬≠tem with expan¬≠ded sys¬≠tem sup¬≠port from part¬≠ners inclu¬≠ding ASUS, Dell Tech¬≠no¬≠lo¬≠gies, Giga¬≠byte, HPE, Leno¬≠vo and Super¬≠mi¬≠cro, the new AMD Instinct‚ĄĘ MI210 acce¬≠le¬≠ra¬≠tor and the robust capa¬≠bi¬≠li¬≠ties of ROCm‚ĄĘ 5 soft¬≠ware. Altog¬≠e¬≠ther, the AMD Instinct and ROCm eco¬≠sys¬≠tem is offe¬≠ring exas¬≠ca¬≠le-class tech¬≠no¬≠lo¬≠gy to a broad base of HPC and AI cus¬≠to¬≠mers, addres¬≠sing the gro¬≠wing demand for com¬≠pu¬≠te-acce¬≠le¬≠ra¬≠ted data cen¬≠ter workloads and redu¬≠cing the time to insights and discovery.

‚ÄúWith twice the plat¬≠forms available com¬≠pared to our pre¬≠vious gene¬≠ra¬≠ti¬≠on acce¬≠le¬≠ra¬≠tors, gro¬≠wing cus¬≠to¬≠mer adop¬≠ti¬≠on across HPC and AI appli¬≠ca¬≠ti¬≠ons, and new sup¬≠port from com¬≠mer¬≠cial ISVs in key workloads, we‚Äôre con¬≠ti¬≠nuing to dri¬≠ve adop¬≠ti¬≠on of the AMD Instinct MI200 acce¬≠le¬≠ra¬≠tors and ROCm 5 soft¬≠ware eco¬≠sys¬≠tem,‚ÄĚ said Brad McCre¬≠die, cor¬≠po¬≠ra¬≠te vice pre¬≠si¬≠dent, Data Cen¬≠ter GPU and Acce¬≠le¬≠ra¬≠ted Pro¬≠ces¬≠sing, AMD. ‚ÄúNow with the avai¬≠la¬≠bi¬≠li¬≠ty of the AMD Instinct MI210 acce¬≠le¬≠ra¬≠tor to the MI200 fami¬≠ly, our cus¬≠to¬≠mers can choo¬≠se the acce¬≠le¬≠ra¬≠tor that works best for their workloads, whe¬≠ther they need lea¬≠ding-edge acce¬≠le¬≠ra¬≠ted pro¬≠ces¬≠sing for lar¬≠ge sca¬≠le HPC and AI workloads, or if they want access to exas¬≠ca¬≠le-class tech¬≠no¬≠lo¬≠gy in a com¬≠mer¬≠cial format.‚ÄĚ

‚ÄúThe Lumi super¬≠com¬≠pu¬≠ter powered by AMD EPYC pro¬≠ces¬≠sors and AMD Instinct MI200 acce¬≠le¬≠ra¬≠tors will pro¬≠vi¬≠de a gene¬≠ra¬≠tio¬≠nal leap in per¬≠for¬≠mance for lar¬≠ge-sca¬≠le simu¬≠la¬≠ti¬≠ons and mode¬≠ling as well as AI and deep lear¬≠ning workloads to sol¬≠ve some of the big¬≠gest ques¬≠ti¬≠ons in rese¬≠arch,‚ÄĚ said Pek¬≠ka Man¬≠ni¬≠nen, Direc¬≠tor of the LUMI Lea¬≠der¬≠ship and Com¬≠pu¬≠ting Faci¬≠li¬≠ty, CSC. ‚ÄúWe‚Äôve uti¬≠li¬≠zed AMD Instinct MI210 acce¬≠le¬≠ra¬≠tors to get hands on expe¬≠ri¬≠ence with the Instinct MI200 fami¬≠ly, pre¬≠pa¬≠ring our sci¬≠en¬≠tists to tack¬≠le the many chal¬≠len¬≠ging and com¬≠plex pro¬≠jects they will run once Lumi is ful¬≠ly deployed.‚ÄĚ

Powe¬≠ring The Future of HPC and AI
The AMD Instinct MI200 series acce¬≠le¬≠ra¬≠tors are desi¬≠gned to power dis¬≠co¬≠veries in exas¬≠ca¬≠le sys¬≠tems, enab¬≠ling rese¬≠ar¬≠chers, sci¬≠en¬≠tists and engi¬≠neers to tack¬≠le our most pres¬≠sing chal¬≠lenges, from cli¬≠ma¬≠te chan¬≠ge to vac¬≠ci¬≠ne rese¬≠arch. The AMD Instinct MI210 acce¬≠le¬≠ra¬≠tors spe¬≠ci¬≠fi¬≠cal¬≠ly enable exas¬≠ca¬≠le-class tech¬≠no¬≠lo¬≠gies for cus¬≠to¬≠mers who need fan¬≠ta¬≠stic HPC and AI per¬≠for¬≠mance in a PCIe¬ģ for¬≠mat. Powered by the AMD CDNA‚ĄĘ 2 archi¬≠tec¬≠tu¬≠re, AMD Instinct MI210 acce¬≠le¬≠ra¬≠tors extend AMD per¬≠for¬≠mance lea¬≠der¬≠ship in dou¬≠ble pre¬≠cis¬≠i¬≠on (FP64) com¬≠pu¬≠te on PCIe form fac¬≠tor cards1. They also deli¬≠ver a robust solu¬≠ti¬≠on for acce¬≠le¬≠ra¬≠ted deep lear¬≠ning trai¬≠ning offe¬≠ring a broad ran¬≠ge of mixed-pre¬≠cis¬≠i¬≠on capa¬≠bi¬≠li¬≠ties based on the AMD Matrix Core Technology.

Dri­ving the ROCm Adoption
An open soft­ware plat­form that allows rese­ar­chers, sci­en­tists and engi­neers to tap the power of AMD Instinct acce­le­ra­tors to dri­ve sci­en­ti­fic dis­co­veries, the AMD ROCm plat­form is built on the foun­da­ti­on of num­e­rous appli­ca­ti­ons and libra­ri­es powe­ring top HPC and AI applications.

With ROCm 5, AMD extends its soft¬≠ware plat¬≠form by adding new hard¬≠ware sup¬≠port for the AMD Instinct MI200 series acce¬≠le¬≠ra¬≠tors and the AMD Rade¬≠on‚ĄĘ PRO W6800 pro¬≠fes¬≠sio¬≠nal gra¬≠phics card, plus Red Hat¬ģ Enter¬≠pri¬≠se Linux¬ģ 8.5 sup¬≠port, incre¬≠asing acces¬≠si¬≠bi¬≠li¬≠ty of ROCm for deve¬≠lo¬≠pers and enab¬≠ling out¬≠stan¬≠ding per¬≠for¬≠mance across key workloads.

Addi¬≠tio¬≠nal¬≠ly, through the AMD Infi¬≠ni¬≠ty Hub, the cen¬≠tral loca¬≠ti¬≠on for open-source appli¬≠ca¬≠ti¬≠ons that are por¬≠ted and opti¬≠mi¬≠zed on AMD GPUs, end-users can easi¬≠ly find, down¬≠load and install con¬≠tai¬≠ne¬≠ri¬≠zed HPC apps and ML frame¬≠works. AMD Infi¬≠ni¬≠ty Hub appli¬≠ca¬≠ti¬≠on con¬≠tai¬≠ners are desi¬≠gned to redu¬≠ce the tra¬≠di¬≠tio¬≠nal¬≠ly dif¬≠fi¬≠cult issue of obtai¬≠ning and instal¬≠ling soft¬≠ware releases while allo¬≠wing users to learn based on shared expe¬≠ri¬≠en¬≠ces and pro¬≠blem-sol¬≠ving opportunities.

Expan­ding Part­ner and Cus­to­mer Ecosystem
As more pur¬≠po¬≠se-built appli¬≠ca¬≠ti¬≠ons are opti¬≠mi¬≠zed to work with ROCm and AMD Instinct acce¬≠le¬≠ra¬≠tors, AMD con¬≠ti¬≠nues to grow its soft¬≠ware eco¬≠sys¬≠tem with the addi¬≠ti¬≠on of com¬≠mer¬≠cial ISVs, inclu¬≠ding Ansys¬ģ, Cas¬≠ca¬≠de Tech¬≠no¬≠lo¬≠gies, and Tem¬≠po¬≠Quest. The¬≠se ISVs pro¬≠vi¬≠de appli¬≠ca¬≠ti¬≠ons for acce¬≠le¬≠ra¬≠ted workloads inclu¬≠ding Com¬≠pu¬≠ta¬≠tio¬≠nal Flu¬≠id Dyna¬≠mics (CFD), wea¬≠ther fore¬≠cas¬≠ting, Com¬≠pu¬≠ter Aided Engi¬≠nee¬≠ring (CAE) and more. The¬≠se updates are on top of exis¬≠ting appli¬≠ca¬≠ti¬≠on sup¬≠port in ROCm which includes HPC, AI and Machi¬≠ne Lear¬≠ning appli¬≠ca¬≠ti¬≠ons, AMBER, Chro¬≠ma, CP2K, GRID, GRO¬≠MACs, LAAMPS, MILC, Mini-HAAC, NAMD, NAMD 3.0, ONNX-RT, OpenMM, PyTorch, RELION, SPECFEM3D Car¬≠te¬≠si¬≠an, SPECFEM3D Glo¬≠be, and TensorFlow.

AMD is also enab¬≠ling part¬≠ners like ASUS, Dell Tech¬≠no¬≠lo¬≠gies, Giga¬≠byte, HPE, Leno¬≠vo, Super¬≠mi¬≠cro, and Sys¬≠tem Inte¬≠gra¬≠tors inclu¬≠ding Col¬≠fax, Exxact, KOI Com¬≠pu¬≠ters, Nor-Tech, Pen¬≠gu¬≠in and Sym¬≠me¬≠tric to offer dif¬≠fe¬≠ren¬≠tia¬≠ted solu¬≠ti¬≠ons to address next gene¬≠ra¬≠ti¬≠on com¬≠pu¬≠ting chal¬≠lenges. Super¬≠com¬≠pu¬≠ting cus¬≠to¬≠mers are alre¬≠a¬≠dy taking advan¬≠ta¬≠ge of the bene¬≠fits offe¬≠red via the¬≠se new cus¬≠to¬≠mer wins inclu¬≠ding the Fron¬≠tier instal¬≠la¬≠ti¬≠on at Oak Ridge Natio¬≠nal Labo¬≠ra¬≠to¬≠ryKTH/DardelCSC/LUMI and Cines/Adastra.

Enab­ling Access for Cus­to­mers and Partners
The AMD Acce¬≠le¬≠ra¬≠tor Cloud offers cus¬≠to¬≠mers an envi¬≠ron¬≠ment to remo¬≠te¬≠ly access and eva¬≠lua¬≠te AMD Instinct acce¬≠le¬≠ra¬≠tors and AMD ROCm soft¬≠ware. Whe¬≠ther it‚Äôs port¬≠ing lega¬≠cy code, bench¬≠mar¬≠king an appli¬≠ca¬≠ti¬≠on or test¬≠ing mul¬≠ti-GPU or mul¬≠ti-node sca¬≠ling, the AMD Acce¬≠le¬≠ra¬≠tor Cloud gives pro¬≠s¬≠pec¬≠ti¬≠ve cus¬≠to¬≠mers and part¬≠ners quick and easy access to the latest GPUs and soft¬≠ware. The AMD Acce¬≠le¬≠ra¬≠tor Cloud is also used to power various events such as hacka¬≠thons and ROCm trai¬≠ning ses¬≠si¬≠ons offe¬≠red to both exis¬≠ting and pro¬≠s¬≠pec¬≠ti¬≠ve cus¬≠to¬≠mers, allo¬≠wing deve¬≠lo¬≠pers to hone their skills and learn how to get the most out of AMD Instinct accelerators.

MI200 Series Specifications

Model Com¬≠pu¬≠te Units Stream Pro¬≠ces¬≠sors FP64 | FP32 Vec¬≠tor (Peak) FP64 | FP32 Matrix (Peak) FP16 | bf16
(Peak)
INT8
(Peak)
HBM2e
ECC
Memory
Memo­ry Bandwidth Form Fac­tor
AMD Instinct MI210 104 6,656 Up to 22.6 TF Up to 45.3 TF Up to 181.0 TF Up to 181.0 TOPS 64GB Up to 1.6 TB/sec PCIe¬ģ
AMD Instinct MI250 208 13,312 Up to 45.3 TF Up to 90.5 TF Up to 362.1 TF Up to 362.1 TOPS 128GB 3.2 TB/sec OCP Acce¬≠le¬≠ra¬≠tor Modu¬≠le (OAM)
AMD Instinct MI250x 220 14,080 Up to 47.9 TF Up to 95.7 TF Up To 383.0 TF Up to 383.0 TOPS 128GB 3.2 TB/sec OCP Acce¬≠le¬≠ra¬≠tor Modu¬≠le (OAM)

Sup­port­ing Resources

About AMD

For more than 50 years AMD has dri¬≠ven inno¬≠va¬≠ti¬≠on in high-per¬≠for¬≠mance com¬≠pu¬≠ting, gra¬≠phics and visua¬≠liza¬≠ti¬≠on tech¬≠no¬≠lo¬≠gies. Bil¬≠li¬≠ons of peo¬≠p¬≠le, lea¬≠ding For¬≠tu¬≠ne 500 busi¬≠nesses and cut¬≠ting-edge sci¬≠en¬≠ti¬≠fic rese¬≠arch insti¬≠tu¬≠ti¬≠ons around the world rely on AMD tech¬≠no¬≠lo¬≠gy dai¬≠ly to impro¬≠ve how they live, work and play. AMD employees are focu¬≠sed on buil¬≠ding lea¬≠der¬≠ship high-per¬≠for¬≠mance and adap¬≠ti¬≠ve pro¬≠ducts that push the boun¬≠da¬≠ries of what is pos¬≠si¬≠ble. For more infor¬≠ma¬≠ti¬≠on about how AMD is enab¬≠ling today and inspi¬≠ring tomor¬≠row, visit the AMD (NASDAQ: AMDweb¬≠siteblogLin¬≠ke¬≠dIn and Twit¬≠ter pages.

©2022 Advan­ced Micro Devices, Inc. All rights reser­ved. AMD, the AMD Arrow logo, AMD CDNA, AMD Instinct, Rade­on, ROCm and com­bi­na­ti­ons the­reof are trade­marks of Advan­ced Micro Devices, Inc. PCIe is a regis­tered trade­mark of PCI-SIG Cor­po­ra­ti­on. Red Hat, Red Hat Enter­pri­se Linux, and the Red Hat logo, are trade­marks or regis­tered trade­marks of Red Hat, Inc. or its sub­si­dia­ries in the U.S. and other count­ries. Other pro­duct names used in this publi­ca­ti­on are for iden­ti­fi­ca­ti­on pur­po­ses only and may be trade­marks of their respec­ti­ve companies.

Linux is the regis­tered trade­mark of Linus Tor­valds in the U.S. and other countries.

___________________________
1 MI200-41 ‚ÄĒ Cal¬≠cu¬≠la¬≠ti¬≠ons con¬≠duc¬≠ted by AMD Per¬≠for¬≠mance Labs as of Jan 14, 2022, for the AMD Instinct‚ĄĘ MI210 (64GB HBM2e PCIe¬ģ card) acce¬≠le¬≠ra¬≠tor at 1,700 MHz peak boost engi¬≠ne clock resul¬≠ted in 45.3 TFLOPS peak theo¬≠re¬≠ti¬≠cal dou¬≠ble pre¬≠cis¬≠i¬≠on (FP64 Matrix), 22.6 TFLOPS peak theo¬≠re¬≠ti¬≠cal dou¬≠ble pre¬≠cis¬≠i¬≠on (FP64), and 181.0 TFLOPS peak theo¬≠re¬≠ti¬≠cal Bfloat16 for¬≠mat pre¬≠cis¬≠i¬≠on (BF16), floa¬≠ting-point performance.

Cal¬≠cu¬≠la¬≠ti¬≠ons con¬≠duc¬≠ted by AMD Per¬≠for¬≠mance Labs as of Sep 18, 2020 for the AMD Instinct‚ĄĘ MI100 (32GB HBM2 PCIe¬ģ card) acce¬≠le¬≠ra¬≠tor at 1,502 MHz peak boost engi¬≠ne clock resul¬≠ted in 11.54 TFLOPS peak theo¬≠re¬≠ti¬≠cal dou¬≠ble pre¬≠cis¬≠i¬≠on (FP64), and 184.6 TFLOPS peak theo¬≠re¬≠ti¬≠cal half pre¬≠cis¬≠i¬≠on (FP16), floa¬≠ting-point performance.

Published results on the NVi­dia Ampere A100 (80GB) GPU acce­le­ra­tor, boost engi­ne clock of 1410 MHz, resul­ted in 19.5 TFLOPS peak dou­ble pre­cis­i­on ten­sor cores (FP64 Ten­sor Core), 9.7 TFLOPS peak dou­ble pre­cis­i­on (FP64) and 39 TFLOPS peak Bfloat16 for­mat pre­cis­i­on (BF16), theo­re­ti­cal floa­ting-point per­for­mance. The TF32 data for­mat is not IEEE com­pli­ant and not included in this comparison.
https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-ampere-architecture-whitepaper.pdf, page 15, Table 1.