Tag:

Instinct

AMD says Instinct MI400X GPU is 10X faster than MI300X, will power Helios rack-scale system with EPYC ‘Venice’ CPUs

by admin June 13, 2025

AMD gave a preview of its first in-house designed rack-scale solution called Helios at its Advancing AI event on Thursday. The system is set to be based on the company’s next-generation EPYC ‘Venice’ processors, will use its Instinct MI400-series accelerator, and will rely on network connections featuring the upcoming Pensando network cards. Overall, the company says that the flagship MI400X is 10 times more powerful than the MI300X, which is a remarkable progress given that the MI400X will be released about three years after the MI300X.

When it comes to rack-scale solutions for AI, AMD clearly trails behind Nvidia. This is going to change a bit this year as cloud service providers (such as Oracle OCI), OEMs, and ODMs will build and deploy rack-scale solutions based on the Instinct MI350X-series GPUs, but those systems will not be designed by AMD, and they will have to interconnect each 8-way system using Ethernet, not low-latency high-bandwidth interconnects like NVLink.

Swipe to scroll horizontally

Year

2025

2026

2024

2025

2026

2027

Density

128

NVL72

NVL144

NVL576

GPU Architecture

CDNA 4

CDNA 5

Blackwell

Blackwell Ultra

Rubin

Rubin Ultra

GPU/GPU+CPU

MI355X

MI400X

GB200

GB300

VR200

VR300

Compute Chiplets

256

144

576

GPU Packages

128

144

FP4 PFLOPs (Dense)

1280

1440

720

1080

3600

14400

HBM Capacity

36 TB

51 TB

14 TB

21 TB

147 TB

HBM Bandwidth

1024 TB/s

1,400 TB/s

576 TB/s

936 TB/s

4,608 TB/s

CPU

EPYC ‘Turin’

EPYC ‘Venice’

72-core Grace

88-core Vera

NVSWitch/UALink/IF

–

UALink/IF

NVSwitch 5.0

NVSwitch 6.0

NVSwitch 7.0

NVSwitch Bandwidth

3600 GB/s

7200 GB/s

14400 GB/s

Scale-Out

800G, copper

1600G, optics

Form-Factor Name

OEM/ODM proprietary

Helios

Oberon

Kyber

The real change will occur next year with the first AMD-designed rack-scale system called Helios, which will use Zen 6-powered EPYC ‘Venice’ CPUs, CDNA ‘Next’-based Instinct MI400-series GPUs, and Pensando ‘Vulcano’ network interface cards (NICs) that are rumored to increase the maximum scale-up world size to beyond eight GPUs, which will greatly enhance their capabilities for training and inference. The system will adhere to OCP standards and enable next-generation interconnects such as Ultra Ethernet and Ultra Accelerator Link, supporting demanding AI workloads.

You may like

”So let me introduce the Helios AI rack,” said Andrew Dieckman, corporate VP and general manager of AMD’s data center GPU business. “Helios is one of the system solutions that we are working on based on the Instinct MI400-series GPU, so it is a fully integrated AI rack with EPYC CPUs, Instinct MI400-series GPUs, Pensando NICs, and then our ROCm stack. It is a unified architecture designed for both frontier model training as well as massive scale inference [that] delivers leadership compute density, memory bandwidth, scale out interconnect, all built in an open OCP-compliant standard supporting Ultra Ethernet and UALink.”

From a performance point of view, AMD’s flagship Instinct MI400-series AI GPU (we will refer to it as to Instinct MI400X, though this is not the official name, and we will also call the CDNA Next as CDNA 5) doubles performance from the Instinct MI355X and increases memory capacity by 50% and bandwidth by more than 100%. While the MI355X delivers 10 dense FP4 PFLOPS, the MI400X is projected to hit 20 dense FP4 PFLOPS.

Overall, the company says that the flagship MI400X is 10 times more powerful than the MI300X, which is a remarkable progress given that the MI400X will be released about three years after the MI300X.

“When you look at our product roadmap and how we continue to accelerate, with MI355X, we have taken a major leap forward [compared to the MI300X]: we are delivering 3X the amount of performance on a broad set of models and workloads, and that is a significant uptick from the previous trajectory we were on from the MI300X with the MI325X,” said Dieckman. “Now, with the Instinct MI400X and Helios, we bend that curve even further, and Helios is designed to deliver up to 10X more AI performance on the the most advanced frontier models in the high end.”

Swipe to scroll horizontally

Year

2024

2025

2024

2025

2026

2027

Architecture

CDNA 4

CDNA 5

Blackwell

Blackwell Ultra

Rubin

GPU

MI355X

MI400X

B200

B300 (Ultra)

VR200

VR300 (Ultra)

Process Technology

N3P

4NP

N3P (3NP?)

Physical Configuration

2 x Reticle Sized GPU

2 x Reticle Sized GPUs

2 x Reticle Sized GPUs, 2x I/O chiplets

4 x Reticle Sized GPUs, 2x I/O chiplets

Packaging

CoWoS-S

CoWoS-L

FP4 PFLOPs (per Package)

100

FP8/INT6 PFLOPs (per Package)

5/-

10/?

4.5

INT8 PFLOPS (per Package)

4.5

0.319

BF16 PFLOPs (per Package)

2.5

2.25

TF32 PFLOPs (per Package)

1.12

2.5

FP32 PFLOPs (per Package)

153.7

1.12

0.083

FP64/FP64 Tensor TFLOPs (per Package)

78.6

1.39

Memory

288 GB HBM3E

432 GB HBM4

192 GB HBM3E

288 GB HBM3E

288 GB HBM4

1 TB HBM4E

Memory Bandwidth

8 TB/s

almost’ 20 GB/s

8 TB/s

4 TB/s

13 TB/s

32 TB/s

HBM Stacks

NVLink/UALink

Infinity Fabric

UALink, Infinity Fabric

NVLink 5.0, 200 GT/s

NVLink 6.0

NVLink 7.0

SerDes speed (Gb/s unidirectional)

224G

GPU TDP

1400 W

1600 W (?)

1200 W

1400 W

1800 W

3600 W

CPU

128-core EPYC ‘Turin’

EPYC ‘Venice’

72-core Grace

88-core Vera

The new MI400X accelerator will also surpass Nvidia’s Blackwell Ultra, which is currently ramping up. However, when it comes to comparison with Nvidia’s next-generation Rubin R200 that delivers 50 dense FP4 PFLOPS, AMD’s MI400X will be around 2.5 times slower. Still, AMD will have an ace up its sleeve, which is memory bandwidth and capacity (see tables for details). Similarly, Helios will outperform Nvidia’s Blackwell Ultra-based NVL72 and Rubin-based NVL144.

However, it remains to be seen how Helios will stack against NVL144 in real-world applications. Also, it will be extremely hard to beat Nvidia’s NVL576 both in terms of compute performance and memory bandwidth in 2027, though by that time, AMD will likely roll out something new.

At least, this is what AMD communicated at its Advancing AI event this week: the company plans to continue evolving its integrated AI platforms with next-generation GPUs, CPUs, and networking technology, extending its roadmap well into 2027 and beyond.

Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Source link

June 13, 2025 0 comments

Gaming Gear

AMD unwraps 2027 AI plans: Verano CPU, Instinct MI500X GPU, next-gen AI rack

by admin June 12, 2025

AMD is accelerating its CPU, GPU, and AI rack-scale solutions roadmaps to a yearly cadence, so the company is set to introduce its all-new EPYC ‘Verano’ CPU, Instinct MI500-series accelerators, and next-generation rack-scale AI solution in 2027, the company revealed at its Advancing AI event.

”We are already deep in the development of our 2027 rack-scale solution that will push the envelope even further on performance efficiency and scalability with our next generation Verano CPUs and Instinct MI500X-series GPUs,” said Lisa Su, chief executive of AMD, at the event.

AMD’s 2026 plans for rack-scale AI solutions already look impressive as the company’s first in-house designed Helios rack-scale system for AI will be based on AMD’s 256-core EPYC ‘Venice’ processor (expected to deliver a 70% generation-to-generation performance improvement); Instinct MI400X-series accelerators projected to double AI inference performance compared to the Instinct MI355X; and Pensando ‘Vulcano’ 800 GbE network cards compliant with the UEC 1.0 specification. But the company is set to introduce something even more impressive the following year.

You may like

That would be AMD’s second generation rack-scale system powered by its EPYC ‘Verano’ processors, Instinct MI500X-series accelerators, and Pensando ‘Vulcano’ 800 GbE NICs.

AMD did not reveal any specifications or performance numbers for its second gen rack-scale solution, EPYC ‘Verano’ processors, or Instinct MI500X-series GPUs. However, based on a picture the company provided, the post-Helios rack-scale machine will feature more compute blades, thus boosting performance density. This alone points to higher performance and power consumption, which will come handy as this one will have to rival Nvidia’s NVL576 ‘Kyber’ system based on 144 Rubin Ultra packages (each packing for reticle-sized compute elements).

Production of EPYC ‘Verano’ CPUs and Instinct MI500X-series accelerators in 2027 align perfectly with TSMC’s roll-out of its A16 process technology in late 2026, its first production node to offer backside power delivery, a technology particularly useful for heavy duty datacenter CPUs and GPUs. We do not know whether AMD’s 2027 processors and accelerators will rely on TSMC’s A16, though it isn’t unreasonable to speculate.

Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Source link

June 12, 2025 0 comments

Gaming Gear

AMD’s Instinct MI355X accelerator will reportedly consume 1,400 watts

by admin June 12, 2025

Mark Papermaster, chief technology officer of AMD, formally introduced the company’s Instinct MI355X accelerators for AI and HPC at ISC 2025 — revealing massive performance improvements for AI inference, but also pointing to nearly doubled power consumption of the new flagship GPU compared to its predecessor from 2023, reports ComputerBase.

AMD’s CDNA 4 enters the scene

AMD’s Instinct MI350X-series GPUs are based on the CDNA 4 architecture that introduces support for FP4 and FP6 precision formats alongside FP8 and FP16. These lower-precision formats have grown in relevance in AI workloads, particularly for inference. AMD positions its Instinct MI350X processors primarily for inference, which makes sense as scale out world size of MI350X continues to be limited to eight GPUs, which reduces their competitive capabilities compared to Nvidia’s Blackwell GPUs. Still Pegatron is readying a 128-way MI350X machine.

AMD’s Instinct MI350X family of AI and HPC GPUs consists of two models: the default Instinct MI350X module with a 1000W power consumption designed for air cooling as well as the higher-performance Instinct MI355X that will consume up to 1400W and will be designed primarily for direct liquid cooling (even though AMD believes that some of its clients will be able to use air cooling with the MI355X).

You may like

Both SKUs will come with 288GB HBM3E memory that will offer up to 8 TB/s of bandwidth, but the MI350X will offer a maximum FP4/FP6 performance of 18.45 PFLOPS, whereas the MI355X is said to push the maximum FP4/FP6 performance to 20.1 PFLOPS. On paper, both Instinct MI350X models outperform Nvidia’s B300 (Blackwell Ultra) GPU that tops at 15 FP4 PFLOPS, though it remains to be seen how AMD’s MI350X and MI355X perform in real-world applications.

Swipe to scroll horizontallyRow 0 – Cell 0

AMD Instinct MI325X GPU

AMD Instinct MI350X GPU

AMD Instinct MI350X Platform (8x OAM)

AMD Instinct MI355X GPU

AMD Instinct MI355X Platform (8x OAM)

GPUs

Instinct MI325X OAM

Instinct MI350X OAM

8x Instinct MI350X OAM

Instinct MI355X OAM

8x Instinct MI355X OAM

GPU Architecture

CDNA 3

CDNA 4

Dedicated Memory Size

256 GB HBM3E

288 GB HBM3E

2.3 TB HBM3E

288 GB HBM3E

2.3 TB HBM3E

Memory Bandwidth

6 TB/s

8 TB/s

8 TB/s per OAM

8 TB/s

8 TB/s per OAM

Peak Half Precision (FP16) Performance

2.61 PFLOPS

4.6 PFLOPS

36.8 PFLOPS

5.03 PFLOPS

40.27 PFLOPS

Peak Eight-bit Precision (FP8) Performance

5.22 PFLOPS

9.228 PFLOPS

72 PFLOPS

10.1 PFLOPS

80.53 PFLOPS

Peak Six-bit Precision (FP6) Performance

–

18.45 PFLOPS

148 PFLOPS

20.1 PFLOPS

161.06 PFLOPS

Peak Four-bit Precision (FP4) Performance

–

18.45 PFLOPS

148 PFLOPS

20.1 PFLOPS

161.06 PFLOPS

Cooling

Air

DLC / Air

Typical Board Power (TBP)

1000W Peak

1000W Peak per OAM

1400W Peak

1400W Peak per OAM

When it comes to performance comparison against its predecessor, FP8 compute throughput of the MI350X is listed at approximately 9.3 PFLOPS, while the faster MI355X is said to be 10.1 PFLOPS, up from 2.61/5.22 FP8 FLOPS (without/with structured sparsity) in case of the Instinct MI325X — this represents a significant performance improvement. Meanwhile, the MI355X also outperforms Nvidia’s B300 by 0.1 FP8 PFLOPS.

Faster GPUs incoming

Papermaster expressed confidence that the industry will continue to develop even more powerful CPUs and accelerators for supercomputers to achieve zettascale performance in about a decade from now. However, that performance will come at the cost of a steep increase of power consumption, which is why a supercomputer offering a ZetaFLOPS performance could consume 500 MW of power — half of what a nuclear power plant can produce.

At ISC 2025, AMD presented data showing that top supercomputers have consistently followed a trajectory where compute performance doubles roughly every 1.2 years. The graph covered performance from 1990 to the present, demonstrating peak system GFLOPs. Early growth was driven by CPU-only systems, but from around 2005, a shift to heterogeneous architectures — mixing CPUs with GPUs and accelerators — took over. Now, in what AMD calls ‘AI Acceleration Era,’ systems like El Capitan and Frontier are pushing beyond 1 ExaFLOP, continuing the exponential growth trend with increasingly AI-specialized hardware.

But performance comes at a cost of power consumption. To maintain performance growth, memory bandwidth and power scaling have become urgent challenges. AMD’s slide indicated that GPU memory bandwidth must more than double every two years to preserve the ratio of bandwidth per FLOPS. This has required increasing the number of HBM stacks per GPU, which in turn results in larger and more power-hungry GPUs and modules.

Indeed, power consumption of accelerators for supercomputers is increasing rapidly. While AMD’s Instinct MI300X introduced in mid-2023 consumed 750W peak, the Instinct MI355X, set to be formally unveiled this week, will feature a peak power consumption of 1,400W. Papermaster envisions 1,600W accelerators in 2026 – 2027 and then 2,000W processors later this decade. By contrast, AMD’s peers from Nvidia seem to be even more ambitious when it comes to power consumption as their Rubin Ultra GPUs featuring four reticle-sized compute chiplets are expected to consume as much as 3,600W.

The good news is that in addition to increased power consumption, supercomputers and accelerators have also been gaining performance efficiency rapidly. Another one of AMD’s ISC 2025 keynote slides illustrated that performance efficiency increased from about 3.2 GFLOPS/W in 2010 to approximately 52 GFLOPS/W by the time exascale systems like Frontier arrived.

Looking ahead, maintaining this pace of performance scaling will require doubling energy efficiency every 2.2 years. A projected zettascale system delivering 1,000× exaflop-class performance would need around 500 MW of power at an efficiency level of 2,140 GFLOPs/W (a 41-fold increase from today). Without such gains, future supercomputers could demand gigawatt-scale energy — comparable to an entire nuclear power plant, making them way too expensive to operate.

AMD believes that to increase the performance of supercomputers dramatically a decade from now, not only it will need to make a number of architectural breakthroughs, but the industry will have to keep pace with compute capabilities to provide adequate memory bandwidth. Still, using nuclear reactors to power supercomputers seems in the 2030s seems to be a more and more realistic possibility.

Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Source link

June 12, 2025 0 comments

Product Reviews

ASE adopts AMD CPUs, begins evaluating Instinct MI300-series GPUs for AI

by admin June 6, 2025

AMD this week said in a blog post that ASE Technology, the world’s largest outsourced semiconductor assembly and test (OSAT) provider, has transitioned to EPYC and Ryzen processors across its data centers and client systems, respectively. The transition has resulted in significant performance improvements and energy efficiency gains. However, perhaps more important is that ASE is now evaluating AMD’s Instinct MI300-series processors for AI workloads.

By adopting AMD’s EPYC processors for servers and Ryzen CPUs for client desktop and laptop PCs, ASE achieved a 50% boost in system performance and a 6.5% reduction in power consumption compared to its previous infrastructure, which resulted in a 30% decrease in total cost of ownership, delivering both operational and financial benefits. AMD’s blog does not disclose which processors ASE used before adopting AMD-based solutions, nor does it indicate whether all systems in ASE’s fleet now use EPYC or Ryzen processors. However, the mention of operational and financial benefits points to a substantial adoption of AMD-based systems.

“We need to handle a big volume of data analysis, including leading-edge technology for AI applications and our smart factories,” said Jekyll Chen, Director of IT Infrastructure for ASE. “We work for many semiconductor companies. Our challenges are the need for high performance, low latency, and high core count, in alignment with ASE’s ESG policy. Stability and scalability are two primary goals for us.”

You may like

(Image credit: ASE Technology)

ASE Technology Holdings is the world’s largest outsourced semiconductor assembly and test provider, with packaging facilities in China, Japan, Korea, Malaysia, Singapore, and Taiwan. The company has worked with AMD on advanced 2.5D packaging since 2007, and this largely resulted in the invention of high-bandwidth memory (HBM). However, while ASE does provide packaging services for AMD these days, we are unsure whether ASE packages AMD’s AI GPUs, as Instinct processors utilize TSMC’s CoWoS technology.

AMD says that many companies are adopting or evaluating its Instinct processors for on-prem AI inference, though ASE is probably the first company of this scale to confirm the evaluation of these accelerators. In fact, the confirmation may indicate that ASE is close to adopting Instinct MI300-series GPUs for its internal AI workloads.

“We must perform data processing, run AI algorithms, and make sure everything operates smoothly, efficiently, and with the flexibility needed in our smart factories,” Chen said. “For client PCs, we need to make sure that they meet the needs of engineering design and the high-performance objectives of digital transformation. We also evaluated the performance, stability, core count, efficiency, total cost of ownership, AI speed, and multi-tasking capabilities of the new servers.”

Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Source link

June 6, 2025 0 comments