Blackwell

Following the ban of its Hopper H20 accelerations in China, Nvidia is reportedly planning on launching new Blackwell-based solutions at a lower price this year, per Reuters. With mass production anticipated by June, we can expect these solutions to be widely available in the Chinese market by Q3 or Q4. While technical details are still emerging, we can already discern some important details and specifications.

Due to stringent U.S. export policies targeting China, the Hopper family has largely been a cat-and-mouse chase between U.S. regulators and Nvidia. Even before their official debut, the flagship H100 and H200 accelerators were already subject to export bans. Nvidia introduced the H800 to circumvent these regulations, which eventually faced a similar fate in October 2023. The cut-down H20 served as Nvidia’s primary AI solution for the Chinese market in the interim until its recent ban under the current administration last month, which forced Nvidia to write off $5.5 billion in GPU supply.

Reuters reports that Nvidia’s follow-up to the H20 will be based on the Blackwell architecture, more specifically, the RTX Pro 6000D. Further clarification by tipster Jukanlosreve at X, citing a report from China’s GF Securities, suggests the RTX Pro 6000D will be dubbed B40 (likely a successor to the Ada Lovelace L40). Reuters classifies this as a server-class GPU which uses traditional GDDR7 memory instead of HBM, and notably avoids the use of TSMC’s CoWoS packaging technology, likely signaling at its monolithic nature.

Silicon possibilities and B40 pricing

There are two possibilities based on the available data. This GPU can either be based on datacenter grade GB1XX Blackwell or consumer-grade GB2XX Blackwell silicon. The former is unlikely, as it only features HBM controllers at the silicon level. If the B40 utilizes GB2XX dies, it would be a derivative of the GB202 chip (found in the RTX 5090 and RTX Pro Blackwell 6000) and would lack NVLink support. The report estimates the price of the B40 between $6,500 and $8,000, which is less than the H20 and comparable to Nvidia’s global RTX Pro 6000 workstation models.

The HGX H20 could be configured in an 8-GPU configuration, but without NVLink, the B40 would likely face challenges in multi-GPU setups. Nvidia’s latest RTX Pro Blackwell servers employ up to eight RTX Pro 6000 GPUs, connected via ConnectX-8 SuperNICs with integrated PCIe 6.0 switches, for GPU-to-GPU communication. This setup is likely what we’ll see for the B40, with scaling beyond eight GPUs expected to be handled by Nvidia’s Spectrum-X networking platform. Since details are scarce, this is just speculation on our part, so please don’t read it as gospel.

Follow Tom’s Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Source link

Nvidia has reportedly broken another AI world record, breaking the 1,000 tokens per second (TPS) barrier per user with Meta’s Llama 4 Maverick large language model, according to Artificial Analysis in a post on LinkedIn. This breakthrough was achieved with Nvidia’s latest DGX B200 node, which features eight Blackwell GPUs.

Nvidia outperformed the previous record holder, SambaNova, by 31%, achieving 1,038 TPS/user compared to AI chipmaker SambaNova’s prior record of 792 TPS/user. According to Artificial Analysis’s benchmark report, Nvidia and SambaNova are well ahead of everyone in this performance metric. Amazon and Groq achieved scores just shy of 300 TPS/user — the rest, Fireworks, Lambda Labs, Kluster.ai, CentML, Google Vertex, Together.ai, Deepinfra, Novita, and Azure, all achieved scores below 200 TPS/user.

Blackwell’s record-breaking result was achieved using a plethora of performance optimizations tailor-made to the Llama 4 Maverick architecture. Nvidia allegedly made extensive software optimizations using TensorRT and trained a speculative decoding draft model using Eagle-3 techniques, which are designed to accelerate inference in LLMs by predicting tokens ahead of time. These two optimizations alone achieved a 4x performance uplift compared to Blackwell’s best prior results.

Nvidia RTX PRO 6000D (B40) Blackwell GPUs reportedly set to supersede banned H20 accelerators in China

Silicon possibilities and B40 pricing

DGX B200 Blackwell node sets world record, breaking over 1,000 TPS/user