By | November 14, 2023
Image of H200 GPU system

Today at SC23, NVIDIA unveiled the next wave of technologies that will lift scientific and industrial research centers around the world to new levels of performance and energy efficiency.

“NVIDIA’s hardware and software innovations are creating a new class of AI supercomputers,” said Ian Buck, vice president of the company’s high-performance computing and hyperscale data center operations, in a special address at the conference.

Some of the systems will pack memory-enhanced NVIDIA Hopper accelerators, others a new NVIDIA Grace Hopper system architecture. All will use the extended parallelism to run a full stack of accelerated software for generative AI, HPC and hybrid quantum computing.

Buck described the new NVIDIA HGX H200 as “the world’s leading AI computing platform.”

NVIDIA H200 Tensor Core GPUs pack HBM3e memory to run growing generative AI models.

It holds up to 141 GB of HBM3e, the first AI accelerator to use the ultra-fast technology. Running models such as GPT-3, NVIDIA H200 Tensor Core GPUs provide an 18x performance increase over previous generations of accelerators.

Including generative AI benchmarksthey zip through 12,000 tokens per second on a Llama2-13B large language model (LLM).

Buck also revealed a server platform that links four NVIDIA GH200 Grace Hopper Superchips on an NVIDIA NVLink interconnect. The Quad configuration packs into a single compute node a whopping 288 Arm Neoverse cores and 16 petaflops of AI performance with up to 2.3 terabytes of high-speed memory.

Image of quad GH200 server node
Server nodes based on the four GH200 Superchips will deliver 16 petaflops of AI performance.

A GH200 Superchip using the open source NVIDIA TensorRT-LLM library shows its efficiency 100 times faster than a dual-socket x86 processor system and almost 2 times more energy efficient than an X86 + H100 GPU server.

“Accelerated computing is sustainable computing,” Buck said. “By harnessing the power of accelerated computing and generative AI, we can together drive innovation across industries while reducing our impact on the environment.”

NVIDIA powers 38 of 49 new TOP500 systems

The latest TOP500 list of the world’s fastest supercomputers reflects the shift towards accelerated, energy-efficient supercomputers.

Thanks to new systems powered by NVIDIA H100 Tensor Core GPUs, NVIDIA is now delivering more than 2.5 exaflops of HPC performance across these world-leading systems, up from 1.6 exaflops in the May ranking. NVIDIA’s top 10 contribution alone reaches nearly an exaflop of HPC and 72 exaflops of AI performance.

The new list includes the highest number of systems ever using NVIDIA technology, 379 compared to 372 in May, including 38 of 49 new supercomputers on the list.

Microsoft Azure leads the newcomers with its Eagle system using H100 GPUs in NDv5 instances to reach number 3 with 561 petaflops. Mare Nostrum5 in Barcelona was ranked number 8, and NVIDIA Eos – which recently set new AI training records on MLPerf’s benchmarks – came in at number 9.

NVIDIA GPUs show their energy efficiency, powering 23 of the top 30 systems on the Green500. And they retained No. 1 with the H100 GPU-based Henri system, delivering 65.09 gigaflops per watt for the Flatiron Institute in New York.

Gen AI explores COVID

Shows what is possible, Argonne National Laboratory used NVIDIA BioNeMo, a generative AI platform for biomolecular LLMs, to develop GenSLMs, a model that can generate gene sequences that closely resemble real variants of the coronavirus. Using NVIDIA GPUs and data from 1.5 million COVID genome sequences, it can also quickly identify new virus variants.

The work won the Gordon Bell Special Prize last year and was trained on supercomputers including Argonne’s Polaris system, the US Department of Energy’s Perlmutter and NVIDIA’s Selene.

That’s “just the tip of the iceberg—the future is full of possibilities, as generative AI continues to redefine the landscape of scientific exploration,” Kimberly Powell, vice president of healthcare at NVIDIA, said in the special speech.

Save time, money and energy

With the latest technology, accelerated workloads can see an order of magnitude reduction in system cost and energy use, Buck said.

For example, Siemens collaborated with Mercedes to analyze aerodynamics and related acoustics for its new EQE electric vehicles. The simulations that took weeks on CPU clusters were significantly faster with the latest NVIDIA H100 GPUs. Additionally, Hopper GPUs allow them to reduce costs by 3x and reduce power consumption by 4x (below).

Chart showing performance and energy efficiency of H100 GPUs

Hitting 200 Exaflops starting next year

Scientific and industrial progress comes from all corners of the world where the latest systems are used.

“We’re already seeing a combined 200 exaflops of AI on Grace Hopper supercomputers going into production in 2024,” Buck said.

They include the massive supercomputer JUPITER at Germany’s Jülich Center. It can deliver 93 exaflops of performance for AI training and 1 exaflop for HPC applications, while consuming only 18.2 megawatts of power.

Chart of deployed performance of supercomputers using NVIDIA GPUs through 2024
Research centers are ready to start a tsunami of GH200 performance.

Based on Evidence’s BullSequana XH3000 liquid-cooled system, JUPITER will use the NVIDIA quad GH200 system architecture and NVIDIA Quantum-2 InfiniBand networks for climate and weather prediction, drug discovery, hybrid quantum computing and digital twins. JUPITER quad GH200 nodes will be configured with 864 GB of high-speed memory.

It’s one of several new supercomputers using Grace Hopper that NVIDIA announced at SC23.

The HPE Cray EX2500 system from Hewlett Packard Enterprise will use the quad GH200 to power many AI supercomputers coming online next year.

For example, HPE uses the quad GH200 to power OFP-II, an advanced HPC system in Japan shared by the University of Tsukuba and the University of Tokyo, as well as the DeltaAI system, which will triple the computing power of the US National Center for Supercomputing Applications.

HPE is also building the Venado system for Los Alamos National Laboratory, the first GH200 to be deployed in the United States. In addition, HPE builds GH200 supercomputers in the Middle East, Switzerland and the UK

Grace Hopper in Texas and Beyond

At the Texas Advanced Computing Center (TACC), Dell Technologies builds the Vista supercomputer with NVIDIA Grace Hopper and Grace CPU Superchips.

More than 100 global companies and organizations, including NASA Ames Research Center and Total Energies, have already purchased early access Grace Hopper systems, Buck said.

They join previously announced GH200 users such as SoftBank and the University of Bristol, as well as the massive Leonardo system with 14,000 NVIDIA A100 GPUs delivering 10 exaflops of AI performance for Italy’s Cineca consortium.

The view from the supercomputer center

Leaders from supercomputing centers around the world shared their plans and ongoing work on the latest systems.

“We have collaborated with MeteoSwiss ECMWP as well as researchers from ETH EXCLAIM and NVIDIA’s Earth-2 project to create an infrastructure that will push the frontier of all dimensions of big data analysis and extreme scale,” said Thomas Schultess, head of the Swiss National Supercomputing Center for work on the Alps supercomputer.

“There are really impressive energy efficiency gains in our stacks,” Dan Stanzione, executive director of TACC, said of Vista.

It’s “really the stepping stone to move users from the types of systems we’ve done in the past to looking at this new Grace Arm CPU and Hopper GPU tightly coupled combination and … we want to scale out by probably a factor of 10 or 15 from what we deploy with Vista when we deploy Horizon in a couple of years,” he said.

Accelerate Quantum Journey

Researchers are also using today’s accelerated systems to pave the way for tomorrow’s supercomputers.

In Germany, JUPITER will “revolutionize scientific research in climate, materials, drug discovery and quantum computing,” says Kristel Michelson, who leads Julich’s research group on quantum information processing.

“JUPITER’s architecture also enables seamless integration of quantum algorithms with parallel HPC algorithms, and this is mandatory for efficient quantum-HPC hybrid simulations,” she said.

CUDA Quantum drives progress

The special address also showed how NVIDIA CUDA Quantum – a platform for programming processors, GPUs and quantum computers also known as QPUs – is advancing quantum computing research.

For example, researchers at BASF, the world’s largest chemical company, have pioneered a new hybrid quantum-classical method to simulate chemicals that can protect humans against harmful metals. They join researchers at Brookhaven National Laboratory and HPE who are separately pushing the boundaries of science with CUDA Quantum.

NVIDIA also announced a partnership with Classiq, a developer of quantum programming tools, to create a life sciences research center at the Tel Aviv Sourasky Medical Center, Israel’s largest teaching hospital. The center will use Classiq software and CUDA Quantum running on an NVIDIA DGX H100 system.

Separately, Quantum Machines will deploy the first NVIDIA DGX Quantum, a system using Grace Hopper Superchips, at the Israel National Quantum Center aimed at driving advances in scientific fields. The DGX system will be coupled to a superconducting QPU from Quantware and a photonic QPU from ORCA Computing, both powered by CUDA Quantum.

Logos for NVIDIA CUDA Quantum partners

“In just two years, our NVIDIA quantum computing platform has amassed over 120 partners (above), a testament to its open, innovative platform,” Buck said.

Taken together, the work across many discovery areas reveals a new trend that combines accelerated data center-scale computing with NVIDIA’s full-stack innovation.

“Accelerated computing is paving the way for sustainable computing with advances that deliver not only great technology, but a more sustainable and impactful future,” he concluded.

See NVIDIA’s SC23 special address below.

#class #accelerated #efficient #systems #marks #era #supercomputing

Leave a Reply

Your email address will not be published. Required fields are marked *