DESIGN TOOLS
applications

Boost HPC workloads with Micron DDR5 and 4th Gen AMD EPYC processors 

Krishna Yalamanchi, Sudharshan Vazhkudai | November 2022

AMD和美光合作的目标是在客户端和数据中心平台上提供一流的 用户体验. To that end, the two companies have a joint server lab in Austin, 努力确保我们减少验证服务器内存的时间,并在验证和发布过程中执行联合工作负载测试. In this blog, 我们来看一些使用Micron DDR5数据中心内存和4的常见hpc工作负载基准测试结果th Gen AMD EPYCTM Processors as both these products are shipping now.

高性能计算(HPC)工作负载历来是一些世界上最快的超级计算机的领域. These are often large-scale, 数据密集型工作负载分为数百万个并行运行的操作,并使用tb级的数据. These complex workloads are dedicated to solving some of humankind’s most challenging problems — weather and climate simulations; seismic modeling; chemical, physics and biological analysis; and more.

With advances in computer architectures, 这些工作负载越来越多地托管在非常大的高性能服务器“向外扩展”集群中. These clusters require the latest and greatest compute, fabric, memory and storage infrastructure to address the scalability, low latency and performance needs of such critical workloads. While server CPUs have grown in performance and throughput, 在过去的几年中,DDR4内存提供的带宽已经成为瓶颈. 没有足够的内存带宽来提供越来越多的高性能内核.

micron ddr5 information

Micron DDR5 memory and the new AMD Zen 4 server architectures featuring 4th Gen AMD EPYC Processors change that. Now, 服务器cpu和内存可以更好地平衡,为最苛刻的工作负载释放性能和效率. DDR5内存可帮助组织更快地获得这些见解,无论是在本地还是在云中. 考虑使用最新的AMD Zen 4 96核CPU和行业标准的HPC工作负载基准测试Micron DDR5时产生的一些证明点. All of our test results have shown two times the performance improvement. 

两倍的内存带宽与美光DDR5 + 4 代AMD EPYC处理器使用STREAM

STREAM1 是一个简单的,众所周知的基准,用于测量HPC计算机中的内存带宽. It captures peak memory bandwidth for HPC systems

Software stack used for this workload

  • Alma 9 Linux kernel 5.14
  • STREAM.f  11-29-2021 release
micron ddr5 provides more bandwidth showing a bar graph

Test setup

  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

Test results

  • Double the memory bandwidth of 378 GB/s for a single-socket DDR5 system
  • 这意味着客户可以运行更大的人工智能/机器学习(AI/ML)项目,或者使用DDR5增加的内存带宽进行更多的HPC计算.
bar graph showing relative gain ddr5 versus ddr4
沙巴体育结算平台设计到航空航天设计等应用中的物理交互. 数据集中包含的一个模拟以摩托车湍流模拟为特征. For this model, OpenFOAM calculates steady air flow around a motorcycle and rider. OpenFOAM根据用户指定的进程数对计算进行负载平衡, and then decomposes the mesh into parts for each process to solve. 求解完成后,将网格和解重组为单个域.

\n

Software stack used for this workload

\n
    \n
  • OpenFOAM CFD Software (v8) with motorBike mesh size of 600 x 240 x 240
  • \n
  • Alma 9 Linux kernel 5.14 
  • \n
  • Open MPI v4.1.1
  • \n
\n

Test setup

\n
    \n
  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
    \n
  • \n
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM
  • \n
\n

Test results

\n

Our tests demonstrated a 2.4 times relative gain for OpenFOAM, 哪个被视为拥有大型开源社区的前五大HPC软件平台. Used widely in universities and R&D centers, 软件的高并行性同时利用了内存(增加的带宽)和CPU特性(如更密集的内核).

\n

Molecular dynamics6 with Micron DDR5 run two times faster

\n

CP2K是一个开源的量子化学工具,可用于许多应用程序, including simulations of solid-state biological systems. CP2K为不同的建模方法提供了一个通用框架,例如 DFT ,使用混合 高斯和平面波方法 GPW和GAPW. 我们看的例子是水(H2O)的线性缩放密度泛函理论(DFT),它由6144个原子组成,在一个39立方埃的盒子里(总共2048个水分子).

\n

Software stack used for this workload

\n
    \n
  • H2O-DFT-LS.NREP4 & H2O-DFT-LS
  • \n
  • Alma 9 Linux kernel 5.14
  • \n
\n

Test setup

\n
    \n
  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
    \n
  • \n
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM
  • \n
\n

Test results

\n

Our tests demonstrated a 2.1 times relative gain for molecular dynamics, and this scales well with more cores and more memory bandwidth.

\n

Summary

\n

The results above are just the start — and just a few examples of HPC workloads. The ability to better match high-performance, 高带宽内存与新服务器处理器(如第四代AMD EPYC处理器)提供的令人难以置信的性能将成为HPC客户的分水岭. 我们可以期待看到更多这样的证明点,展示企业数据中心和云运营商如何在这些新平台上使用美光DDR5来解锁新的性能和效率水平. We look forward to sharing these with you in the coming months. To learn more about Micron DDR5 and data center workload benefits, visit Micron.com/ddr5.

\n"}}' id="text-31413e043b">

Weather research and forecasting (WRF)4 runs two times faster with Micron DDR5

This HPC workload code is used by the weather and climate community, and the model is widely used for meteorological applications. WRF通常在支持高浮点处理的传统HPC架构上表现良好, high memory bandwidth and a low-latency network. For this effort, the Continental United States (CONUS) at 2.5-km lateral resolution was chosen.

Software stack used for this workload 

  • Alma 9 Linux kernel 5.14 
  • WRF 2.3.5 & 4.3.3 
  • Open MPI v4.1.1

Test setup

  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

Test results

  • We were able to execute 1.使用美光DDR5和第四代AMD EPYC处理器 与2相比,每秒3567个时间步.8533 time steps per second.
  • 更快的执行时间意味着天气预报员可以选择更大的数据集或运行更多的模型. Both efforts lead to improved forecasts.

OpenFOAM5 with Micron DDR5 runs two times faster

OpenFOAM is an open-source HPC workload for computation fluid dynamics (CFD), used in a wide variety of industries to reduce development time and costs. 它模拟了从消费沙巴体育结算平台设计到航空航天设计等应用中的物理交互. 数据集中包含的一个模拟以摩托车湍流模拟为特征. For this model, OpenFOAM calculates steady air flow around a motorcycle and rider. OpenFOAM根据用户指定的进程数对计算进行负载平衡, and then decomposes the mesh into parts for each process to solve. 求解完成后,将网格和解重组为单个域.

Software stack used for this workload

  • OpenFOAM CFD Software (v8) with motorBike mesh size of 600 x 240 x 240
  • Alma 9 Linux kernel 5.14 
  • Open MPI v4.1.1

Test setup

  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

Test results

Our tests demonstrated a 2.4 times relative gain for OpenFOAM, 哪个被视为拥有大型开源社区的前五大HPC软件平台. Used widely in universities and R&D centers, 软件的高并行性同时利用了内存(增加的带宽)和CPU特性(如更密集的内核).

Molecular dynamics6 with Micron DDR5 run two times faster

CP2K是一个开源的量子化学工具,可用于许多应用程序, including simulations of solid-state biological systems. CP2K为不同的建模方法提供了一个通用框架,例如 DFT ,使用混合 高斯和平面波方法 GPW和GAPW. 我们看的例子是水(H2O)的线性缩放密度泛函理论(DFT),它由6144个原子组成,在一个39立方埃的盒子里(总共2048个水分子).

Software stack used for this workload

  • H2O-DFT-LS.NREP4 & H2O-DFT-LS
  • Alma 9 Linux kernel 5.14

Test setup

  • DDR4 system 3rd Gen AMD EPYC Processors with 64 cores and 3.7 GHz; DDR4 3200 MHz system2 is fully populated with 64GB RDIMM
  • DDR5 system 4th Gen AMD EPYC Processors with 96 cores and 3.7 GHz; DDR5 4800 MHz system3 is fully populated with 64GB RDIMM

Test results

Our tests demonstrated a 2.1 times relative gain for molecular dynamics, and this scales well with more cores and more memory bandwidth.

Summary

The results above are just the start — and just a few examples of HPC workloads. The ability to better match high-performance, 高带宽内存与新服务器处理器(如第四代AMD EPYC处理器)提供的令人难以置信的性能将成为HPC客户的分水岭. 我们可以期待看到更多这样的证明点,展示企业数据中心和云运营商如何在这些新平台上使用美光DDR5来解锁新的性能和效率水平. We look forward to sharing these with you in the coming months. To learn more about Micron DDR5 and data center workload benefits, visit Micron.com/ddr5.

1. Our STREAM benchmark setup with 2.5 billion vector size STREAM Benchmark - AMD run with a 1 CPU system
2. AMD DDR4系统是AMD EPYC 7763 64核DDR4-3200 MHz完全填充64GB rdimm
3. AMD DDR5系统是AMD EPYC 9654 96核DDR5-4800 MHz完全填充64GB rdimm
4. WRF with a 12.在计算存储I/O时,5km CONUS在DDR4系统上运行929秒,在DDR5系统上运行287秒. The above example is from a WRF 2.5-km CONUS that ran 2.8533 time steps per second and 1.3567 time steps per second.
5. For OpenFOAM, we ran three variations:
5a. 1004040 runtimes = 1,144 seconds on DDR4 system and 478 seconds DDR5 system
5b. 1084646运行时间= DDR4系统上的1633秒,DDR5系统上的698秒
5c. 1305252运行时间=在DDR4系统上为2,522秒,在DDR5系统上为1,091秒
6. Molecular dynamics workload ran for 2,519 seconds on the DDR4 system and for 1,242 seconds on the DDR5 system

Sr Manager, Ecosystem Enablement

Krishna Yalamanchi

Krishna是高级生态系统开发经理,专注于DDR5和CXL解决方案. Previously, Krishna lead SAP HANA migration for Intel IT, 通过他们的SI合作伙伴生态系统推出了针对SAP工作负载的第三代和第四代英特尔至强, OEM’s and Cloud Service Providers.

Director, Workload Analytics

Sudharshan Vazhkudai

Dr. Sudharshan S. Vazhkudai is the Director of System Architecture / Workload Analytics at Micron. He leads a team spread across Austin and Hyderabad, India, focusing on understanding the composability of the memory/storage (DDR, CXL, HBM和NVMe)沙巴体育结算平台层次结构,并针对数据中心工作负载优化系统架构.