Enabling New Server Architectures With The CXL Interconnect

Republished By Plato

Followers: 0

The ever-growing demand for higher performance compute is motivating the exploration of new compute offload architectures for the data center. Artificial intelligence and machine learning (AI/ML) are just one example of the increasingly complex and demanding workloads that are pushing data centers to move away from the classic server computing architecture. These more demanding workloads can benefit greatly from lower latency coherent memory architectures. This is where the Compute Express Link (CXL) standard comes in.

CXL was first introduced in 2019 and has emerged as a new enabling technology for interconnecting computing resources. It provides a means of interconnecting, in a memory cache-coherent manner, a wide range of computing elements including CPUs, GPUs, System on Chip (SoC), memory, and more. This is particularly compelling in a world of heterogenous computing where purpose-built accelerators offload targeted workloads from the CPU. As the workloads get increasingly challenging, more and more memory resources are deployed with accelerators. CXL gives us a means to share those memory resources across CPUs and accelerators for greater performance, efficiency, and improved total cost of ownership (TCO).

CXL adopted the ubiquitous PCIe standard for its physical layer protocol, harnessing the standard’s tremendous industry momentum. At that time CXL was first launched, PCIe 5.0 was the latest standard, and CXL 1.0, 1.1 and the subsequent 2.0 generation all used PCIe 5.0’s 32 GT/s signaling. CXL 3.0 was released in 2022 and adopted PCIe 6.0 as its physical interface. CXL 3.0, like PCIe 6.0, uses PAM4 to boost signaling rates to 64 GT/s.

To support a broad number of use cases, the CXL standard defines three protocols: CXL.io, CXL.cache and CXL.mem. CXL.io provides a non-coherent load/store interface for IO devices and can be used for discovery, enumeration, and register accesses. CXL.cache enables devices such as accelerators to efficiently access and cache host memory for improved performance. With CXL.io plus CXL.cache, the following use model is possible: an accelerator-based NIC (a Type 1 device in CXL parlance) would be able to coherently cache host memory on the accelerator, perform networking or other functions, and then pass ownership of the memory to the CPU for additional processing.

The combination of CXL.io, CXL.cache and CXL.mem protocols enable a further compelling use case. With these three protocols, a host and an accelerator with attached memory (a Type 2 device) can cache coherently share memory resources. This can provide enormous architectural flexibility by offering processors, whether they be the hosts or the accelerators, access to greater capacity and memory bandwidth across their combined memory resources. One application that benefits from lower latency coherent access to CPU attached memory is natural language processing (NLP). NLP algorithms require a large amount of memory which is typically larger than can be included on a single accelerator card.

Rambus offers a CXL 2.0 Interface Subsystem (Controller and PHY) as well as a CXL 3.0 PHY (PCIe 6.0 PHY) that are ideal for performance-intensive devices such as AI/ML accelerators. These Rambus solutions benefit from over 30 years of high-speed signaling expertise, as well as extensive experience in PCIe and CXL solutions.

Additional resources

Lou Ternullo

(all posts)
Lou Ternullo is senior director of product marketing at Rambus.

SEO Powered Content & PR Distribution. Get Amplified Today.
Platoblockchain. Web3 Metaverse Intelligence. Knowledge Amplified. Access Here.
Source: https://semiengineering.com/enabling-new-server-architectures-with-the-cxl-interconnect/

Time Stamp: March 9, 2023

Time Stamp: Oct 19, 2023

Republished By Plato

Lou Ternullo

Rapid Prototyping For Emerging Semiconductor Devices

Elimination Of Die-Pop Defect By Vacuum Reflow For Ultrathin Die With Warpage In Semiconductor Packaging Assembly

EV Charging Hardware Attacks

Low Density Of LPDDR4x DRAM — The Best Choice For Edge AI

Thermal Integrity Challenges Grow In 2.5D

From Known Good Die To Known Good System With UCIe IP

Breaking The 1M RAID5 Write IOPS Barrier

Research Bits: Jan. 3

Post-Quantum Cryptography (PQC): New Algorithms For A New Era

Energy Usage in Layers Of Computing (SLAC)

About Us

Vertical Search & Ai

Platform

Stay Connected

Account