By Bill Gervasi with contributions from Joe O'Hare
The industry is in the process of adopting Compute Express Link, or CXL®, as the primary fabric for interconnecting a variety of processors, I/O resources, memory resources, and storage for data centers, hyperscalers, and similar computing clusters. Artificial intelligence is similarly soaring in popularity and demanding massive data sets for the learning processes. CXL’s simplicity of integration into systems over a standardized Peripheral Connect Interface Express (PCIe) bus encourages its adoption in embedded applications as well. CXL is likely to move into the next generation of automobiles as well as cars adopt data center technology. CXL allows mixing resources in a fashion that meets the needs of these systems, and flexibility for each end user to deploy a different mix.
FLIT-MRAM is a memory component that operates in “Type 3” mode (CXL.io and CXL.mem) to provide a unique non-volatile memory solution for emerging CXL applications. It supports I/O transactions for discovery, configuration, telemetry gathering, and extended operations while also supporting direct memory accesses. FLIT-MRAM deploys a low pin count interface with higher performance and lower power than industry standard interfaces such as LPDDR, while providing valuable data persistence. It enables a disaggregated Persistent Memory solution for Compute.
CXL Memory Challenges
CXL memory expansion is helping address the needs of artificial intelligence applications and other in-memory compute programs that demand ever-increasing amounts of high-performance memory. The following concerns with CXL modules with DRAM for memory expansion impacts the adoption of this feature:
- DRAM Volatility/pFail: DRAMs lose data on power failure, putting operations in process at risk at all times.
- DRAM Quality of Service: Access times to DRAM must accommodate interruptions for refreshes.
Everspin FLIT-MRAM provides a unique solution for CXL memory expansion that addresses each of these concerns.
Challenge: DRAM Volatility
The memory cell of a DRAM is a capacitor that stores a charge representing a data bit on or off state. If power is lost, DRAMs immediately lose all data content. This weakness of DRAM architecture has driven computer systems design for decades, and a hierarchy of memory tiers must include non-volatile memory resources such as Solid State Drives (SSDs) for saving data permanently in case of power loss. Checkpointing from DRAM to SSD typically consumes 7% of system performance and power.
Spin-transfer Torque (STT) Magnetoresistive Random Access Memory (MRAM) is a non-volatile memory technology that utilizes the spin-transfer torque property, which is the manipulation of the spin of electrons with a polarizing current, to establish the desired magnetic state of a magnetic tunnel junction (MTJ) to reflect the binary 1’s and 0’s of a memory bit cell. This property is not lost on power failure, and data is immediately available when power is restored. In essence, an MRAM operates like a combination of DRAM and NAND, greatly simplifying system architecture and reducing unnecessary data movement, including checkpointing. Recovery time can be much faster as database reconstruction from previous transactions need not require multiple accesses to slow SSD resources.
FLIT-MRAM, as shown in Figure 1, is a flow control unit (FLIT) based solution. CXL FLITs are packets containing 64 data bytes (512 bits) that match the natural size of one processor cache line, thus an MRAM core with a 512-bit wide memory port allows transfer of one FLIT on each clock cycle. This slows the clock rate, lowering power and improving efficiency. FLIT-MRAM deploys a wide I/O memory device with centralized memory control logic, consolidating all the features usually scattered across many devices into one.
Figure 1: FLIT-MRAM Centralized Control Functions
Metadata is a hot topic in computer memory design. Metadata allows the system to store “hidden” information about the memory regions, and the amount of metadata desired is increasing. FLIT-MRAM allows addition of metadata as in Figure 2 which is covered by the ECC codes, and each bit of metadata added only increases the die size by 0.2%, so customers can define the number of bits they need for their application. FLIT-MRAM can support a robust 3-bit error correction scheme with a 6.4% die adder, i.e., better protection at lower cost than multi-device solutions.
Figure 2: FLIT-MRAM Array with ECC Coverage on Metadata
Challenge: Memory Determinism
DRAM controllers include a “refresh” function that must recharge every single bit of the memory device more than 30 times every second, and refresh recovery time represents 4.5% of available memory transfer bandwidth. At higher temperatures, between 85-95⁰C, this penalty increases to 9%. FLIT-MRAM has no refresh requirement, making the memory interface deterministic, improving the QoS of the application.
CXL differs from traditional memory interfaces in another respect that impacts QoS: it supports full duplex operation where reads and writes can occur simultaneously. Traditional DRAM interfaces, such as double data rate (DDR), are half duplex: switching from read to write and back require inserting delay times “bubbles” that impact performance and QoS. This impact is visualized in Figure 1 which compares FLIT-MRAM to low-power DDR (LPDDR) memory. At the lower left of the plot, under light application load, the latency of the CXL interface gives LPDDR a slight advantage. However, as the application load gets heavier, such as is typical with multi-core processors, the half-duplex nature of LPDDR dominates and the access times increase dramatically. FLIT-MRAM leverages CXL’s full duplex nature to hide these penalties, and performance stays more constant even under heavy load.
Figure 3: Performance of FLIT-MRAM Versus LPDDR
General Purpose Acceleration
CXL offers another very useful feature: CXL.io, a protocol based on the PCIe interface. This back channel into the controller can be very useful in that commands not supported with DDR are now possible. One example is the ability to perform a memory fill operation. Naturally, the CXL.io back channel allows for many new functions that can be added over time; compression, encryption, processing-in-memory, and so forth may be added to the roadmap or into customer-specific solutions.
FLIT-MRAM for Artificial Intelligence
The proposed FLIT-MRAM provides a simple expansion mechanism for AI solutions, and the non-volatility can be leveraged advantageously. FLIT-MRAM need not replace all AI memory, which commonly uses HBM to hold its data sets, but can be in addition to HBM where the application can partition “checkpoint” data into the FLIT-MRAM, giving the overall solution the data resilience on power failure required. With the low pin count, i.e., 32 active signals, and reasonably long throw allowing 100 mm between processor and FLIT-MRAM, incremental memory may be added to increase the total memory footprint and eliminate checkpointing. Since each FLIT-MRAM operates independently, any or all may be put into deep sleep or turned off to save power when not in use.
Figure 4: AI Processor with HBM and FLIT-MRAM
FLIT-MRAM for Automotive Applications
Automotive electronics have evolved into critical data center applications as cars become more networked, incorporate more sensors and controls, manage multiple high definition displays, and increase their artificial intelligence capabilities. Car electronics designs are already migrating to PCIe as the fabric connecting multiple high performance system-on-a-chip (SoC) processors. Each SoC typically has nine LPDDR DRAMs on a 144-bit data bus, and no shared memory. A possible redesign of the system replaces the nine LPDDR devices with three FLIT-MRAMs, and optionally adds more to the PCIe fabric for shared memory resources to simplify passing data between the processors. A car with FLIT-MRAM as its memory resource enables instant-on, a feature that literally can save lives.
Figure 5: Automotive Solution with LPDDR Versus FLIT-MRAM
Conclusion
FLIT-MRAM rethinks the architecture of DRAM solutions by leveraging the increasingly popular CXL infrastructure. Replacing the DRAM DDRx physical interface with a CXL interface allows memory transactions to match the 64-byte cache lines of processors via the 64-byte FLIT payload of CXL.
FLIT-MRAM brings transparent non-volatility to the processing architecture, improving quality of service and improving data reliability.
Potential applications include notebook computers, AI accelerators, automobiles, and industrial control systems, but increasingly, even server motherboard suppliers are considering if this form of memory expansion helps them scale the memory wall and improve system performance.