W04 Rapid Design Space Explorations of Novel Hardware Solutions: from Atoms to Applications

Start
End
Organiser
Michael Niemier, University of Notre Dame, United States
Organiser
Ian O'Connor, École Centrale de Lyon, France

Organisers: Michael Niemier, Ian O'Connor, Siddharth Joshi and Lorenzo Ciampolini (mniemieratnd [dot] edu) – United States & France
 

At a high-level, there is a need to integrate physics-aware models of non-volatile memories (NVMs), thermal properties of silicon and memory devices, and advances with interconnect and packaging solutions (e.g., chiplets) with system-level architectural exploration. Device-centric work can be informed by AI-guided materials discovery efforts and tooling, while compilers-centric work will provide paths to programmability for novel hardware solutions. The resulting impact of this cyber-infrastructure would be many-fold. (1) Researchers at lower-levels of the design stack can use said tools to evaluate the efficacy of novel materials/devices on application-level workloads, thereby prioritizing efforts in said space; (2) researchers at higher-levels of the design stack can (a) be in-formed by the practical capabilities of novel hardware solutions (which can subsequently guide research at the architectural and/or algorithmic levels) and (b) be used to sweep a range of opti-mistic and pessimistic assumptions for novel devices to more rapidly identify “thresholds” for FOM that are ultimately required to positively impact application-level performance from the top-down.  The workshop will capture the scope of this vast design space, identify existing infrastructure from the research community that may address the above challenges, identify  gaps and/or ways to link seemingly disparate design tools to address said gaps, while simultaneously identifying news ways for the design automation community to focus research that spans from the atomistic to the application-level.

More technically, this workshop will capture how modeling various aspects of NVMs, 2.5D/3D interconnects, and architectures including thermal, electrical, and analytical models, can be inte-grated into design space exploration (DSE) tools such as Timeloop and ZigZag.  Talks will discuss how to enhance existing DSE frameworks to facilitate modeling for next-generation accelerator use (e.g., thermally/chiplet-aware map spaces) to best meet the needs of future users. Among others, presentations will consider how to integrate/refine analytical models for novel memory systems across various abstraction levels, and how models can be calibrated with detailed device, inter-connect, and thermal modeling to inform the toolset across abstraction layers. (The latter will also encompass emerging research threads such as AI-guided materials discovery to accelerate the development of logic, memory, and interconnect technologies that can achieve key performance indi-cators that are necessary to satisfactorily address the compute requirements of emerging work-loads.)  We will also consider how cycle-accurate architectural simulators could be employed in conjunction with Timeloop/ZigZag to study chipsets such as a highly multi-threaded CPU, a high-end GPU, and/or a neural engine, as well as optimal data mapping strategies. Compilers-based infrastructure will map compute kernels from machine learning (ML) APIs such as TensorFlow and PyTorch and can drive research from the bottom-up or top-down.

The workshop will architect a path toward an infrastructure that will deliver an enhanced, extensi-ble analytical modeling toolset, validated models, and actionable design insights. Said frameworks will afford the academic community at large, as well as industrial partners who work at all levels of the design stack with the capability to quantitatively evaluate/co-design next-generation memory systems with advanced workloads.
 

W04.1 Design Space Exploration Frameworks

Session Start
Session End
Session chair
Ian O'Connor, École Centrale de Lyon, France
Presentations

W04.1.1 Workshop Introduction, Overview, and Welcome

Start
End
Speaker
Ian O'Connor, École Centrale de Lyon, France
Speaker
Michael Niemier, University of Notre Dame, United States

W04.1.2 Exploiting analytical modeling for efficient deployment of emerging AI workloads on diverse accelerator hardware

Start
End
Speaker
Arne Symons, KU Leuven, Belgium

As AI models continue to evolve, efficiently mapping them onto accelerator hardware becomes increasingly important and increasingly complex. This talk gives an accessible introduction to the fundamentals of AI accelerator mapping and hardware modeling, showing how performance, energy, and memory efficiency depend on the interaction between workload structure, hardware architecture, and execution strategy. We will outline analytical methods that make these trade-offs visible and support systematic design space exploration. We use these foundations to look at two timely directions: emerging sequence workloads such as state space models, and novel accelerator platforms such as AMD's AIE NPU platform. Together, these examples illustrate how mapping and modeling techniques can help bridge established accelerator principles with the requirements of new workloads and new hardware targets, while also connecting naturally to practical deployment flows through an MLIR-based compiler interface.

Hypothesizing Autonomous Accelerator Design

Start
End
Speaker
Zhiru Zhang, Cornell University, United States

Computing is undergoing a fundamental transition, with performance and efficiency gains increasingly driven by specialized accelerators. Yet a longstanding disconnect remains between how these accelerators are designed, how they are modeled and characterized, and how they are programmed. This gap slows hardware innovation, complicates the software stack, and makes accelerators far harder to evolve than the rapidly changing applications they are meant to serve. While increasingly capable coding agents can help alleviate some of these challenges, many key pieces are still missing to truly close this loop. In this talk, I will share lessons from our recent work on (1) workload mapping for emerging accelerator architectures, (2) abstractions that help unify accelerator design and programming, and (3) agentic approaches to compiler construction. I will also discuss how these directions may collectively move us closer to a future of more autonomous accelerator design.

W04.1.3 Top-Down Analysis via Integrated Compilers Frameworks

Start
End
Speaker
Jeronimo Castrillon, TU Dresden, Germany

W04.1.4 Memory Key Performance Indicators - From Materials to Array-Level Analysis with Ferroelectrics

Start
End
Speaker
Michael Niemier, University of Notre Dame, United States
Speaker
Asif Khan, Georgia Institute of Technology, United States

Ferroelectric memory technologies offer exciting opportunities for future low-energy computing, but realizing their full potential requires a clear connection between device-level properties and system-level performance.  We present a framework for evaluating how key ferroelectric memory key performance indicators (KPIs) (e.g., repeatability, retention, endurance, read disturb, and conductance-state separation) propagate to array- and application-level figures of merit including area, latency, energy, and accuracy.

Our approach combines array-level modeling, including peripheral-circuit overheads, with physics-based device models and experimental data to quantify both architectural tradeoffs and application-facing impact. This analysis not only helps identify where ferroelectric devices are most advantageous, but also creates a feedback path from workloads and hardware mapping back to the device community by revealing which material and device targets are most consequential in practice. In particular, for AI-relevant workloads, this co-design view helps expose how improvements in one device dimension may shift constraints elsewhere, underscoring the need for quantitative benchmarking across levels of abstraction.

While the primary focus of the talk is on ferroelectric devices, we will briefly comment on how the same evaluation framework can be extended to other emerging memory technologies such as ECRAM. More broadly, the presentation aims to highlight a practical atoms-to-applications methodology for assessing and guiding the development of ferroelectric memories for future computing systems.
 

W04.2 Break

Session Start
Session End
Session chair
Michael Niemier, University of Notre Dame, United States

Break

W04.3 Infrastructure for Design Space Exploration Framework - Bottom-up Meets Top-down

Session Start
Session End
Session chair
Michael Niemier, University of Notre Dame, United States
Presentations

W04.3.1 Spin-Orbit-Torque MRAM for Information Storage and Database Search

Start
End
Speaker
Azad Naeemi, Georgia Institute of Technology, United States

The first part of this talk will focus on cross-layer modeling and design of SOT-MRAM chips based on a comprehensive set of experimentally validated physical models for nanoscale SOT devices and physical design of memory cells, subarrays, peripheral circuits, memory controllers, and the full chip. At the device level, tradeoffs among the write current, error rate, and time will be quantified and will be used to design and optimize memory sub-arrays and to perform DTCO for the entire memory chip based on place and route (PnR). The second part of the talk will present the design and benchmarking of SOT-MRAM content-addressable-memories (CAM) for nearest neighbor search and will show how BEOL compatible Transition Metal Dichalcogenide (TMD) Thin-Film Resistors can be used to significantly improve the resolution of CAMs.

W04.3.2 Computing close to memory: a co-design perspective

Start
End
Speaker
Giovanni Ansaloni, EPFL, Switzerland

Next-generation computing architectures will have to confront the demise of scaling laws and the unabated increase in AI workloads. Against this backdrop, Compute Memories (CMs) are especially promising, since they drastically reduce ever-more costly data movements, while offering massive parallelism.  Nonetheless, the development of CMs is hampered by the paucity of exploration frameworks for investigating hardware/software co-designed solutions. In this talk, I illustrate two complementary approaches which addresses this challenge, based on open hardware and system simulation frameworks, respectively. The talk also details the architecture of domain-specific CMs for AI using such strategies, each resulting in >100X performance increase compared to traditional processor-centric execution. I will highlight differences in capabilities, target scenarios and implementation philosophies.

W04.3.3 New Architectural Solutions and Mapping Spaces

Start
End
Speaker
Vijay Reddi, Harvard University, United States

W04.3.4 Improving Memory KPIs from the Bottom-Up

Start
End
Speaker
James Rondinelli, Northwestern University, United States