Abstract
Modern processors still struggle with the memory-wall—the widening the gap between core speed and main-memory latency. Hardware data prefetchers mitigate this bottleneck by predicting future memory accesses and fetching their cache lines early. Berti is a recently published first-level-cache prefetcher that outperforms state-of-the-art designs by tracking local deltas —the difference between the cache line addresses of two demand accesses from the same instruction— and issuing only high-confidence requests.
This talk tells the story of taking Berti from paper to reality. We begin by modelling a Sargantana-like core in a cycle-accurate simulator and building a comprehensive prefetching interface. Next, we adapt Berti’s algorithm to the interface of Sargantana, refining its delta selection to respect the core’s constraints. A tailored “Berti-for-Sargantana” version is then evaluated. Finally, we describe the ongoing integration of this design into Sargantana’s RTL, closing the loop between research insight and deployable hardware.

Simranjit Singh is a research assistant in the Computer Architecture & Parallel Systems (CAPS) group at the University of Murcia, from which he recently graduated in Computer Engineering. His research focuses on advanced cache-prefetching techniques, working in simulation frameworks for research in high-performance processors, and implementing research-based proposals in hardware. He was awarded the Severo Ochoa Visiting Researcher Grant by the Barcelona Supercomputing Center to integrate the Berti prefetcher into the Sargantana processor.
Speakers
Speaker: Simranjit Singh, research assistant in the Computer Architecture & Parallel Systems (CAPS) group at the University of Murcia
Host: Lluc Alvarez, Computer Sciences - UNCORE Cache hierarchy and interconnects, BSC