Fixing newly discovered side channel will likely take a major toll on performance.
A newly discovered vulnerability baked into Apple’s M-series of chips allows attackers to extract secret keys from Macs when they perform widely used cryptographic operations, academic researchers have revealed in a paper published Thursday.
The flaw—a side channel allowing end-to-end key extractions when Apple chips run implementations of widely used cryptographic protocols—can’t be patched directly because it stems from the microarchitectural design of the silicon itself. Instead, it can only be mitigated by building defenses into third-party cryptographic software that could drastically degrade M-series performance when executing cryptographic operations, particularly on the earlier M1 and M2 generations. The vulnerability can be exploited when the targeted cryptographic operation and the malicious application with normal user system privileges run on the same CPU cluster.
Beware of hardware optimizations
The threat resides in the chips’ data memory-dependent prefetcher, a hardware optimization that predicts the memory addresses of data that running code is likely to access in the near future. By loading the contents into the CPU cache before it’s actually needed, the DMP, as the feature is abbreviated, reduces latency between the main memory and the CPU, a common bottleneck in modern computing. DMPs are a relatively new phenomenon found only in M-series chips and Intel’s 13th-generation Raptor Lake microarchitecture, although older forms of prefetchers have been common for years.
Security experts have long known that classical prefetchers open a side channel that malicious processes can probe to obtain secret key material from cryptographic operations. This vulnerability is the result of the prefetchers making predictions based on previous access patterns, which can create changes in state that attackers can exploit to leak information. In response, cryptographic engineers have devised constant-time programming, an approach that ensures that all operations take the same amount of time to complete, regardless of their operands. It does this by keeping code free of secret-dependent memory accesses or structures.
The breakthrough of the new research is that it exposes a previously overlooked behavior of DMPs in Apple silicon: Sometimes they confuse memory content, such as key material, with the pointer value that is used to load other data. As a result, the DMP often reads the data and attempts to treat it as an address to perform memory access. This “dereferencing” of “pointers”—meaning the reading of data and leaking it through a side channel—is a flagrant violation of the constant-time paradigm.Advertisement
The team of researchers consists of:
- Boru Chen, University of Illinois Urbana-Champaign
- Yingchen Wang, University of Texas at Austin
- Pradyumna Shome, Georgia Institute of Technology
- Christopher W. Fletcher, University of California, Berkeley
- David Kohlbrenner, University of Washington
- Riccardo Paccagnella, Carnegie Mellon University
- Daniel Genkin, Georgia Institute of Technology
In an email, they explained:
Prefetchers usually look at addresses of accessed data (ignoring values of accessed data) and try to guess future addresses that might be useful. The DMP is different in this sense as in addition to addresses it also uses the data values in order to make predictions (predict addresses to go to and prefetch). In particular, if a data value “looks like” a pointer, it will be treated as an “address” (where in fact it’s actually not!) and the data from this “address” will be brought to the cache. The arrival of this address into the cache is visible, leaking over cache side channels.
Our attack exploits this fact. We cannot leak encryption keys directly, but what we can do is manipulate intermediate data inside the encryption algorithm to look like a pointer via a chosen input attack. The DMP then sees that the data value “looks like” an address, and brings the data from this “address” into the cache, which leaks the “address.” We don’t care about the data value being prefetched, but the fact that the intermediate data looked like an address is visible via a cache channel and is sufficient to reveal the secret key over time.
In Thursday’s paper, the team explained it slightly differently:
Our key insight is that while the DMP only dereferences pointers, an attacker can craft program inputs so that when those inputs mix with cryptographic secrets, the resulting intermediate state can be engineered to look like a pointer if and only if the secret satisfies an attacker-chosen predicate. For example, imagine that a program has secret s, takes x as input, and computes and then stores y = s ⊕ x to its program memory. The attacker can craft different x and infer partial (or even complete) information about s by observing whether the DMP is able to dereference y. We first use this observation to break the guarantees of a standard constant-time swap primitive recommended for use in cryptographic implementations. We then show how to break complete cryptographic implementations designed to be secure against chosen-input attacks.
ARS VIDEO
How Scientists Respond to Science Deniers
Enter GoFetch
The attack, which the researchers have named GoFetch, uses an application that doesn’t require root access, only the same user privileges needed by most third-party applications installed on a macOS system. M-series chips are divided into what are known as clusters. The M1, for example, has two clusters: one containing four efficiency cores and the other four performance cores. As long as the GoFetch app and the targeted cryptography app are running on the same performance cluster—even when on separate cores within that cluster—GoFetch can mine enough secrets to leak a secret key.
The attack works against both classical encryption algorithms and a newer generation of encryption that has been hardened to withstand anticipated attacks from quantum computers. The GoFetch app requires less than an hour to extract a 2048-bit RSA key and a little over two hours to extract a 2048-bit Diffie-Hellman key. The attack takes 54 minutes to extract the material required to assemble a Kyber-512 key and about 10 hours for a Dilithium-2 key, not counting offline time needed to process the raw data.
The GoFetch app connects to the targeted app and feeds it inputs that it signs or decrypts. As its doing this, it extracts the app secret key that it uses to perform these cryptographic operations. This mechanism means the targeted app need not perform any cryptographic operations on its own during the collection period.Advertisement
The RSA and Diffie-Hellman keys were processed on implementations from Go and OpenSSL and the Kyber and Dilithium from CRYSTALS-Kyber and CRYSTALS-Dilithium. All four implementations employ constant-time programming, proving that the DMPs in Apple silicon defeat the widely deployed defense.
GoFetch isn’t the first time researchers have identified threats lurking in Apple DMPs. The optimization was first documented in 2022 research that discovered a previously unknown “pointer-chasing DMP” in both the M1 and Apple’s A14 Bionic chip for iPhones. The research, from a different assemblage of academics, gave rise to Augury, an attack that identified and exploited a memory side channel that leaked pointers. Ultimately, Augury was unable to mix data and addresses when constant-time practices were used, a shortcoming that may have given the impression the DMP didn’t pose much of a threat.
“GoFetch shows that the DMP is significantly more aggressive than previously thought and thus poses a much greater security risk,” the GoFetch authors wrote on their website. “Specifically, we find that any value loaded from memory is a candidate for being dereferenced (literally!). This allows us to sidestep many of Augury’s limitations and demonstrate end-to-end attacks on real constant-time code.”
Penalizing performance
Like other microarchitectural CPU side channels, the one that makes GoFetch possible can’t be patched in the silicon. Instead, responsibility for mitigating the harmful effects of the vulnerability falls on the people developing code for Apple hardware. For developers of cryptographic software running on M1 and M2 processors, this means that in addition to constant-time programming, they will have to employ other defenses, almost all of which come with significant performance penalties.
One of the most effective mitigations, known as ciphertext blinding, is a good example. Blinding works by adding/removing masks to sensitive values before/after being stored to/loaded from memory. This effectively randomizes the internal state of the cryptographic algorithm, preventing the attacker from controlling it and thus neutralizing GoFetch attacks. Unfortunately, the researchers said, this defense is both algorithm-specific and often costly, potentially even doubling the computing resources needed in some cases, such as for Diffie-Hellman key exchanges.
One other defense is to run cryptographic processes on the previously mentioned efficiency cores, also known as Icestorm cores, which don’t have DMP. One approach is to run all cryptographic code on these cores. This defense, too, is hardly ideal. Not only is it possible for unannounced changes to add DMP functionality to efficiency cores, running cryptographic processes here will also likely increase the time required to complete operations by a nontrivial margin. The researchers mention several ad-hoc defenses, but they are equally problematic.
The DMP on the M3, Apple’s latest chip, has a special bit that developers can invoke to disable the feature. The researchers don’t yet know what kind of penalty will occur when this performance optimization is turned off. (The researchers noted that the DMP found in Intel’s Raptor Lake processors doesn’t leak the same sorts of cryptographic secrets. What’s more, setting a special DOIT bit also effectively turns off the DMP.)Advertisement
Readers should remember that whatever penalties result will only be felt when affected software is performing specific cryptographic operations. For browsers and many other types of apps, the performance cost may not be noticeable.
“Longer term, we view the right solution to be to broaden the hardware-software contract to account for the DMP,” the researchers wrote. “At a minimum, hardware should expose to software a way to selectively disable the DMP when running security-critical applications. This already has nascent industry precedent. For example, Intel’s DOIT extensions specifically mention disabling their DMP through an ISA extension. Longer term, one would ideally like finer-grain control, e.g., to constrain the DMP to only prefetch from specific buffers or designated non-sensitive memory regions.”
Apple representatives declined to comment on the record about the GoFetch research.
End users who are concerned should check for GoFetch mitigation updates that become available for macOS software that implements any of the four encryption protocols known to be vulnerable. Out of an abundance of caution, it’s probably also wise to assume, at least for now, that other cryptographic protocols are likely also susceptible.
“Unfortunately, to assess if an implementation is vulnerable, cryptanalysis and code inspection are required to understand when and how intermediate values can be made to look like pointers in a way that leaks secrets,” the researchers advised. “This process is manual and slow and does not rule out other attack approaches.”
Post updated to note that the GoFetch app connects directly to the targeted app.