# Cache attacks: Software side-channel and fault attacks

Clémentine Maurice, Graz University of Technology May 11, 2017—Spring school on Security & Correctness in the IoT 2017

## · Clémentine Maurice

- PhD in computer science, Postdoc @ Graz University Of Technology
- 🎔 @BloodyTangerine
- 🗹 clementine.maurice@iaik.tugraz.at

## · Clémentine Maurice

- PhD in computer science, Postdoc @ Graz University Of Technology
- 🎔 @BloodyTangerine
- 🗹 clementine.maurice@iaik.tugraz.at
- + Secure Systems team: Daniel Gruss, Michael Schwarz, Moritz Lipp



 everyday hardware: servers, workstations, laptops, smartphones...



 everyday hardware: servers, workstations, laptops, smartphones...

• remote side-channel attacks

#### $\cdot$ safe software infrastructure $\rightarrow$ no bugs, e.g., Heartbleed

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- $\cdot$  does not mean safe execution

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- $\cdot$  does not mean safe execution
- information leaks because of the hardware it runs on
- $\cdot\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- $\cdot$  does not mean safe execution
- information leaks because of the hardware it runs on
- $\cdot\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations
- $\rightarrow\,$  crypto and sensitive info., e.g., keystrokes and mouse movements

• via power consumption, electromagnetic leaks

#### Sources of leakage



- via power consumption, electromagnetic leaks
  - ightarrow targeted attacks, physical access

- via power consumption, electromagnetic leaks
  - $\rightarrow\,$  targeted attacks, physical access
- $\cdot$  via shared hardware and microarchitecture

- via power consumption, electromagnetic leaks
  - $\rightarrow\,$  targeted attacks, physical access
- $\cdot$  via shared hardware and microarchitecture
  - $\rightarrow$  remote attacks





#### • new microarchitectures yearly



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...
- ... leading to side channels



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...
- ... leading to side channels
- no documentation on this intellectual property

• "Intel x86 documentation has more pages than the 6502 has transistors"

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors
- Intel Software Developer's Manuals (sept. 2016): 4670 pages

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors
- Intel Software Developer's Manuals (sept. 2016): 4670 pages
- $\cdot$  (there are actually more manuals than just the SDM)

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- Background
- Cache covert channels
- Side-channel attacks on crypto, keystrokes, and KASLR
- Fault attacks on DRAM in JavaScript
- Countermeasures and conclusion

# Cache Attack Techniques

#### MOV-Move

| Opcode        | Instruction                                     | Op/<br>En | 64-Bit<br>Mode | Compat/<br>Leg Mode | Description                                                 |
|---------------|-------------------------------------------------|-----------|----------------|---------------------|-------------------------------------------------------------|
| 88 /r         | MOV r/m8,r8                                     | MR        | Valid          | Valid               | Move r8 to r/m8.                                            |
| REX + 88 /r   | MOV <i>r/m8<sup>***,</sup> r8<sup>***</sup></i> | MR        | Valid          | N.E.                | Move <i>r8</i> to <i>r/m8.</i>                              |
| 89 /r         | MOV r/m16,r16                                   | MR        | Valid          | Valid               | Move <i>r16</i> to <i>r/m16.</i>                            |
| 89 /r         | MOV r/m32,r32                                   | MR        | Valid          | Valid               | Move <i>r32</i> to <i>r/m32.</i>                            |
| REX.W + 89 /r | MOV r/m64,r64                                   | MR        | Valid          | N.E.                | Move <i>r64</i> to <i>r/m64.</i>                            |
| 8A /r         | MOV <i>r8,r/m8</i>                              | RM        | Valid          | Valid               | Move <i>r/m8</i> to <i>r8.</i>                              |
| REX + 8A /r   | MOV r8***,r/m8***                               | RM        | Valid          | N.E.                | Move <i>r/m8</i> to <i>r8.</i>                              |
| 8B /r         | MOV r16,r/m16                                   | RM        | Valid          | Valid               | Move <i>r/m16</i> to <i>r16.</i>                            |
| 8B /r         | MOV <i>r32,r/m32</i>                            | RM        | Valid          | Valid               | Move <i>r/m32</i> to <i>r32.</i>                            |
| REX.W + 8B /r | MOV r64,r/m64                                   | RM        | Valid          | N.E.                | Move <i>r/m64</i> to <i>r64.</i>                            |
| 8C /r         | MOV r/m16,Sreg**                                | MR        | Valid          | Valid               | Move segment register to r/m16.                             |
| REX.W + 8C /r | MOV r/m64,Sreg**                                | MR        | Valid          | Valid               | Move zero extended 16-bit segment register to <i>r/m64.</i> |
| 8E /r         | MOV Sreg,r/m16**                                | RM        | Valid          | Valid               | Move <i>r/m16</i> to segment register.                      |
| REX.W + 8E /r | MOV Sreg,r/m64**                                | RM        | Valid          | Valid               | Move <i>lower 16 bits of r/m64</i> to segment register.     |
| AO            | MOV AL,moffs8*                                  | FD        | Valid          | Valid               | Move byte at ( <i>seg:offset</i> ) to AL.                   |
| REX.W + AO    | MOV AL, moffs8*                                 | FD        | Valid          | N.E.                | Move byte at (offset) to AL.                                |
| A1            | MOV AX,moffs16*                                 | FD        | Valid          | Valid               | Move word at (seg:offset) to AX.                            |
| A1            | MOV EAX,moffs32*                                | FD        | Valid          | Valid               | Move doubleword at (seg:offset) to EAX.                     |
| REX.W + A1    | MOV RAX, moffs64*                               | FD        | Valid          | N.E.                | Move quadword at (offset) to RAX.                           |

| 64-Bit Mode Excep          | tions                                                                                                                                                                        |  |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| #GP(0)                     | If the memory address is in a non-canonical form.                                                                                                                            |  |
|                            | If an attempt is made to load SS register with NULL segment selector when CPL = 3.                                                                                           |  |
|                            | If an attempt is made to load SS register with NULL segment selector when CPL < 3 and CPL $\neq$ RPL.                                                                        |  |
| #GP(selector)              | If segment selector index is outside descriptor table limits.                                                                                                                |  |
|                            | If the memory access to the descriptor table is non-canonical.                                                                                                               |  |
|                            | If the SS register is being loaded and the segment selector's RPL and the segment descriptor's DPL are not equal to the CPL.                                                 |  |
|                            | If the SS register is being loaded and the segment pointed to is a nonwritable data segment.                                                                                 |  |
|                            | If the DS, ES, FS, or GS register is being loaded and the segment pointed to is not a data or<br>readable code segment.                                                      |  |
|                            | If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL. |  |
| #SS(0)                     | If the stack address is in a non-canonical form.                                                                                                                             |  |
| #SS(selector)              | If the SS register is being loaded and the segment pointed to is marked not present.                                                                                         |  |
| <pre>#PF(fault-code)</pre> | If a page fault occurs.                                                                                                                                                      |  |
| #AC(0)                     | If alignment checking is enabled and an unaligned memory reference is made while the<br>current privilege level is 3.                                                        |  |
| #UD                        | If attempt is made to load the CS register.                                                                                                                                  |  |
|                            | If the LOCK prefix is used.                                                                                                                                                  |  |

11

 $\cdot$  lots of exceptions for mov

- $\cdot$  lots of exceptions for mov
- $\cdot$  but accessing data loads it to the cache

- $\cdot$  lots of exceptions for  $mo\nu$
- but accessing data loads it to the cache
- $\rightarrow$  side effects on computations!



• data can reside in



- $\cdot$  data can reside in
  - CPU registers


- $\cdot\,$  data can reside in
  - CPU registers
  - different levels of the CPU cache



- $\cdot\,$  data can reside in
  - CPU registers
  - different levels of the CPU cache
  - main memory



- $\cdot\,$  data can reside in
  - CPU registers
  - different levels of the CPU cache
  - main memory
  - disk storage



• L1 and L2 are private



- L1 and L2 are private
- last-level cache



- L1 and L2 are private
- last-level cache
  - divided in slices



- L1 and L2 are private
- last-level cache
  - $\cdot$  divided in slices
  - shared across cores



- L1 and L2 are private
- last-level cache
  - divided in slices
  - shared across cores
  - inclusive

| Address |  | Index | Offset |
|---------|--|-------|--------|
|---------|--|-------|--------|



Data loaded in a specific set depending on its address



Data loaded in a specific set depending on its address

Several ways per set



Data loaded in a specific set depending on its address

Several ways per set

Cache line loaded in a specific way depending on the replacement policy





#### cache hits cache misses



- very short timings
- rdtsc instruction: cycle-accurate timestamps

- very short timings
- rdtsc instruction: cycle-accurate timestamps

```
[...]
rdtsc
function()
rdtsc
[...]
```

• do you measure what you think you measure?

- do you measure what you think you measure?
- out-of-order execution

- do you measure what you think you measure?
- $\cdot \, \, \text{out-of-order}$  execution  $\rightarrow$  what is really executed

| rdtsc      | rdtsc      | rdtsc                 |
|------------|------------|-----------------------|
| function() | []         | rdtsc                 |
| []         | rdtsc      | <pre>function()</pre> |
| rdtsc      | function() | []                    |

• use pseudo-serializing instruction rdtscp (recent CPUs)

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper, December 2010.

 $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- $\cdot$  attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- $\cdot$  attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs
- side-channel attack: one malicious process spies on benign processes
  - e.g., steals crypto keys, spies on keystrokes

- two (main) techniques
  - 1. Flush+Reload (Gullasch et al., Osvik et al., Yarom et al.)
  - 2. Prime+Probe (Percival, Osvik et al., Liu et al.)
- exploitable on x86 and ARM

D. Gullasch et al. "Cache Games – Bringing Access-Based Cache Attacks on AES to Practice". In: S&P'11. 2011.

Y. Yarom et al. "Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack". In: USENIX Security Symposium. 2014.

D. A. Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

C. Percival. "Cache missing for fun and profit". In: Proceedings of BSDCan. 2005.

F. Liu et al. "Last-Level Cache Side-Channel Attacks are Practical". In: S&P'15. 2015.



Step 1: Attacker maps shared library (shared memory, in cache)



Step 1: Attacker maps shared library (shared memory, in cache)



Step 1: Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line



**Step 1:** Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line

Step 3: Victim loads the data



Step 1: Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line

Step 3: Victim loads the data

Step 4: Attacker reloads the data

# What if there is no shared memory?



• inclusive LLC: superset of L1 and L2



• inclusive LLC: superset of L1 and L2


• inclusive LLC: superset of L1 and L2



• inclusive LLC: superset of L1 and L2



- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2



- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2
- a core can evict lines in the private L1 of another core

Victim address space

Cache

Attacker address space



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

Step 2: Victim evicts cache lines while running



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

Step 2: Victim evicts cache lines while running



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

**Step 2:** Victim evicts cache lines while running

Step 3: Attacker probes data to determine if set has been accessed



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

**Step 2:** Victim evicts cache lines while running

Step 3: Attacker probes data to determine if set has been accessed

We need to evict caches lines without **clflush** or shared memory:

- 1. which addresses do we access to have congruent cache lines?
- 2. without any privilege?
- 3. and in which order do we access them?

We need to evict caches lines without **clflush** or shared memory:

- 1. which addresses do we access to have congruent cache lines?
- 2. without any privilege?
- 3. and in which order do we access them?

# Last-level cache addressing



- $\cdot\,$  last-level cache  $\rightarrow$  as many slices as cores
- undocumented hash function that maps a physical address to a slice
- designed for performance



Undocumented function  $\rightarrow$  impossible to target the same set in the same slice



Let's reverse-engineer the last-level cache!

1. find some way to map one address to one slice

- 1. find some way to map one address to one slice
- 2. repeat for every address with a 64B stride

- 1. find some way to map one address to one slice
- 2. repeat for every address with a 64B stride
- 3. infer a function out of it

- with performance counters (Maurice et al., 2015)
- $\cdot$  with a timing attack
  - using clflush (using Gruss et al., 2016)
  - using memory accesses (Yarom et al., 2015)

C. Maurice et al. "Reverse Engineering Intel Complex Addressing Using Performance Counters". In: RAID'15. 2015.

D. Gruss et al. "Flush+Flush: A Fast and Stealthy Cache Attack". In: DIMVA'16. 2016.

Y. Yarom et al. "Mapping the Intel Last-Level Cache". In: Cryptology ePrint Archive, Report 2015/905 (2015), pp. 1–12.

- with performance counters (Maurice et al., 2015)
- $\cdot$  with a timing attack
  - using clflush (using Gruss et al., 2016)
  - using memory accesses (Yarom et al., 2015)

C. Maurice et al. "Reverse Engineering Intel Complex Addressing Using Performance Counters". In: RAID'15. 2015.

D. Gruss et al. "Flush+Flush: A Fast and Stealthy Cache Attack". In: DIMVA'16. 2016.

Y. Yarom et al. "Mapping the Intel Last-Level Cache". In: Cryptology ePrint Archive, Report 2015/905 (2015), pp. 1–12.









#### 1. translate virtual to physical address with /proc/pid/pagemap

- 1. translate virtual to physical address with /proc/pid/pagemap
- 2. set up monitoring session

- 1. translate virtual to physical address with /proc/pid/pagemap
- 2. set up monitoring session
- 3. repeat access to a single address

- 1. translate virtual to physical address with /proc/pid/pagemap
- 2. set up monitoring session
- 3. repeat access to a single address
  - $\rightarrow~{\tt clflush}$  is already counted as an access

- 1. translate virtual to physical address with /proc/pid/pagemap
- 2. set up monitoring session
- 3. repeat access to a single address
  - $\rightarrow~{\tt clflush}$  is already counted as an access
- 4. read UNC\_CBO\_CACHE\_LOOKUP event for each CBo

- 1. translate virtual to physical address with /proc/pid/pagemap
- 2. set up monitoring session
- 3. repeat access to a single address
  - $\rightarrow~{\tt clflush}$  is already counted as an access
- 4. read UNC\_CBO\_CACHE\_LOOKUP event for each CBo
- 5. slice is the one that has the maximum lookup events

# Mapping physical addresses to slices



■CBo 0 ■CBo 1 ■CBo 2 ■CBo 3

1. linear function  $(2^n$  number of cores): XOR of address bits

- 1. linear function  $(2^n$  number of cores): XOR of address bits
  - solve equations with linear algebra

- 1. linear function  $(2^n$  number of cores): XOR of address bits
  - solve equations with linear algebra
  - $\cdot$  brute force

- 1. linear function  $(2^n$  number of cores): XOR of address bits
  - $\cdot$  solve equations with linear algebra
  - $\cdot$  brute force
- 2. non-linear function (the rest): more complicated
#### 3 functions, depending on the number of cores

|         |    | Address bit |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |   |          |          |          |
|---------|----|-------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|---|----------|----------|----------|
|         |    | 3           | 3        | 3        | 3        | 3        | 3        | 3        | 3        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 0 | 0        | 0        | 0        |
|         |    | 7           | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9        | 8        | 7        | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9        | 8        | 7        | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9 | 8        | 7        | 6        |
| 2 cores | 00 |             |          |          |          |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
| 4 cores | 00 | 1           |          |          |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
|         | 01 |             |          |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          |   |          | $\oplus$ |          |
|         | 00 |             | $\oplus$ | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
| 8 cores | 01 | $\oplus$    |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          |   |          | $\oplus$ |          |
|         | 02 | $\oplus$    | $\oplus$ | $\oplus$ | $\oplus$ |          |          | $\oplus$ |          |          | $\oplus$ |          |          | $\oplus$ | $\oplus$ |          |          |   | $\oplus$ |          |          |

Function valid for Sandy Bridge, Ivy Bridge, Haswell, Broadwell

**#1.** Cache Covert Channels

• malicious privacy gallery app



- malicious privacy gallery app
  - no permissions except accessing your images



- malicious privacy gallery app
  - no permissions except accessing your images
- malicious weather widget



- $\cdot$  malicious privacy gallery app
  - no permissions except accessing your images
- malicious weather widget
  - no permissions except accessing the Internet



- malicious privacy gallery app
  - no permissions except accessing your images
- malicious weather widget
  - no permissions except accessing the Internet



- malicious privacy gallery app
  - no permissions except accessing your images
- malicious weather widget
  - no permissions except accessing the Internet



- malicious privacy gallery app
  - no permissions except accessing your images
- malicious weather widget
  - no permissions except accessing the Internet



- sender transmits bits with cache hits and misses
- receiver monitors a cache line or a cache set to receive 1 bit

- sender transmits bits with cache hits and misses
- receiver monitors a cache line or a cache set to receive 1 bit
- transmitting on several lines or sets to send several bits

- sender transmits bits with cache hits and misses
- receiver monitors a cache line or a cache set to receive 1 bit
- transmitting on several lines or sets to send several bits
- Flush+Reload: using cache line from a shared library/executable
  - sender: accesses the line to send '1', stays idle to send '0'
  - receiver: reloads the line, then flushes it

- sender transmits bits with cache hits and misses
- receiver monitors a cache line or a cache set to receive 1 bit
- transmitting on several lines or sets to send several bits
- Flush+Reload: using cache line from a shared library/executable
  - sender: accesses the line to send '1', stays idle to send '0'
  - receiver: reloads the line, then flushes it
- Prime+Probe: agreeing on a cache set
  - sender: primes the set to send '1', stays idle to send '0'
  - $\cdot$  receiver: probes the set

- programs competing for
  - $\cdot$  the CPU cache
  - scheduling

C. Maurice et al. "Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud". In: NDSS'17. 2017.

- programs competing for
  - $\cdot$  the CPU cache
  - scheduling
- literature: covert channels stop working with noise on the machine

C. Maurice et al. "Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud". In: NDSS'17. 2017.

- programs competing for
  - $\cdot$  the CPU cache
  - scheduling
- literature: covert channels stop working with noise on the machine
- solution? "Just use error-correcting codes"

C. Maurice et al. "Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud". In: NDSS'17. 2017.

- programs competing for
  - $\cdot$  the CPU cache
  - scheduling
- literature: covert channels stop working with noise on the machine
- solution? "Just use error-correcting codes"
- let's investigate this!

C. Maurice et al. "Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud". In: NDSS'17. 2017.



## Why can't we just use error correcting codes?





# Why can't we just use error correcting codes?







(c) Sender descheduled: insertions

# Why can't we just use error correcting codes?





(d) Receiver descheduled: deletions

- physical layer:
  - $\cdot\,$  transmits words as a sequence of '0's and '1's
  - deals with synchronization errors
- data-link layer:
  - divides data to transmit into packets
  - corrects the remaining errors

• sender and receiver agree on one set

- sender and receiver agree on one set
- receiver probes the set continuously

- $\cdot$  sender and receiver agree on one set
- $\cdot$  receiver probes the set continuously
- sender transmits '0' doing nothing
  - $\rightarrow~$  lines of the receiver still in cache  $\rightarrow~$  fast access

- $\cdot$  sender and receiver agree on one set
- receiver probes the set continuously
- sender transmits '0' doing nothing
  - $\rightarrow~$  lines of the receiver still in cache  $\rightarrow~$  fast access
- sender transmits '1' accessing addresses in the set
  - $\rightarrow~{\rm evicts}~{\rm lines}~{\rm of}~{\rm the}~{\rm receiver}\rightarrow {\rm slow}~{\rm access}$

• need a set of addresses in the same cache set and same slice

# Eviction set generation

- need a set of addresses in the same cache set and same slice
- problem: slice number depends on all bits of the physical address

### Eviction set generation

- need a set of addresses in the same cache set and same slice
- problem: slice number depends on all bits of the physical address



• we can build a set of addresses in the same cache set and same slice

# Eviction set generation

- need a set of addresses in the same cache set and same slice
- problem: slice number depends on all bits of the physical address



- we can build a set of addresses in the same cache set and same slice
- without knowing which slice



| Cache Sets |  |  |  |  |  |  |  |  |  |  |  |  |  |
|------------|--|--|--|--|--|--|--|--|--|--|--|--|--|
|            |  |  |  |  |  |  |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |  |  |  |  |  |  |

receiver eviction sets





receiver eviction sets







# Jamming agreement







receiver eviction sets






# Jamming agreement













# Jamming agreement















# Jamming agreement







| Cache Sets |  |  |  |  |  |  |  |
|------------|--|--|--|--|--|--|--|
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |





| Cache Sets |  |  |  |  |  |  |  |
|------------|--|--|--|--|--|--|--|
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |
|            |  |  |  |  |  |  |  |











## Sending the first image



### Handling synchronization errors



## Handling synchronization errors

- deletion errors: request-to-send scheme that also serves as ack
  - 3-bit sequence number
  - request: encoded sequence number (7 bits)



## Handling synchronization errors

- deletion errors: request-to-send scheme that also serves as ack
  - 3-bit sequence number
  - request: encoded sequence number (7 bits)
- · '0'-insertion errors: error detection code  $\rightarrow$  Berger codes
  - appending the number of '0's in the word to itself
  - ightarrow property: a word cannot consist solely of '0's



# Synchronization (before)



# Synchronization (after)



# Synchronization (after)



# Synchronization (after)



### Data-link layer: Error correction

• Reed-Solomon codes to correct the remaining errors

#### Data-link layer: Error correction

- Reed-Solomon codes to correct the remaining errors
- RS word size = physical layer word size = 12 bits
- packet size =  $2^{12} 1 = 4095$  RS words
- 10% error-correcting code: 409 parity and 3686 data RS words



# Error correction (after)



| Environment | Bit rate   | Error rate | Noise |
|-------------|------------|------------|-------|
| Native      | 75.10 KBps | 0.00%      | _     |

| Environment | Bit rate   | Error rate | Noise       |
|-------------|------------|------------|-------------|
| Native      | 75.10 KBps | 0.00%      | -           |
| Native      | 36.03 KBps | 0.00%      | stress -m 1 |

| Environment | Bit rate   | Error rate | Noise       |
|-------------|------------|------------|-------------|
| Native      | 75.10 KBps | 0.00%      | -           |
| Native      | 36.03 KBps | 0.00%      | stress -m 1 |
| Amazon EC2  | 45.25 KBps | 0.00%      | -           |

| Environment | Bit rate   | Error rate | Noise                                       |
|-------------|------------|------------|---------------------------------------------|
| Native      | 75.10 KBps | 0.00%      | -                                           |
| Native      | 36.03 KBps | 0.00%      | stress -m 1                                 |
| Amazon EC2  | 45.25 KBps | 0.00%      | -                                           |
| Amazon EC2  | 45.09 KBps | 0.00%      | web server serving files on sender VM       |
| Amazon EC2  | 42.96 KBps | 0.00%      | <pre>stress -m 2 on sender VM</pre>         |
| Amazon EC2  | 42.26 KBps | 0.00%      | stress -m 1 on receiver VM                  |
| Amazon EC2  | 37.42 KBps | 0.00%      | web server on all 3 VMs, stress -m 4 on 3rd |
|             |            |            | VM, stress -m 1 on sender and receiver VMs  |
| Amazon EC2  | 34.27 KBps | 0.00%      | stress -m 8 on third VM                     |

## Building an SSH connection



#### Between two instances on Amazon EC2

| Noise                                 | Connection   |
|---------------------------------------|--------------|
| No noise                              | $\checkmark$ |
| stress -m 8 on third VM               | $\checkmark$ |
| Web server on third VM                | $\checkmark$ |
| Web server on SSH server VM           | $\checkmark$ |
| Web server on all VMs                 | $\checkmark$ |
| <pre>stress -m 1 on server side</pre> | unstable     |

#### Between two instances on Amazon EC2

| Noise                                 | Connection   |
|---------------------------------------|--------------|
| No noise                              | $\checkmark$ |
| <pre>stress -m 8 on third VM</pre>    | $\checkmark$ |
| Web server on third VM                | $\checkmark$ |
| Web server on SSH server VM           | $\checkmark$ |
| Web server on all VMs                 | $\checkmark$ |
| <pre>stress -m 1 on server side</pre> | unstable     |

Telnet also works with occasional corrupted bytes with **stress** -m 1

• cache covert channels are practical
- cache covert channels are practical
- even in the cloud, even in presence of extraordinary noise

- cache covert channels are practical
- even in the cloud, even in presence of extraordinary noise
- $\cdot$  our robust covert channel supports an SSH connection

- cache covert channels are practical
- even in the cloud, even in presence of extraordinary noise
- $\cdot$  our robust covert channel supports an SSH connection
- we extended Amazon's product portfolio :)

- cache covert channels are practical
- even in the cloud, even in presence of extraordinary noise
- our robust covert channel supports an SSH connection
- we extended Amazon's product portfolio :)



- cache covert channels are practical
- even in the cloud, even in presence of extraordinary noise
- our robust covert channel supports an SSH connection
- we extended Amazon's product portfolio :)



# #2. Side-Channel Attack on Crypto

• AES T-Tables: fast software implementation

D. A. Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

- AES T-Tables: fast software implementation
- uses precomputed look-up tables

D. A. Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

- AES T-Tables: fast software implementation
- uses precomputed look-up tables
- one-round known-plaintext attack by Osvik et al.
  - *p* plaintext and *k* secret key
  - intermediate state  $x^{(r)} = (x_0^{(r)}, \dots, x_{15}^{(r)})$  at each round r
  - first round, accessed table indices are

$$x_i^{(0)} = p_i \oplus k_i$$
 for all  $i = 0, \dots, 15$ 

D. A. Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

- AES T-Tables: fast software implementation
- uses precomputed look-up tables
- one-round known-plaintext attack by Osvik et al.
  - *p* plaintext and *k* secret key
  - intermediate state  $x^{(r)} = (x_0^{(r)}, \dots, x_{15}^{(r)})$  at each round r
  - first round, accessed table indices are

$$x_i^{(0)} = p_i \oplus k_i \qquad \text{for all } i = 0, \dots, 15$$

#### $\rightarrow\,$ recovering accessed table indices $\Rightarrow\,$ recovering the key

D. A. Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

• monitoring which T-Table entry is accessed ( $k_0 = 0 \times 0 0$ )





plaintext byte values

Prime+Probe

• it's an old attack...

M. Lipp et al. "ARMageddon: Cache Attacks on Mobile Devices". In: USENIX Security Symposium. 2016.

- it's an old attack...
- everything should be fixed by now...

M. Lipp et al. "ARMageddon: Cache Attacks on Mobile Devices". In: USENIX Security Symposium. 2016.

- it's an old attack...
- everything should be fixed by now...
- $\cdot\,$  Bouncy Castle on Android  $\rightarrow$  default implementation uses T-Tables

M. Lipp et al. "ARMageddon: Cache Attacks on Mobile Devices". In: USENIX Security Symposium. 2016.

- Flush+Reload
  - RSA: 96.7% of secret key bits in a single signature (Yarom et al.)
  - AES: full key recovery in 30000 dec. (a few seconds) (Guelmezoglu et al.)
- Prime+Probe
  - El Gamal (sliding window): full key recovery in 12 min. (Liu et al.)

Y. Yarom et al. "Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack". In: USENIX Security Symposium. 2014

B. Gülmezoglu et al. "A Faster and More Realistic Flush+Reload Attack on AES". In: COSADE'15. 2015

F. Liu et al. "Last-Level Cache Side-Channel Attacks are Practical". In: S&P'15. 2015.

• you should never write crypto with secret-dependent memory accesses

- you should never write crypto with secret-dependent memory accesses
- most recent implementations are now protected against these attacks
- $\cdot$  e.g., really hard to use AES T-Tables by mistake on OpenSSL

- you should never write crypto with secret-dependent memory accesses
- most recent implementations are now protected against these attacks
- e.g., really hard to use AES T-Tables by mistake on OpenSSL
- big issue is adoption and legacy code still running

#3. Side-Channel Attack onKeystroke Timings

· locating event-dependent memory access  $\rightarrow$  Cache Template Attacks

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

- + locating event-dependent memory access  $\rightarrow$  Cache Template Attacks
- $\cdot$  learning phase

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

- + locating event-dependent memory access  $\rightarrow$  Cache Template Attacks
- $\cdot$  learning phase
  - 1. shared library or executable is mapped

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

- + locating event-dependent memory access  $\rightarrow$  Cache Template Attacks
- learning phase
  - 1. shared library or executable is mapped
  - 2. trigger an event while Flush+Reload one address

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

- + locating event-dependent memory access  $\rightarrow$  Cache Template Attacks
- learning phase
  - 1. shared library or executable is mapped
  - 2. trigger an event while Flush+Reload one address
    - ightarrow cache hit: address used by the library/executable

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

- · locating event-dependent memory access  $\rightarrow$  Cache Template Attacks
- learning phase
  - 1. shared library or executable is mapped
  - 2. trigger an event while Flush+Reload one address
    - ightarrow cache hit: address used by the library/executable
  - 3. repeat step 2 for every address

D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015.

## Which addresses to monitor?

- cache template matrix
  - = how many cache hits for each pair (event, address)?



Addresses

## Which addresses to monitor?

- cache template matrix
  - = how many cache hits for each pair (event, address)?



## Which addresses to monitor?

- cache template matrix
  - = how many cache hits for each pair (event, address)?



## Spying on keystrokes

- + Flush+Reload and Evict+Reload: fine-grained attacks  $\rightarrow$  spy on keystrokes
- high-resolution timers  $\rightarrow$  precise inter-keystroke timing
- next step: infer typed words with Hidden Markov Models



M. Lipp et al. "ARMageddon: Cache Attacks on Mobile Devices". In: USENIX Security Symposium. 2016

distinguishing between different types of events by monitoring access time



M. Lipp et al. "ARMageddon: Cache Attacks on Mobile Devices". In: USENIX Security Symposium. 2016.

• obtaining precise timings is easy

- obtaining precise timings is easy
- further computations are needed to derive typed words

- obtaining precise timings is easy
- further computations are needed to derive typed words
- but the attack also allows distinguishing key groups

- obtaining precise timings is easy
- further computations are needed to derive typed words
- but the attack also allows distinguishing key groups
- $\rightarrow$  reduces search space for, e.g., password retrieval

# #4. Side-Channel Attack on KASLR

prefetch fetches the line of data from memory containing the specified byte

6 prefetch instructions:

- prefetcht0: suggests CPU to load data into L1
- prefetcht1: suggests CPU to load data into L2
- prefetcht2: suggests CPU to load data into L3
- prefetchnta: suggests CPU to load data for non-temporal access
- prefetchw: suggests CPU to load data with intention to write
- prefetchwt1: suggests CPU to load vector data with intention to write
# prefetch according to Intel

#### NOTE

Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2014

# Using the PREFETCH instruction is recommended only if data does not fit in cache.

Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2014

Using the PREFETCH instruction is recommended only if data does not fit in cache. Use of software prefetch should be limited to memory addresses that are managed or owned within the application context.

Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2014

Using the PREFETCH instruction is recommended only if data does not fit in cache. Use of software prefetch should be limited to memory addresses that are managed or owned within the application context. Prefetching to addresses that are not mapped to physical pages can experience non-deterministic performance penalty.

Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2014

Using the PREFETCH instruction is recommended only if data does not fit in cache. Use of software prefetch should be limited to memory addresses that are managed or owned within the application context. Prefetching to addresses that are **not mapped** to physical pages can experience **non-deterministic** performance penalty. For example specifying a NULL pointer (OL) as address for a prefetch can cause long delays.

Intel Corporation. Intel® 64 and IA-32 Architectures Optimization Reference Manual. 2014

A little bit more background before continuing...

# Address translation



# Address translation caches



#### Today's operating systems:



# Kernel Address Space Layout Randomization (KASLR)



• same driver, different offset at each boot

# Kernel Address Space Layout Randomization (KASLR)



- same driver, different offset at each boot
- leaking kernel/driver addresses defeats KASLR

# Kernel direct-physical map



• OS X, Linux, BSD, Xen PVM (Amazon EC2)

# Kernel direct-physical map



- OS X, Linux, BSD, Xen PVM (Amazon EC2)
- not Windows

# Let's go back to prefetch!

• tells the CPU "I might need that later"

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

- tells the CPU "I might need that later"
- hint—may be ignored by the CPU

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

- tells the CPU "I might need that later"
- hint—may be ignored by the CPU
- generates no faults

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

- tells the CPU "I might need that later"
- hint—may be ignored by the CPU
- generates no faults

Property #1: do not check privileges

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

• operand is a virtual address

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

- operand is a virtual address
- but it needs to translate the virtual address to a physical address

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016

- operand is a virtual address
- but it needs to translate the virtual address to a physical address

Property #2: execution time varies

D. Gruss et al. "Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR". In: CCS'16. 2016



| Cache | ache | h | С | Ca | ( |
|-------|------|---|---|----|---|
|-------|------|---|---|----|---|









 $\cdot\,$  cache hit  $\rightarrow$  physical address in kernel mapping is the correct translation

# Address-translation oracle



Exploiting property #2



Exploiting property #2



 $\cdot$  timing depends on where the translation stops

# Prefetch side-channel attacks

• variants of cache attacks (e.g., Flush+Prefetch)

- variants of cache attacks (e.g., Flush+Prefetch)
- Rowhammer attacks on privileged addresses

- variants of cache attacks (e.g., Flush+Prefetch)
- Rowhammer attacks on privileged addresses
- recovering translation levels of a process (→ /proc/pid/pagemap)
  → now privileged → bypasses ASLR

- variants of cache attacks (e.g., Flush+Prefetch)
- Rowhammer attacks on privileged addresses
- recovering translation levels of a process ( $\rightarrow$  /proc/pid/pagemap)
  - $\rightarrow$  now privileged  $\rightarrow$  bypasses ASLR
- translating virtual addresses to physical addresses ( $\rightarrow$  /proc/pid/pagemap)  $\rightarrow$  now privileged  $\rightarrow$  re-enables ret2dir exploits

- variants of cache attacks (e.g., Flush+Prefetch)
- Rowhammer attacks on privileged addresses
- + recovering translation levels of a process ( $\rightarrow \texttt{/proc/pid/pagemap})$ 
  - $\rightarrow \mbox{ now privileged } \rightarrow \mbox{ bypasses ASLR}$
- translating virtual addresses to physical addresses ( $\rightarrow$  /proc/pid/pagemap)  $\rightarrow$  now privileged  $\rightarrow$  re-enables ret2dir exploits
- locating kernel drivers
  - $\rightarrow$  bypasses KASLR

For all mapped pages, found with the translation-level oracle
1. evict translation caches: Sleep() / access large memory buffer

- 1. evict translation caches: Sleep() / access large memory buffer
- 2. perform syscall to driver

- 1. evict translation caches: Sleep() / access large memory buffer
- 2. perform syscall to driver
- 3. time prefetch(page address)

- 1. evict translation caches: Sleep() / access large memory buffer
- 2. perform syscall to driver
- 3. time prefetch(page address)
- $\rightarrow\,$  fastest average access time is a driver page

- 1. evict translation caches: Sleep() / access large memory buffer
- 2. perform syscall to driver
- 3. time prefetch(page address)
- $\rightarrow\,$  fastest average access time is a driver page

Full attack on Windows 10 in < 12 seconds

### Defeating KASLR by locating kernel driver (2)



Page offset in kernel driver region

prefetch does not allow you to access memory

D. Gruss et al. "KASLR is Dead: Long Live KASLR". In: ESSoS'17. to appear. 2017.

- prefetch does not allow you to access memory
- doesn't mean that it is a good idea to not perform any check

D. Gruss et al. "KASLR is Dead: Long Live KASLR". In: ESSoS'17. to appear. 2017.

- prefetch does not allow you to access memory
- doesn't mean that it is a good idea to not perform any check
- other microarchitectural attacks targeted KASLR

D. Gruss et al. "KASLR is Dead: Long Live KASLR". In: ESSoS'17. to appear. 2017.

- prefetch does not allow you to access memory
- doesn't mean that it is a good idea to not perform any check
- other microarchitectural attacks targeted KASLR
- still not completely dead

D. Gruss et al. "KASLR is Dead: Long Live KASLR". In: ESSoS'17. to appear. 2017.

#5. Fault Attacks on DRAM in JavaScript













- bits in cells in rows
- access: activate row, copy to row buffer

- $\cdot\,$  cells leak  $\rightarrow$  repetitive refresh necessary
- + refresh  $\approx$  reading (destructive) + writing same data again
- maximum interval between refreshes to guarantee data integrity

- $\cdot\,$  cells leak  $\rightarrow$  repetitive refresh necessary
- + refresh  $\approx$  reading (destructive) + writing same data again
- maximum interval between refreshes to guarantee data integrity
- $\cdot\,$  cells leak faster upon proximate accesses  $\rightarrow$  Rowhammer

DRAM bank



row buffer













- only non-cached accesses reach DRAM
- $\cdot$  original attacks use  ${\tt clflush}$  instruction
- $\rightarrow\,$  flush line from cache
- $\rightarrow\,$  next access will be served from DRAM






















# Rowhammer (with clflush)



# Rowhammer (with clflush)



# Rowhammer (with clflush)



- the core of Rowhammer is essentially a Flush+Reload loop
- $\cdot$  as much an attack on DRAM as on cache

- idea: avoid clflush to be independent of specific instructions  $\rightarrow$  no clflush in JavaScript

D. Gruss et al. "Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript". In: DIMVA'16. 2016.

- idea: avoid clflush to be independent of specific instructions  $\rightarrow$  no clflush in JavaScript
- our approach: use regular memory accesses for eviction
  - $\rightarrow$  techniques from cache attacks!

D. Gruss et al. "Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript". In: DIMVA'16. 2016.

- idea: avoid clflush to be independent of specific instructions  $\rightarrow$  no clflush in JavaScript
- our approach: use regular memory accesses for eviction
  - $\rightarrow\,$  techniques from cache attacks!
  - $\rightarrow$  Rowhammer, Prime+Probe style!

D. Gruss et al. "Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript". In: DIMVA'16. 2016.



























*n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



*n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



*n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



#### *n* accesses for an *n*-way cache



n accesses for an n-way cache



- no LRU replacement
- only 75% success rate on Haswell

n accesses for an n-way cache



- no LRU replacement
- only 75% success rate on Haswell
- $\cdot$  more accesses  $\rightarrow$  higher success rate, but too slow

- 1. uncached memory accesses: need to reach DRAM
- 2. fast memory accesses: race against the next row refresh

- 1. uncached memory accesses: need to reach DRAM
- 2. fast memory accesses: race against the next row refresh
- $\rightarrow\,$  optimize the eviction rate and the timing

## Cache eviction strategies: The beginning



 $\rightarrow$  fast and effective on Haswell: eviction rate >99.97%

We evaluated more than 10000 strategies...

| strategy   | # accesses | eviction rate | loop time |
|------------|------------|---------------|-----------|
| ₽-1-1-1-17 | 17         |               |           |
| ₽-1-1-1-20 | 20         |               |           |

Executed in a loop, on a Haswell with a 16-way last-level cache

We evaluated more than 10000 strategies...

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      |           |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      |           |

Executed in a loop, on a Haswell with a 16-way last-level cache
| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy   | # accesses | eviction rate | loop time |
|------------|------------|---------------|-----------|
| ₽-1-1-1-17 | 17         | 74.46% 🗡      | 307 ns 🗸  |
| ₽-1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17 | 34         |               |           |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy   | # accesses | eviction rate | loop time |
|------------|------------|---------------|-----------|
| ₽-1-1-1-17 | 17         | 74.46% 🗡      | 307 ns 🗸  |
| ₽-1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17 | 34         | 99.86% 🗸      |           |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17         | 34         | 99.86% 🗸      | 191 ns 🗸  |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17         | 34         | 99.86% 🗸      | 191 ns 🗸  |
| ₽-2-2-1-17         | 64         |               |           |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17         | 34         | 99.86% 🗸      | 191 ns 🗸  |
| <i>P</i> -2-2-1-17 | 64         | 99.98% 🗸      |           |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17         | 34         | 99.86% 🗸      | 191 ns 🗸  |
| ₽-2-2-1-17         | 64         | 99.98% 🗸      | 180 ns 🗸  |

Executed in a loop, on a Haswell with a 16-way last-level cache

| strategy           | # accesses | eviction rate | loop time |
|--------------------|------------|---------------|-----------|
| ₽-1-1-1-17         | 17         | 74.46% 🗡      | 307 ns 🗸  |
| <i>P</i> -1-1-1-20 | 20         | 99.82% 🗸      | 934 ns 🗡  |
| ₽-2-1-1-17         | 34         | 99.86% 🗸      | 191 ns 🗸  |
| <i>P</i> -2-2-1-17 | 64         | 99.98% 🗸      | 180 ns 🗸  |

 $\rightarrow$  more accesses, smaller execution time?

Executed in a loop, on a Haswell with a 16-way last-level cache

*P*-2-1-1-17 (34 accesses, 191ns)

Miss (intended)

#### *P*-2-1-1-17 (34 accesses, 191ns)

Miss (intende

| Miss       | Miss       |
|------------|------------|
| (intended) | (intended) |

#### *𝒫*-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н |
|--------------------|--------------------|---|
|--------------------|--------------------|---|

#### *𝒫*-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

#### *P*-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

#### *P*-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

### P-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

### P-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

### P-2-1-1-17 (34 accesses, 191ns)



| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss HHHHHHH |
|-------------------|
|-------------------|

| Miss<br>(intended) | Miss<br>(intended) | н | 4 Miss |
|--------------------|--------------------|---|--------|
|--------------------|--------------------|---|--------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) H H H H | нн | 4 |
|--------------------------------------------|----|---|
|--------------------------------------------|----|---|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss |
|--------------------|--------------------|---|------|------|
|--------------------|--------------------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss HHHHHHHH Miss |
|-------------------------|
|-------------------------|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss |
|--------------------|--------------------|---|------|------|
|--------------------|--------------------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | Н Н Н Н H H H Miss | ľ |
|------------------------------------|--------------------|---|
|------------------------------------|--------------------|---|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss |
|--------------------|--------------------|---|------|------|
|--------------------|--------------------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | НИНИНИИ Miss | 4 |  |
|------------------------------------|--------------|---|--|
|------------------------------------|--------------|---|--|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss |
|--------------------|--------------------|---|------|------|
|--------------------|--------------------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | нныныны | Miss | н | , |
|------------------------------------|---------|------|---|---|
|------------------------------------|---------|------|---|---|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss | Miss |
|--------------------|--------------------|---|------|------|------|
|--------------------|--------------------|---|------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | нининии | Miss H H H |
|------------------------------------|---------|------------|
|------------------------------------|---------|------------|

| Miss<br>(intended) (ii | Miss<br>ntended) | Miss | Miss | Miss |
|------------------------|------------------|------|------|------|
|------------------------|------------------|------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | ННИНИНИ Miss | н | н | I | 1 | H |  | • |
|------------------------------------|--------------|---|---|---|---|---|--|---|
|------------------------------------|--------------|---|---|---|---|---|--|---|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss |
|------------------------------------|---|------|------|------|
|------------------------------------|---|------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | H | ŀ |  | 1 | н | н | • |  | н | н | Miss | ļ | 1 | н | н | ŀ | 4 | н | ľ |  |
|------------------------------------|---|---|--|---|---|---|---|--|---|---|------|---|---|---|---|---|---|---|---|--|
|------------------------------------|---|---|--|---|---|---|---|--|---|---|------|---|---|---|---|---|---|---|---|--|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss |
|------------------------------------|---|------|------|------|
|------------------------------------|---|------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | нинини м | 55 ИНИНИИ |
|------------------------------------|----------|-----------|
|------------------------------------|----------|-----------|

| Miss<br>(intended) | Miss<br>(intended) | н | Miss | Miss | Miss |
|--------------------|--------------------|---|------|------|------|
|--------------------|--------------------|---|------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) |  | н | н |  | н | ŀ | 4 | ŀ |  | н | ŀ | 4 | н |  | Miss | H |  | н | н | н |  | 4 | н | ŀ |  |
|------------------------------------|--|---|---|--|---|---|---|---|--|---|---|---|---|--|------|---|--|---|---|---|--|---|---|---|--|
|------------------------------------|--|---|---|--|---|---|---|---|--|---|---|---|---|--|------|---|--|---|---|---|--|---|---|---|--|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | I |
|------------------------------------|---|------|------|------|---|
|------------------------------------|---|------|------|------|---|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H | Miss HHHHHHHH Miss |
|----------------------------------|--------------------|
|----------------------------------|--------------------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### P-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H | Miss HHHHHHHH Miss |
|----------------------------------|--------------------|
|----------------------------------|--------------------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss H H H H H H H H H | Miss HHHHHHHH Miss | I |
|-----------------------------|--------------------|---|
|-----------------------------|--------------------|---|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss<br>(intended) | Miss<br>(intended) | нынынын | Miss | нынынын | Miss H H |
|--------------------|--------------------|---------|------|---------|----------|
|--------------------|--------------------|---------|------|---------|----------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | НИНИНИИ Miss | H H H H H H H Miss | . нин |
|------------------------------------|--------------|--------------------|-------|
|------------------------------------|--------------|--------------------|-------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H H H H | Miss H H H H H H H Miss H H | нн |
|----------------------------------------|-----------------------------|----|
|----------------------------------------|-----------------------------|----|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H | tiss HHHHHHHH Miss HHHHH |
|----------------------------------|--------------------------|
|----------------------------------|--------------------------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss |
|------------------------------------|---|------|------|------|---|------|
|------------------------------------|---|------|------|------|---|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) HHHHHHHHH Miss | N H H H H H H H Miss | ннннн |
|-------------------------------------|----------------------|-------|
|-------------------------------------|----------------------|-------|
| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss | Miss |
|------------------------------------|---|------|------|------|---|------|------|
|------------------------------------|---|------|------|------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H Mis | s HHHHHHH Miss HHHHHHHH |
|------------------------------------|-------------------------|
|------------------------------------|-------------------------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | н | Miss | Miss |
|------------------------------------|---|------|------|------|---|------|------|
|------------------------------------|---|------|------|------|---|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (recented) utaliju nju u m Miss (recented) |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) HHHHHHHHH Miss | H H H H H H H H Miss | нннннн | Miss H |
|-------------------------------------|----------------------|--------|--------|
|-------------------------------------|----------------------|--------|--------|

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss H H H H H H H H H H H H | Miss ИИНИНИИ / | Miss HHHHHHHH Miss | s ннн |
|-----------------------------------|----------------|--------------------|-------|
|-----------------------------------|----------------|--------------------|-------|

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H H H Miss | HHHHHHHH Miss | HHHHHHH Mis | s нини |
|-------------------------------------------|---------------|-------------|--------|
|-------------------------------------------|---------------|-------------|--------|

| Miss Miss<br>(intended) (intended) | н мі | as Miss | Miss | H Miss | Miss | Miss |
|------------------------------------|------|---------|------|--------|------|------|
|------------------------------------|------|---------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H Miss | H H H H H H H Miss | H H H H H H H Miss | нннн |
|---------------------------------------|--------------------|--------------------|------|
|---------------------------------------|--------------------|--------------------|------|

| Miss Miss<br>(intended) (intende | ) н | Miss | Miss | Miss | н | Miss | Miss | Miss | H |
|----------------------------------|-----|------|------|------|---|------|------|------|---|
|----------------------------------|-----|------|------|------|---|------|------|------|---|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H H H H H H Miss | H H H H H H H Miss | НИНИНИИ Miss | нинии |
|---------------------------------------|--------------------|--------------|-------|
|---------------------------------------|--------------------|--------------|-------|

| Miss Miss H<br>(intended) (intended) | Miss | Miss | Miss | H Miss | Miss | Miss | H Miss |
|--------------------------------------|------|------|------|--------|------|------|--------|
|--------------------------------------|------|------|------|--------|------|------|--------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss H<br>(intended) (intended) | Miss Miss | Miss H | Miss | Miss | Miss | H Miss | Miss |
|--------------------------------------|-----------|--------|------|------|------|--------|------|
|--------------------------------------|-----------|--------|------|------|------|--------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) H Miss | Miss Miss | H Miss | Miss | Miss | H Miss | Miss | Miss |
|-----------------------------|-----------|--------|------|------|--------|------|------|
|-----------------------------|-----------|--------|------|------|--------|------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss<br>(intended) | Miss<br>(intended) | H Miss | Miss | Miss | H Miss | Miss | Miss | H Miss | Miss | Miss | ŀ |
|--------------------|--------------------|--------|------|------|--------|------|------|--------|------|------|---|
|--------------------|--------------------|--------|------|------|--------|------|------|--------|------|------|---|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) HHHHHHHHH Miss | H H H H H H H Miss | ННННННН Miss | нинии |
|-------------------------------------|--------------------|--------------|-------|
|-------------------------------------|--------------------|--------------|-------|

| Miss Miss<br>(intended) (intended) | н | Miss | Miss | Miss | H Miss | Miss | Miss | H Miss | Miss | Miss | H Miss |
|------------------------------------|---|------|------|------|--------|------|------|--------|------|------|--------|
|------------------------------------|---|------|------|------|--------|------|------|--------|------|------|--------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss HHHHHHHHHHHH | H H H H H H H Miss | HHHHHHH Miss | нинии |
|------------------------|--------------------|--------------|-------|
|------------------------|--------------------|--------------|-------|

| Miss<br>(intended) | Miss<br>(intended) | H Miss | Miss | Miss | H Miss | Miss | Miss | H Miss | Miss | Miss | H Miss | Miss |
|--------------------|--------------------|--------|------|------|--------|------|------|--------|------|------|--------|------|
|--------------------|--------------------|--------|------|------|--------|------|------|--------|------|------|--------|------|

### *P*-2-1-1-17 (34 accesses, 191ns)

| Miss Miss (intended) HHHHHHHHH Miss | H H H H H H H Miss | ННННННН Miss | нинии |
|-------------------------------------|--------------------|--------------|-------|
|-------------------------------------|--------------------|--------------|-------|

## **Evaluation on Haswell**



- $\cdot\,$  cache eviction fast enough to replace <code>clflush</code>
- $\cdot$  independent of programming language and available instructions

E. Bosman et al. "Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector". In: S&P'16. 2016

- cache eviction fast enough to replace clflush
- $\cdot$  independent of programming language and available instructions
- lessons learned from side channels to perform a fault attack

E. Bosman et al. "Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector". In: S&P'16. 2016

- cache eviction fast enough to replace clflush
- $\cdot$  independent of programming language and available instructions
- $\cdot$  lessons learned from side channels to perform a fault attack
- first remote fault attack, from a browser

E. Bosman et al. "Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector". In: S&P'16. 2016

- cache eviction fast enough to replace clflush
- $\cdot$  independent of programming language and available instructions
- $\cdot$  lessons learned from side channels to perform a fault attack
- first remote fault attack, from a browser
- if you think a fault is not exploitable, think again

E. Bosman et al. "Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector". In: S&P'16. 2016

Countermeasures

- $\cdot$  different levels: hardware, system, application
- different goals
  - remove interferences
  - add noise to interferences
  - make it impossible to measure interferences

- $\cdot$  clflush
  - unprivileged line eviction

# $\cdot$ clflush

· unprivileged line eviction  $\rightarrow$  make it privileged

# $\cdot$ clflush

- $\cdot$  unprivileged line eviction  $\rightarrow$  make it privileged
- leaks timing information

# $\cdot$ clflush

- $\cdot$  unprivileged line eviction  $\rightarrow$  make it privileged
- + leaks timing information  $\rightarrow$  make it constant-time

# $\cdot$ clflush

- $\cdot$  unprivileged line eviction  $\rightarrow$  make it privileged
- + leaks timing information  $\rightarrow$  make it constant-time

### $\cdot$ rdtsc

• unprivileged fine-grained timing

# $\cdot$ clflush

- $\cdot$  unprivileged line eviction  $\rightarrow$  make it privileged
- + leaks timing information  $\rightarrow$  make it constant-time

### $\cdot$ rdtsc

+ unprivileged fine-grained timing  $\rightarrow$  make it privileged

# $\cdot$ clflush

- · unprivileged line eviction  $\rightarrow$  make it privileged
- + leaks timing information  $\rightarrow$  make it constant-time

### $\cdot$ rdtsc

- unprivileged fine-grained timing  $\rightarrow$  make it privileged
- $\rightarrow\,$  require changes to the architecture

# $\cdot$ clflush

- $\cdot \text{ unprivileged line eviction} \rightarrow \text{make it } \underline{\text{privileged}}$
- + leaks timing information  $\rightarrow$  make it constant-time

### $\cdot$ rdtsc

- unprivileged fine-grained timing  $\rightarrow$  make it privileged
- $\rightarrow\,$  require changes to the architecture
- $\rightarrow$  attacks still possible (e.g., Prime+Probe)

stop sharing cache

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

- stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1  $\rightarrow$  same core

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

- · stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

- $\cdot$  stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1  $\rightarrow$  same core  $\rightarrow$  stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

- $\cdot$  stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores  $\rightarrow$  stop sharing CPU

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

- $\cdot$  stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores  $\rightarrow$  stop sharing CPU
  - 2016: first attack across processors

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015
# Hardware level: Stop sharing hardware?

- $\cdot$  stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores  $\rightarrow$  stop sharing CPU
  - 2016: first attack across processors  $\rightarrow$  what next?

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

# Hardware level: Stop sharing hardware?

- $\cdot$  stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores  $\rightarrow$  stop sharing CPU
  - 2016: first attack across processors  $\rightarrow$  what next?
- not an option for cost reasons in the cloud

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

# Hardware level: Stop sharing hardware?

- · stop sharing cache  $\rightarrow$  attacks are getting better
  - + first attacks on L1 ightarrow same core ightarrow stop sharing core
  - + current attacks on LLC  $\rightarrow$  across cores  $\rightarrow$  stop sharing CPU
  - 2016: first attack across processors  $\rightarrow$  what next?
- not an option for cost reasons in the cloud
- what about JavaScript attacks?

G. Irazoqui et al. "Cross processor cache attacks". In: AsiaCCS'16. 2016

Y. Oren et al. "The Spy in the Sandbox: Practical Cache Attacks in JavaScript and their Implications". In: CCS'15. 2015

• secure cache designs: Random-Permutation Cache, Partition-Locked Cache

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

- secure cache designs: Random-Permutation Cache, Partition-Locked Cache
  - $\rightarrow$  expensive, not always high performance

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

- secure cache designs: Random-Permutation Cache, Partition-Locked Cache
  - $\rightarrow$  expensive, not always high performance
- · disruptive prefetching: random hardware prefetches

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

- secure cache designs: Random-Permutation Cache, Partition-Locked Cache
  - $\rightarrow$  expensive, not always high performance
- disruptive prefetching: random hardware prefetches
  - ightarrow adding noise makes attacks harder, not impossible

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

- secure cache designs: Random-Permutation Cache, Partition-Locked Cache
  - $\rightarrow$  expensive, not always high performance
- disruptive prefetching: random hardware prefetches
  - ightarrow adding noise makes attacks harder, not impossible
- $\rightarrow$  trade-off security/performance/cost

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

- secure cache designs: Random-Permutation Cache, Partition-Locked Cache
  - $\rightarrow$  expensive, not always high performance
- disruptive prefetching: random hardware prefetches
  - ightarrow adding noise makes attacks harder, not impossible
- $\rightarrow$  trade-off security/performance/cost
- $\rightarrow$  performance and cost win, no implementation by manufacturers

Z. Wang et al. "New cache designs for thwarting software cache-based side channel attacks". In: ACM SIGARCH Computer Architecture News 35.2 (June 2007), p. 494

J. Kong et al. "Hardware-software integrated approaches to defend against software cache-based side channel attacks". In: HPCA'09. 2009.

A. Fuchs et al. "Disruptive Prefetching: Impact on Side-Channel Attacks and Cache Designs". In: SYSTOR'15. 2015

• L1 cache cleansing

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

- L1 cache cleansing
  - $\rightarrow\,$  if applied to LLC  $\rightarrow$  same as no cache, disastrous performance

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

- L1 cache cleansing
  - $\rightarrow\,$  if applied to LLC  $\rightarrow$  same as no cache, disastrous performance
- $\cdot$  page coloring  $\rightarrow$  reduces cache sharing

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

# System level: Prevention

- L1 cache cleansing
  - $\rightarrow$  if applied to LLC  $\rightarrow$  same as no cache, disastrous performance
- $\cdot$  page coloring  $\rightarrow$  reduces cache sharing
  - $\rightarrow$  limited number of colors + bad performance
  - $\rightarrow$  doesn't prevent Flush+Reload

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

# System level: Prevention

- L1 cache cleansing
  - $\rightarrow$  if applied to LLC  $\rightarrow$  same as no cache, disastrous performance
- $\cdot$  page coloring  $\rightarrow$  reduces cache sharing
  - $\rightarrow$  limited number of colors + bad performance
  - $\rightarrow$  doesn't prevent Flush+Reload
- $\cdot\,$  noise in timers or no timer

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

# System level: Prevention

- L1 cache cleansing
  - $\rightarrow$  if applied to LLC  $\rightarrow$  same as no cache, disastrous performance
- $\cdot$  page coloring  $\rightarrow$  reduces cache sharing
  - $\rightarrow$  limited number of colors + bad performance
  - $\rightarrow$  doesn't prevent Flush+Reload
- $\cdot\,$  noise in timers or no timer
  - ightarrow adding noise makes attacks harder, not impossible
  - $\rightarrow$  removing timers is not realistic

Y. Zhang et al. "Düppel: retrofitting commodity operating systems to mitigate cache side channels in the cloud". In: CCS'13. 2013

H. Raj et al. "Resource Management for Isolation Enhanced Cloud Services". In: CCSW'09. 2009

B. C. Vattikonda et al. "Eliminating fine grained timers in Xen". In: CCSW'11. 2011

M. Schwarz et al. "Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript". In: FC'17. 2017.

#### System level: Detect on-going attacks

- using performance counters to monitor cache hits and cache misses
- $\rightarrow~{\rm risk}$  of false positives



N. Herath et al. "These are Not Your Grand Daddys CPU Performance Counters – CPU Hardware Performance Counters for Security". In: Black Hat 2015 Briefings. 2015

D. Gruss et al. "Flush+Flush: A Fast and Stealthy Cache Attack". In: DIMVA'16. 2016

- CacheAudit : static analysis of source code
- Cache Template Attacks : dynamic approach

G. Doychev et al. "CacheAudit: A Tool for the Static Analysis of Cache Side Channels". In: USENIX Security Symposium. 2013 D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015

- CacheAudit : static analysis of source code
- Cache Template Attacks : dynamic approach
- $\rightarrow\,$  limited to side-channels  $\rightarrow$  covert channels still possible
- $\rightarrow\,$  most effective for critical code

G. Doychev et al. "CacheAudit: A Tool for the Static Analysis of Cache Side Channels". In: USENIX Security Symposium. 2013 D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015

- CacheAudit : static analysis of source code
- Cache Template Attacks : dynamic approach
- $\rightarrow\,$  limited to side-channels  $\rightarrow$  covert channels still possible
- ightarrow most effective for critical code
- $\rightarrow\,$  see Boris' talk this afternoon!

G. Doychev et al. "CacheAudit: A Tool for the Static Analysis of Cache Side Channels". In: USENIX Security Symposium. 2013 D. Gruss et al. "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches". In: USENIX Security Symposium. 2015

- square-and-multiply-always algorithm
- bit-sliced AES implementation
- hardware implementations (AES-NI, etc.)

- square-and-multiply-always algorithm
- bit-sliced AES implementation
- hardware implementations (AES-NI, etc.)
- $\rightarrow\,$  protecting crypto is possible and necessary!
- $\rightarrow\,$  a few CVEs that have been treated: CVE-2005-0109, CVE-2013-4242, CVE-2014-0076, CVE-2016-0702, CVE-2016-2178

# **Bigger Perspective and Conclusions**

# rdseed and floating point operations

#### $\cdot$ rdseed

- request a random seed to the hardware random number generator
- $\cdot$  fixed number of precomputed random bits, takes time to regenerate them
- $\rightarrow$  covert channel

D. Evtyushkin et al. "Covert Channels through Random Number Generator: Mechanisms, Capacity Estimation and Mitigations". In: CCS'16. 2016 M. Andrysco et al. "On subnormal floating point and abnormal timing". In: S&P'15. 2015

# rdseed and floating point operations

#### $\cdot$ rdseed

- request a random seed to the hardware random number generator
- fixed number of precomputed random bits, takes time to regenerate them
- $\rightarrow$  covert channel
- $\cdot$  fadd, fmul
  - floating-point unit
  - floating point operations running time depends on the operands
  - ightarrow bypassing Firefox's same origin policy via SVG filter timing attack

D. Evtyushkin et al. "Covert Channels through Random Number Generator: Mechanisms, Capacity Estimation and Mitigations". In: CCS'16. 2016 M. Andrysco et al. "On subnormal floating point and abnormal timing". In: S&P'15. 2015

### • jmp

- + branch prediction and branch target prediction  $\rightarrow$  branch prediction unit
- ightarrow covert channels, side-channel attacks on crypto, bypassing kernel ASLR

O. Aciiçmez et al. "Predicting secret keys via branch prediction". In: CT-RSA 2007. 2007

D. Evtyushkin et al. "Jump over ASLR: Attacking branch predictors to bypass ASLR". In: MICRO'16. 2016

Y. Jang et al. "Breaking kernel address space layout randomization with intel TSX". In: CCS'16. 2016

# ·jmp

- + branch prediction and branch target prediction  $\rightarrow$  branch prediction unit
- $\rightarrow~{\rm covert}$  channels, side-channel attacks on crypto, bypassing kernel ASLR
- TSX instructions
  - extension for transactional memory support in hardware
  - $\rightarrow$  bypassing kernel ASLR

O. Aciiçmez et al. "Predicting secret keys via branch prediction". In: CT-RSA 2007. 2007

D. Evtyushkin et al. "Jump over ASLR: Attacking branch predictors to bypass ASLR". In: MICRO'16. 2016

Y. Jang et al. "Breaking kernel address space layout randomization with intel TSX". In: CCS'16. 2016

• more a problem of CPU design than Instruction Set Architecture

- more a problem of CPU design than Instruction Set Architecture
- $\cdot$  hard to patch  $\rightarrow$  issues linked to performance optimizations

- more a problem of CPU design than Instruction Set Architecture
- $\cdot\,$  hard to patch  $\rightarrow$  issues linked to performance optimizations
- quick fixes like removing instructions won't work

- more a problem of CPU design than Instruction Set Architecture
- $\cdot\,$  hard to patch  $\rightarrow$  issues linked to performance optimizations
- quick fixes like removing instructions won't work
- $\rightarrow\,$  looking forward to Boris' talk this afternoon!

# **Questions?**

Contact

clementine.maurice@iaik.tugraz.at
@BloodyTangerine

# Cache attacks: Software side-channel and fault attacks

Clémentine Maurice, Graz University of Technology May 11, 2017—Spring school on Security & Correctness in the IoT 2017