## Introduction to micro-architectural attacks

Clémentine Maurice, CNRS, IRISA April 30, 2019—Ben Gurion University, Israel

#### **Clémentine Maurice**

- since 2017: CNRS tenured researcher, working at IRISA lab, EMSEC group
- 2016–2017: postdoc at TU Graz (Austria)
- 2012–2015: PhD (Technicolor/Eurecom)



#### Everyday hardware: servers, workstations, laptops, smartphones...



#### $\cdot$ safe software infrastructure $\rightarrow$ no bugs, e.g., buffer overflows

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., buffer overflows
- $\cdot$  does not mean safe execution

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., buffer overflows
- $\cdot$  does not mean safe execution
- information leaks because of implementation and hardware
- $\cdot\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations

- $\cdot$  safe software infrastructure  $\rightarrow$  no bugs, e.g., buffer overflows
- $\cdot$  does not mean safe execution
- information leaks because of implementation and hardware
- $\cdot\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations
- $\rightarrow\,$  crypto and sensitive info., e.g., keystrokes and mouse movements

• via power consumption, electromagnetic leaks

#### Sources of leakage



- via power consumption, electromagnetic leaks
  - $\rightarrow\,$  targeted attacks, physical access
  - ightarrow mostly performed on embedded devices

- via power consumption, electromagnetic leaks
  - $\rightarrow\,$  targeted attacks, physical access
  - $\rightarrow\,$  mostly performed on embedded devices
- $\cdot$  via the timing and micro-architecture

- $\cdot$  via power consumption, electromagnetic leaks
  - $\rightarrow\,$  targeted attacks, physical access
  - $\rightarrow\,$  mostly performed on embedded devices
- $\cdot$  via the timing and micro-architecture
  - $\rightarrow$  remote attacks, no physical access required

## Example: Cache attack on RSA square-and-multiply exponentiation (1/2)

mbedTLS version 2.3.0 (fixed since)

```
Algorithm 1: Square-and-multiply exponentiation
Input: base b, exponent e, modulus n
Output: b^e \mod n
X \leftarrow 1
for i \leftarrow bitlen(e) downto 0 do
    X \leftarrow \text{multiply}(X, X)
    if e_i = 1 then
        X \leftarrow \text{multiply}(X, b)
    end
end
return X
```

## Example: Cache attack on RSA square-and-multiply exponentiation (2/2)

• raw Prime+Probe cache trace on the buffer holding the multiplier b



## Example: Cache attack on RSA square-and-multiply exponentiation (2/2)

- raw Prime+Probe cache trace on the buffer holding the multiplier b
- processed with a simple moving average



## Example: Cache attack on RSA square-and-multiply exponentiation (2/2)

- raw Prime+Probe cache trace on the buffer holding the multiplier b
- processed with a simple moving average
- allows to clearly see the bits of the exponent



• no physical access to the device

- no physical access to the device
- can execute unprivileged code on the same machine as victim

- $\cdot$  no physical access to the device
- can execute unprivileged code on the same machine as victim
- what are the scenarios in which this happens?

- $\cdot$  no physical access to the device
- · can execute unprivileged code on the same machine as victim
- what are the scenarios in which this happens?
  - you install some program on your machine/smartphone
  - you have a virtual machine on some physical machine (cloud)
  - some JavaScript runs on a web page







- new microarchitectures yearly
- + performance improvement  $\approx 5\%$



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...



- new microarchitectures yearly
- performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...
- ... leading to side channels



- new microarchitectures yearly
- + performance improvement  $\approx 5\%$
- very small optimizations: caches, branch prediction...
- ... leading to side channels
- no documentation on this intellectual property

#### $\cdot$ "Intel x86 documentation has more pages than the 6502 has transistors"

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors
- Intel Software Developer's Manuals (may. 2018): 4844 pages

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

- $\cdot$  "Intel x86 documentation has more pages than the 6502 has transistors"
- 6502: 8-bit microprocessor, used in the Apple II, Commodore 64, Atari 800...
  - year: 1975  $\rightarrow$  3510 transistors
- 22-core Intel Xeon Broadwell-E5
  - year: 2016  $\rightarrow$  7.2 billion transistors
- Intel Software Developer's Manuals (may. 2018): 4844 pages
- (there are actually more manuals than just the SDM)

Ken Shirriff, http://www.righto.com/2013/09/intel-x86-documentation-has-more-pages.html

# Background on caches

- this is not boring background to maybe understand better the remainder
- we actually do really really need to understand how caches work in great details to perform cache attacks
- pay attention and ask questions if you don't understand something :)

• ideal memory: zero latency, infinite capacity, zero cost, infinite bandwidth

- ideal memory: zero latency, infinite capacity, zero cost, infinite bandwidth
- $\cdot$  ideal memory requirements oppose each other

- ideal memory: zero latency, infinite capacity, zero cost, infinite bandwidth
- $\cdot$  ideal memory requirements oppose each other
- $\cdot$  bigger is slower  $\rightarrow$  bigger: takes longer to determine the location
- + faster is more expensive  $\rightarrow$  memory technology: SRAM vs. DRAM vs. Disk
- higher bandwidth is more expensive  $\rightarrow$  need more banks, more ports, higher frequency, or faster technology

slower access

SRAM: static random access memory

faster access

- $\cdot\,$  slower access
- higher density (1 transistor
  + 1 capacitor per cell)

- faster access
- lower density (6 transistors per cell)

- $\cdot\,$  slower access
- higher density (1 transistor
  + 1 capacitor per cell)
- lower cost

- faster access
- lower density (6 transistors per cell)
- higher cost

- $\cdot$  slower access
- higher density (1 transistor
  + 1 capacitor per cell)
- lower cost
- $\cdot$  charge loss over time  $\rightarrow$  requires refresh

- faster access
- lower density (6 transistors per cell)
- higher cost
- no need for refresh

• we can't have both large and fast with a single level of memory

- $\cdot$  we can't have both large and fast with a single level of memory
- have multiple levels of storage

- $\cdot$  we can't have both large and fast with a single level of memory
- have multiple levels of storage
- progressively bigger and slower as the levels are farther from the processor

- $\cdot$  we can't have both large and fast with a single level of memory
- have multiple levels of storage
- progressively bigger and slower as the levels are farther from the processor
- ensure most of the data the processor needs is kept in the fast(er) level(s)







• CPU registers



- CPU registers
- different levels of the CPU cache



- CPU registers
- different levels of the CPU cache
- main memory



- CPU registers
- different levels of the CPU cache
- main memory
- disk storage

• **temporal locality**: a program tends to reference the same memory location many times and all within a small window of time (e.g., loops)

- **temporal locality**: a program tends to reference the same memory location many times and all within a small window of time (e.g., loops)
- anticipation: recently accessed data will be accessed again soon

- **temporal locality**: a program tends to reference the same memory location many times and all within a small window of time (e.g., loops)
- anticipation: recently accessed data will be accessed again soon
- idea: store recently accessed data in automatically managed fast memory

• **spatial locality**: a program tends to reference a **cluster of memory locations at a time**, e.g., sequential instruction access, array traversal

- **spatial locality**: a program tends to reference a **cluster of memory locations at a time**, e.g., sequential instruction access, array traversal
- anticipation: nearby data will be accessed soon

- **spatial locality**: a program tends to reference a **cluster of memory locations at a time**, e.g., sequential instruction access, array traversal
- anticipation: nearby data will be accessed soon
- idea: store addresses adjacent to the recently accessed one in automatically managed fast memory
  - logically divide memory into equal size blocks (lines)
  - $\cdot$  fetch to cache the accessed block in its entirety

- manual: programmer manages data movement across levels
  - painful for substantial programs
  - only used in some embedded systems

- manual: programmer manages data movement across levels
  - painful for substantial programs
  - only used in some embedded systems
- **automatic**: hardware manages data movement across levels, transparently to the programmer

- manual: programmer manages data movement across levels
  - painful for substantial programs
  - only used in some embedded systems
- **automatic**: hardware manages data movement across levels, transparently to the programmer
  - the average programmer doesn't need to know about it, how big it is, or how it works to write a correct program

- manual: programmer manages data movement across levels
  - painful for substantial programs
  - only used in some embedded systems
- **automatic**: hardware manages data movement across levels, transparently to the programmer
  - the average programmer doesn't need to know about it, how big it is, or how it works to write a correct program
  - what about a fast program?

- manual: programmer manages data movement across levels
  - painful for substantial programs
  - only used in some embedded systems
- **automatic**: hardware manages data movement across levels, transparently to the programmer
  - the average programmer doesn't need to know about it, how big it is, or how it works to write a correct program
  - what about a fast program?
  - what about side channels?!

- block/line: unit of storage in the cache  $\rightarrow$  memory is logically divided into cache blocks that map to locations in the cache
- $\cdot$  when data is referenced
  - hit: if in cache, use cached data instead of accessing memory
  - miss: if not in cache, bring block into cache
    - ightarrow maybe have to kick something else out to do it

- placement: where and how to place/find a block in cache?
- replacement: what data to remove to make room in cache?
- granularity of management: size of blocks? uniform?
- write policy: what do we do about writes?
- instructions/data: do we treat them separately?

| Address | Tag | Index | Offset |  |
|---------|-----|-------|--------|--|
|---------|-----|-------|--------|--|

Cache



Cacilie

Data loaded in a specific set depending on its address



Data loaded in a specific set depending on its address Several ways per set



Data loaded in a specific set depending on its address

Several ways per set

Cache line loaded in a specific way depending on the replacement policy

Small exercise: compute the set index of the address (110010101111110)b for a cache with the following design:

- 8B cache lines
- 16 cache sets
- 2 ways

Small exercise: compute the set index of the address (110010101111110)b for a cache with the following design:

- $\cdot \,$  8B cache lines  $\rightarrow$  3 bits
- $\cdot$  16 cache sets  $\rightarrow$  4 bits
- 2 ways

Small exercise: compute the set index of the address (110010101111110)b for a cache with the following design:

- $\cdot\,$  8B cache lines  $\rightarrow$  3 bits
- $\cdot$  16 cache sets  $\rightarrow$  4 bits
- 2 ways

Set index: (1111)b  $\rightarrow$  15

Small exercise: compute the set index of the address (110010101111110)b for a cache with the following design:

- $\cdot\,$  8B cache lines  $\rightarrow$  3 bits
- $\cdot$  16 cache sets  $\rightarrow$  4 bits
- 2 ways

Set index: (1111)b  $\rightarrow$  15

Bonus question: what is the size of the cache?

Small exercise: compute the set index of the address (110010101111110)b for a cache with the following design:

- $\cdot\,$  8B cache lines  $\rightarrow$  3 bits
- $\cdot$  16 cache sets  $\rightarrow$  4 bits
- 2 ways

Set index: (1111)b  $\rightarrow$  15

**Bonus question**: what is the size of the cache?  $8 \times 16 \times 2 = 256B$ 

- program knows about virtual addresses, machine knows about physical addresses
- MMU does the translation between virtual to physical address
- $\cdot\,$  addresses are used for the index and the tag  $\rightarrow$  is virtual or physical used?

- program knows about virtual addresses, machine knows about physical addresses
- MMU does the translation between virtual to physical address
- $\cdot\,$  addresses are used for the index and the tag  $\rightarrow$  is virtual or physical used?
- trade-off depending on the level!
- 4 possibilities: VIVT, VIPT, PIPT (PIVT)

Virtually-indexed, virtually-tagged (VIVT)

Virtually-indexed, virtually-tagged (VIVT)

- $\cdot$   $\checkmark$  fast: no need to translate addresses
- X aliasing issues: same virtual address maps to several different physical addresses
  - $\rightarrow$  tag is not unique  $\rightarrow$  flushing the cache on context switches

Virtually-indexed, physically-tagged (VIPT)

Virtually-indexed, physically-tagged (VIPT)

- $\cdot$  needs TLB translation for the tag, but cache set can be looked up in parallel
- 🗸 still quite fast
- $\cdot$   $\checkmark$  avoiding aliasing if set index bits come from the page offset

Virtually-indexed, physically-tagged (VIPT)

- $\cdot$  needs TLB translation for the tag, but cache set can be looked up in parallel
- 🗸 still quite fast
- ✓ avoiding aliasing if set index bits come from the page offset
   → X limits the size of VIPT caches (page size × # of sets)
- used e.g., in L1 on Intel

 $\rightarrow$  4KB pages and 64B lines  $\rightarrow$  cannot have more than  $2^6=64$  sets

Physically-indexed, physically-tagged (PIPT)

Physically-indexed, physically-tagged (PIPT)

- $\cdot\,$  needs TLB translation for the tag and the set index
- X slower because of address translation
- $\cdot$   $\checkmark$  no aliasing issues
- $\cdot$   $\checkmark$  no limit for the number of sets  $\rightarrow$  good for bigger levels
- used e.g., in L2 and L3 on Intel

#### Physically-indexed, virtually-tagged (PIVT)

Physically-indexed, virtually-tagged (PIVT)

★ the worst of both worlds
 → rarely used in practice

Which block in the set to replace on a cache miss?

Which block in the set to replace on a cache miss?

- FIFO
- least recently used
- least frequently used
- $\cdot$  random
- hybrid
- ...



n accesses for an n-way cache with a LRU replacement policy



• LRU replacement policy: oldest entry replaced first



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- "perfect" LRU is complex to implement, it usually is pseudo-LRU





- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- $\cdot$  "perfect" LRU is complex to implement, it usually is pseudo-LRU



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- $\cdot$  "perfect" LRU is complex to implement, it usually is pseudo-LRU



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- $\cdot$  "perfect" LRU is complex to implement, it usually is pseudo-LRU



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- "perfect" LRU is complex to implement, it usually is pseudo-LRU



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- "perfect" LRU is complex to implement, it usually is pseudo-LRU



- LRU replacement policy: oldest entry replaced first
- timestamps for every cache line
- access updates timestamp
- $\cdot$  "perfect" LRU is complex to implement, it usually is pseudo-LRU

• LRU vs. random, which one is better?

- LRU vs. random, which one is better?
- example: 4-way cache, cyclic references to A, B, C, D, E

- LRU vs. random, which one is better?
- example: 4-way cache, cyclic references to A, B, C, D, E
- $\rightarrow~$  0% hit rate with LRU policy
  - set thrashing: when the "program working set" in a set is larger than set associativity
- ightarrow random replacement policy is better when thrashing occurs

- LRU vs. random, which one is better?
- example: 4-way cache, cyclic references to A, B, C, D, E
- $\rightarrow~$  0% hit rate with LRU policy
  - set thrashing: when the "program working set" in a set is larger than set associativity
- ightarrow random replacement policy is better when thrashing occurs
  - in practice, depends on workload, similar average hit rate for LRU and random



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



- no LRU replacement
- older entries are not necessary replaced
- switch from LRU to non-LRU from Sandy Bridge to Ivy Bridge



set-associative



- set-associative
- L1 and L2 are private



- set-associative
- L1 and L2 are private
- last-level cache



- set-associative
- L1 and L2 are private
- last-level cache
  - $\cdot$  divided in slices



- set-associative
- L1 and L2 are private
- last-level cache
  - divided in slices
  - shared across cores



- set-associative
- L1 and L2 are private
- last-level cache
  - $\cdot$  divided in slices
  - shared across cores
  - inclusive

User programs can optimize cache usage:

- prefetch: suggest CPU to load data into cache
- clflush: throw out data from from all caches

based on virtual addresses

### On my Intel Core i5-5200U (2 cores, 4 threads)

|            | L1d   | L1i   | L2     | L3   |
|------------|-------|-------|--------|------|
| level size | 32 KB | 32 KB | 256 KB | 3 MB |
| line size  | 64 B  | 64 B  | 64 B   | 64 B |
| # ways     | 8     | 8     | 8      | 12   |
| # sets     | 64    | 64    | 512    | 4096 |
| inclusive? | no    | no    | no     | yes  |

| event                | latency   |
|----------------------|-----------|
| 1 CPU cycle          | 0.3 ns    |
| level 1 cache access | 0.9 ns    |
| level 2 cache access | 2.8 ns    |
| level 3 cache access | 12.9 ns   |
| main memory access   | 120 ns    |
| solid-state disk I/O | 50-150 us |
| rotational disk I/O  | 1-10 ms   |

| event                | latency   | scaled latency |
|----------------------|-----------|----------------|
| 1 CPU cycle          | 0.3 ns    | 1 s            |
| level 1 cache access | 0.9 ns    | 3 s            |
| level 2 cache access | 2.8 ns    | 9 s            |
| level 3 cache access | 12.9 ns   | 43 s           |
| main memory access   | 120 ns    | 6 min          |
| solid-state disk I/O | 50-150 us | 2-6 days       |
| rotational disk I/O  | 1-10 ms   | 1-12 months    |





### cache hits cache misses



Cache attacks techniques

 $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs

- $\cdot\,$  cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- $\cdot\,$  attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs
- side-channel attack: one malicious process spies on benign processes
  - e.g., steals crypto keys, spies on keystrokes

How every timing attack works:

- learn timing of different corner cases
- later, we recognize these corner cases by timing only

How every timing attack works:

- learn timing of different corner cases
- later, we recognize these corner cases by timing only
- here, corner cases: hits and misses

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!
- 4. find a threshold to distinguish the two cases

Loop:

- 1. measure time
- 2. access variable (always cache hit)
- 3. measure time
- 4. update histogram with delta

Loop:

- 1. flush variable (clflush instruction)
- 2. measure time
- 3. access variable (always cache miss)
- 4. measure time
- 5. update histogram with delta

### cache hits cache misses



### Finding the threshold

- $\cdot\,$  as high as possible  $\rightarrow$  most cache hits are below
- no cache miss below



# How to measure time accurately? (1/3)

- very short timings
- rdtsc instruction: cycle-accurate timestamps

# How to measure time accurately? (1/3)

- very short timings
- rdtsc instruction: cycle-accurate timestamps

```
[...]
rdtsc
function()
rdtsc
[...]
```

• do you measure what you think you measure?

- do you measure what you think you measure?
- out-of-order execution

- do you measure what you think you measure?
- $\cdot \, \, \text{out-of-order}$  execution  $\rightarrow$  what is really executed

| rdtsc                 | rdtsc      | rdtsc      |
|-----------------------|------------|------------|
| <pre>function()</pre> | []         | rdtsc      |
| []                    | rdtsc      | function() |
| rdtsc                 | function() | []         |

• use pseudo-serializing instruction rdtscp (recent CPUs)

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper, December 2010.

### Cache attacks techniques

- two (main) techniques
  - 1. Flush+Reload (Gullasch et al., Osvik et al., Yarom et al.)
  - 2. Prime+Probe (Percival, Osvik et al., Liu et al.)
- exploitable on x86 and ARM
- $\cdot$  used for both covert channels and side-channel attacks

Yuval Yarom et al. "Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack". In: USENIX Security Symposium. 2014.

David Gullasch et al. "Cache Games – Bringing Access-Based Cache Attacks on AES to Practice". In: S&P'11. 2011.

Dag Arne Osvik et al. "Cache Attacks and Countermeasures: the Case of AES". In: CT-RSA 2006. 2006.

Colin Percival. "Cache missing for fun and profit". In: Proceedings of BSDCan. 2005.

Fangfei Liu et al. "Last-Level Cache Side-Channel Attacks are Practical". In: S&P'15. 2015.



Step 1: Attacker maps shared library (shared memory, in cache)



Step 1: Attacker maps shared library (shared memory, in cache)



Step 1: Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line



Step 1: Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line

Step 3: Victim loads the data



Step 1: Attacker maps shared library (shared memory, in cache)

Step 2: Attacker flushes the shared cache line

Step 3: Victim loads the data

Step 4: Attacker reloads the data

Pros

#### fine granularity: 1 line

### Cons

#### restrictive

 needs clflush instruction (not available e.g., on ARM-v7)

2. needs shared memory

#### Shared library $\rightarrow$ shared in physical memory







































What if there is **no shared memory**?

# What if there is **no shared memory**?

There is no memory deduplication, e.g., on Amazon EC2











- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2



- inclusive LLC: superset of L1 and L2
- data evicted from the LLC is also evicted from L1 and L2
- a core can evict lines in the private L1 of another core

| <br> |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| <br> | <br> |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| <br> | <br> |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|      |      | Image: second |

Victim address space

Cache

Attacker address space



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

Step 2: Victim evicts cache lines while running



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

Step 2: Victim evicts cache lines while running



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

**Step 2:** Victim evicts cache lines while running

Step 3: Attacker probes data to determine if set has been accessed



Step 1: Attacker primes, *i.e.*, fills, the cache (no shared memory)

**Step 2:** Victim evicts cache lines while running

Step 3: Attacker probes data to determine if set has been accessed

#### Pros

Cons

less restrictive

- 1. no need for **clflush**
- 2. no need for shared memory
- $\rightarrow$  possible from JavaScript

coarser granularity: 1 set

We need to evict caches lines without **clflush** or shared memory:

- 1. which addresses do we access to have congruent cache lines?
- 2. without any privilege?
- 3. and in which order do we access them?

We need:

- 1. an eviction set: addresses in the same set, in the same slice (issue #1 and #2)
- 2. an eviction strategy (issue #3)

• we need addresses that have the same set index: how do we do that?

• we need addresses that have the same set index: how do we do that?



- we need addresses that have the same set index: how do we do that?
- $\cdot$  we want to target the L3 for cross-core attacks
- L3 for a 2-core CPU: 4096 sets, 64B-lines, 12 or 16 ways
- how many bits for the set index?



- we need addresses that have the same set index: how do we do that?
- $\cdot$  we want to target the L3 for cross-core attacks
- L3 for a 2-core CPU: 4096 sets, 64B-lines, 12 or 16 ways
- how many bits for the set index?
- hint hint:  $4096 = 2^{12}$



• L3 is physically indexed

 $\rightarrow$  we need to choose addresses with fixed physical address bits

• issue #1: address translation from virtual to physical is privileged

• L3 is physically indexed

 $\rightarrow$  we need to choose addresses with fixed physical address bits

- issue #1: address translation from virtual to physical is privileged
- reminder: page offset stays the same from virtual to physical address
- $\cdot\,$  typical page size: 4KB  $\rightarrow$  12 bits of page offset
- issue #2: set index bits are not included in the 12 LSB of the address

- $\cdot\,$  we also have 2MB "huge pages"  $\rightarrow$  21 bits of page offset
- $\cdot$  set index bits are included in the 21 LSB of the address



## We have one more problem

• L3 is divided in slices, as many slices as cores

## We have one more problem

- L3 is divided in slices, as many slices as cores
- $\cdot$  I lied to you

## We have one more problem

- L3 is divided in slices, as many slices as cores
- $\cdot$  I lied to you
- $\cdot\,$  we always have 2048 sets per slice  $\rightarrow$  actually 11 bits for the set index
- $\cdot$  but we need to know the slice number
- hash function takes all bits as input, including physical page number bits  $\rightarrow$  outside the known bits from page offset



- $\cdot\,$  last-level cache  $\rightarrow$  as many slices as cores
- undocumented hash function that maps a physical address to a slice
- designed for performance



Undocumented function  $\rightarrow$  impossible to target the same set in the same slice?



Clémentine Maurice et al. "Reverse Engineering Intel Complex Addressing Using Performance Counters". In: RAID'15. 2015.

Undocumented function  $\rightarrow$  impossible to target the same set in the same slice?



#### We reverse-engineered this function!

### 3 functions, depending on the number of cores

|         |    | Address bit |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |   |          |          |          |
|---------|----|-------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|---|----------|----------|----------|
|         |    | 3           | 3        | 3        | 3        | 3        | 3        | 3        | 3        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 2        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 1        | 0 | 0        | 0        | 0        |
|         |    | 7           | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9        | 8        | 7        | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9        | 8        | 7        | 6        | 5        | 4        | 3        | 2        | 1        | 0        | 9 | 8        | 7        | 6        |
| 2 cores | 00 |             |          |          |          |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
| 4 cores | 00 |             |          |          |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
| 4 cores | 01 |             |          |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          |   |          | $\oplus$ |          |
|         | 00 |             | $\oplus$ | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ |          | $\oplus$ |   |          |          | $\oplus$ |
| 8 cores | 01 | $\oplus$    |          | $\oplus$ | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ |          | $\oplus$ |          | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ | $\oplus$ |          |   |          | $\oplus$ |          |
|         | 02 | $\oplus$    | $\oplus$ | $\oplus$ | $\oplus$ |          |          | $\oplus$ |          |          | $\oplus$ |          |          | $\oplus$ | $\oplus$ |          |          |   | $\oplus$ |          |          |

Function valid for Sandy Bridge, Ivy Bridge, Haswell, Broadwell

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted
- 5. select a random address s from S and remove it

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted
- 5. select a random address s from S and remove it
- 6. iteratively access all elements of  $S \setminus s$

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted
- 5. select a random address s from S and remove it
- 6. iteratively access all elements of  $S \setminus s$
- 7. measure  $t_2$ , the time it takes to access  $x \rightarrow$  is it evicted?

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted
- 5. select a random address s from S and remove it
- 6. iteratively access all elements of  $S \setminus s$
- 7. measure  $t_2$ , the time it takes to access  $x \rightarrow$  is it evicted?
  - + if not  $\rightarrow$  s is part of the same set as  $x \rightarrow$  place it back into S

- 1. construct S set of addresses with the same set index
- 2. access reference address  $x \in S$  (to load it in cache)
- 3. iteratively access all elements of S
- 4. measure  $t_1$ , the time it takes to access  $x \rightarrow$  it should be evicted
- 5. select a random address s from S and remove it
- 6. iteratively access all elements of  $S \setminus s$
- 7. measure  $t_2$ , the time it takes to access  $x \rightarrow$  is it evicted?
  - + if not  $\rightarrow$  s is part of the same set as  $x \rightarrow$  place it back into S
  - if it was evicted  $\rightarrow$  s is not part of the same set as  $x \rightarrow$  discard s

 $\cdot$  if the function is known, we can speed up the process



• for a CPU with c cores: 16/c addresses in the same set and slice per 2MB page

### Prime+Probe: Eviction set

 $\cdot$  if the function is known, we can speed up the process



- $\cdot$  for a CPU with *c* cores: 16/*c* addresses in the same set and slice per 2MB page
- apply same algorithm with groups of addresses instead of single addresses

# We now have an eviction set! What about the eviction strategy?

- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100% $\rightarrow 75\%$  on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate < 100%  $\rightarrow$  75% on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate <100%  $\rightarrow75\%$  on Haswell



- attacker fills a set with *n* addresses for a *n*-way cache
- + if the replacement policy is LRU ightarrow access addresses from eviction set 1 by 1
- if the replacement policy is not LRU, eviction rate <100%  $\rightarrow75\%$  on Haswell





with  $a_1 \dots a_9$  in the same cache set

 $\rightarrow$  fast and effective on Haswell: eviction rate >99.97%

In practice, for Prime+Probe on recent processors we need:

- an eviction set, *i.e.*, addresses in the same slice and with the same set index  $\rightarrow$  depends on the addressing
- an eviction strategy, *i.e.*, the order with which we access the eviction set  $\rightarrow$  depends on the replacement policy

### **Uncore IPC Features**

#### AFP – Adaptive Fill Policy

Cache heuristics to identify and segregate streaming applications

#### QLRU – Quad-Age LRU algorithm

- Allows fine-grain "age assignment" on cache allocation
- E.g.: prefetched requests are allocated at "middle age"

#### DPT – Dynamic Prefetch Throttling

- Real-time memory bandwidth monitor
- Directs core prefetchers to reduce prefetch aggressiveness during high memory load scenarios

#### Channel Hashing -- DRAM channel selection mechanism

- Allows channel selection to be made based on multiple address bits
- Historically, it had been "A[6]"
- Allows more even distribution of memory accesses across channels



Conclusion

To perform a side-channel attack on some software you need both:

- $\cdot$  shared and vulnerable hardware
  - no side channel if **every** memory access takes the same time
  - $\cdot$  or if you cannot share the hardware component
- $\cdot$  a vulnerable implementation
  - + vulnerable implementation  $\neq$  vulnerable algorithm
  - $\cdot\,$  we can attack specific implementations of AES and RSA
  - $\cdot\,$  does not mean that AES and RSA are broken
  - $\rightarrow$  not all implementations are created equal
- $\rightarrow$  hardware will most likely stay vulnerable
- $\rightarrow\,$  patch implementations when you can

Constant time is not enough...

Constant time is not enough...

Because an attacker can modify the internal state of the micro-architecture

## Questions?

## Step-by-step attack

- we need:
  - a machine running on Linux (not virtualized)
  - an Intel CPU

- we need:
  - a machine running on Linux (not virtualized)
  - an Intel CPU
- I will demonstrate the steps on my machine but everything is ready so that you can try on yours during this session
- $\cdot$  find a lab partner if you don't have the right setup

• clone the repository:

git clone https://github.com/clementine-m/cache\_template\_attacks.git

- $\cdot$  three folders
  - 1. calibration
  - 2. profiling
  - 3. exploitation

• clone the repository:

git clone https://github.com/clementine-m/cache\_template\_attacks.git

- $\cdot$  three folders
  - 1. calibration
  - 2. profiling
  - 3. exploitation
- note: if you insist on using Windows, you can find some tools in the original git repository https://github.com/IAIK/cache\_template\_attacks, but | don't provide any Windows assistance :)

## #1. Calibration

 $\rightarrow$  Learn timing of different corner cases

cd calibration
make
./calibration

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a histogram!
- 4. find a threshold to distinguish the two cases

Loop:

- 1. measure time
- 2. access variable (always cache hit)
- 3. measure time
- 4. update histogram with delta

Loop:

- 1. flush variable (clflush instruction)
- 2. measure time
- 3. access variable (always cache miss)
- 4. measure time
- 5. update histogram with delta

#### cache hits cache misses



- $\cdot$  as high as possible
- most cache hits are below
- no cache miss below



**Cache hits cache misses** 

## #2. Profiling

Open gedit

(Very) ugly one-liner, from the README of the repository

\$ cat /proc/`ps -A | grep gedit | grep -oE "^[0-9]+"`/maps |
grep r-x | grep libgedit

Open gedit

(Very) ugly one-liner, from the README of the repository

\$ cat /proc/`ps -A | grep gedit | grep -oE "^[0-9]+"`/maps |
grep r-x | grep libgedit

If you cannot copy paste ;)

#### Resulting line (memory range, access rights, offset, -, -, file name)

7f6e681ea000-7f6e682c3000 r-xp 00000000 fd:01 6423718 /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so

# \$ cd ../profiling

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

## \$ cd ../profiling

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

#### \$ make \$ sleep 3; ./profiling 200 7f6e681ea000-7f6e682c3000 r-xp 00000000 fd:01 6423718 /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so

... And hold down key in the targeted program

# Profiling (a tiny bit faster)

You are probably not seeing a lot of cache hits

We are searching for hits from offset 0 of the library  $\rightarrow$  nothing handles keystrokes there

We are searching for hits from offset 0 of the library

ightarrow nothing handles keystrokes there

Normally, run the template attack on the whole library but takes a while

- We are searching for hits from offset 0 of the library
- ightarrow nothing handles keystrokes there

Normally, run the template attack on the whole library but takes a while Let's start from a different offset, skipping all non executable parts

\$ sleep 3; ./profiling 200 7f6e681ea000-7f6e682c3000 r-xp
20000 fd:01 6423718 /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so

- We are searching for hits from offset 0 of the library
- ightarrow nothing handles keystrokes there

Normally, run the template attack on the whole library but takes a while Let's start from a different offset, skipping all non executable parts

\$ sleep 3; ./profiling 200 7f6e681ea000-7f6e682c3000 r-xp
20000 fd:01 6423718 /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so

Save offsets with many cache hits!

- We are searching for hits from offset 0 of the library
- ightarrow nothing handles keystrokes there

Normally, run the template attack on the whole library but takes a while Let's start from a different offset, skipping all non executable parts

\$ sleep 3; ./profiling 200 7f6e681ea000-7f6e682c3000 r-xp
20000 fd:01 6423718 /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so

Save offsets with many cache hits!

Ideally, start the profiling without triggering any event to eliminate false positives

#### Output

| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20e40, | 15 |
|---------------------------------------------------------|----------|----|
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20e80, | 27 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20ec0, | 7  |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20f00, | 10 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20f40, | 16 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20f80, | 13 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x20fc0, | 10 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x21000, | 18 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x21040, | 15 |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x21080, | 3  |
| <pre>/usr/lib/x86_64-linux-gnu/gedit/libgedit.so,</pre> | 0x210c0, | 1  |

# #3. Exploitation

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make

\$ ./spy <file> <offset>

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make

\$ ./spy <file> <offset>

Let's try some offset:

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make
\$ ./spy <file> <offset>

Let's try some offset: lots of cache hits for 0x20c40!!!

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x20c40

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make
\$ ./spy <file> <offset>

Let's try some offset: lots of cache hits for 0x20c40!!!

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x20c40

A cache hit each time the cursor blinks.

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make
\$ ./spy <file> <offset>

Let's try some offset: lots of cache hits for 0x20c40!!!

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x20c40

A cache hit each time the cursor blinks. Not what we want.

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make
\$ ./spy <file> <offset>

Let's try some offset: lots of cache hits for 0x20c40!!!

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x20c40

A cache hit each time the cursor blinks. Not what we want. Let's try another one

Change value of **#define MIN\_CACHE\_MISS\_CYCLES** to your threshold

\$ make
\$ ./spy <file> <offset>

Let's try some offset: lots of cache hits for 0x20c40!!!

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x20c40

A cache hit each time the cursor blinks. Not what we want. Let's try another one

./spy /usr/lib/x86\_64-linux-gnu/gedit/libgedit.so 0x24440

We have more than one cache hit per keystroke, in a very short time.

8588659923476: Cache Hit (167 cycles) after a pause of 1381237 cycles 8588660655587: Cache Hit (158 cycles) after a pause of 182 cycles 8588662014696: Cache Hit (142 cycles) after a pause of 388 cycles 8592435140102: Cache Hit (139 cycles) after a pause of 1254280 cycles 8592435663328: Cache Hit (152 cycles) after a pause of 120 cycles 8592436855980: Cache Hit (161 cycles) after a pause of 322 cycles 8595876762459: Cache Hit (206 cycles) after a pause of 1133098 cycles 8595877338658: Cache Hit (155 cycles) after a pause of 139 cycles 8595877386776: Cache Hit (155 cvcles) after a pause of 9 cvcles 8595877512170: Cache Hit (112 cycles) after a pause of 30 cycles 8595877736734: Cache Hit (152 cycles) after a pause of 57 cycles 8595878749423: Cache Hit (145 cvcles) after a pause of 273 cvcles 8599529228024: Cache Hit (152 cycles) after a pause of 1217393 cycles 8599529824018: Cache Hit (173 cycles) after a pause of 145 cycles 8599530032220: Cache Hit (142 cycles) after a pause of 48 cycles 8599531215638: Cache Hit (145 cycles) after a pause of 334 cycles

• have a look at the flushandreload(void\* addr) function in spy.c

- have a look at the flushandreload(void\* addr) function in spy.c
- · if (kpause > 0)  $\rightarrow$  modify threshold and recompile

- have a look at the flushandreload(void\* addr) function in spy.c
- $\cdot$  if (kpause > 0)  $\rightarrow$  modify threshold and recompile
- no false positives with (kpause > 10000)

#### Going further



- we can now obtain precise timing for keystrokes
- you can also build a complete matrix for each keystroke to identify key groups

#### Going further



- we can now obtain precise timing for keystrokes
- you can also build a complete matrix for each keystroke to identify key groups
- you may want to automate event triggering :)

# Introduction to micro-architectural attacks

Clémentine Maurice, CNRS, IRISA April 30, 2019—Ben Gurion University, Israel Some slides are inspired by Onur Mutlu's lectures on Computer Architecture https://people.inf.ethz.ch/omutlu/lecture-videos.html

#### References i

David Gullasch, Endre Bangerter, and Stephan Krenn. "Cache Games – Bringing Access-Based Cache Attacks on AES to Practice". In: *S&P'11*. 2011.

Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B. Lee. "Last-Level Cache Side-Channel Attacks are Practical". In: *S&P'15.* 2015.

Clémentine Maurice, Nicolas Le Scouarnec, Christoph Neumann, Olivier Heen, and Aurélien Francillon. "Reverse Engineering Intel Complex Addressing Using Performance Counters". In: *RAID'15*. 2015.

Dag Arne Osvik, Adi Shamir, and Eran Tromer. "Cache Attacks and Countermeasures: the Case of AES". In: *CT-RSA 2006*. 2006.

Colin Percival. "Cache missing for fun and profit". In: Proceedings of BSDCan. 2005.

Yuval Yarom and Katrina Falkner. "Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack". In: USENIX Security Symposium. 2014.