194 lines
		
	
	
		
			8.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			194 lines
		
	
	
		
			8.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| Microarchitectural Data Sampling (MDS) mitigation
 | |
| =================================================
 | |
| 
 | |
| .. _mds:
 | |
| 
 | |
| Overview
 | |
| --------
 | |
| 
 | |
| Microarchitectural Data Sampling (MDS) is a family of side channel attacks
 | |
| on internal buffers in Intel CPUs. The variants are:
 | |
| 
 | |
|  - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126)
 | |
|  - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130)
 | |
|  - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127)
 | |
|  - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091)
 | |
| 
 | |
| MSBDS leaks Store Buffer Entries which can be speculatively forwarded to a
 | |
| dependent load (store-to-load forwarding) as an optimization. The forward
 | |
| can also happen to a faulting or assisting load operation for a different
 | |
| memory address, which can be exploited under certain conditions. Store
 | |
| buffers are partitioned between Hyper-Threads so cross thread forwarding is
 | |
| not possible. But if a thread enters or exits a sleep state the store
 | |
| buffer is repartitioned which can expose data from one thread to the other.
 | |
| 
 | |
| MFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage
 | |
| L1 miss situations and to hold data which is returned or sent in response
 | |
| to a memory or I/O operation. Fill buffers can forward data to a load
 | |
| operation and also write data to the cache. When the fill buffer is
 | |
| deallocated it can retain the stale data of the preceding operations which
 | |
| can then be forwarded to a faulting or assisting load operation, which can
 | |
| be exploited under certain conditions. Fill buffers are shared between
 | |
| Hyper-Threads so cross thread leakage is possible.
 | |
| 
 | |
| MLPDS leaks Load Port Data. Load ports are used to perform load operations
 | |
| from memory or I/O. The received data is then forwarded to the register
 | |
| file or a subsequent operation. In some implementations the Load Port can
 | |
| contain stale data from a previous operation which can be forwarded to
 | |
| faulting or assisting loads under certain conditions, which again can be
 | |
| exploited eventually. Load ports are shared between Hyper-Threads so cross
 | |
| thread leakage is possible.
 | |
| 
 | |
| MDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from
 | |
| memory that takes a fault or assist can leave data in a microarchitectural
 | |
| structure that may later be observed using one of the same methods used by
 | |
| MSBDS, MFBDS or MLPDS.
 | |
| 
 | |
| Exposure assumptions
 | |
| --------------------
 | |
| 
 | |
| It is assumed that attack code resides in user space or in a guest with one
 | |
| exception. The rationale behind this assumption is that the code construct
 | |
| needed for exploiting MDS requires:
 | |
| 
 | |
|  - to control the load to trigger a fault or assist
 | |
| 
 | |
|  - to have a disclosure gadget which exposes the speculatively accessed
 | |
|    data for consumption through a side channel.
 | |
| 
 | |
|  - to control the pointer through which the disclosure gadget exposes the
 | |
|    data
 | |
| 
 | |
| The existence of such a construct in the kernel cannot be excluded with
 | |
| 100% certainty, but the complexity involved makes it extremly unlikely.
 | |
| 
 | |
| There is one exception, which is untrusted BPF. The functionality of
 | |
| untrusted BPF is limited, but it needs to be thoroughly investigated
 | |
| whether it can be used to create such a construct.
 | |
| 
 | |
| 
 | |
| Mitigation strategy
 | |
| -------------------
 | |
| 
 | |
| All variants have the same mitigation strategy at least for the single CPU
 | |
| thread case (SMT off): Force the CPU to clear the affected buffers.
 | |
| 
 | |
| This is achieved by using the otherwise unused and obsolete VERW
 | |
| instruction in combination with a microcode update. The microcode clears
 | |
| the affected CPU buffers when the VERW instruction is executed.
 | |
| 
 | |
| For virtualization there are two ways to achieve CPU buffer
 | |
| clearing. Either the modified VERW instruction or via the L1D Flush
 | |
| command. The latter is issued when L1TF mitigation is enabled so the extra
 | |
| VERW can be avoided. If the CPU is not affected by L1TF then VERW needs to
 | |
| be issued.
 | |
| 
 | |
| If the VERW instruction with the supplied segment selector argument is
 | |
| executed on a CPU without the microcode update there is no side effect
 | |
| other than a small number of pointlessly wasted CPU cycles.
 | |
| 
 | |
| This does not protect against cross Hyper-Thread attacks except for MSBDS
 | |
| which is only exploitable cross Hyper-thread when one of the Hyper-Threads
 | |
| enters a C-state.
 | |
| 
 | |
| The kernel provides a function to invoke the buffer clearing:
 | |
| 
 | |
|     mds_clear_cpu_buffers()
 | |
| 
 | |
| The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
 | |
| (idle) transitions.
 | |
| 
 | |
| As a special quirk to address virtualization scenarios where the host has
 | |
| the microcode updated, but the hypervisor does not (yet) expose the
 | |
| MD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the
 | |
| hope that it might actually clear the buffers. The state is reflected
 | |
| accordingly.
 | |
| 
 | |
| According to current knowledge additional mitigations inside the kernel
 | |
| itself are not required because the necessary gadgets to expose the leaked
 | |
| data cannot be controlled in a way which allows exploitation from malicious
 | |
| user space or VM guests.
 | |
| 
 | |
| Kernel internal mitigation modes
 | |
| --------------------------------
 | |
| 
 | |
|  ======= ============================================================
 | |
|  off      Mitigation is disabled. Either the CPU is not affected or
 | |
|           mds=off is supplied on the kernel command line
 | |
| 
 | |
|  full     Mitigation is enabled. CPU is affected and MD_CLEAR is
 | |
|           advertised in CPUID.
 | |
| 
 | |
|  vmwerv	  Mitigation is enabled. CPU is affected and MD_CLEAR is not
 | |
| 	  advertised in CPUID. That is mainly for virtualization
 | |
| 	  scenarios where the host has the updated microcode but the
 | |
| 	  hypervisor does not expose MD_CLEAR in CPUID. It's a best
 | |
| 	  effort approach without guarantee.
 | |
|  ======= ============================================================
 | |
| 
 | |
| If the CPU is affected and mds=off is not supplied on the kernel command
 | |
| line then the kernel selects the appropriate mitigation mode depending on
 | |
| the availability of the MD_CLEAR CPUID bit.
 | |
| 
 | |
| Mitigation points
 | |
| -----------------
 | |
| 
 | |
| 1. Return to user space
 | |
| ^^^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
|    When transitioning from kernel to user space the CPU buffers are flushed
 | |
|    on affected CPUs when the mitigation is not disabled on the kernel
 | |
|    command line. The migitation is enabled through the static key
 | |
|    mds_user_clear.
 | |
| 
 | |
|    The mitigation is invoked in prepare_exit_to_usermode() which covers
 | |
|    all but one of the kernel to user space transitions.  The exception
 | |
|    is when we return from a Non Maskable Interrupt (NMI), which is
 | |
|    handled directly in do_nmi().
 | |
| 
 | |
|    (The reason that NMI is special is that prepare_exit_to_usermode() can
 | |
|     enable IRQs.  In NMI context, NMIs are blocked, and we don't want to
 | |
|     enable IRQs with NMIs blocked.)
 | |
| 
 | |
| 
 | |
| 2. C-State transition
 | |
| ^^^^^^^^^^^^^^^^^^^^^
 | |
| 
 | |
|    When a CPU goes idle and enters a C-State the CPU buffers need to be
 | |
|    cleared on affected CPUs when SMT is active. This addresses the
 | |
|    repartitioning of the store buffer when one of the Hyper-Threads enters
 | |
|    a C-State.
 | |
| 
 | |
|    When SMT is inactive, i.e. either the CPU does not support it or all
 | |
|    sibling threads are offline CPU buffer clearing is not required.
 | |
| 
 | |
|    The idle clearing is enabled on CPUs which are only affected by MSBDS
 | |
|    and not by any other MDS variant. The other MDS variants cannot be
 | |
|    protected against cross Hyper-Thread attacks because the Fill Buffer and
 | |
|    the Load Ports are shared. So on CPUs affected by other variants, the
 | |
|    idle clearing would be a window dressing exercise and is therefore not
 | |
|    activated.
 | |
| 
 | |
|    The invocation is controlled by the static key mds_idle_clear which is
 | |
|    switched depending on the chosen mitigation mode and the SMT state of
 | |
|    the system.
 | |
| 
 | |
|    The buffer clear is only invoked before entering the C-State to prevent
 | |
|    that stale data from the idling CPU from spilling to the Hyper-Thread
 | |
|    sibling after the store buffer got repartitioned and all entries are
 | |
|    available to the non idle sibling.
 | |
| 
 | |
|    When coming out of idle the store buffer is partitioned again so each
 | |
|    sibling has half of it available. The back from idle CPU could be then
 | |
|    speculatively exposed to contents of the sibling. The buffers are
 | |
|    flushed either on exit to user space or on VMENTER so malicious code
 | |
|    in user space or the guest cannot speculatively access them.
 | |
| 
 | |
|    The mitigation is hooked into all variants of halt()/mwait(), but does
 | |
|    not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver
 | |
|    has been superseded by the intel_idle driver around 2010 and is
 | |
|    preferred on all affected CPUs which are expected to gain the MD_CLEAR
 | |
|    functionality in microcode. Aside of that the IO-Port mechanism is a
 | |
|    legacy interface which is only used on older systems which are either
 | |
|    not affected or do not receive microcode updates anymore.
 | 
