100 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			100 lines
		
	
	
		
			4.1 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _mmu_notifier:
 | |
| 
 | |
| When do you need to notify inside page table lock ?
 | |
| ===================================================
 | |
| 
 | |
| When clearing a pte/pmd we are given a choice to notify the event through
 | |
| (notify version of \*_clear_flush call mmu_notifier_invalidate_range) under
 | |
| the page table lock. But that notification is not necessary in all cases.
 | |
| 
 | |
| For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use
 | |
| thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a
 | |
| process virtual address space). There is only 2 cases when you need to notify
 | |
| those secondary TLB while holding page table lock when clearing a pte/pmd:
 | |
| 
 | |
|   A) page backing address is free before mmu_notifier_invalidate_range_end()
 | |
|   B) a page table entry is updated to point to a new page (COW, write fault
 | |
|      on zero page, __replace_page(), ...)
 | |
| 
 | |
| Case A is obvious you do not want to take the risk for the device to write to
 | |
| a page that might now be used by some completely different task.
 | |
| 
 | |
| Case B is more subtle. For correctness it requires the following sequence to
 | |
| happen:
 | |
| 
 | |
|   - take page table lock
 | |
|   - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify())
 | |
|   - set page table entry to point to new page
 | |
| 
 | |
| If clearing the page table entry is not followed by a notify before setting
 | |
| the new pte/pmd value then you can break memory model like C11 or C++11 for
 | |
| the device.
 | |
| 
 | |
| Consider the following scenario (device use a feature similar to ATS/PASID):
 | |
| 
 | |
| Two address addrA and addrB such that \|addrA - addrB\| >= PAGE_SIZE we assume
 | |
| they are write protected for COW (other case of B apply too).
 | |
| 
 | |
| ::
 | |
| 
 | |
|  [Time N] --------------------------------------------------------------------
 | |
|  CPU-thread-0  {try to write to addrA}
 | |
|  CPU-thread-1  {try to write to addrB}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {read addrA and populate device TLB}
 | |
|  DEV-thread-2  {read addrB and populate device TLB}
 | |
|  [Time N+1] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
 | |
|  CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {}
 | |
|  DEV-thread-2  {}
 | |
|  [Time N+2] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {COW_step1: {update page table to point to new page for addrA}}
 | |
|  CPU-thread-1  {COW_step1: {update page table to point to new page for addrB}}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {}
 | |
|  DEV-thread-2  {}
 | |
|  [Time N+3] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {preempted}
 | |
|  CPU-thread-1  {preempted}
 | |
|  CPU-thread-2  {write to addrA which is a write to new page}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {}
 | |
|  DEV-thread-2  {}
 | |
|  [Time N+3] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {preempted}
 | |
|  CPU-thread-1  {preempted}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {write to addrB which is a write to new page}
 | |
|  DEV-thread-0  {}
 | |
|  DEV-thread-2  {}
 | |
|  [Time N+4] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {preempted}
 | |
|  CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {}
 | |
|  DEV-thread-2  {}
 | |
|  [Time N+5] ------------------------------------------------------------------
 | |
|  CPU-thread-0  {preempted}
 | |
|  CPU-thread-1  {}
 | |
|  CPU-thread-2  {}
 | |
|  CPU-thread-3  {}
 | |
|  DEV-thread-0  {read addrA from old page}
 | |
|  DEV-thread-2  {read addrB from new page}
 | |
| 
 | |
| So here because at time N+2 the clear page table entry was not pair with a
 | |
| notification to invalidate the secondary TLB, the device see the new value for
 | |
| addrB before seing the new value for addrA. This break total memory ordering
 | |
| for the device.
 | |
| 
 | |
| When changing a pte to write protect or to point to a new write protected page
 | |
| with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range
 | |
| call to mmu_notifier_invalidate_range_end() outside the page table lock. This
 | |
| is true even if the thread doing the page table update is preempted right after
 | |
| releasing page table lock but before call mmu_notifier_invalidate_range_end().
 | 
