[deliverable/linux.git] / Documentation / vm / split_page_table_lock

Split page table lock
=====================

Originally, mm->page_table_lock spinlock protected all page tables of the
mm_struct. But this approach leads to poor page fault scalability of
multi-threaded applications due high contention on the lock. To improve
scalability, split page table lock was introduced.

With split page table lock we have separate per-table lock to serialize
access to the table. At the moment we use split lock for PTE and PMD
tables. Access to higher level tables protected by mm->page_table_lock.

There are helpers to lock/unlock a table and other accessor functions:
 - pte_offset_map_lock()
	maps pte and takes PTE table lock, returns pointer to the taken
	lock;
 - pte_unmap_unlock()
	unlocks and unmaps PTE table;
 - pte_alloc_map_lock()
	allocates PTE table if needed and take the lock, returns pointer
	to taken lock or NULL if allocation failed;
 - pte_lockptr()
	returns pointer to PTE table lock;
 - pmd_lock()
	takes PMD table lock, returns pointer to taken lock;
 - pmd_lockptr()
	returns pointer to PMD table lock;

Split page table lock for PTE tables is enabled compile-time if
CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
If split lock is disabled, all tables guaded by mm->page_table_lock.

Split page table lock for PMD tables is enabled, if it's enabled for PTE
tables and the architecture supports it (see below).

Hugetlb and split page table lock
---------------------------------

Hugetlb can support several page sizes. We use split lock only for PMD
level, but not for PUD.

Hugetlb-specific helpers:
 - huge_pte_lock()
	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
	otherwise;
 - huge_pte_lockptr()
	returns pointer to table lock;

Support of split page table lock by an architecture
---------------------------------------------------

There's no need in special enabling of PTE split page table lock:
everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
which must be called on PTE table allocation / freeing.

Make sure the architecture doesn't use slab allocator for page table
allocation: slab uses page->slab_cache for its pages.
This field shares storage with page->ptl.

PMD split lock only makes sense if you have more than two page table
levels.

PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
allocation and pgtable_pmd_page_dtor() on freeing.

Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().

With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.

NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
be handled properly.

page->ptl
---------

page->ptl is used to access split page table lock, where 'page' is struct
page of page containing the table. It shares storage with page->private
(and few other fields in union).

To avoid increasing size of struct page and have best performance, we use a
trick:
 - if spinlock_t fits into long, we use page->ptr as spinlock, so we
   can avoid indirect access and save a cache line.
 - if size of spinlock_t is bigger then size of long, we use page->ptl as
   pointer to spinlock_t and allocate it dynamically. This allows to use
   split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
   one more cache line for indirect access;

The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
pgtable_pmd_page_ctor() for PMD table.

Please, never access page->ptl directly -- use appropriate helper.
Commit	Line	Data
49076ec2 KS	1	Split page table lock
	2	=====================
	3
	4	Originally, mm->page_table_lock spinlock protected all page tables of the
	5	mm_struct. But this approach leads to poor page fault scalability of
	6	multi-threaded applications due high contention on the lock. To improve
	7	scalability, split page table lock was introduced.
	8
	9	With split page table lock we have separate per-table lock to serialize
	10	access to the table. At the moment we use split lock for PTE and PMD
	11	tables. Access to higher level tables protected by mm->page_table_lock.
	12
	13	There are helpers to lock/unlock a table and other accessor functions:
	14	- pte_offset_map_lock()
	15	maps pte and takes PTE table lock, returns pointer to the taken
	16	lock;
	17	- pte_unmap_unlock()
	18	unlocks and unmaps PTE table;
	19	- pte_alloc_map_lock()
	20	allocates PTE table if needed and take the lock, returns pointer
	21	to taken lock or NULL if allocation failed;
	22	- pte_lockptr()
	23	returns pointer to PTE table lock;
	24	- pmd_lock()
	25	takes PMD table lock, returns pointer to taken lock;
	26	- pmd_lockptr()
	27	returns pointer to PMD table lock;
	28
	29	Split page table lock for PTE tables is enabled compile-time if
	30	CONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.
	31	If split lock is disabled, all tables guaded by mm->page_table_lock.
	32
	33	Split page table lock for PMD tables is enabled, if it's enabled for PTE
	34	tables and the architecture supports it (see below).
	35
	36	Hugetlb and split page table lock
	37	---------------------------------
	38
	39	Hugetlb can support several page sizes. We use split lock only for PMD
	40	level, but not for PUD.
	41
	42	Hugetlb-specific helpers:
	43	- huge_pte_lock()
	44	takes pmd split lock for PMD_SIZE page, mm->page_table_lock
	45	otherwise;
	46	- huge_pte_lockptr()
	47	returns pointer to table lock;
	48
	49	Support of split page table lock by an architecture
	50	---------------------------------------------------
	51
	52	There's no need in special enabling of PTE split page table lock:
	53	everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
	54	which must be called on PTE table allocation / freeing.
	55
	56	Make sure the architecture doesn't use slab allocator for page table
1d798ca3 KS	57	allocation: slab uses page->slab_cache for its pages.
1d798ca3 KS	58	This field shares storage with page->ptl.
49076ec2 KS	59
	60	PMD split lock only makes sense if you have more than two page table
	61	levels.
	62
	63	PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD table
	64	allocation and pgtable_pmd_page_dtor() on freeing.
	65
c283610e KS	66	Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() and
	67	pmd_free_tlb(), but make sure you cover all PMD table allocation / freeing
	68	paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
49076ec2 KS	69
	70	With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
	71
	72	NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
	73	be handled properly.
	74
	75	page->ptl
	76	---------
	77
	78	page->ptl is used to access split page table lock, where 'page' is struct
	79	page of page containing the table. It shares storage with page->private
	80	(and few other fields in union).
	81
	82	To avoid increasing size of struct page and have best performance, we use a
	83	trick:
	84	- if spinlock_t fits into long, we use page->ptr as spinlock, so we
	85	can avoid indirect access and save a cache line.
	86	- if size of spinlock_t is bigger then size of long, we use page->ptl as
	87	pointer to spinlock_t and allocate it dynamically. This allows to use
	88	split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
	89	one more cache line for indirect access;
	90
	91	The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
	92	pgtable_pmd_page_ctor() for PMD table.
	93
	94	Please, never access page->ptl directly -- use appropriate helper.