[deliverable/linux.git] / Documentation / RCU / arrayRCU.txt

Using RCU to Protect Read-Mostly Arrays


Although RCU is more commonly used to protect linked lists, it can
also be used to protect arrays.  Three situations are as follows:

1.  Hash Tables

2.  Static Arrays

3.  Resizeable Arrays

Each of these situations are discussed below.


Situation 1: Hash Tables

Hash tables are often implemented as an array, where each array entry
has a linked-list hash chain.  Each hash chain can be protected by RCU
as described in the listRCU.txt document.  This approach also applies
to other array-of-list situations, such as radix trees.


Situation 2: Static Arrays

Static arrays, where the data (rather than a pointer to the data) is
located in each array element, and where the array is never resized,
have not been used with RCU.  Rik van Riel recommends using seqlock in
this situation, which would also have minimal read-side overhead as long
as updates are rare.

Quick Quiz:  Why is it so important that updates be rare when
	     using seqlock?


Situation 3: Resizeable Arrays

Use of RCU for resizeable arrays is demonstrated by the grow_ary()
function used by the System V IPC code.  The array is used to map from
semaphore, message-queue, and shared-memory IDs to the data structure
that represents the corresponding IPC construct.  The grow_ary()
function does not acquire any locks; instead its caller must hold the
ids->sem semaphore.

The grow_ary() function, shown below, does some limit checks, allocates a
new ipc_id_ary, copies the old to the new portion of the new, initializes
the remainder of the new, updates the ids->entries pointer to point to
the new array, and invokes ipc_rcu_putref() to free up the old array.
Note that rcu_assign_pointer() is used to update the ids->entries pointer,
which includes any memory barriers required on whatever architecture
you are running on.

	static int grow_ary(struct ipc_ids* ids, int newsize)
	{
		struct ipc_id_ary* new;
		struct ipc_id_ary* old;
		int i;
		int size = ids->entries->size;

		if(newsize > IPCMNI)
			newsize = IPCMNI;
		if(newsize <= size)
			return newsize;

		new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm *)*newsize +
				    sizeof(struct ipc_id_ary));
		if(new == NULL)
			return size;
		new->size = newsize;
		memcpy(new->p, ids->entries->p,
		       sizeof(struct kern_ipc_perm *)*size +
		       sizeof(struct ipc_id_ary));
		for(i=size;i<newsize;i++) {
			new->p[i] = NULL;
		}
		old = ids->entries;

		/*
		 * Use rcu_assign_pointer() to make sure the memcpyed
		 * contents of the new array are visible before the new
		 * array becomes visible.
		 */
		rcu_assign_pointer(ids->entries, new);

		ipc_rcu_putref(old);
		return newsize;
	}

The ipc_rcu_putref() function decrements the array's reference count
and then, if the reference count has dropped to zero, uses call_rcu()
to free the array after a grace period has elapsed.

The array is traversed by the ipc_lock() function.  This function
indexes into the array under the protection of rcu_read_lock(),
using rcu_dereference() to pick up the pointer to the array so
that it may later safely be dereferenced -- memory barriers are
required on the Alpha CPU.  Since the size of the array is stored
with the array itself, there can be no array-size mismatches, so
a simple check suffices.  The pointer to the structure corresponding
to the desired IPC object is placed in "out", with NULL indicating
a non-existent entry.  After acquiring "out->lock", the "out->deleted"
flag indicates whether the IPC object is in the process of being
deleted, and, if not, the pointer is returned.

	struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
	{
		struct kern_ipc_perm* out;
		int lid = id % SEQ_MULTIPLIER;
		struct ipc_id_ary* entries;

		rcu_read_lock();
		entries = rcu_dereference(ids->entries);
		if(lid >= entries->size) {
			rcu_read_unlock();
			return NULL;
		}
		out = entries->p[lid];
		if(out == NULL) {
			rcu_read_unlock();
			return NULL;
		}
		spin_lock(&out->lock);

		/* ipc_rmid() may have already freed the ID while ipc_lock
		 * was spinning: here verify that the structure is still valid
		 */
		if (out->deleted) {
			spin_unlock(&out->lock);
			rcu_read_unlock();
			return NULL;
		}
		return out;
	}


Answer to Quick Quiz:

	The reason that it is important that updates be rare when
	using seqlock is that frequent updates can livelock readers.
	One way to avoid this problem is to assign a seqlock for
	each array entry rather than to the entire array.
Commit	Line	Data
1da177e4 LT	1	Using RCU to Protect Read-Mostly Arrays
	2
	3
	4	Although RCU is more commonly used to protect linked lists, it can
	5	also be used to protect arrays. Three situations are as follows:
	6
	7	1. Hash Tables
	8
	9	2. Static Arrays
	10
	11	3. Resizeable Arrays
	12
	13	Each of these situations are discussed below.
	14
	15
	16	Situation 1: Hash Tables
	17
	18	Hash tables are often implemented as an array, where each array entry
	19	has a linked-list hash chain. Each hash chain can be protected by RCU
	20	as described in the listRCU.txt document. This approach also applies
	21	to other array-of-list situations, such as radix trees.
	22
	23
	24	Situation 2: Static Arrays
	25
	26	Static arrays, where the data (rather than a pointer to the data) is
	27	located in each array element, and where the array is never resized,
	28	have not been used with RCU. Rik van Riel recommends using seqlock in
	29	this situation, which would also have minimal read-side overhead as long
	30	as updates are rare.
	31
	32	Quick Quiz: Why is it so important that updates be rare when
	33	using seqlock?
	34
	35
	36	Situation 3: Resizeable Arrays
	37
	38	Use of RCU for resizeable arrays is demonstrated by the grow_ary()
	39	function used by the System V IPC code. The array is used to map from
	40	semaphore, message-queue, and shared-memory IDs to the data structure
	41	that represents the corresponding IPC construct. The grow_ary()
	42	function does not acquire any locks; instead its caller must hold the
	43	ids->sem semaphore.
	44
	45	The grow_ary() function, shown below, does some limit checks, allocates a
	46	new ipc_id_ary, copies the old to the new portion of the new, initializes
	47	the remainder of the new, updates the ids->entries pointer to point to
	48	the new array, and invokes ipc_rcu_putref() to free up the old array.
	49	Note that rcu_assign_pointer() is used to update the ids->entries pointer,
	50	which includes any memory barriers required on whatever architecture
	51	you are running on.
	52
	53	static int grow_ary(struct ipc_ids* ids, int newsize)
	54	{
	55	struct ipc_id_ary* new;
	56	struct ipc_id_ary* old;
	57	int i;
	58	int size = ids->entries->size;
	59
	60	if(newsize > IPCMNI)
	61	newsize = IPCMNI;
	62	if(newsize <= size)
	63	return newsize;
	64
65	new = ipc_rcu_alloc(sizeof(struct kern_ipc_perm )newsize +
66	sizeof(struct ipc_id_ary));
67	if(new == NULL)
68	return size;
69	new->size = newsize;
70	memcpy(new->p, ids->entries->p,
71	sizeof(struct kern_ipc_perm )size +
72	sizeof(struct ipc_id_ary));
73	for(i=size;i<newsize;i++) {
74	new->p[i] = NULL;
75	}
76	old = ids->entries;
77
78	/*
79	* Use rcu_assign_pointer() to make sure the memcpyed
80	* contents of the new array are visible before the new
81	* array becomes visible.
82	*/
83	rcu_assign_pointer(ids->entries, new);
84
85	ipc_rcu_putref(old);
86	return newsize;
87	}
88
89	The ipc_rcu_putref() function decrements the array's reference count
90	and then, if the reference count has dropped to zero, uses call_rcu()
91	to free the array after a grace period has elapsed.
92
93	The array is traversed by the ipc_lock() function. This function
94	indexes into the array under the protection of rcu_read_lock(),
95	using rcu_dereference() to pick up the pointer to the array so
96	that it may later safely be dereferenced -- memory barriers are
97	required on the Alpha CPU. Since the size of the array is stored
98	with the array itself, there can be no array-size mismatches, so
99	a simple check suffices. The pointer to the structure corresponding
100	to the desired IPC object is placed in "out", with NULL indicating
101	a non-existent entry. After acquiring "out->lock", the "out->deleted"
102	flag indicates whether the IPC object is in the process of being
103	deleted, and, if not, the pointer is returned.
104
105	struct kern_ipc_perm* ipc_lock(struct ipc_ids* ids, int id)
106	{
107	struct kern_ipc_perm* out;
108	int lid = id % SEQ_MULTIPLIER;
109	struct ipc_id_ary* entries;
110
111	rcu_read_lock();
112	entries = rcu_dereference(ids->entries);
113	if(lid >= entries->size) {
114	rcu_read_unlock();
115	return NULL;
116	}
117	out = entries->p[lid];
118	if(out == NULL) {
119	rcu_read_unlock();
120	return NULL;
121	}
122	spin_lock(&out->lock);
123
124	/* ipc_rmid() may have already freed the ID while ipc_lock
125	* was spinning: here verify that the structure is still valid
126	*/
127	if (out->deleted) {
128	spin_unlock(&out->lock);
129	rcu_read_unlock();
130	return NULL;
131	}
132	return out;
133	}
134
135
136	Answer to Quick Quiz:
137
138	The reason that it is important that updates be rare when
139	using seqlock is that frequent updates can livelock readers.
140	One way to avoid this problem is to assign a seqlock for
141	each array entry rather than to the entire array.