[deliverable/linux.git] / Documentation / rpc-cache.txt

This document gives a brief introduction to the caching
mechanisms in the sunrpc layer that is used, in particular,
for NFS authentication.

CACHES
======
The caching replaces the old exports table and allows for
a wide variety of values to be caches.

There are a number of caches that are similar in structure though
quite possibly very different in content and use.  There is a corpus
of common code for managing these caches.

Examples of caches that are likely to be needed are:
  - mapping from IP address to client name
  - mapping from client name and filesystem to export options
  - mapping from UID to list of GIDs, to work around NFS's limitation
    of 16 gids.
  - mappings between local UID/GID and remote UID/GID for sites that
    do not have uniform uid assignment
  - mapping from network identify to public key for crypto authentication.

The common code handles such things as:
   - general cache lookup with correct locking
   - supporting 'NEGATIVE' as well as positive entries
   - allowing an EXPIRED time on cache items, and removing
     items after they expire, and are no longe in-use.

   Future code extensions are expect to handle
   - making requests to user-space to fill in cache entries
   - allowing user-space to directly set entries in the cache
   - delaying RPC requests that depend on as-yet incomplete
     cache entries, and replaying those requests when the cache entry
     is complete.
   - maintaining last-access times on cache entries
   - clean out old entries when the caches become full

The code for performing a cache lookup is also common, but in the form
of a template.  i.e. a #define.
Each cache defines a lookup function by using the DefineCacheLookup
macro, or the simpler DefineSimpleCacheLookup macro

Creating a Cache
----------------

1/ A cache needs a datum to cache.  This is in the form of a
   structure definition that must contain a
     struct cache_head
   as an element, usually the first.
   It will also contain a key and some content.
   Each cache element is reference counted and contains
   expiry and update times for use in cache management.
2/ A cache needs a "cache_detail" structure that
   describes the cache.  This stores the hash table, and some
   parameters for cache management.
3/ A cache needs a lookup function.  This is created using
   the DefineCacheLookup macro.  This lookup function is used both
   to find entries and to update entries.  The normal mode for
   updating an entry is to replace the old entry with a new
   entry.  However it is possible to allow update-in-place
   for those caches where it makes sense (no atomicity issues
   or indirect reference counting issue)
4/ A cache needs to be registered using cache_register().  This
   includes in on a list of caches that will be regularly
   cleaned to discard old data.  For this to work, some
   thread must periodically call cache_clean
   
Using a cache
-------------

To find a value in a cache, call the lookup function passing it a the
datum which contains key, and possibly content, and a flag saying
whether to update the cache with new data from the datum.   Depending
on how the cache lookup function was defined, it may take an extra
argument to identify the particular cache in question.

Except in cases of kmalloc failure, the lookup function
will return a new datum which will store the key and
may contain valid content, or may not.
This datum is typically passed to cache_check which determines the
validity of the datum and may later initiate an upcall to fill
in the data.

cache_check can be passed a "struct cache_req *".  This structure is
typically embedded in the actual request and can be used to create a
deferred copy of the request (struct cache_deferred_req).  This is
done when the found cache item is not uptodate, but the is reason to
believe that userspace might provide information soon.  When the cache
item does become valid, the deferred copy of the request will be
revisited (->revisit).  It is expected that this method will
reschedule the request for processing.


Populating a cache
------------------

Each cache has a name, and when the cache is registered, a directory
with that name is created in /proc/net/rpc

This directory contains a file called 'channel' which is a channel
for communicating between kernel and user for populating the cache.
This directory may later contain other files of interacting
with the cache.

The 'channel' works a bit like a datagram socket. Each 'write' is
passed as a whole to the cache for parsing and interpretation.
Each cache can treat the write requests differently, but it is
expected that a message written will contain:
  - a key
  - an expiry time
  - a content.
with the intention that an item in the cache with the give key
should be create or updated to have the given content, and the
expiry time should be set on that item.

Reading from a channel is a bit more interesting.  When a cache
lookup fail, or when it suceeds but finds an entry that may soon
expiry, a request is lodged for that cache item to be updated by
user-space.  These requests appear in the channel file.

Successive reads will return successive requests.
If there are no more requests to return, read will return EOF, but a
select or poll for read will block waiting for another request to be
added.

Thus a user-space helper is likely to:
  open the channel.
    select for readable
    read a request
    write a response
  loop.

If it dies and needs to be restarted, any requests that have not be
answered will still appear in the file and will be read by the new
instance of the helper.

Each cache should define a "cache_parse" method which takes a message
written from user-space and processes it.  It should return an error
(which propagates back to the write syscall) or 0.

Each cache should also define a "cache_request" method which
takes a cache item and encodes a request into the buffer
provided.


Note: If a cache has no active readers on the channel, and has had not
active readers for more than 60 seconds, further requests will not be
added to the channel but instead all looks that do not find a valid
entry will fail.  This is partly for backward compatibility: The
previous nfs exports table was deemed to be authoritative and a
failed lookup meant a definite 'no'.

request/response format
-----------------------

While each cache is free to use it's own format for requests
and responses over channel, the following is recommended are
appropriate and support routines are available to help:
Each request or response record should be printable ASCII
with precisely one newline character which should be at the end.
Fields within the record should be separated by spaces, normally one.
If spaces, newlines, or nul characters are needed in a field they
much be quotes.  two mechanisms are available:
1/ If a field begins '\x' then it must contain an even number of
   hex digits, and pairs of these digits provide the bytes in the
   field.
2/ otherwise a \ in the field must be followed by 3 octal digits
   which give the code for a byte.  Other characters are treated
   as them selves.  At the very least, space, newlines nul, and
   '\' must be quoted in this way.
Commit	Line	Data
1da177e4 LT	1	This document gives a brief introduction to the caching
	2	mechanisms in the sunrpc layer that is used, in particular,
	3	for NFS authentication.
	4
	5	CACHES
	6	======
	7	The caching replaces the old exports table and allows for
	8	a wide variety of values to be caches.
	9
	10	There are a number of caches that are similar in structure though
	11	quite possibly very different in content and use. There is a corpus
	12	of common code for managing these caches.
	13
	14	Examples of caches that are likely to be needed are:
	15	- mapping from IP address to client name
	16	- mapping from client name and filesystem to export options
	17	- mapping from UID to list of GIDs, to work around NFS's limitation
	18	of 16 gids.
	19	- mappings between local UID/GID and remote UID/GID for sites that
	20	do not have uniform uid assignment
	21	- mapping from network identify to public key for crypto authentication.
	22
	23	The common code handles such things as:
	24	- general cache lookup with correct locking
	25	- supporting 'NEGATIVE' as well as positive entries
	26	- allowing an EXPIRED time on cache items, and removing
	27	items after they expire, and are no longe in-use.
	28
	29	Future code extensions are expect to handle
	30	- making requests to user-space to fill in cache entries
	31	- allowing user-space to directly set entries in the cache
	32	- delaying RPC requests that depend on as-yet incomplete
	33	cache entries, and replaying those requests when the cache entry
	34	is complete.
	35	- maintaining last-access times on cache entries
	36	- clean out old entries when the caches become full
	37
	38	The code for performing a cache lookup is also common, but in the form
	39	of a template. i.e. a #define.
	40	Each cache defines a lookup function by using the DefineCacheLookup
	41	macro, or the simpler DefineSimpleCacheLookup macro
	42
	43	Creating a Cache
	44	----------------
	45
	46	1/ A cache needs a datum to cache. This is in the form of a
	47	structure definition that must contain a
	48	struct cache_head
	49	as an element, usually the first.
	50	It will also contain a key and some content.
	51	Each cache element is reference counted and contains
	52	expiry and update times for use in cache management.
	53	2/ A cache needs a "cache_detail" structure that
	54	describes the cache. This stores the hash table, and some
	55	parameters for cache management.
	56	3/ A cache needs a lookup function. This is created using
	57	the DefineCacheLookup macro. This lookup function is used both
	58	to find entries and to update entries. The normal mode for
	59	updating an entry is to replace the old entry with a new
	60	entry. However it is possible to allow update-in-place
	61	for those caches where it makes sense (no atomicity issues
	62	or indirect reference counting issue)
	63	4/ A cache needs to be registered using cache_register(). This
	64	includes in on a list of caches that will be regularly
65	cleaned to discard old data. For this to work, some
66	thread must periodically call cache_clean
67
68	Using a cache
69	-------------
70
71	To find a value in a cache, call the lookup function passing it a the
72	datum which contains key, and possibly content, and a flag saying
73	whether to update the cache with new data from the datum. Depending
74	on how the cache lookup function was defined, it may take an extra
75	argument to identify the particular cache in question.
76
77	Except in cases of kmalloc failure, the lookup function
78	will return a new datum which will store the key and
79	may contain valid content, or may not.
80	This datum is typically passed to cache_check which determines the
81	validity of the datum and may later initiate an upcall to fill
82	in the data.
83
84	cache_check can be passed a "struct cache_req *". This structure is
85	typically embedded in the actual request and can be used to create a
86	deferred copy of the request (struct cache_deferred_req). This is
87	done when the found cache item is not uptodate, but the is reason to
88	believe that userspace might provide information soon. When the cache
89	item does become valid, the deferred copy of the request will be
90	revisited (->revisit). It is expected that this method will
91	reschedule the request for processing.
92
93
94	Populating a cache
95	------------------
96
97	Each cache has a name, and when the cache is registered, a directory
98	with that name is created in /proc/net/rpc
99
100	This directory contains a file called 'channel' which is a channel
101	for communicating between kernel and user for populating the cache.
102	This directory may later contain other files of interacting
103	with the cache.
104
105	The 'channel' works a bit like a datagram socket. Each 'write' is
106	passed as a whole to the cache for parsing and interpretation.
107	Each cache can treat the write requests differently, but it is
108	expected that a message written will contain:
109	- a key
110	- an expiry time
111	- a content.
112	with the intention that an item in the cache with the give key
113	should be create or updated to have the given content, and the
114	expiry time should be set on that item.
115
116	Reading from a channel is a bit more interesting. When a cache
117	lookup fail, or when it suceeds but finds an entry that may soon
118	expiry, a request is lodged for that cache item to be updated by
119	user-space. These requests appear in the channel file.
120
121	Successive reads will return successive requests.
122	If there are no more requests to return, read will return EOF, but a
123	select or poll for read will block waiting for another request to be
124	added.
125
126	Thus a user-space helper is likely to:
127	open the channel.
128	select for readable
129	read a request
130	write a response
131	loop.
132
133	If it dies and needs to be restarted, any requests that have not be
134	answered will still appear in the file and will be read by the new
135	instance of the helper.
136
137	Each cache should define a "cache_parse" method which takes a message
138	written from user-space and processes it. It should return an error
139	(which propagates back to the write syscall) or 0.
140
141	Each cache should also define a "cache_request" method which
142	takes a cache item and encodes a request into the buffer
143	provided.
144
145
146	Note: If a cache has no active readers on the channel, and has had not
147	active readers for more than 60 seconds, further requests will not be
148	added to the channel but instead all looks that do not find a valid
149	entry will fail. This is partly for backward compatibility: The
150	previous nfs exports table was deemed to be authoritative and a
151	failed lookup meant a definite 'no'.
152
153	request/response format
154	-----------------------
155
156	While each cache is free to use it's own format for requests
157	and responses over channel, the following is recommended are
158	appropriate and support routines are available to help:
159	Each request or response record should be printable ASCII
160	with precisely one newline character which should be at the end.
161	Fields within the record should be separated by spaces, normally one.
162	If spaces, newlines, or nul characters are needed in a field they
163	much be quotes. two mechanisms are available:
164	1/ If a field begins '\x' then it must contain an even number of
165	hex digits, and pairs of these digits provide the bytes in the
166	field.
167	2/ otherwise a \ in the field must be followed by 3 octal digits
168	which give the code for a byte. Other characters are treated
169	as them selves. At the very least, space, newlines nul, and
170	'\' must be quoted in this way.
171