[AFS]: Clean up the AFS sources
[deliverable/linux.git] / Documentation / networking / rxrpc.txt
CommitLineData
17926a79
DH
1 ======================
2 RxRPC NETWORK PROTOCOL
3 ======================
4
5The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
6that can be used to perform RxRPC remote operations. This is done over sockets
7of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and
8receive data, aborts and errors.
9
10Contents of this document:
11
12 (*) Overview.
13
14 (*) RxRPC protocol summary.
15
16 (*) AF_RXRPC driver model.
17
18 (*) Control messages.
19
20 (*) Socket options.
21
22 (*) Security.
23
24 (*) Example client usage.
25
26 (*) Example server usage.
27
28
29========
30OVERVIEW
31========
32
33RxRPC is a two-layer protocol. There is a session layer which provides
34reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
35layer, but implements a real network protocol; and there's the presentation
36layer which renders structured data to binary blobs and back again using XDR
37(as does SunRPC):
38
39 +-------------+
40 | Application |
41 +-------------+
42 | XDR | Presentation
43 +-------------+
44 | RxRPC | Session
45 +-------------+
46 | UDP | Transport
47 +-------------+
48
49
50AF_RXRPC provides:
51
52 (1) Part of an RxRPC facility for both kernel and userspace applications by
53 making the session part of it a Linux network protocol (AF_RXRPC).
54
55 (2) A two-phase protocol. The client transmits a blob (the request) and then
56 receives a blob (the reply), and the server receives the request and then
57 transmits the reply.
58
59 (3) Retention of the reusable bits of the transport system set up for one call
60 to speed up subsequent calls.
61
62 (4) A secure protocol, using the Linux kernel's key retention facility to
63 manage security on the client end. The server end must of necessity be
64 more active in security negotiations.
65
66AF_RXRPC does not provide XDR marshalling/presentation facilities. That is
67left to the application. AF_RXRPC only deals in blobs. Even the operation ID
68is just the first four bytes of the request blob, and as such is beyond the
69kernel's interest.
70
71
72Sockets of AF_RXRPC family are:
73
74 (1) created as type SOCK_DGRAM;
75
76 (2) provided with a protocol of the type of underlying transport they're going
77 to use - currently only PF_INET is supported.
78
79
80The Andrew File System (AFS) is an example of an application that uses this and
81that has both kernel (filesystem) and userspace (utility) components.
82
83
84======================
85RXRPC PROTOCOL SUMMARY
86======================
87
88An overview of the RxRPC protocol:
89
90 (*) RxRPC sits on top of another networking protocol (UDP is the only option
91 currently), and uses this to provide network transport. UDP ports, for
92 example, provide transport endpoints.
93
94 (*) RxRPC supports multiple virtual "connections" from any given transport
95 endpoint, thus allowing the endpoints to be shared, even to the same
96 remote endpoint.
97
98 (*) Each connection goes to a particular "service". A connection may not go
99 to multiple services. A service may be considered the RxRPC equivalent of
100 a port number. AF_RXRPC permits multiple services to share an endpoint.
101
102 (*) Client-originating packets are marked, thus a transport endpoint can be
103 shared between client and server connections (connections have a
104 direction).
105
106 (*) Up to a billion connections may be supported concurrently between one
107 local transport endpoint and one service on one remote endpoint. An RxRPC
108 connection is described by seven numbers:
109
110 Local address }
111 Local port } Transport (UDP) address
112 Remote address }
113 Remote port }
114 Direction
115 Connection ID
116 Service ID
117
118 (*) Each RxRPC operation is a "call". A connection may make up to four
119 billion calls, but only up to four calls may be in progress on a
120 connection at any one time.
121
122 (*) Calls are two-phase and asymmetric: the client sends its request data,
123 which the service receives; then the service transmits the reply data
124 which the client receives.
125
126 (*) The data blobs are of indefinite size, the end of a phase is marked with a
127 flag in the packet. The number of packets of data making up one blob may
128 not exceed 4 billion, however, as this would cause the sequence number to
129 wrap.
130
131 (*) The first four bytes of the request data are the service operation ID.
132
133 (*) Security is negotiated on a per-connection basis. The connection is
134 initiated by the first data packet on it arriving. If security is
135 requested, the server then issues a "challenge" and then the client
136 replies with a "response". If the response is successful, the security is
137 set for the lifetime of that connection, and all subsequent calls made
138 upon it use that same security. In the event that the server lets a
139 connection lapse before the client, the security will be renegotiated if
140 the client uses the connection again.
141
142 (*) Calls use ACK packets to handle reliability. Data packets are also
143 explicitly sequenced per call.
144
145 (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs.
146 A hard-ACK indicates to the far side that all the data received to a point
147 has been received and processed; a soft-ACK indicates that the data has
148 been received but may yet be discarded and re-requested. The sender may
149 not discard any transmittable packets until they've been hard-ACK'd.
150
151 (*) Reception of a reply data packet implicitly hard-ACK's all the data
152 packets that make up the request.
153
154 (*) An call is complete when the request has been sent, the reply has been
155 received and the final hard-ACK on the last packet of the reply has
156 reached the server.
157
158 (*) An call may be aborted by either end at any time up to its completion.
159
160
161=====================
162AF_RXRPC DRIVER MODEL
163=====================
164
165About the AF_RXRPC driver:
166
167 (*) The AF_RXRPC protocol transparently uses internal sockets of the transport
168 protocol to represent transport endpoints.
169
170 (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
171 connections are handled transparently. One client socket may be used to
172 make multiple simultaneous calls to the same service. One server socket
173 may handle calls from many clients.
174
175 (*) Additional parallel client connections will be initiated to support extra
176 concurrent calls, up to a tunable limit.
177
178 (*) Each connection is retained for a certain amount of time [tunable] after
179 the last call currently using it has completed in case a new call is made
180 that could reuse it.
181
182 (*) Each internal UDP socket is retained [tunable] for a certain amount of
183 time [tunable] after the last connection using it discarded, in case a new
184 connection is made that could use it.
185
186 (*) A client-side connection is only shared between calls if they have have
187 the same key struct describing their security (and assuming the calls
188 would otherwise share the connection). Non-secured calls would also be
189 able to share connections with each other.
190
191 (*) A server-side connection is shared if the client says it is.
192
193 (*) ACK'ing is handled by the protocol driver automatically, including ping
194 replying.
195
196 (*) SO_KEEPALIVE automatically pings the other side to keep the connection
197 alive [TODO].
198
199 (*) If an ICMP error is received, all calls affected by that error will be
200 aborted with an appropriate network error passed through recvmsg().
201
202
203Interaction with the user of the RxRPC socket:
204
205 (*) A socket is made into a server socket by binding an address with a
206 non-zero service ID.
207
208 (*) In the client, sending a request is achieved with one or more sendmsgs,
209 followed by the reply being received with one or more recvmsgs.
210
211 (*) The first sendmsg for a request to be sent from a client contains a tag to
212 be used in all other sendmsgs or recvmsgs associated with that call. The
213 tag is carried in the control data.
214
215 (*) connect() is used to supply a default destination address for a client
216 socket. This may be overridden by supplying an alternate address to the
217 first sendmsg() of a call (struct msghdr::msg_name).
218
219 (*) If connect() is called on an unbound client, a random local port will
220 bound before the operation takes place.
221
222 (*) A server socket may also be used to make client calls. To do this, the
223 first sendmsg() of the call must specify the target address. The server's
224 transport endpoint is used to send the packets.
225
226 (*) Once the application has received the last message associated with a call,
227 the tag is guaranteed not to be seen again, and so it can be used to pin
228 client resources. A new call can then be initiated with the same tag
229 without fear of interference.
230
231 (*) In the server, a request is received with one or more recvmsgs, then the
232 the reply is transmitted with one or more sendmsgs, and then the final ACK
233 is received with a last recvmsg.
234
235 (*) When sending data for a call, sendmsg is given MSG_MORE if there's more
236 data to come on that call.
237
238 (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more
239 data to come for that call.
240
241 (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
242 to indicate the terminal message for that call.
243
244 (*) A call may be aborted by adding an abort control message to the control
245 data. Issuing an abort terminates the kernel's use of that call's tag.
246 Any messages waiting in the receive queue for that call will be discarded.
247
248 (*) Aborts, busy notifications and challenge packets are delivered by recvmsg,
249 and control data messages will be set to indicate the context. Receiving
250 an abort or a busy message terminates the kernel's use of that call's tag.
251
252 (*) The control data part of the msghdr struct is used for a number of things:
253
254 (*) The tag of the intended or affected call.
255
256 (*) Sending or receiving errors, aborts and busy notifications.
257
258 (*) Notifications of incoming calls.
259
260 (*) Sending debug requests and receiving debug replies [TODO].
261
262 (*) When the kernel has received and set up an incoming call, it sends a
263 message to server application to let it know there's a new call awaiting
264 its acceptance [recvmsg reports a special control message]. The server
265 application then uses sendmsg to assign a tag to the new call. Once that
266 is done, the first part of the request data will be delivered by recvmsg.
267
268 (*) The server application has to provide the server socket with a keyring of
269 secret keys corresponding to the security types it permits. When a secure
270 connection is being set up, the kernel looks up the appropriate secret key
271 in the keyring and then sends a challenge packet to the client and
272 receives a response packet. The kernel then checks the authorisation of
273 the packet and either aborts the connection or sets up the security.
274
275 (*) The name of the key a client will use to secure its communications is
276 nominated by a socket option.
277
278
279Notes on recvmsg:
280
281 (*) If there's a sequence of data messages belonging to a particular call on
282 the receive queue, then recvmsg will keep working through them until:
283
284 (a) it meets the end of that call's received data,
285
286 (b) it meets a non-data message,
287
288 (c) it meets a message belonging to a different call, or
289
290 (d) it fills the user buffer.
291
292 If recvmsg is called in blocking mode, it will keep sleeping, awaiting the
293 reception of further data, until one of the above four conditions is met.
294
295 (2) MSG_PEEK operates similarly, but will return immediately if it has put any
296 data in the buffer rather than sleeping until it can fill the buffer.
297
298 (3) If a data message is only partially consumed in filling a user buffer,
299 then the remainder of that message will be left on the front of the queue
300 for the next taker. MSG_TRUNC will never be flagged.
301
302 (4) If there is more data to be had on a call (it hasn't copied the last byte
303 of the last data message in that phase yet), then MSG_MORE will be
304 flagged.
305
306
307================
308CONTROL MESSAGES
309================
310
311AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex
312calls, to invoke certain actions and to report certain conditions. These are:
313
314 MESSAGE ID SRT DATA MEANING
315 ======================= === =========== ===============================
316 RXRPC_USER_CALL_ID sr- User ID App's call specifier
317 RXRPC_ABORT srt Abort code Abort code to issue/received
318 RXRPC_ACK -rt n/a Final ACK received
319 RXRPC_NET_ERROR -rt error num Network error on call
320 RXRPC_BUSY -rt n/a Call rejected (server busy)
321 RXRPC_LOCAL_ERROR -rt error num Local error encountered
322 RXRPC_NEW_CALL -r- n/a New call received
323 RXRPC_ACCEPT s-- n/a Accept new call
324
325 (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
326
327 (*) RXRPC_USER_CALL_ID
328
329 This is used to indicate the application's call ID. It's an unsigned long
330 that the app specifies in the client by attaching it to the first data
331 message or in the server by passing it in association with an RXRPC_ACCEPT
332 message. recvmsg() passes it in conjunction with all messages except
333 those of the RXRPC_NEW_CALL message.
334
335 (*) RXRPC_ABORT
336
337 This is can be used by an application to abort a call by passing it to
338 sendmsg, or it can be delivered by recvmsg to indicate a remote abort was
339 received. Either way, it must be associated with an RXRPC_USER_CALL_ID to
340 specify the call affected. If an abort is being sent, then error EBADSLT
341 will be returned if there is no call with that user ID.
342
343 (*) RXRPC_ACK
344
345 This is delivered to a server application to indicate that the final ACK
346 of a call was received from the client. It will be associated with an
347 RXRPC_USER_CALL_ID to indicate the call that's now complete.
348
349 (*) RXRPC_NET_ERROR
350
351 This is delivered to an application to indicate that an ICMP error message
352 was encountered in the process of trying to talk to the peer. An
353 errno-class integer value will be included in the control message data
354 indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
355 affected.
356
357 (*) RXRPC_BUSY
358
359 This is delivered to a client application to indicate that a call was
360 rejected by the server due to the server being busy. It will be
361 associated with an RXRPC_USER_CALL_ID to indicate the rejected call.
362
363 (*) RXRPC_LOCAL_ERROR
364
365 This is delivered to an application to indicate that a local error was
366 encountered and that a call has been aborted because of it. An
367 errno-class integer value will be included in the control message data
368 indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
369 affected.
370
371 (*) RXRPC_NEW_CALL
372
373 This is delivered to indicate to a server application that a new call has
374 arrived and is awaiting acceptance. No user ID is associated with this,
375 as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT.
376
377 (*) RXRPC_ACCEPT
378
379 This is used by a server application to attempt to accept a call and
380 assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID
381 to indicate the user ID to be assigned. If there is no call to be
382 accepted (it may have timed out, been aborted, etc.), then sendmsg will
383 return error ENODATA. If the user ID is already in use by another call,
384 then error EBADSLT will be returned.
385
386
387==============
388SOCKET OPTIONS
389==============
390
391AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
392
393 (*) RXRPC_SECURITY_KEY
394
395 This is used to specify the description of the key to be used. The key is
396 extracted from the calling process's keyrings with request_key() and
397 should be of "rxrpc" type.
398
399 The optval pointer points to the description string, and optlen indicates
400 how long the string is, without the NUL terminator.
401
402 (*) RXRPC_SECURITY_KEYRING
403
404 Similar to above but specifies a keyring of server secret keys to use (key
405 type "keyring"). See the "Security" section.
406
407 (*) RXRPC_EXCLUSIVE_CONNECTION
408
409 This is used to request that new connections should be used for each call
410 made subsequently on this socket. optval should be NULL and optlen 0.
411
412 (*) RXRPC_MIN_SECURITY_LEVEL
413
414 This is used to specify the minimum security level required for calls on
415 this socket. optval must point to an int containing one of the following
416 values:
417
418 (a) RXRPC_SECURITY_PLAIN
419
420 Encrypted checksum only.
421
422 (b) RXRPC_SECURITY_AUTH
423
424 Encrypted checksum plus packet padded and first eight bytes of packet
425 encrypted - which includes the actual packet length.
426
427 (c) RXRPC_SECURITY_ENCRYPTED
428
429 Encrypted checksum plus entire packet padded and encrypted, including
430 actual packet length.
431
432
433========
434SECURITY
435========
436
437Currently, only the kerberos 4 equivalent protocol has been implemented
438(security index 2 - rxkad). This requires the rxkad module to be loaded and,
439on the client, tickets of the appropriate type to be obtained from the AFS
440kaserver or the kerberos server and installed as "rxrpc" type keys. This is
441normally done using the klog program. An example simple klog program can be
442found at:
443
444 http://people.redhat.com/~dhowells/rxrpc/klog.c
445
446The payload provided to add_key() on the client should be of the following
447form:
448
449 struct rxrpc_key_sec2_v1 {
450 uint16_t security_index; /* 2 */
451 uint16_t ticket_length; /* length of ticket[] */
452 uint32_t expiry; /* time at which expires */
453 uint8_t kvno; /* key version number */
454 uint8_t __pad[3];
455 uint8_t session_key[8]; /* DES session key */
456 uint8_t ticket[0]; /* the encrypted ticket */
457 };
458
459Where the ticket blob is just appended to the above structure.
460
461
462For the server, keys of type "rxrpc_s" must be made available to the server.
463They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an
464rxkad key for the AFS VL service). When such a key is created, it should be
465given the server's secret key as the instantiation data (see the example
466below).
467
468 add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
469
470A keyring is passed to the server socket by naming it in a sockopt. The server
471socket then looks the server secret keys up in this keyring when secure
472incoming connections are made. This can be seen in an example program that can
473be found at:
474
475 http://people.redhat.com/~dhowells/rxrpc/listen.c
476
477
478====================
479EXAMPLE CLIENT USAGE
480====================
481
482A client would issue an operation by:
483
484 (1) An RxRPC socket is set up by:
485
486 client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
487
488 Where the third parameter indicates the protocol family of the transport
489 socket used - usually IPv4 but it can also be IPv6 [TODO].
490
491 (2) A local address can optionally be bound:
492
493 struct sockaddr_rxrpc srx = {
494 .srx_family = AF_RXRPC,
495 .srx_service = 0, /* we're a client */
496 .transport_type = SOCK_DGRAM, /* type of transport socket */
497 .transport.sin_family = AF_INET,
498 .transport.sin_port = htons(7000), /* AFS callback */
499 .transport.sin_address = 0, /* all local interfaces */
500 };
501 bind(client, &srx, sizeof(srx));
502
503 This specifies the local UDP port to be used. If not given, a random
504 non-privileged port will be used. A UDP port may be shared between
505 several unrelated RxRPC sockets. Security is handled on a basis of
506 per-RxRPC virtual connection.
507
508 (3) The security is set:
509
510 const char *key = "AFS:cambridge.redhat.com";
511 setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
512
513 This issues a request_key() to get the key representing the security
514 context. The minimum security level can be set:
515
516 unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
517 setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
518 &sec, sizeof(sec));
519
520 (4) The server to be contacted can then be specified (alternatively this can
521 be done through sendmsg):
522
523 struct sockaddr_rxrpc srx = {
524 .srx_family = AF_RXRPC,
525 .srx_service = VL_SERVICE_ID,
526 .transport_type = SOCK_DGRAM, /* type of transport socket */
527 .transport.sin_family = AF_INET,
528 .transport.sin_port = htons(7005), /* AFS volume manager */
529 .transport.sin_address = ...,
530 };
531 connect(client, &srx, sizeof(srx));
532
533 (5) The request data should then be posted to the server socket using a series
534 of sendmsg() calls, each with the following control message attached:
535
536 RXRPC_USER_CALL_ID - specifies the user ID for this call
537
538 MSG_MORE should be set in msghdr::msg_flags on all but the last part of
539 the request. Multiple requests may be made simultaneously.
540
541 If a call is intended to go to a destination other then the default
542 specified through connect(), then msghdr::msg_name should be set on the
543 first request message of that call.
544
545 (6) The reply data will then be posted to the server socket for recvmsg() to
546 pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data
547 for a particular call to be read. MSG_EOR will be set on the terminal
548 read for a call.
549
550 All data will be delivered with the following control message attached:
551
552 RXRPC_USER_CALL_ID - specifies the user ID for this call
553
554 If an abort or error occurred, this will be returned in the control data
555 buffer instead, and MSG_EOR will be flagged to indicate the end of that
556 call.
557
558
559====================
560EXAMPLE SERVER USAGE
561====================
562
563A server would be set up to accept operations in the following manner:
564
565 (1) An RxRPC socket is created by:
566
567 server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
568
569 Where the third parameter indicates the address type of the transport
570 socket used - usually IPv4.
571
572 (2) Security is set up if desired by giving the socket a keyring with server
573 secret keys in it:
574
575 keyring = add_key("keyring", "AFSkeys", NULL, 0,
576 KEY_SPEC_PROCESS_KEYRING);
577
578 const char secret_key[8] = {
579 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 };
580 add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
581
582 setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);
583
584 The keyring can be manipulated after it has been given to the socket. This
585 permits the server to add more keys, replace keys, etc. whilst it is live.
586
587 (2) A local address must then be bound:
588
589 struct sockaddr_rxrpc srx = {
590 .srx_family = AF_RXRPC,
591 .srx_service = VL_SERVICE_ID, /* RxRPC service ID */
592 .transport_type = SOCK_DGRAM, /* type of transport socket */
593 .transport.sin_family = AF_INET,
594 .transport.sin_port = htons(7000), /* AFS callback */
595 .transport.sin_address = 0, /* all local interfaces */
596 };
597 bind(server, &srx, sizeof(srx));
598
599 (3) The server is then set to listen out for incoming calls:
600
601 listen(server, 100);
602
603 (4) The kernel notifies the server of pending incoming connections by sending
604 it a message for each. This is received with recvmsg() on the server
605 socket. It has no data, and has a single dataless control message
606 attached:
607
608 RXRPC_NEW_CALL
609
610 The address that can be passed back by recvmsg() at this point should be
611 ignored since the call for which the message was posted may have gone by
612 the time it is accepted - in which case the first call still on the queue
613 will be accepted.
614
615 (5) The server then accepts the new call by issuing a sendmsg() with two
616 pieces of control data and no actual data:
617
618 RXRPC_ACCEPT - indicate connection acceptance
619 RXRPC_USER_CALL_ID - specify user ID for this call
620
621 (6) The first request data packet will then be posted to the server socket for
622 recvmsg() to pick up. At that point, the RxRPC address for the call can
623 be read from the address fields in the msghdr struct.
624
625 Subsequent request data will be posted to the server socket for recvmsg()
626 to collect as it arrives. All but the last piece of the request data will
627 be delivered with MSG_MORE flagged.
628
629 All data will be delivered with the following control message attached:
630
631 RXRPC_USER_CALL_ID - specifies the user ID for this call
632
633 (8) The reply data should then be posted to the server socket using a series
634 of sendmsg() calls, each with the following control messages attached:
635
636 RXRPC_USER_CALL_ID - specifies the user ID for this call
637
638 MSG_MORE should be set in msghdr::msg_flags on all but the last message
639 for a particular call.
640
641 (9) The final ACK from the client will be posted for retrieval by recvmsg()
642 when it is received. It will take the form of a dataless message with two
643 control messages attached:
644
645 RXRPC_USER_CALL_ID - specifies the user ID for this call
646 RXRPC_ACK - indicates final ACK (no data)
647
648 MSG_EOR will be flagged to indicate that this is the final message for
649 this call.
650
651(10) Up to the point the final packet of reply data is sent, the call can be
652 aborted by calling sendmsg() with a dataless message with the following
653 control messages attached:
654
655 RXRPC_USER_CALL_ID - specifies the user ID for this call
656 RXRPC_ABORT - indicates abort code (4 byte data)
657
658 Any packets waiting in the socket's receive queue will be discarded if
659 this is issued.
660
661Note that all the communications for a particular service take place through
662the one server socket, using control messages on sendmsg() and recvmsg() to
663determine the call affected.
This page took 0.089514 seconds and 5 git commands to generate.