Merge tag 'f2fs-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk...
[deliverable/linux.git] / drivers / staging / xillybus / README
CommitLineData
48bae050
EB
1
2 ==========================================
3 Xillybus driver for generic FPGA interface
4 ==========================================
5
6Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com)
7Email: eli.billauer@gmail.com or as advertised on Xillybus' site.
8
9Contents:
10
11 - Introduction
12 -- Background
13 -- Xillybus Overview
14
15 - Usage
16 -- User interface
17 -- Synchronization
18 -- Seekable pipes
19
20- Internals
21 -- Source code organization
22 -- Pipe attributes
23 -- Host never reads from the FPGA
24 -- Channels, pipes, and the message channel
25 -- Data streaming
26 -- Data granularity
27 -- Probing
28 -- Buffer allocation
29 -- Memory management
30 -- The "nonempty" message (supporting poll)
31
32
33INTRODUCTION
34============
35
36Background
37----------
38
39An FPGA (Field Programmable Gate Array) is a piece of logic hardware, which
40can be programmed to become virtually anything that is usually found as a
41dedicated chipset: For instance, a display adapter, network interface card,
42or even a processor with its peripherals. FPGAs are the LEGO of hardware:
43Based upon certain building blocks, you make your own toys the way you like
44them. It's usually pointless to reimplement something that is already
45available on the market as a chipset, so FPGAs are mostly used when some
46special functionality is needed, and the production volume is relatively low
47(hence not justifying the development of an ASIC).
48
49The challenge with FPGAs is that everything is implemented at a very low
50level, even lower than assembly language. In order to allow FPGA designers to
51focus on their specific project, and not reinvent the wheel over and over
52again, pre-designed building blocks, IP cores, are often used. These are the
53FPGA parallels of library functions. IP cores may implement certain
54mathematical functions, a functional unit (e.g. a USB interface), an entire
55processor (e.g. ARM) or anything that might come handy. Think of them as a
56building block, with electrical wires dangling on the sides for connection to
57other blocks.
58
59One of the daunting tasks in FPGA design is communicating with a fullblown
60operating system (actually, with the processor running it): Implementing the
61low-level bus protocol and the somewhat higher-level interface with the host
62(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's
63function is a well-known one (e.g. a video adapter card, or a NIC), it can
64make sense to design the FPGA's interface logic specifically for the project.
65A special driver is then written to present the FPGA as a well-known interface
66to the kernel and/or user space. In that case, there is no reason to treat the
67FPGA differently than any device on the bus.
68
69It's however common that the desired data communication doesn't fit any well-
70known peripheral function. Also, the effort of designing an elegant
71abstraction for the data exchange is often considered too big. In those cases,
72a quicker and possibly less elegant solution is sought: The driver is
73effectively written as a user space program, leaving the kernel space part
74with just elementary data transport. This still requires designing some
75interface logic for the FPGA, and write a simple ad-hoc driver for the kernel.
76
77Xillybus Overview
78-----------------
79
80Xillybus is an IP core and a Linux driver. Together, they form a kit for
81elementary data transport between an FPGA and the host, providing pipe-like
82data streams with a straightforward user interface. It's intended as a low-
83effort solution for mixed FPGA-host projects, for which it makes sense to
84have the project-specific part of the driver running in a user-space program.
85
86Since the communication requirements may vary significantly from one FPGA
87project to another (the number of data pipes needed in each direction and
88their attributes), there isn't one specific chunk of logic being the Xillybus
89IP core. Rather, the IP core is configured and built based upon a
90specification given by its end user.
91
92Xillybus presents independent data streams, which resemble pipes or TCP/IP
93communication to the user. At the host side, a character device file is used
94just like any pipe file. On the FPGA side, hardware FIFOs are used to stream
95the data. This is contrary to a common method of communicating through fixed-
96sized buffers (even though such buffers are used by Xillybus under the hood).
97There may be more than a hundred of these streams on a single IP core, but
98also no more than one, depending on the configuration.
99
100In order to ease the deployment of the Xillybus IP core, it contains a simple
101data structure which completely defines the core's configuration. The Linux
102driver fetches this data structure during its initialization process, and sets
103up the DMA buffers and character devices accordingly. As a result, a single
104driver is used to work out of the box with any Xillybus IP core.
105
106The data structure just mentioned should not be confused with PCI's
107configuration space or the Flattened Device Tree.
108
109USAGE
110=====
111
112User interface
113--------------
114
115On the host, all interface with Xillybus is done through /dev/xillybus_*
116device files, which are generated automatically as the drivers loads. The
117names of these files depend on the IP core that is loaded in the FPGA (see
118Probing below). To communicate with the FPGA, open the device file that
119corresponds to the hardware FIFO you want to send data or receive data from,
120and use plain write() or read() calls, just like with a regular pipe. In
121particular, it makes perfect sense to go:
122
123$ cat mydata > /dev/xillybus_thisfifo
124
125$ cat /dev/xillybus_thatfifo > hisdata
126
127possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have
128the capability to send an EOF (but may not use it).
129
130The driver and hardware are designed to behave sensibly as pipes, including:
131
132* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ).
133
134* Supporting poll() and select().
135
136* Being bandwidth efficient under load (using DMA) but also handle small
137 pieces of data sent across (like TCP/IP) by autoflushing.
138
139A device file can be read only, write only or bidirectional. Bidirectional
140device files are treated like two independent pipes (except for sharing a
141"channel" structure in the implementation code).
142
143Synchronization
144---------------
145
146Xillybus pipes are configured (on the IP core) to be either synchronous or
147asynchronous. For a synchronous pipe, write() returns successfully only after
148some data has been submitted and acknowledged by the FPGA. This slows down
149bulk data transfers, and is nearly impossible for use with streams that
150require data at a constant rate: There is no data transmitted to the FPGA
151between write() calls, in particular when the process loses the CPU.
152
153When a pipe is configured asynchronous, write() returns if there was enough
154room in the buffers to store any of the data in the buffers.
155
156For FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA
157as soon as the respective device file is opened, regardless of if the data
158has been requested by a read() call. On synchronous pipes, only the amount
159of data requested by a read() call is transmitted.
160
161In summary, for synchronous pipes, data between the host and FPGA is
162transmitted only to satisfy the read() or write() call currently handled
163by the driver, and those calls wait for the transmission to complete before
164returning.
165
166Note that the synchronization attribute has nothing to do with the possibility
167that read() or write() completes less bytes than requested. There is a
168separate configuration flag ("allowpartial") that determines whether such a
169partial completion is allowed.
170
171Seekable pipes
172--------------
173
174A synchronous pipe can be configured to have the stream's position exposed
175to the user logic at the FPGA. Such a pipe is also seekable on the host API.
176With this feature, a memory or register interface can be attached on the
177FPGA side to the seekable stream. Reading or writing to a certain address in
178the attached memory is done by seeking to the desired address, and calling
179read() or write() as required.
180
181
182INTERNALS
183=========
184
185Source code organization
186------------------------
187
188The Xillybus driver consists of a core module, xillybus_core.c, and modules
189that depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c).
190
191The bus specific modules are those probed when a suitable device is found by
192the kernel. Since the DMA mapping and synchronization functions, which are bus
193dependent by their nature, are used by the core module, a
194xilly_endpoint_hardware structure is passed to the core module on
195initialization. This structure is populated with pointers to wrapper functions
196which execute the DMA-related operations on the bus.
197
198Pipe attributes
199---------------
200
201Each pipe has a number of attributes which are set when the FPGA component
202(IP core) is built. They are fetched from the IDT (the data structure which
203defines the core's configuration, see Probing below) by xilly_setupchannels()
204in xillybus_core.c as follows:
205
206* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to
207 host pipe (the FPGA "writes").
208
209* channelnum: The pipe's identification number in communication between the
210 host and FPGA.
211
212* format: The underlying data width. See Data Granularity below.
213
214* allowpartial: A non-zero value means that a read() or write() (whichever
215 applies) may return with less than the requested number of bytes. The common
216 choice is a non-zero value, to match standard UNIX behavior.
217
218* synchronous: A non-zero value means that the pipe is synchronous. See
219 Syncronization above.
220
221* bufsize: Each DMA buffer's size. Always a power of two.
222
223* bufnum: The number of buffers allocated for this pipe. Always a power of two.
224
225* exclusive_open: A non-zero value forces exclusive opening of the associated
226 device file. If the device file is bidirectional, and already opened only in
227 one direction, the opposite direction may be opened once.
228
229* seekable: A non-zero value indicates that the pipe is seekable. See
230 Seekable pipes above.
231
232* supports_nonempty: A non-zero value (which is typical) indicates that the
233 hardware will send the messages that are necessary to support select() and
234 poll() for this pipe.
235
236Host never reads from the FPGA
237------------------------------
238
239Even though PCI Express is hotpluggable in general, a typical motherboard
240doesn't expect a card to go away all of the sudden. But since the PCIe card
241is based upon reprogrammable logic, a sudden disappearance from the bus is
242quite likely as a result of an accidental reprogramming of the FPGA while the
243host is up. In practice, nothing happens immediately in such a situation. But
244if the host attempts to read from an address that is mapped to the PCI Express
245device, that leads to an immediate freeze of the system on some motherboards,
246even though the PCIe standard requires a graceful recovery.
247
248In order to avoid these freezes, the Xillybus driver refrains completely from
249reading from the device's register space. All communication from the FPGA to
250the host is done through DMA. In particular, the Interrupt Service Routine
251doesn't follow the common practice of checking a status register when it's
252invoked. Rather, the FPGA prepares a small buffer which contains short
253messages, which inform the host what the interrupt was about.
254
255This mechanism is used on non-PCIe buses as well for the sake of uniformity.
256
257
258Channels, pipes, and the message channel
259----------------------------------------
260
261Each of the (possibly bidirectional) pipes presented to the user is allocated
262a data channel between the FPGA and the host. The distinction between channels
263and pipes is necessary only because of channel 0, which is used for interrupt-
264related messages from the FPGA, and has no pipe attached to it.
265
266Data streaming
267--------------
268
269Even though a non-segmented data stream is presented to the user at both
270sides, the implementation relies on a set of DMA buffers which is allocated
271for each channel. For the sake of illustration, let's take the FPGA to host
272direction: As data streams into the respective channel's interface in the
273FPGA, the Xillybus IP core writes it to one of the DMA buffers. When the
274buffer is full, the FPGA informs the host about that (appending a
275XILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if
276necessary). The host responds by making the data available for reading through
277the character device. When all data has been read, the host writes on the
278the FPGA's buffer control register, allowing the buffer's overwriting. Flow
279control mechanisms exist on both sides to prevent underflows and overflows.
280
281This is not good enough for creating a TCP/IP-like stream: If the data flow
282stops momentarily before a DMA buffer is filled, the intuitive expectation is
283that the partial data in buffer will arrive anyhow, despite the buffer not
284being completed. This is implemented by adding a field in the
285XILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just
286which buffer is submitted, but how much data it contains.
287
288But the FPGA will submit a partially filled buffer only if directed to do so
289by the host. This situation occurs when the read() method has been blocking
290for XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands
291the FPGA to submit a DMA buffer as soon as it can. This timeout mechanism
292balances between bus bandwidth efficiency (preventing a lot of partially
293filled buffers being sent) and a latency held fairly low for tails of data.
294
295A similar setting is used in the host to FPGA direction. The handling of
296partial DMA buffers is somewhat different, though. The user can tell the
297driver to submit all data it has in the buffers to the FPGA, by issuing a
298write() with the byte count set to zero. This is similar to a flush request,
299but it doesn't block. There is also an autoflushing mechanism, which triggers
300an equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write().
301This allows the user to be oblivious about the underlying buffering mechanism
302and yet enjoy a stream-like interface.
303
304Note that the issue of partial buffer flushing is irrelevant for pipes having
305the "synchronous" attribute nonzero, since synchronous pipes don't allow data
306to lay around in the DMA buffers between read() and write() anyhow.
307
308Data granularity
309----------------
310
311The data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as
312configured by the "format" attribute. Whenever possible, the driver attempts
313to hide this when the pipe is accessed differently from its natural alignment.
314For example, reading single bytes from a pipe with 32 bit granularity works
315with no issues. Writing single bytes to pipes with 16 or 32 bit granularity
316will also work, but the driver can't send partially completed words to the
317FPGA, so the transmission of up to one word may be held until it's fully
318occupied with user data.
319
320This somewhat complicates the handling of host to FPGA streams, because
321when a buffer is flushed, it may contain up to 3 bytes don't form a word in
322the FPGA, and hence can't be sent. To prevent loss of data, these leftover
323bytes need to be moved to the next buffer. The parts in xillybus_core.c
324that mention "leftovers" in some way are related to this complication.
325
326Probing
327-------
328
329As mentioned earlier, the number of pipes that are created when the driver
330loads and their attributes depend on the Xillybus IP core in the FPGA. During
331the driver's initialization, a blob containing configuration info, the
332Interface Description Table (IDT), is sent from the FPGA to the host. The
333bootstrap process is done in three phases:
334
3351. Acquire the length of the IDT, so a buffer can be allocated for it. This
336 is done by sending a quiesce command to the device, since the acknowledge
337 for this command contains the IDT's buffer length.
338
3392. Acquire the IDT itself.
340
3413. Create the interfaces according to the IDT.
342
343Buffer allocation
344-----------------
345
346In order to simplify the logic that prevents illegal boundary crossings of
347PCIe packets, the following rule applies: If a buffer is smaller than 4kB,
348it must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The
349xilly_setupchannels() functions allocates these buffers by requesting whole
350pages from the kernel, and diving them into DMA buffers as necessary. Since
351all buffers' sizes are powers of two, it's possible to pack any set of such
352buffers, with a maximal waste of one page of memory.
353
354All buffers are allocated when the driver is loaded. This is necessary,
355since large continuous physical memory segments are sometimes requested,
356which are more likely to be available when the system is freshly booted.
357
358The allocation of buffer memory takes place in the same order they appear in
359the IDT. The driver relies on a rule that the pipes are sorted with decreasing
360buffer size in the IDT. If a requested buffer is larger or equal to a page,
361the necessary number of pages is requested from the kernel, and these are
362used for this buffer. If the requested buffer is smaller than a page, one
363single page is requested from the kernel, and that page is partially used.
364Or, if there already is a partially used page at hand, the buffer is packed
365into that page. It can be shown that all pages requested from the kernel
366(except possibly for the last) are 100% utilized this way.
367
368Memory management
369-----------------
370
371The tricky part about the buffer allocation procedure described above is
372freeing and unmapping the buffers, in particular if something goes wrong in
373the middle, and the allocations need to be rolled back. The three-stage
374probing procedure makes this even more crucial, since temporary buffers are
375set up and mapped in the first of its two stages.
376
377To keep the code clean from complicated and bug-prone memory release routines,
378there are special routines for allocating memory. For example, instead of
379calling kzalloc, there's
380
381void *xilly_malloc(struct xilly_cleanup *mem, size_t size)
382
383which effectively allocates a zeroed buffer of size "size". Its first
384argument, "mem", is where this allocation is enlisted, so that it's released
385when xillybus_do_cleanup() is called with the same "mem" structure.
386
387Two other functions enlist allocations in this structure: xilly_pagealloc()
388for page allocations and xilly_map_single_*() for DMA mapping.
389
390The "nonempty" message (supporting poll)
391---------------------------------------
392
393In order to support the "poll" method (and hence select() ), there is a small
394catch regarding the FPGA to host direction: The FPGA may have filled a DMA
395buffer with some data, but not submitted that buffer. If the host waited for
396the buffer's submission by the FPGA, there would be a possibility that the
397FPGA side has sent data, but a select() call would still block, because the
398host has not received any notification about this. This is solved with
399XILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from
400completely empty to containing some data.
401
402These messages are used only to support poll() and select(). The IP core can
403be configured not to send them for a slight reduction of bandwidth.
This page took 0.099738 seconds and 5 git commands to generate.