Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
[deliverable/linux.git] / Documentation / scsi / st.txt
1 This file contains brief information about the SCSI tape driver.
2 The driver is currently maintained by Kai Mäkisara (email
3 Kai.Makisara@kolumbus.fi)
4
5 Last modified: Sun Feb 24 21:59:07 2008 by kai.makisara
6
7
8 BASICS
9
10 The driver is generic, i.e., it does not contain any code tailored
11 to any specific tape drive. The tape parameters can be specified with
12 one of the following three methods:
13
14 1. Each user can specify the tape parameters he/she wants to use
15 directly with ioctls. This is administratively a very simple and
16 flexible method and applicable to single-user workstations. However,
17 in a multiuser environment the next user finds the tape parameters in
18 state the previous user left them.
19
20 2. The system manager (root) can define default values for some tape
21 parameters, like block size and density using the MTSETDRVBUFFER ioctl.
22 These parameters can be programmed to come into effect either when a
23 new tape is loaded into the drive or if writing begins at the
24 beginning of the tape. The second method is applicable if the tape
25 drive performs auto-detection of the tape format well (like some
26 QIC-drives). The result is that any tape can be read, writing can be
27 continued using existing format, and the default format is used if
28 the tape is rewritten from the beginning (or a new tape is written
29 for the first time). The first method is applicable if the drive
30 does not perform auto-detection well enough and there is a single
31 "sensible" mode for the device. An example is a DAT drive that is
32 used only in variable block mode (I don't know if this is sensible
33 or not :-).
34
35 The user can override the parameters defined by the system
36 manager. The changes persist until the defaults again come into
37 effect.
38
39 3. By default, up to four modes can be defined and selected using the minor
40 number (bits 5 and 6). The number of modes can be changed by changing
41 ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed
42 above. Additional modes are dormant until they are defined by the
43 system manager (root). When specification of a new mode is started,
44 the configuration of mode 0 is used to provide a starting point for
45 definition of the new mode.
46
47 Using the modes allows the system manager to give the users choices
48 over some of the buffering parameters not directly accessible to the
49 users (buffered and asynchronous writes). The modes also allow choices
50 between formats in multi-tape operations (the explicitly overridden
51 parameters are reset when a new tape is loaded).
52
53 If more than one mode is used, all modes should contain definitions
54 for the same set of parameters.
55
56 Many Unices contain internal tables that associate different modes to
57 supported devices. The Linux SCSI tape driver does not contain such
58 tables (and will not do that in future). Instead of that, a utility
59 program can be made that fetches the inquiry data sent by the device,
60 scans its database, and sets up the modes using the ioctls. Another
61 alternative is to make a small script that uses mt to set the defaults
62 tailored to the system.
63
64 The driver supports fixed and variable block size (within buffer
65 limits). Both the auto-rewind (minor equals device number) and
66 non-rewind devices (minor is 128 + device number) are implemented.
67
68 In variable block mode, the byte count in write() determines the size
69 of the physical block on tape. When reading, the drive reads the next
70 tape block and returns to the user the data if the read() byte count
71 is at least the block size. Otherwise, error ENOMEM is returned.
72
73 In fixed block mode, the data transfer between the drive and the
74 driver is in multiples of the block size. The write() byte count must
75 be a multiple of the block size. This is not required when reading but
76 may be advisable for portability.
77
78 Support is provided for changing the tape partition and partitioning
79 of the tape with one or two partitions. By default support for
80 partitioned tape is disabled for each driver and it can be enabled
81 with the ioctl MTSETDRVBUFFER.
82
83 By default the driver writes one filemark when the device is closed after
84 writing and the last operation has been a write. Two filemarks can be
85 optionally written. In both cases end of data is signified by
86 returning zero bytes for two consecutive reads.
87
88 If rewind, offline, bsf, or seek is done and previous tape operation was
89 write, a filemark is written before moving tape.
90
91 The compile options are defined in the file linux/drivers/scsi/st_options.h.
92
93 4. If the open option O_NONBLOCK is used, open succeeds even if the
94 drive is not ready. If O_NONBLOCK is not used, the driver waits for
95 the drive to become ready. If this does not happen in ST_BLOCK_SECONDS
96 seconds, open fails with the errno value EIO. With O_NONBLOCK the
97 device can be opened for writing even if there is a write protected
98 tape in the drive (commands trying to write something return error if
99 attempted).
100
101
102 MINOR NUMBERS
103
104 The tape driver currently supports 128 drives by default. This number
105 can be increased by editing st.h and recompiling the driver if
106 necessary. The upper limit is 2^17 drives if 4 modes for each drive
107 are used.
108
109 The minor numbers consist of the following bit fields:
110
111 dev_upper non-rew mode dev-lower
112 20 - 8 7 6 5 4 0
113 The non-rewind bit is always bit 7 (the uppermost bit in the lowermost
114 byte). The bits defining the mode are below the non-rewind bit. The
115 remaining bits define the tape device number. This numbering is
116 backward compatible with the numbering used when the minor number was
117 only 8 bits wide.
118
119
120 SYSFS SUPPORT
121
122 The driver creates the directory /sys/class/scsi_tape and populates it with
123 directories corresponding to the existing tape devices. There are autorewind
124 and non-rewind entries for each mode. The names are stxy and nstxy, where x
125 is the tape number and y a character corresponding to the mode (none, l, m,
126 a). For example, the directories for the first tape device are (assuming four
127 modes): st0 nst0 st0l nst0l st0m nst0m st0a nst0a.
128
129 Each directory contains the entries: default_blksize default_compression
130 default_density defined dev device driver. The file 'defined' contains 1
131 if the mode is defined and zero if not defined. The files 'default_*' contain
132 the defaults set by the user. The value -1 means the default is not set. The
133 file 'dev' contains the device numbers corresponding to this device. The links
134 'device' and 'driver' point to the SCSI device and driver entries.
135
136 Each directory also contains the entry 'options' which shows the currently
137 enabled driver and mode options. The value in the file is a bit mask where the
138 bit definitions are the same as those used with MTSETDRVBUFFER in setting the
139 options.
140
141 A link named 'tape' is made from the SCSI device directory to the class
142 directory corresponding to the mode 0 auto-rewind device (e.g., st0).
143
144
145 BSD AND SYS V SEMANTICS
146
147 The user can choose between these two behaviours of the tape driver by
148 defining the value of the symbol ST_SYSV. The semantics differ when a
149 file being read is closed. The BSD semantics leaves the tape where it
150 currently is whereas the SYS V semantics moves the tape past the next
151 filemark unless the filemark has just been crossed.
152
153 The default is BSD semantics.
154
155
156 BUFFERING
157
158 The driver tries to do transfers directly to/from user space. If this
159 is not possible, a driver buffer allocated at run-time is used. If
160 direct i/o is not possible for the whole transfer, the driver buffer
161 is used (i.e., bounce buffers for individual pages are not
162 used). Direct i/o can be impossible because of several reasons, e.g.:
163 - one or more pages are at addresses not reachable by the HBA
164 - the number of pages in the transfer exceeds the number of
165 scatter/gather segments permitted by the HBA
166 - one or more pages can't be locked into memory (should not happen in
167 any reasonable situation)
168
169 The size of the driver buffers is always at least one tape block. In fixed
170 block mode, the minimum buffer size is defined (in 1024 byte units) by
171 ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
172 several blocks and using one SCSI read or write to transfer all of the
173 blocks. Buffering of data across write calls in fixed block mode is
174 allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used.
175 Buffer allocation uses chunks of memory having sizes 2^n * (page
176 size). Because of this the actual buffer size may be larger than the
177 minimum allowable buffer size.
178
179 NOTE that if direct i/o is used, the small writes are not buffered. This may
180 cause a surprise when moving from 2.4. There small writes (e.g., tar without
181 -b option) may have had good throughput but this is not true any more with
182 2.6. Direct i/o can be turned off to solve this problem but a better solution
183 is to use bigger write() byte counts (e.g., tar -b 64).
184
185 Asynchronous writing. Writing the buffer contents to the tape is
186 started and the write call returns immediately. The status is checked
187 at the next tape operation. Asynchronous writes are not done with
188 direct i/o and not in fixed block mode.
189
190 Buffered writes and asynchronous writes may in some rare cases cause
191 problems in multivolume operations if there is not enough space on the
192 tape after the early-warning mark to flush the driver buffer.
193
194 Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is
195 attempted even if the user does not want to get all of the data at
196 this read command. Should be disabled for those drives that don't like
197 a filemark to truncate a read request or that don't like backspacing.
198
199 Scatter/gather buffers (buffers that consist of chunks non-contiguous
200 in the physical memory) are used if contiguous buffers can't be
201 allocated. To support all SCSI adapters (including those not
202 supporting scatter/gather), buffer allocation is using the following
203 three kinds of chunks:
204 1. The initial segment that is used for all SCSI adapters including
205 those not supporting scatter/gather. The size of this buffer will be
206 (PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of
207 this size (and it is not larger than the buffer size specified by
208 ST_BUFFER_BLOCKS). If this size is not available, the driver halves
209 the size and tries again until the size of one page. The default
210 settings in st_options.h make the driver to try to allocate all of the
211 buffer as one chunk.
212 2. The scatter/gather segments to fill the specified buffer size are
213 allocated so that as many segments as possible are used but the number
214 of segments does not exceed ST_FIRST_SG.
215 3. The remaining segments between ST_MAX_SG (or the module parameter
216 max_sg_segs) and the number of segments used in phases 1 and 2
217 are used to extend the buffer at run-time if this is necessary. The
218 number of scatter/gather segments allowed for the SCSI adapter is not
219 exceeded if it is smaller than the maximum number of scatter/gather
220 segments specified. If the maximum number allowed for the SCSI adapter
221 is smaller than the number of segments used in phases 1 and 2,
222 extending the buffer will always fail.
223
224
225 EOM BEHAVIOUR WHEN WRITING
226
227 When the end of medium early warning is encountered, the current write
228 is finished and the number of bytes is returned. The next write
229 returns -1 and errno is set to ENOSPC. To enable writing a trailer,
230 the next write is allowed to proceed and, if successful, the number of
231 bytes is returned. After this, -1 and the number of bytes are
232 alternately returned until the physical end of medium (or some other
233 error) is encountered.
234
235
236 MODULE PARAMETERS
237
238 The buffer size, write threshold, and the maximum number of allocated buffers
239 are configurable when the driver is loaded as a module. The keywords are:
240
241 buffer_kbs=xxx the buffer size for fixed block mode is set
242 to xxx kilobytes
243 write_threshold_kbs=xxx the write threshold in kilobytes set to xxx
244 max_sg_segs=xxx the maximum number of scatter/gather
245 segments
246 try_direct_io=x try direct transfer between user buffer and
247 tape drive if this is non-zero
248
249 Note that if the buffer size is changed but the write threshold is not
250 set, the write threshold is set to the new buffer size - 2 kB.
251
252
253 BOOT TIME CONFIGURATION
254
255 If the driver is compiled into the kernel, the same parameters can be
256 also set using, e.g., the LILO command line. The preferred syntax is
257 to use the same keyword used when loading as module but prepended
258 with 'st.'. For instance, to set the maximum number of scatter/gather
259 segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the
260 number of scatter/gather segments).
261
262 For compatibility, the old syntax from early 2.5 and 2.4 kernel
263 versions is supported. The same keywords can be used as when loading
264 the driver as module. If several parameters are set, the keyword-value
265 pairs are separated with a comma (no spaces allowed). A colon can be
266 used instead of the equal mark. The definition is prepended by the
267 string st=. Here is an example:
268
269 st=buffer_kbs:64,write_threshold_kbs:60
270
271 The following syntax used by the old kernel versions is also supported:
272
273 st=aa[,bb[,dd]]
274
275 where
276 aa is the buffer size for fixed block mode in 1024 byte units
277 bb is the write threshold in 1024 byte units
278 dd is the maximum number of scatter/gather segments
279
280
281 IOCTLS
282
283 The tape is positioned and the drive parameters are set with ioctls
284 defined in mtio.h The tape control program 'mt' uses these ioctls. Try
285 to find an mt that supports all of the Linux SCSI tape ioctls and
286 opens the device for writing if the tape contents will be modified
287 (look for a package mt-st* from the Linux ftp sites; the GNU mt does
288 not open for writing for, e.g., erase).
289
290 The supported ioctls are:
291
292 The following use the structure mtop:
293
294 MTFSF Space forward over count filemarks. Tape positioned after filemark.
295 MTFSFM As above but tape positioned before filemark.
296 MTBSF Space backward over count filemarks. Tape positioned before
297 filemark.
298 MTBSFM As above but ape positioned after filemark.
299 MTFSR Space forward over count records.
300 MTBSR Space backward over count records.
301 MTFSS Space forward over count setmarks.
302 MTBSS Space backward over count setmarks.
303 MTWEOF Write count filemarks.
304 MTWSM Write count setmarks.
305 MTREW Rewind tape.
306 MTOFFL Set device off line (often rewind plus eject).
307 MTNOP Do nothing except flush the buffers.
308 MTRETEN Re-tension tape.
309 MTEOM Space to end of recorded data.
310 MTERASE Erase tape. If the argument is zero, the short erase command
311 is used. The long erase command is used with all other values
312 of the argument.
313 MTSEEK Seek to tape block count. Uses Tandberg-compatible seek (QFA)
314 for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and
315 block numbers in the status are not valid after a seek.
316 MTSETBLK Set the drive block size. Setting to zero sets the drive into
317 variable block mode (if applicable).
318 MTSETDENSITY Sets the drive density code to arg. See drive
319 documentation for available codes.
320 MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door.
321 MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the
322 command argument x is between MT_ST_HPLOADER_OFFSET + 1 and
323 MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the
324 drive with the command and it selects the tape slot to use of
325 HP C1553A changer.
326 MTCOMPRESSION Sets compressing or uncompressing drive mode using the
327 SCSI mode page 15. Note that some drives other methods for
328 control of compression. Some drives (like the Exabytes) use
329 density codes for compression control. Some drives use another
330 mode page but this page has not been implemented in the
331 driver. Some drives without compression capability will accept
332 any compression mode without error.
333 MTSETPART Moves the tape to the partition given by the argument at the
334 next tape operation. The block at which the tape is positioned
335 is the block where the tape was previously positioned in the
336 new active partition unless the next tape operation is
337 MTSEEK. In this case the tape is moved directly to the block
338 specified by MTSEEK. MTSETPART is inactive unless
339 MT_ST_CAN_PARTITIONS set.
340 MTMKPART Formats the tape with one partition (argument zero) or two
341 partitions (the argument gives in megabytes the size of
342 partition 1 that is physically the first partition of the
343 tape). The drive has to support partitions with size specified
344 by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set.
345 MTSETDRVBUFFER
346 Is used for several purposes. The command is obtained from count
347 with mask MT_SET_OPTIONS, the low order bits are used as argument.
348 This command is only allowed for the superuser (root). The
349 subcommands are:
350 0
351 The drive buffer option is set to the argument. Zero means
352 no buffering.
353 MT_ST_BOOLEANS
354 Sets the buffering options. The bits are the new states
355 (enabled/disabled) the following options (in the
356 parenthesis is specified whether the option is global or
357 can be specified differently for each mode):
358 MT_ST_BUFFER_WRITES write buffering (mode)
359 MT_ST_ASYNC_WRITES asynchronous writes (mode)
360 MT_ST_READ_AHEAD read ahead (mode)
361 MT_ST_TWO_FM writing of two filemarks (global)
362 MT_ST_FAST_EOM using the SCSI spacing to EOD (global)
363 MT_ST_AUTO_LOCK automatic locking of the drive door (global)
364 MT_ST_DEF_WRITES the defaults are meant only for writes (mode)
365 MT_ST_CAN_BSR backspacing over more than one records can
366 be used for repositioning the tape (global)
367 MT_ST_NO_BLKLIMS the driver does not ask the block limits
368 from the drive (block size can be changed only to
369 variable) (global)
370 MT_ST_CAN_PARTITIONS enables support for partitioned
371 tapes (global)
372 MT_ST_SCSI2LOGICAL the logical block number is used in
373 the MTSEEK and MTIOCPOS for SCSI-2 drives instead of
374 the device dependent address. It is recommended to set
375 this flag unless there are tapes using the device
376 dependent (from the old times) (global)
377 MT_ST_SYSV sets the SYSV semantics (mode)
378 MT_ST_NOWAIT enables immediate mode (i.e., don't wait for
379 the command to finish) for some commands (e.g., rewind)
380 MT_ST_SILI enables setting the SILI bit in SCSI commands when
381 reading in variable block mode to enhance performance when
382 reading blocks shorter than the byte count; set this only
383 if you are sure that the drive supports SILI and the HBA
384 correctly returns transfer residuals
385 MT_ST_DEBUGGING debugging (global; debugging must be
386 compiled into the driver)
387 MT_ST_SETBOOLEANS
388 MT_ST_CLEARBOOLEANS
389 Sets or clears the option bits.
390 MT_ST_WRITE_THRESHOLD
391 Sets the write threshold for this device to kilobytes
392 specified by the lowest bits.
393 MT_ST_DEF_BLKSIZE
394 Defines the default block size set automatically. Value
395 0xffffff means that the default is not used any more.
396 MT_ST_DEF_DENSITY
397 MT_ST_DEF_DRVBUFFER
398 Used to set or clear the density (8 bits), and drive buffer
399 state (3 bits). If the value is MT_ST_CLEAR_DEFAULT
400 (0xfffff) the default will not be used any more. Otherwise
401 the lowermost bits of the value contain the new value of
402 the parameter.
403 MT_ST_DEF_COMPRESSION
404 The compression default will not be used if the value of
405 the lowermost byte is 0xff. Otherwise the lowermost bit
406 contains the new default. If the bits 8-15 are set to a
407 non-zero number, and this number is not 0xff, the number is
408 used as the compression algorithm. The value
409 MT_ST_CLEAR_DEFAULT can be used to clear the compression
410 default.
411 MT_ST_SET_TIMEOUT
412 Set the normal timeout in seconds for this device. The
413 default is 900 seconds (15 minutes). The timeout should be
414 long enough for the retries done by the device while
415 reading/writing.
416 MT_ST_SET_LONG_TIMEOUT
417 Set the long timeout that is used for operations that are
418 known to take a long time. The default is 14000 seconds
419 (3.9 hours). For erase this value is further multiplied by
420 eight.
421 MT_ST_SET_CLN
422 Set the cleaning request interpretation parameters using
423 the lowest 24 bits of the argument. The driver can set the
424 generic status bit GMT_CLN if a cleaning request bit pattern
425 is found from the extended sense data. Many drives set one or
426 more bits in the extended sense data when the drive needs
427 cleaning. The bits are device-dependent. The driver is
428 given the number of the sense data byte (the lowest eight
429 bits of the argument; must be >= 18 (values 1 - 17
430 reserved) and <= the maximum requested sense data sixe),
431 a mask to select the relevant bits (the bits 9-16), and the
432 bit pattern (bits 17-23). If the bit pattern is zero, one
433 or more bits under the mask indicate cleaning request. If
434 the pattern is non-zero, the pattern must match the masked
435 sense data byte.
436
437 (The cleaning bit is set if the additional sense code and
438 qualifier 00h 17h are seen regardless of the setting of
439 MT_ST_SET_CLN.)
440
441 The following ioctl uses the structure mtpos:
442 MTIOCPOS Reads the current position from the drive. Uses
443 Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2
444 command for the SCSI-2 drives.
445
446 The following ioctl uses the structure mtget to return the status:
447 MTIOCGET Returns some status information.
448 The file number and block number within file are returned. The
449 block is -1 when it can't be determined (e.g., after MTBSF).
450 The drive type is either MTISSCSI1 or MTISSCSI2.
451 The number of recovered errors since the previous status call
452 is stored in the lower word of the field mt_erreg.
453 The current block size and the density code are stored in the field
454 mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and
455 MT_ST_DENSITY_SHIFT).
456 The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN
457 is set if there is no tape in the drive. GMT_EOD means either
458 end of recorded data or end of tape. GMT_EOT means end of tape.
459
460
461 MISCELLANEOUS COMPILE OPTIONS
462
463 The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL
464 is defined.
465
466 The maximum number of tape devices is determined by the define
467 ST_MAX_TAPES. If more tapes are detected at driver initialization, the
468 maximum is adjusted accordingly.
469
470 Immediate return from tape positioning SCSI commands can be enabled by
471 defining ST_NOWAIT. If this is defined, the user should take care that
472 the next tape operation is not started before the previous one has
473 finished. The drives and SCSI adapters should handle this condition
474 gracefully, but some drive/adapter combinations are known to hang the
475 SCSI bus in this case.
476
477 The MTEOM command is by default implemented as spacing over 32767
478 filemarks. With this method the file number in the status is
479 correct. The user can request using direct spacing to EOD by setting
480 ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file
481 number will be invalid.
482
483 When using read ahead or buffered writes the position within the file
484 may not be correct after the file is closed (correct position may
485 require backspacing over more than one record). The correct position
486 within file can be obtained if ST_IN_FILE_POS is defined at compile
487 time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl.
488 (The driver always backs over a filemark crossed by read ahead if the
489 user does not request data that far.)
490
491
492 DEBUGGING HINTS
493
494 To enable debugging messages, edit st.c and #define DEBUG 1. As seen
495 above, debugging can be switched off with an ioctl if debugging is
496 compiled into the driver. The debugging output is not voluminous.
497
498 If the tape seems to hang, I would be very interested to hear where
499 the driver is waiting. With the command 'ps -l' you can see the state
500 of the process using the tape. If the state is D, the process is
501 waiting for something. The field WCHAN tells where the driver is
502 waiting. If you have the current System.map in the correct place (in
503 /boot for the procps I use) or have updated /etc/psdatabase (for kmem
504 ps), ps writes the function name in the WCHAN field. If not, you have
505 to look up the function from System.map.
506
507 Note also that the timeouts are very long compared to most other
508 drivers. This means that the Linux driver may appear hung although the
509 real reason is that the tape firmware has got confused.
This page took 0.040479 seconds and 5 git commands to generate.