Commit | Line | Data |
---|---|---|
9c1b96e3 AK |
1 | The Definitive KVM (Kernel-based Virtual Machine) API Documentation |
2 | =================================================================== | |
3 | ||
4 | 1. General description | |
5 | ||
6 | The kvm API is a set of ioctls that are issued to control various aspects | |
7 | of a virtual machine. The ioctls belong to three classes | |
8 | ||
9 | - System ioctls: These query and set global attributes which affect the | |
10 | whole kvm subsystem. In addition a system ioctl is used to create | |
11 | virtual machines | |
12 | ||
13 | - VM ioctls: These query and set attributes that affect an entire virtual | |
14 | machine, for example memory layout. In addition a VM ioctl is used to | |
15 | create virtual cpus (vcpus). | |
16 | ||
17 | Only run VM ioctls from the same process (address space) that was used | |
18 | to create the VM. | |
19 | ||
20 | - vcpu ioctls: These query and set attributes that control the operation | |
21 | of a single virtual cpu. | |
22 | ||
23 | Only run vcpu ioctls from the same thread that was used to create the | |
24 | vcpu. | |
25 | ||
2044892d | 26 | 2. File descriptors |
9c1b96e3 AK |
27 | |
28 | The kvm API is centered around file descriptors. An initial | |
29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle | |
30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this | |
2044892d | 31 | handle will create a VM file descriptor which can be used to issue VM |
9c1b96e3 AK |
32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu |
33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu | |
34 | fd can be used to control the vcpu, including the important task of | |
35 | actually running guest code. | |
36 | ||
37 | In general file descriptors can be migrated among processes by means | |
38 | of fork() and the SCM_RIGHTS facility of unix domain socket. These | |
39 | kinds of tricks are explicitly not supported by kvm. While they will | |
40 | not cause harm to the host, their actual behavior is not guaranteed by | |
41 | the API. The only supported use is one virtual machine per process, | |
42 | and one vcpu per thread. | |
43 | ||
44 | 3. Extensions | |
45 | ||
46 | As of Linux 2.6.22, the KVM ABI has been stabilized: no backward | |
47 | incompatible change are allowed. However, there is an extension | |
48 | facility that allows backward-compatible extensions to the API to be | |
49 | queried and used. | |
50 | ||
51 | The extension mechanism is not based on on the Linux version number. | |
52 | Instead, kvm defines extension identifiers and a facility to query | |
53 | whether a particular extension identifier is available. If it is, a | |
54 | set of ioctls is available for application use. | |
55 | ||
56 | 4. API description | |
57 | ||
58 | This section describes ioctls that can be used to control kvm guests. | |
59 | For each ioctl, the following information is provided along with a | |
60 | description: | |
61 | ||
62 | Capability: which KVM extension provides this ioctl. Can be 'basic', | |
63 | which means that is will be provided by any kernel that supports | |
64 | API version 12 (see section 4.1), or a KVM_CAP_xyz constant, which | |
65 | means availability needs to be checked with KVM_CHECK_EXTENSION | |
66 | (see section 4.4). | |
67 | ||
68 | Architectures: which instruction set architectures provide this ioctl. | |
69 | x86 includes both i386 and x86_64. | |
70 | ||
71 | Type: system, vm, or vcpu. | |
72 | ||
73 | Parameters: what parameters are accepted by the ioctl. | |
74 | ||
75 | Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) | |
76 | are not detailed, but errors with specific meanings are. | |
77 | ||
78 | 4.1 KVM_GET_API_VERSION | |
79 | ||
80 | Capability: basic | |
81 | Architectures: all | |
82 | Type: system ioctl | |
83 | Parameters: none | |
84 | Returns: the constant KVM_API_VERSION (=12) | |
85 | ||
86 | This identifies the API version as the stable kvm API. It is not | |
87 | expected that this number will change. However, Linux 2.6.20 and | |
88 | 2.6.21 report earlier versions; these are not documented and not | |
89 | supported. Applications should refuse to run if KVM_GET_API_VERSION | |
90 | returns a value other than 12. If this check passes, all ioctls | |
91 | described as 'basic' will be available. | |
92 | ||
93 | 4.2 KVM_CREATE_VM | |
94 | ||
95 | Capability: basic | |
96 | Architectures: all | |
97 | Type: system ioctl | |
98 | Parameters: none | |
99 | Returns: a VM fd that can be used to control the new virtual machine. | |
100 | ||
101 | The new VM has no virtual cpus and no memory. An mmap() of a VM fd | |
102 | will access the virtual machine's physical address space; offset zero | |
103 | corresponds to guest physical address zero. Use of mmap() on a VM fd | |
104 | is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is | |
105 | available. | |
106 | ||
107 | 4.3 KVM_GET_MSR_INDEX_LIST | |
108 | ||
109 | Capability: basic | |
110 | Architectures: x86 | |
111 | Type: system | |
112 | Parameters: struct kvm_msr_list (in/out) | |
113 | Returns: 0 on success; -1 on error | |
114 | Errors: | |
115 | E2BIG: the msr index list is to be to fit in the array specified by | |
116 | the user. | |
117 | ||
118 | struct kvm_msr_list { | |
119 | __u32 nmsrs; /* number of msrs in entries */ | |
120 | __u32 indices[0]; | |
121 | }; | |
122 | ||
123 | This ioctl returns the guest msrs that are supported. The list varies | |
124 | by kvm version and host processor, but does not change otherwise. The | |
125 | user fills in the size of the indices array in nmsrs, and in return | |
126 | kvm adjusts nmsrs to reflect the actual number of msrs and fills in | |
127 | the indices array with their numbers. | |
128 | ||
129 | 4.4 KVM_CHECK_EXTENSION | |
130 | ||
131 | Capability: basic | |
132 | Architectures: all | |
133 | Type: system ioctl | |
134 | Parameters: extension identifier (KVM_CAP_*) | |
135 | Returns: 0 if unsupported; 1 (or some other positive integer) if supported | |
136 | ||
137 | The API allows the application to query about extensions to the core | |
138 | kvm API. Userspace passes an extension identifier (an integer) and | |
139 | receives an integer that describes the extension availability. | |
140 | Generally 0 means no and 1 means yes, but some extensions may report | |
141 | additional information in the integer return value. | |
142 | ||
143 | 4.5 KVM_GET_VCPU_MMAP_SIZE | |
144 | ||
145 | Capability: basic | |
146 | Architectures: all | |
147 | Type: system ioctl | |
148 | Parameters: none | |
149 | Returns: size of vcpu mmap area, in bytes | |
150 | ||
151 | The KVM_RUN ioctl (cf.) communicates with userspace via a shared | |
152 | memory region. This ioctl returns the size of that region. See the | |
153 | KVM_RUN documentation for details. | |
154 | ||
155 | 4.6 KVM_SET_MEMORY_REGION | |
156 | ||
157 | Capability: basic | |
158 | Architectures: all | |
159 | Type: vm ioctl | |
160 | Parameters: struct kvm_memory_region (in) | |
161 | Returns: 0 on success, -1 on error | |
162 | ||
163 | struct kvm_memory_region { | |
164 | __u32 slot; | |
165 | __u32 flags; | |
166 | __u64 guest_phys_addr; | |
167 | __u64 memory_size; /* bytes */ | |
168 | }; | |
169 | ||
170 | /* for kvm_memory_region::flags */ | |
171 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | |
172 | ||
173 | This ioctl allows the user to create or modify a guest physical memory | |
174 | slot. When changing an existing slot, it may be moved in the guest | |
175 | physical memory space, or its flags may be modified. It may not be | |
176 | resized. Slots may not overlap. | |
177 | ||
178 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | |
179 | instructs kvm to keep track of writes to memory within the slot. See | |
180 | the KVM_GET_DIRTY_LOG ioctl. | |
181 | ||
182 | It is recommended to use the KVM_SET_USER_MEMORY_REGION ioctl instead | |
183 | of this API, if available. This newer API allows placing guest memory | |
184 | at specified locations in the host address space, yielding better | |
185 | control and easy access. | |
186 | ||
187 | 4.6 KVM_CREATE_VCPU | |
188 | ||
189 | Capability: basic | |
190 | Architectures: all | |
191 | Type: vm ioctl | |
192 | Parameters: vcpu id (apic id on x86) | |
193 | Returns: vcpu fd on success, -1 on error | |
194 | ||
195 | This API adds a vcpu to a virtual machine. The vcpu id is a small integer | |
196 | in the range [0, max_vcpus). | |
197 | ||
198 | 4.7 KVM_GET_DIRTY_LOG (vm ioctl) | |
199 | ||
200 | Capability: basic | |
201 | Architectures: x86 | |
202 | Type: vm ioctl | |
203 | Parameters: struct kvm_dirty_log (in/out) | |
204 | Returns: 0 on success, -1 on error | |
205 | ||
206 | /* for KVM_GET_DIRTY_LOG */ | |
207 | struct kvm_dirty_log { | |
208 | __u32 slot; | |
209 | __u32 padding; | |
210 | union { | |
211 | void __user *dirty_bitmap; /* one bit per page */ | |
212 | __u64 padding; | |
213 | }; | |
214 | }; | |
215 | ||
216 | Given a memory slot, return a bitmap containing any pages dirtied | |
217 | since the last call to this ioctl. Bit 0 is the first page in the | |
218 | memory slot. Ensure the entire structure is cleared to avoid padding | |
219 | issues. | |
220 | ||
221 | 4.8 KVM_SET_MEMORY_ALIAS | |
222 | ||
223 | Capability: basic | |
224 | Architectures: x86 | |
225 | Type: vm ioctl | |
226 | Parameters: struct kvm_memory_alias (in) | |
227 | Returns: 0 (success), -1 (error) | |
228 | ||
229 | struct kvm_memory_alias { | |
230 | __u32 slot; /* this has a different namespace than memory slots */ | |
231 | __u32 flags; | |
232 | __u64 guest_phys_addr; | |
233 | __u64 memory_size; | |
234 | __u64 target_phys_addr; | |
235 | }; | |
236 | ||
237 | Defines a guest physical address space region as an alias to another | |
238 | region. Useful for aliased address, for example the VGA low memory | |
239 | window. Should not be used with userspace memory. | |
240 | ||
241 | 4.9 KVM_RUN | |
242 | ||
243 | Capability: basic | |
244 | Architectures: all | |
245 | Type: vcpu ioctl | |
246 | Parameters: none | |
247 | Returns: 0 on success, -1 on error | |
248 | Errors: | |
249 | EINTR: an unmasked signal is pending | |
250 | ||
251 | This ioctl is used to run a guest virtual cpu. While there are no | |
252 | explicit parameters, there is an implicit parameter block that can be | |
253 | obtained by mmap()ing the vcpu fd at offset 0, with the size given by | |
254 | KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct | |
255 | kvm_run' (see below). | |
256 | ||
257 | 4.10 KVM_GET_REGS | |
258 | ||
259 | Capability: basic | |
260 | Architectures: all | |
261 | Type: vcpu ioctl | |
262 | Parameters: struct kvm_regs (out) | |
263 | Returns: 0 on success, -1 on error | |
264 | ||
265 | Reads the general purpose registers from the vcpu. | |
266 | ||
267 | /* x86 */ | |
268 | struct kvm_regs { | |
269 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ | |
270 | __u64 rax, rbx, rcx, rdx; | |
271 | __u64 rsi, rdi, rsp, rbp; | |
272 | __u64 r8, r9, r10, r11; | |
273 | __u64 r12, r13, r14, r15; | |
274 | __u64 rip, rflags; | |
275 | }; | |
276 | ||
277 | 4.11 KVM_SET_REGS | |
278 | ||
279 | Capability: basic | |
280 | Architectures: all | |
281 | Type: vcpu ioctl | |
282 | Parameters: struct kvm_regs (in) | |
283 | Returns: 0 on success, -1 on error | |
284 | ||
285 | Writes the general purpose registers into the vcpu. | |
286 | ||
287 | See KVM_GET_REGS for the data structure. | |
288 | ||
289 | 4.12 KVM_GET_SREGS | |
290 | ||
291 | Capability: basic | |
292 | Architectures: x86 | |
293 | Type: vcpu ioctl | |
294 | Parameters: struct kvm_sregs (out) | |
295 | Returns: 0 on success, -1 on error | |
296 | ||
297 | Reads special registers from the vcpu. | |
298 | ||
299 | /* x86 */ | |
300 | struct kvm_sregs { | |
301 | struct kvm_segment cs, ds, es, fs, gs, ss; | |
302 | struct kvm_segment tr, ldt; | |
303 | struct kvm_dtable gdt, idt; | |
304 | __u64 cr0, cr2, cr3, cr4, cr8; | |
305 | __u64 efer; | |
306 | __u64 apic_base; | |
307 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; | |
308 | }; | |
309 | ||
310 | interrupt_bitmap is a bitmap of pending external interrupts. At most | |
311 | one bit may be set. This interrupt has been acknowledged by the APIC | |
312 | but not yet injected into the cpu core. | |
313 | ||
314 | 4.13 KVM_SET_SREGS | |
315 | ||
316 | Capability: basic | |
317 | Architectures: x86 | |
318 | Type: vcpu ioctl | |
319 | Parameters: struct kvm_sregs (in) | |
320 | Returns: 0 on success, -1 on error | |
321 | ||
322 | Writes special registers into the vcpu. See KVM_GET_SREGS for the | |
323 | data structures. | |
324 | ||
325 | 4.14 KVM_TRANSLATE | |
326 | ||
327 | Capability: basic | |
328 | Architectures: x86 | |
329 | Type: vcpu ioctl | |
330 | Parameters: struct kvm_translation (in/out) | |
331 | Returns: 0 on success, -1 on error | |
332 | ||
333 | Translates a virtual address according to the vcpu's current address | |
334 | translation mode. | |
335 | ||
336 | struct kvm_translation { | |
337 | /* in */ | |
338 | __u64 linear_address; | |
339 | ||
340 | /* out */ | |
341 | __u64 physical_address; | |
342 | __u8 valid; | |
343 | __u8 writeable; | |
344 | __u8 usermode; | |
345 | __u8 pad[5]; | |
346 | }; | |
347 | ||
348 | 4.15 KVM_INTERRUPT | |
349 | ||
350 | Capability: basic | |
351 | Architectures: x86 | |
352 | Type: vcpu ioctl | |
353 | Parameters: struct kvm_interrupt (in) | |
354 | Returns: 0 on success, -1 on error | |
355 | ||
356 | Queues a hardware interrupt vector to be injected. This is only | |
357 | useful if in-kernel local APIC is not used. | |
358 | ||
359 | /* for KVM_INTERRUPT */ | |
360 | struct kvm_interrupt { | |
361 | /* in */ | |
362 | __u32 irq; | |
363 | }; | |
364 | ||
365 | Note 'irq' is an interrupt vector, not an interrupt pin or line. | |
366 | ||
367 | 4.16 KVM_DEBUG_GUEST | |
368 | ||
369 | Capability: basic | |
370 | Architectures: none | |
371 | Type: vcpu ioctl | |
372 | Parameters: none) | |
373 | Returns: -1 on error | |
374 | ||
375 | Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. | |
376 | ||
377 | 4.17 KVM_GET_MSRS | |
378 | ||
379 | Capability: basic | |
380 | Architectures: x86 | |
381 | Type: vcpu ioctl | |
382 | Parameters: struct kvm_msrs (in/out) | |
383 | Returns: 0 on success, -1 on error | |
384 | ||
385 | Reads model-specific registers from the vcpu. Supported msr indices can | |
386 | be obtained using KVM_GET_MSR_INDEX_LIST. | |
387 | ||
388 | struct kvm_msrs { | |
389 | __u32 nmsrs; /* number of msrs in entries */ | |
390 | __u32 pad; | |
391 | ||
392 | struct kvm_msr_entry entries[0]; | |
393 | }; | |
394 | ||
395 | struct kvm_msr_entry { | |
396 | __u32 index; | |
397 | __u32 reserved; | |
398 | __u64 data; | |
399 | }; | |
400 | ||
401 | Application code should set the 'nmsrs' member (which indicates the | |
402 | size of the entries array) and the 'index' member of each array entry. | |
403 | kvm will fill in the 'data' member. | |
404 | ||
405 | 4.18 KVM_SET_MSRS | |
406 | ||
407 | Capability: basic | |
408 | Architectures: x86 | |
409 | Type: vcpu ioctl | |
410 | Parameters: struct kvm_msrs (in) | |
411 | Returns: 0 on success, -1 on error | |
412 | ||
413 | Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the | |
414 | data structures. | |
415 | ||
416 | Application code should set the 'nmsrs' member (which indicates the | |
417 | size of the entries array), and the 'index' and 'data' members of each | |
418 | array entry. | |
419 | ||
420 | 4.19 KVM_SET_CPUID | |
421 | ||
422 | Capability: basic | |
423 | Architectures: x86 | |
424 | Type: vcpu ioctl | |
425 | Parameters: struct kvm_cpuid (in) | |
426 | Returns: 0 on success, -1 on error | |
427 | ||
428 | Defines the vcpu responses to the cpuid instruction. Applications | |
429 | should use the KVM_SET_CPUID2 ioctl if available. | |
430 | ||
431 | ||
432 | struct kvm_cpuid_entry { | |
433 | __u32 function; | |
434 | __u32 eax; | |
435 | __u32 ebx; | |
436 | __u32 ecx; | |
437 | __u32 edx; | |
438 | __u32 padding; | |
439 | }; | |
440 | ||
441 | /* for KVM_SET_CPUID */ | |
442 | struct kvm_cpuid { | |
443 | __u32 nent; | |
444 | __u32 padding; | |
445 | struct kvm_cpuid_entry entries[0]; | |
446 | }; | |
447 | ||
448 | 4.20 KVM_SET_SIGNAL_MASK | |
449 | ||
450 | Capability: basic | |
451 | Architectures: x86 | |
452 | Type: vcpu ioctl | |
453 | Parameters: struct kvm_signal_mask (in) | |
454 | Returns: 0 on success, -1 on error | |
455 | ||
456 | Defines which signals are blocked during execution of KVM_RUN. This | |
457 | signal mask temporarily overrides the threads signal mask. Any | |
458 | unblocked signal received (except SIGKILL and SIGSTOP, which retain | |
459 | their traditional behaviour) will cause KVM_RUN to return with -EINTR. | |
460 | ||
461 | Note the signal will only be delivered if not blocked by the original | |
462 | signal mask. | |
463 | ||
464 | /* for KVM_SET_SIGNAL_MASK */ | |
465 | struct kvm_signal_mask { | |
466 | __u32 len; | |
467 | __u8 sigset[0]; | |
468 | }; | |
469 | ||
470 | 4.21 KVM_GET_FPU | |
471 | ||
472 | Capability: basic | |
473 | Architectures: x86 | |
474 | Type: vcpu ioctl | |
475 | Parameters: struct kvm_fpu (out) | |
476 | Returns: 0 on success, -1 on error | |
477 | ||
478 | Reads the floating point state from the vcpu. | |
479 | ||
480 | /* for KVM_GET_FPU and KVM_SET_FPU */ | |
481 | struct kvm_fpu { | |
482 | __u8 fpr[8][16]; | |
483 | __u16 fcw; | |
484 | __u16 fsw; | |
485 | __u8 ftwx; /* in fxsave format */ | |
486 | __u8 pad1; | |
487 | __u16 last_opcode; | |
488 | __u64 last_ip; | |
489 | __u64 last_dp; | |
490 | __u8 xmm[16][16]; | |
491 | __u32 mxcsr; | |
492 | __u32 pad2; | |
493 | }; | |
494 | ||
495 | 4.22 KVM_SET_FPU | |
496 | ||
497 | Capability: basic | |
498 | Architectures: x86 | |
499 | Type: vcpu ioctl | |
500 | Parameters: struct kvm_fpu (in) | |
501 | Returns: 0 on success, -1 on error | |
502 | ||
503 | Writes the floating point state to the vcpu. | |
504 | ||
505 | /* for KVM_GET_FPU and KVM_SET_FPU */ | |
506 | struct kvm_fpu { | |
507 | __u8 fpr[8][16]; | |
508 | __u16 fcw; | |
509 | __u16 fsw; | |
510 | __u8 ftwx; /* in fxsave format */ | |
511 | __u8 pad1; | |
512 | __u16 last_opcode; | |
513 | __u64 last_ip; | |
514 | __u64 last_dp; | |
515 | __u8 xmm[16][16]; | |
516 | __u32 mxcsr; | |
517 | __u32 pad2; | |
518 | }; | |
519 | ||
5dadbfd6 AK |
520 | 4.23 KVM_CREATE_IRQCHIP |
521 | ||
522 | Capability: KVM_CAP_IRQCHIP | |
523 | Architectures: x86, ia64 | |
524 | Type: vm ioctl | |
525 | Parameters: none | |
526 | Returns: 0 on success, -1 on error | |
527 | ||
528 | Creates an interrupt controller model in the kernel. On x86, creates a virtual | |
529 | ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a | |
530 | local APIC. IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23 | |
531 | only go to the IOAPIC. On ia64, a IOSAPIC is created. | |
532 | ||
533 | 4.24 KVM_IRQ_LINE | |
534 | ||
535 | Capability: KVM_CAP_IRQCHIP | |
536 | Architectures: x86, ia64 | |
537 | Type: vm ioctl | |
538 | Parameters: struct kvm_irq_level | |
539 | Returns: 0 on success, -1 on error | |
540 | ||
541 | Sets the level of a GSI input to the interrupt controller model in the kernel. | |
542 | Requires that an interrupt controller model has been previously created with | |
543 | KVM_CREATE_IRQCHIP. Note that edge-triggered interrupts require the level | |
544 | to be set to 1 and then back to 0. | |
545 | ||
546 | struct kvm_irq_level { | |
547 | union { | |
548 | __u32 irq; /* GSI */ | |
549 | __s32 status; /* not used for KVM_IRQ_LEVEL */ | |
550 | }; | |
551 | __u32 level; /* 0 or 1 */ | |
552 | }; | |
553 | ||
554 | 4.25 KVM_GET_IRQCHIP | |
555 | ||
556 | Capability: KVM_CAP_IRQCHIP | |
557 | Architectures: x86, ia64 | |
558 | Type: vm ioctl | |
559 | Parameters: struct kvm_irqchip (in/out) | |
560 | Returns: 0 on success, -1 on error | |
561 | ||
562 | Reads the state of a kernel interrupt controller created with | |
563 | KVM_CREATE_IRQCHIP into a buffer provided by the caller. | |
564 | ||
565 | struct kvm_irqchip { | |
566 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | |
567 | __u32 pad; | |
568 | union { | |
569 | char dummy[512]; /* reserving space */ | |
570 | struct kvm_pic_state pic; | |
571 | struct kvm_ioapic_state ioapic; | |
572 | } chip; | |
573 | }; | |
574 | ||
575 | 4.26 KVM_SET_IRQCHIP | |
576 | ||
577 | Capability: KVM_CAP_IRQCHIP | |
578 | Architectures: x86, ia64 | |
579 | Type: vm ioctl | |
580 | Parameters: struct kvm_irqchip (in) | |
581 | Returns: 0 on success, -1 on error | |
582 | ||
583 | Sets the state of a kernel interrupt controller created with | |
584 | KVM_CREATE_IRQCHIP from a buffer provided by the caller. | |
585 | ||
586 | struct kvm_irqchip { | |
587 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ | |
588 | __u32 pad; | |
589 | union { | |
590 | char dummy[512]; /* reserving space */ | |
591 | struct kvm_pic_state pic; | |
592 | struct kvm_ioapic_state ioapic; | |
593 | } chip; | |
594 | }; | |
595 | ||
ffde22ac ES |
596 | 4.27 KVM_XEN_HVM_CONFIG |
597 | ||
598 | Capability: KVM_CAP_XEN_HVM | |
599 | Architectures: x86 | |
600 | Type: vm ioctl | |
601 | Parameters: struct kvm_xen_hvm_config (in) | |
602 | Returns: 0 on success, -1 on error | |
603 | ||
604 | Sets the MSR that the Xen HVM guest uses to initialize its hypercall | |
605 | page, and provides the starting address and size of the hypercall | |
606 | blobs in userspace. When the guest writes the MSR, kvm copies one | |
607 | page of a blob (32- or 64-bit, depending on the vcpu mode) to guest | |
608 | memory. | |
609 | ||
610 | struct kvm_xen_hvm_config { | |
611 | __u32 flags; | |
612 | __u32 msr; | |
613 | __u64 blob_addr_32; | |
614 | __u64 blob_addr_64; | |
615 | __u8 blob_size_32; | |
616 | __u8 blob_size_64; | |
617 | __u8 pad2[30]; | |
618 | }; | |
619 | ||
afbcf7ab GC |
620 | 4.27 KVM_GET_CLOCK |
621 | ||
622 | Capability: KVM_CAP_ADJUST_CLOCK | |
623 | Architectures: x86 | |
624 | Type: vm ioctl | |
625 | Parameters: struct kvm_clock_data (out) | |
626 | Returns: 0 on success, -1 on error | |
627 | ||
628 | Gets the current timestamp of kvmclock as seen by the current guest. In | |
629 | conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios | |
630 | such as migration. | |
631 | ||
632 | struct kvm_clock_data { | |
633 | __u64 clock; /* kvmclock current value */ | |
634 | __u32 flags; | |
635 | __u32 pad[9]; | |
636 | }; | |
637 | ||
638 | 4.28 KVM_SET_CLOCK | |
639 | ||
640 | Capability: KVM_CAP_ADJUST_CLOCK | |
641 | Architectures: x86 | |
642 | Type: vm ioctl | |
643 | Parameters: struct kvm_clock_data (in) | |
644 | Returns: 0 on success, -1 on error | |
645 | ||
2044892d | 646 | Sets the current timestamp of kvmclock to the value specified in its parameter. |
afbcf7ab GC |
647 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios |
648 | such as migration. | |
649 | ||
650 | struct kvm_clock_data { | |
651 | __u64 clock; /* kvmclock current value */ | |
652 | __u32 flags; | |
653 | __u32 pad[9]; | |
654 | }; | |
655 | ||
3cfc3092 JK |
656 | 4.29 KVM_GET_VCPU_EVENTS |
657 | ||
658 | Capability: KVM_CAP_VCPU_EVENTS | |
48005f64 | 659 | Extended by: KVM_CAP_INTR_SHADOW |
3cfc3092 JK |
660 | Architectures: x86 |
661 | Type: vm ioctl | |
662 | Parameters: struct kvm_vcpu_event (out) | |
663 | Returns: 0 on success, -1 on error | |
664 | ||
665 | Gets currently pending exceptions, interrupts, and NMIs as well as related | |
666 | states of the vcpu. | |
667 | ||
668 | struct kvm_vcpu_events { | |
669 | struct { | |
670 | __u8 injected; | |
671 | __u8 nr; | |
672 | __u8 has_error_code; | |
673 | __u8 pad; | |
674 | __u32 error_code; | |
675 | } exception; | |
676 | struct { | |
677 | __u8 injected; | |
678 | __u8 nr; | |
679 | __u8 soft; | |
48005f64 | 680 | __u8 shadow; |
3cfc3092 JK |
681 | } interrupt; |
682 | struct { | |
683 | __u8 injected; | |
684 | __u8 pending; | |
685 | __u8 masked; | |
686 | __u8 pad; | |
687 | } nmi; | |
688 | __u32 sipi_vector; | |
dab4b911 | 689 | __u32 flags; |
3cfc3092 JK |
690 | }; |
691 | ||
48005f64 JK |
692 | KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that |
693 | interrupt.shadow contains a valid state. Otherwise, this field is undefined. | |
694 | ||
3cfc3092 JK |
695 | 4.30 KVM_SET_VCPU_EVENTS |
696 | ||
697 | Capability: KVM_CAP_VCPU_EVENTS | |
48005f64 | 698 | Extended by: KVM_CAP_INTR_SHADOW |
3cfc3092 JK |
699 | Architectures: x86 |
700 | Type: vm ioctl | |
701 | Parameters: struct kvm_vcpu_event (in) | |
702 | Returns: 0 on success, -1 on error | |
703 | ||
704 | Set pending exceptions, interrupts, and NMIs as well as related states of the | |
705 | vcpu. | |
706 | ||
707 | See KVM_GET_VCPU_EVENTS for the data structure. | |
708 | ||
dab4b911 JK |
709 | Fields that may be modified asynchronously by running VCPUs can be excluded |
710 | from the update. These fields are nmi.pending and sipi_vector. Keep the | |
711 | corresponding bits in the flags field cleared to suppress overwriting the | |
712 | current in-kernel state. The bits are: | |
713 | ||
714 | KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel | |
715 | KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector | |
716 | ||
48005f64 JK |
717 | If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in |
718 | the flags field to signal that interrupt.shadow contains a valid state and | |
719 | shall be written into the VCPU. | |
720 | ||
a1efbe77 JK |
721 | 4.32 KVM_GET_DEBUGREGS |
722 | ||
723 | Capability: KVM_CAP_DEBUGREGS | |
724 | Architectures: x86 | |
725 | Type: vm ioctl | |
726 | Parameters: struct kvm_debugregs (out) | |
727 | Returns: 0 on success, -1 on error | |
728 | ||
729 | Reads debug registers from the vcpu. | |
730 | ||
731 | struct kvm_debugregs { | |
732 | __u64 db[4]; | |
733 | __u64 dr6; | |
734 | __u64 dr7; | |
735 | __u64 flags; | |
736 | __u64 reserved[9]; | |
737 | }; | |
738 | ||
739 | 4.33 KVM_SET_DEBUGREGS | |
740 | ||
741 | Capability: KVM_CAP_DEBUGREGS | |
742 | Architectures: x86 | |
743 | Type: vm ioctl | |
744 | Parameters: struct kvm_debugregs (in) | |
745 | Returns: 0 on success, -1 on error | |
746 | ||
747 | Writes debug registers into the vcpu. | |
748 | ||
749 | See KVM_GET_DEBUGREGS for the data structure. The flags field is unused | |
750 | yet and must be cleared on entry. | |
751 | ||
0f2d8f4d AK |
752 | 4.34 KVM_SET_USER_MEMORY_REGION |
753 | ||
754 | Capability: KVM_CAP_USER_MEM | |
755 | Architectures: all | |
756 | Type: vm ioctl | |
757 | Parameters: struct kvm_userspace_memory_region (in) | |
758 | Returns: 0 on success, -1 on error | |
759 | ||
760 | struct kvm_userspace_memory_region { | |
761 | __u32 slot; | |
762 | __u32 flags; | |
763 | __u64 guest_phys_addr; | |
764 | __u64 memory_size; /* bytes */ | |
765 | __u64 userspace_addr; /* start of the userspace allocated memory */ | |
766 | }; | |
767 | ||
768 | /* for kvm_memory_region::flags */ | |
769 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | |
770 | ||
771 | This ioctl allows the user to create or modify a guest physical memory | |
772 | slot. When changing an existing slot, it may be moved in the guest | |
773 | physical memory space, or its flags may be modified. It may not be | |
774 | resized. Slots may not overlap in guest physical address space. | |
775 | ||
776 | Memory for the region is taken starting at the address denoted by the | |
777 | field userspace_addr, which must point at user addressable memory for | |
778 | the entire memory slot size. Any object may back this memory, including | |
779 | anonymous memory, ordinary files, and hugetlbfs. | |
780 | ||
781 | It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr | |
782 | be identical. This allows large pages in the guest to be backed by large | |
783 | pages in the host. | |
784 | ||
785 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | |
786 | instructs kvm to keep track of writes to memory within the slot. See | |
787 | the KVM_GET_DIRTY_LOG ioctl. | |
788 | ||
789 | When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory | |
790 | region are automatically reflected into the guest. For example, an mmap() | |
791 | that affects the region will be made visible immediately. Another example | |
792 | is madvise(MADV_DROP). | |
793 | ||
794 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. | |
795 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory | |
796 | allocation and is deprecated. | |
3cfc3092 | 797 | |
8a5416db AK |
798 | 4.35 KVM_SET_TSS_ADDR |
799 | ||
800 | Capability: KVM_CAP_SET_TSS_ADDR | |
801 | Architectures: x86 | |
802 | Type: vm ioctl | |
803 | Parameters: unsigned long tss_address (in) | |
804 | Returns: 0 on success, -1 on error | |
805 | ||
806 | This ioctl defines the physical address of a three-page region in the guest | |
807 | physical address space. The region must be within the first 4GB of the | |
808 | guest physical address space and must not conflict with any memory slot | |
809 | or any mmio address. The guest may malfunction if it accesses this memory | |
810 | region. | |
811 | ||
812 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware | |
813 | because of a quirk in the virtualization implementation (see the internals | |
814 | documentation when it pops into existence). | |
815 | ||
71fbfd5f AG |
816 | 4.36 KVM_ENABLE_CAP |
817 | ||
818 | Capability: KVM_CAP_ENABLE_CAP | |
819 | Architectures: ppc | |
820 | Type: vcpu ioctl | |
821 | Parameters: struct kvm_enable_cap (in) | |
822 | Returns: 0 on success; -1 on error | |
823 | ||
824 | +Not all extensions are enabled by default. Using this ioctl the application | |
825 | can enable an extension, making it available to the guest. | |
826 | ||
827 | On systems that do not support this ioctl, it always fails. On systems that | |
828 | do support it, it only works for extensions that are supported for enablement. | |
829 | ||
830 | To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should | |
831 | be used. | |
832 | ||
833 | struct kvm_enable_cap { | |
834 | /* in */ | |
835 | __u32 cap; | |
836 | ||
837 | The capability that is supposed to get enabled. | |
838 | ||
839 | __u32 flags; | |
840 | ||
841 | A bitfield indicating future enhancements. Has to be 0 for now. | |
842 | ||
843 | __u64 args[4]; | |
844 | ||
845 | Arguments for enabling a feature. If a feature needs initial values to | |
846 | function properly, this is the place to put them. | |
847 | ||
848 | __u8 pad[64]; | |
849 | }; | |
850 | ||
b843f065 AK |
851 | 4.37 KVM_GET_MP_STATE |
852 | ||
853 | Capability: KVM_CAP_MP_STATE | |
854 | Architectures: x86, ia64 | |
855 | Type: vcpu ioctl | |
856 | Parameters: struct kvm_mp_state (out) | |
857 | Returns: 0 on success; -1 on error | |
858 | ||
859 | struct kvm_mp_state { | |
860 | __u32 mp_state; | |
861 | }; | |
862 | ||
863 | Returns the vcpu's current "multiprocessing state" (though also valid on | |
864 | uniprocessor guests). | |
865 | ||
866 | Possible values are: | |
867 | ||
868 | - KVM_MP_STATE_RUNNABLE: the vcpu is currently running | |
869 | - KVM_MP_STATE_UNINITIALIZED: the vcpu is an application processor (AP) | |
870 | which has not yet received an INIT signal | |
871 | - KVM_MP_STATE_INIT_RECEIVED: the vcpu has received an INIT signal, and is | |
872 | now ready for a SIPI | |
873 | - KVM_MP_STATE_HALTED: the vcpu has executed a HLT instruction and | |
874 | is waiting for an interrupt | |
875 | - KVM_MP_STATE_SIPI_RECEIVED: the vcpu has just received a SIPI (vector | |
876 | accesible via KVM_GET_VCPU_EVENTS) | |
877 | ||
878 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel | |
879 | irqchip, the multiprocessing state must be maintained by userspace. | |
880 | ||
881 | 4.38 KVM_SET_MP_STATE | |
882 | ||
883 | Capability: KVM_CAP_MP_STATE | |
884 | Architectures: x86, ia64 | |
885 | Type: vcpu ioctl | |
886 | Parameters: struct kvm_mp_state (in) | |
887 | Returns: 0 on success; -1 on error | |
888 | ||
889 | Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for | |
890 | arguments. | |
891 | ||
892 | This ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel | |
893 | irqchip, the multiprocessing state must be maintained by userspace. | |
894 | ||
9c1b96e3 AK |
895 | 5. The kvm_run structure |
896 | ||
897 | Application code obtains a pointer to the kvm_run structure by | |
898 | mmap()ing a vcpu fd. From that point, application code can control | |
899 | execution by changing fields in kvm_run prior to calling the KVM_RUN | |
900 | ioctl, and obtain information about the reason KVM_RUN returned by | |
901 | looking up structure members. | |
902 | ||
903 | struct kvm_run { | |
904 | /* in */ | |
905 | __u8 request_interrupt_window; | |
906 | ||
907 | Request that KVM_RUN return when it becomes possible to inject external | |
908 | interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. | |
909 | ||
910 | __u8 padding1[7]; | |
911 | ||
912 | /* out */ | |
913 | __u32 exit_reason; | |
914 | ||
915 | When KVM_RUN has returned successfully (return value 0), this informs | |
916 | application code why KVM_RUN has returned. Allowable values for this | |
917 | field are detailed below. | |
918 | ||
919 | __u8 ready_for_interrupt_injection; | |
920 | ||
921 | If request_interrupt_window has been specified, this field indicates | |
922 | an interrupt can be injected now with KVM_INTERRUPT. | |
923 | ||
924 | __u8 if_flag; | |
925 | ||
926 | The value of the current interrupt flag. Only valid if in-kernel | |
927 | local APIC is not used. | |
928 | ||
929 | __u8 padding2[2]; | |
930 | ||
931 | /* in (pre_kvm_run), out (post_kvm_run) */ | |
932 | __u64 cr8; | |
933 | ||
934 | The value of the cr8 register. Only valid if in-kernel local APIC is | |
935 | not used. Both input and output. | |
936 | ||
937 | __u64 apic_base; | |
938 | ||
939 | The value of the APIC BASE msr. Only valid if in-kernel local | |
940 | APIC is not used. Both input and output. | |
941 | ||
942 | union { | |
943 | /* KVM_EXIT_UNKNOWN */ | |
944 | struct { | |
945 | __u64 hardware_exit_reason; | |
946 | } hw; | |
947 | ||
948 | If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown | |
949 | reasons. Further architecture-specific information is available in | |
950 | hardware_exit_reason. | |
951 | ||
952 | /* KVM_EXIT_FAIL_ENTRY */ | |
953 | struct { | |
954 | __u64 hardware_entry_failure_reason; | |
955 | } fail_entry; | |
956 | ||
957 | If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due | |
958 | to unknown reasons. Further architecture-specific information is | |
959 | available in hardware_entry_failure_reason. | |
960 | ||
961 | /* KVM_EXIT_EXCEPTION */ | |
962 | struct { | |
963 | __u32 exception; | |
964 | __u32 error_code; | |
965 | } ex; | |
966 | ||
967 | Unused. | |
968 | ||
969 | /* KVM_EXIT_IO */ | |
970 | struct { | |
971 | #define KVM_EXIT_IO_IN 0 | |
972 | #define KVM_EXIT_IO_OUT 1 | |
973 | __u8 direction; | |
974 | __u8 size; /* bytes */ | |
975 | __u16 port; | |
976 | __u32 count; | |
977 | __u64 data_offset; /* relative to kvm_run start */ | |
978 | } io; | |
979 | ||
2044892d | 980 | If exit_reason is KVM_EXIT_IO, then the vcpu has |
9c1b96e3 AK |
981 | executed a port I/O instruction which could not be satisfied by kvm. |
982 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or | |
983 | where kvm expects application code to place the data for the next | |
2044892d | 984 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. |
9c1b96e3 AK |
985 | |
986 | struct { | |
987 | struct kvm_debug_exit_arch arch; | |
988 | } debug; | |
989 | ||
990 | Unused. | |
991 | ||
992 | /* KVM_EXIT_MMIO */ | |
993 | struct { | |
994 | __u64 phys_addr; | |
995 | __u8 data[8]; | |
996 | __u32 len; | |
997 | __u8 is_write; | |
998 | } mmio; | |
999 | ||
2044892d | 1000 | If exit_reason is KVM_EXIT_MMIO, then the vcpu has |
9c1b96e3 AK |
1001 | executed a memory-mapped I/O instruction which could not be satisfied |
1002 | by kvm. The 'data' member contains the written data if 'is_write' is | |
1003 | true, and should be filled by application code otherwise. | |
1004 | ||
ad0a048b AG |
1005 | NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO and KVM_EXIT_OSI, the corresponding |
1006 | operations are complete (and guest state is consistent) only after userspace | |
1007 | has re-entered the kernel with KVM_RUN. The kernel side will first finish | |
67961344 MT |
1008 | incomplete operations and then check for pending signals. Userspace |
1009 | can re-enter the guest with an unmasked signal pending to complete | |
1010 | pending operations. | |
1011 | ||
9c1b96e3 AK |
1012 | /* KVM_EXIT_HYPERCALL */ |
1013 | struct { | |
1014 | __u64 nr; | |
1015 | __u64 args[6]; | |
1016 | __u64 ret; | |
1017 | __u32 longmode; | |
1018 | __u32 pad; | |
1019 | } hypercall; | |
1020 | ||
647dc49e AK |
1021 | Unused. This was once used for 'hypercall to userspace'. To implement |
1022 | such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). | |
1023 | Note KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. | |
9c1b96e3 AK |
1024 | |
1025 | /* KVM_EXIT_TPR_ACCESS */ | |
1026 | struct { | |
1027 | __u64 rip; | |
1028 | __u32 is_write; | |
1029 | __u32 pad; | |
1030 | } tpr_access; | |
1031 | ||
1032 | To be documented (KVM_TPR_ACCESS_REPORTING). | |
1033 | ||
1034 | /* KVM_EXIT_S390_SIEIC */ | |
1035 | struct { | |
1036 | __u8 icptcode; | |
1037 | __u64 mask; /* psw upper half */ | |
1038 | __u64 addr; /* psw lower half */ | |
1039 | __u16 ipa; | |
1040 | __u32 ipb; | |
1041 | } s390_sieic; | |
1042 | ||
1043 | s390 specific. | |
1044 | ||
1045 | /* KVM_EXIT_S390_RESET */ | |
1046 | #define KVM_S390_RESET_POR 1 | |
1047 | #define KVM_S390_RESET_CLEAR 2 | |
1048 | #define KVM_S390_RESET_SUBSYSTEM 4 | |
1049 | #define KVM_S390_RESET_CPU_INIT 8 | |
1050 | #define KVM_S390_RESET_IPL 16 | |
1051 | __u64 s390_reset_flags; | |
1052 | ||
1053 | s390 specific. | |
1054 | ||
1055 | /* KVM_EXIT_DCR */ | |
1056 | struct { | |
1057 | __u32 dcrn; | |
1058 | __u32 data; | |
1059 | __u8 is_write; | |
1060 | } dcr; | |
1061 | ||
1062 | powerpc specific. | |
1063 | ||
ad0a048b AG |
1064 | /* KVM_EXIT_OSI */ |
1065 | struct { | |
1066 | __u64 gprs[32]; | |
1067 | } osi; | |
1068 | ||
1069 | MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch | |
1070 | hypercalls and exit with this exit struct that contains all the guest gprs. | |
1071 | ||
1072 | If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. | |
1073 | Userspace can now handle the hypercall and when it's done modify the gprs as | |
1074 | necessary. Upon guest entry all guest GPRs will then be replaced by the values | |
1075 | in this struct. | |
1076 | ||
9c1b96e3 AK |
1077 | /* Fix the size of the union. */ |
1078 | char padding[256]; | |
1079 | }; | |
1080 | }; |