Commit | Line | Data |
---|---|---|
4cf1bc4c CD |
1 | KVM/ARM VGIC Forwarded Physical Interrupts |
2 | ========================================== | |
3 | ||
4 | The KVM/ARM code implements software support for the ARM Generic | |
5 | Interrupt Controller's (GIC's) hardware support for virtualization by | |
6 | allowing software to inject virtual interrupts to a VM, which the guest | |
7 | OS sees as regular interrupts. The code is famously known as the VGIC. | |
8 | ||
9 | Some of these virtual interrupts, however, correspond to physical | |
10 | interrupts from real physical devices. One example could be the | |
11 | architected timer, which itself supports virtualization, and therefore | |
12 | lets a guest OS program the hardware device directly to raise an | |
13 | interrupt at some point in time. When such an interrupt is raised, the | |
14 | host OS initially handles the interrupt and must somehow signal this | |
15 | event as a virtual interrupt to the guest. Another example could be a | |
16 | passthrough device, where the physical interrupts are initially handled | |
17 | by the host, but the device driver for the device lives in the guest OS | |
18 | and KVM must therefore somehow inject a virtual interrupt on behalf of | |
19 | the physical one to the guest OS. | |
20 | ||
21 | These virtual interrupts corresponding to a physical interrupt on the | |
22 | host are called forwarded physical interrupts, but are also sometimes | |
23 | referred to as 'virtualized physical interrupts' and 'mapped interrupts'. | |
24 | ||
25 | Forwarded physical interrupts are handled slightly differently compared | |
26 | to virtual interrupts generated purely by a software emulated device. | |
27 | ||
28 | ||
29 | The HW bit | |
30 | ---------- | |
31 | Virtual interrupts are signalled to the guest by programming the List | |
32 | Registers (LRs) on the GIC before running a VCPU. The LR is programmed | |
33 | with the virtual IRQ number and the state of the interrupt (Pending, | |
34 | Active, or Pending+Active). When the guest ACKs and EOIs a virtual | |
35 | interrupt, the LR state moves from Pending to Active, and finally to | |
36 | inactive. | |
37 | ||
38 | The LRs include an extra bit, called the HW bit. When this bit is set, | |
39 | KVM must also program an additional field in the LR, the physical IRQ | |
40 | number, to link the virtual with the physical IRQ. | |
41 | ||
42 | When the HW bit is set, KVM must EITHER set the Pending OR the Active | |
43 | bit, never both at the same time. | |
44 | ||
45 | Setting the HW bit causes the hardware to deactivate the physical | |
46 | interrupt on the physical distributor when the guest deactivates the | |
47 | corresponding virtual interrupt. | |
48 | ||
49 | ||
50 | Forwarded Physical Interrupts Life Cycle | |
51 | ---------------------------------------- | |
52 | ||
53 | The state of forwarded physical interrupts is managed in the following way: | |
54 | ||
55 | - The physical interrupt is acked by the host, and becomes active on | |
56 | the physical distributor (*). | |
57 | - KVM sets the LR.Pending bit, because this is the only way the GICV | |
58 | interface is going to present it to the guest. | |
59 | - LR.Pending will stay set as long as the guest has not acked the interrupt. | |
60 | - LR.Pending transitions to LR.Active on the guest read of the IAR, as | |
61 | expected. | |
62 | - On guest EOI, the *physical distributor* active bit gets cleared, | |
63 | but the LR.Active is left untouched (set). | |
64 | - KVM clears the LR on VM exits when the physical distributor | |
65 | active state has been cleared. | |
66 | ||
67 | (*): The host handling is slightly more complicated. For some forwarded | |
68 | interrupts (shared), KVM directly sets the active state on the physical | |
69 | distributor before entering the guest, because the interrupt is never actually | |
70 | handled on the host (see details on the timer as an example below). For other | |
71 | forwarded interrupts (non-shared) the host does not deactivate the interrupt | |
72 | when the host ISR completes, but leaves the interrupt active until the guest | |
73 | deactivates it. Leaving the interrupt active is allowed, because Linux | |
74 | configures the physical GIC with EOIMode=1, which causes EOI operations to | |
75 | perform a priority drop allowing the GIC to receive other interrupts of the | |
76 | default priority. | |
77 | ||
78 | ||
79 | Forwarded Edge and Level Triggered PPIs and SPIs | |
80 | ------------------------------------------------ | |
81 | Forwarded physical interrupts injected should always be active on the | |
82 | physical distributor when injected to a guest. | |
83 | ||
84 | Level-triggered interrupts will keep the interrupt line to the GIC | |
85 | asserted, typically until the guest programs the device to deassert the | |
86 | line. This means that the interrupt will remain pending on the physical | |
87 | distributor until the guest has reprogrammed the device. Since we | |
88 | always run the VM with interrupts enabled on the CPU, a pending | |
89 | interrupt will exit the guest as soon as we switch into the guest, | |
90 | preventing the guest from ever making progress as the process repeats | |
91 | over and over. Therefore, the active state on the physical distributor | |
92 | must be set when entering the guest, preventing the GIC from forwarding | |
93 | the pending interrupt to the CPU. As soon as the guest deactivates the | |
94 | interrupt, the physical line is sampled by the hardware again and the host | |
95 | takes a new interrupt if and only if the physical line is still asserted. | |
96 | ||
97 | Edge-triggered interrupts do not exhibit the same problem with | |
98 | preventing guest execution that level-triggered interrupts do. One | |
99 | option is to not use HW bit at all, and inject edge-triggered interrupts | |
100 | from a physical device as pure virtual interrupts. But that would | |
101 | potentially slow down handling of the interrupt in the guest, because a | |
102 | physical interrupt occurring in the middle of the guest ISR would | |
103 | preempt the guest for the host to handle the interrupt. Additionally, | |
104 | if you configure the system to handle interrupts on a separate physical | |
105 | core from that running your VCPU, you still have to interrupt the VCPU | |
106 | to queue the pending state onto the LR, even though the guest won't use | |
107 | this information until the guest ISR completes. Therefore, the HW | |
108 | bit should always be set for forwarded edge-triggered interrupts. With | |
109 | the HW bit set, the virtual interrupt is injected and additional | |
110 | physical interrupts occurring before the guest deactivates the interrupt | |
111 | simply mark the state on the physical distributor as Pending+Active. As | |
112 | soon as the guest deactivates the interrupt, the host takes another | |
113 | interrupt if and only if there was a physical interrupt between injecting | |
114 | the forwarded interrupt to the guest and the guest deactivating the | |
115 | interrupt. | |
116 | ||
117 | Consequently, whenever we schedule a VCPU with one or more LRs with the | |
118 | HW bit set, the interrupt must also be active on the physical | |
119 | distributor. | |
120 | ||
121 | ||
122 | Forwarded LPIs | |
123 | -------------- | |
124 | LPIs, introduced in GICv3, are always edge-triggered and do not have an | |
125 | active state. They become pending when a device signal them, and as | |
126 | soon as they are acked by the CPU, they are inactive again. | |
127 | ||
128 | It therefore doesn't make sense, and is not supported, to set the HW bit | |
129 | for physical LPIs that are forwarded to a VM as virtual interrupts, | |
130 | typically virtual SPIs. | |
131 | ||
132 | For LPIs, there is no other choice than to preempt the VCPU thread if | |
133 | necessary, and queue the pending state onto the LR. | |
134 | ||
135 | ||
136 | Putting It Together: The Architected Timer | |
137 | ------------------------------------------ | |
138 | The architected timer is a device that signals interrupts with level | |
139 | triggered semantics. The timer hardware is directly accessed by VCPUs | |
140 | which program the timer to fire at some point in time. Each VCPU on a | |
141 | system programs the timer to fire at different times, and therefore the | |
142 | hardware is multiplexed between multiple VCPUs. This is implemented by | |
143 | context-switching the timer state along with each VCPU thread. | |
144 | ||
145 | However, this means that a scenario like the following is entirely | |
146 | possible, and in fact, typical: | |
147 | ||
148 | 1. KVM runs the VCPU | |
149 | 2. The guest programs the time to fire in T+100 | |
150 | 3. The guest is idle and calls WFI (wait-for-interrupts) | |
151 | 4. The hardware traps to the host | |
152 | 5. KVM stores the timer state to memory and disables the hardware timer | |
153 | 6. KVM schedules a soft timer to fire in T+(100 - time since step 2) | |
154 | 7. KVM puts the VCPU thread to sleep (on a waitqueue) | |
155 | 8. The soft timer fires, waking up the VCPU thread | |
156 | 9. KVM reprograms the timer hardware with the VCPU's values | |
157 | 10. KVM marks the timer interrupt as active on the physical distributor | |
158 | 11. KVM injects a forwarded physical interrupt to the guest | |
159 | 12. KVM runs the VCPU | |
160 | ||
161 | Notice that KVM injects a forwarded physical interrupt in step 11 without | |
162 | the corresponding interrupt having actually fired on the host. That is | |
163 | exactly why we mark the timer interrupt as active in step 10, because | |
164 | the active state on the physical distributor is part of the state | |
165 | belonging to the timer hardware, which is context-switched along with | |
166 | the VCPU thread. | |
167 | ||
168 | If the guest does not idle because it is busy, the flow looks like this | |
169 | instead: | |
170 | ||
171 | 1. KVM runs the VCPU | |
172 | 2. The guest programs the time to fire in T+100 | |
173 | 4. At T+100 the timer fires and a physical IRQ causes the VM to exit | |
174 | (note that this initially only traps to EL2 and does not run the host ISR | |
175 | until KVM has returned to the host). | |
176 | 5. With interrupts still disabled on the CPU coming back from the guest, KVM | |
177 | stores the virtual timer state to memory and disables the virtual hw timer. | |
178 | 6. KVM looks at the timer state (in memory) and injects a forwarded physical | |
179 | interrupt because it concludes the timer has expired. | |
180 | 7. KVM marks the timer interrupt as active on the physical distributor | |
181 | 7. KVM enables the timer, enables interrupts, and runs the VCPU | |
182 | ||
183 | Notice that again the forwarded physical interrupt is injected to the | |
184 | guest without having actually been handled on the host. In this case it | |
185 | is because the physical interrupt is never actually seen by the host because the | |
186 | timer is disabled upon guest return, and the virtual forwarded interrupt is | |
187 | injected on the KVM guest entry path. |