[deliverable/linux.git] / Documentation / virtual / kvm / arm / vgic-mapped-irqs.txt

KVM/ARM VGIC Forwarded Physical Interrupts
==========================================

The KVM/ARM code implements software support for the ARM Generic
Interrupt Controller's (GIC's) hardware support for virtualization by
allowing software to inject virtual interrupts to a VM, which the guest
OS sees as regular interrupts.  The code is famously known as the VGIC.

Some of these virtual interrupts, however, correspond to physical
interrupts from real physical devices.  One example could be the
architected timer, which itself supports virtualization, and therefore
lets a guest OS program the hardware device directly to raise an
interrupt at some point in time.  When such an interrupt is raised, the
host OS initially handles the interrupt and must somehow signal this
event as a virtual interrupt to the guest.  Another example could be a
passthrough device, where the physical interrupts are initially handled
by the host, but the device driver for the device lives in the guest OS
and KVM must therefore somehow inject a virtual interrupt on behalf of
the physical one to the guest OS.

These virtual interrupts corresponding to a physical interrupt on the
host are called forwarded physical interrupts, but are also sometimes
referred to as 'virtualized physical interrupts' and 'mapped interrupts'.

Forwarded physical interrupts are handled slightly differently compared
to virtual interrupts generated purely by a software emulated device.


The HW bit
----------
Virtual interrupts are signalled to the guest by programming the List
Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
with the virtual IRQ number and the state of the interrupt (Pending,
Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
interrupt, the LR state moves from Pending to Active, and finally to
inactive.

The LRs include an extra bit, called the HW bit.  When this bit is set,
KVM must also program an additional field in the LR, the physical IRQ
number, to link the virtual with the physical IRQ.

When the HW bit is set, KVM must EITHER set the Pending OR the Active
bit, never both at the same time.

Setting the HW bit causes the hardware to deactivate the physical
interrupt on the physical distributor when the guest deactivates the
corresponding virtual interrupt.


Forwarded Physical Interrupts Life Cycle
----------------------------------------

The state of forwarded physical interrupts is managed in the following way:

  - The physical interrupt is acked by the host, and becomes active on
    the physical distributor (*).
  - KVM sets the LR.Pending bit, because this is the only way the GICV
    interface is going to present it to the guest.
  - LR.Pending will stay set as long as the guest has not acked the interrupt.
  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
    expected.
  - On guest EOI, the *physical distributor* active bit gets cleared,
    but the LR.Active is left untouched (set).
  - KVM clears the LR on VM exits when the physical distributor
    active state has been cleared.

(*): The host handling is slightly more complicated.  For some forwarded
interrupts (shared), KVM directly sets the active state on the physical
distributor before entering the guest, because the interrupt is never actually
handled on the host (see details on the timer as an example below).  For other
forwarded interrupts (non-shared) the host does not deactivate the interrupt
when the host ISR completes, but leaves the interrupt active until the guest
deactivates it.  Leaving the interrupt active is allowed, because Linux
configures the physical GIC with EOIMode=1, which causes EOI operations to
perform a priority drop allowing the GIC to receive other interrupts of the
default priority.


Forwarded Edge and Level Triggered PPIs and SPIs
------------------------------------------------
Forwarded physical interrupts injected should always be active on the
physical distributor when injected to a guest.

Level-triggered interrupts will keep the interrupt line to the GIC
asserted, typically until the guest programs the device to deassert the
line.  This means that the interrupt will remain pending on the physical
distributor until the guest has reprogrammed the device.  Since we
always run the VM with interrupts enabled on the CPU, a pending
interrupt will exit the guest as soon as we switch into the guest,
preventing the guest from ever making progress as the process repeats
over and over.  Therefore, the active state on the physical distributor
must be set when entering the guest, preventing the GIC from forwarding
the pending interrupt to the CPU.  As soon as the guest deactivates the
interrupt, the physical line is sampled by the hardware again and the host
takes a new interrupt if and only if the physical line is still asserted.

Edge-triggered interrupts do not exhibit the same problem with
preventing guest execution that level-triggered interrupts do.  One
option is to not use HW bit at all, and inject edge-triggered interrupts
from a physical device as pure virtual interrupts.  But that would
potentially slow down handling of the interrupt in the guest, because a
physical interrupt occurring in the middle of the guest ISR would
preempt the guest for the host to handle the interrupt.  Additionally,
if you configure the system to handle interrupts on a separate physical
core from that running your VCPU, you still have to interrupt the VCPU
to queue the pending state onto the LR, even though the guest won't use
this information until the guest ISR completes.  Therefore, the HW
bit should always be set for forwarded edge-triggered interrupts.  With
the HW bit set, the virtual interrupt is injected and additional
physical interrupts occurring before the guest deactivates the interrupt
simply mark the state on the physical distributor as Pending+Active.  As
soon as the guest deactivates the interrupt, the host takes another
interrupt if and only if there was a physical interrupt between injecting
the forwarded interrupt to the guest and the guest deactivating the
interrupt.

Consequently, whenever we schedule a VCPU with one or more LRs with the
HW bit set, the interrupt must also be active on the physical
distributor.


Forwarded LPIs
--------------
LPIs, introduced in GICv3, are always edge-triggered and do not have an
active state.  They become pending when a device signal them, and as
soon as they are acked by the CPU, they are inactive again.

It therefore doesn't make sense, and is not supported, to set the HW bit
for physical LPIs that are forwarded to a VM as virtual interrupts,
typically virtual SPIs.

For LPIs, there is no other choice than to preempt the VCPU thread if
necessary, and queue the pending state onto the LR.


Putting It Together: The Architected Timer
------------------------------------------
The architected timer is a device that signals interrupts with level
triggered semantics.  The timer hardware is directly accessed by VCPUs
which program the timer to fire at some point in time.  Each VCPU on a
system programs the timer to fire at different times, and therefore the
hardware is multiplexed between multiple VCPUs.  This is implemented by
context-switching the timer state along with each VCPU thread.

However, this means that a scenario like the following is entirely
possible, and in fact, typical:

1.  KVM runs the VCPU
2.  The guest programs the time to fire in T+100
3.  The guest is idle and calls WFI (wait-for-interrupts)
4.  The hardware traps to the host
5.  KVM stores the timer state to memory and disables the hardware timer
6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
7.  KVM puts the VCPU thread to sleep (on a waitqueue)
8.  The soft timer fires, waking up the VCPU thread
9.  KVM reprograms the timer hardware with the VCPU's values
10. KVM marks the timer interrupt as active on the physical distributor
11. KVM injects a forwarded physical interrupt to the guest
12. KVM runs the VCPU

Notice that KVM injects a forwarded physical interrupt in step 11 without
the corresponding interrupt having actually fired on the host.  That is
exactly why we mark the timer interrupt as active in step 10, because
the active state on the physical distributor is part of the state
belonging to the timer hardware, which is context-switched along with
the VCPU thread.

If the guest does not idle because it is busy, the flow looks like this
instead:

1.  KVM runs the VCPU
2.  The guest programs the time to fire in T+100
4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
    (note that this initially only traps to EL2 and does not run the host ISR
    until KVM has returned to the host).
5.  With interrupts still disabled on the CPU coming back from the guest, KVM
    stores the virtual timer state to memory and disables the virtual hw timer.
6.  KVM looks at the timer state (in memory) and injects a forwarded physical
    interrupt because it concludes the timer has expired.
7.  KVM marks the timer interrupt as active on the physical distributor
7.  KVM enables the timer, enables interrupts, and runs the VCPU

Notice that again the forwarded physical interrupt is injected to the
guest without having actually been handled on the host.  In this case it
is because the physical interrupt is never actually seen by the host because the
timer is disabled upon guest return, and the virtual forwarded interrupt is
injected on the KVM guest entry path.
Commit	Line	Data
4cf1bc4c CD	1	KVM/ARM VGIC Forwarded Physical Interrupts
	2	==========================================
	3
	4	The KVM/ARM code implements software support for the ARM Generic
	5	Interrupt Controller's (GIC's) hardware support for virtualization by
	6	allowing software to inject virtual interrupts to a VM, which the guest
	7	OS sees as regular interrupts. The code is famously known as the VGIC.
	8
	9	Some of these virtual interrupts, however, correspond to physical
	10	interrupts from real physical devices. One example could be the
	11	architected timer, which itself supports virtualization, and therefore
	12	lets a guest OS program the hardware device directly to raise an
	13	interrupt at some point in time. When such an interrupt is raised, the
	14	host OS initially handles the interrupt and must somehow signal this
	15	event as a virtual interrupt to the guest. Another example could be a
	16	passthrough device, where the physical interrupts are initially handled
	17	by the host, but the device driver for the device lives in the guest OS
	18	and KVM must therefore somehow inject a virtual interrupt on behalf of
	19	the physical one to the guest OS.
	20
	21	These virtual interrupts corresponding to a physical interrupt on the
	22	host are called forwarded physical interrupts, but are also sometimes
	23	referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
	24
	25	Forwarded physical interrupts are handled slightly differently compared
	26	to virtual interrupts generated purely by a software emulated device.
	27
	28
	29	The HW bit
	30	----------
	31	Virtual interrupts are signalled to the guest by programming the List
	32	Registers (LRs) on the GIC before running a VCPU. The LR is programmed
	33	with the virtual IRQ number and the state of the interrupt (Pending,
	34	Active, or Pending+Active). When the guest ACKs and EOIs a virtual
	35	interrupt, the LR state moves from Pending to Active, and finally to
	36	inactive.
	37
	38	The LRs include an extra bit, called the HW bit. When this bit is set,
	39	KVM must also program an additional field in the LR, the physical IRQ
	40	number, to link the virtual with the physical IRQ.
	41
	42	When the HW bit is set, KVM must EITHER set the Pending OR the Active
	43	bit, never both at the same time.
	44
	45	Setting the HW bit causes the hardware to deactivate the physical
	46	interrupt on the physical distributor when the guest deactivates the
	47	corresponding virtual interrupt.
	48
	49
	50	Forwarded Physical Interrupts Life Cycle
	51	----------------------------------------
	52
	53	The state of forwarded physical interrupts is managed in the following way:
	54
	55	- The physical interrupt is acked by the host, and becomes active on
	56	the physical distributor (*).
	57	- KVM sets the LR.Pending bit, because this is the only way the GICV
	58	interface is going to present it to the guest.
	59	- LR.Pending will stay set as long as the guest has not acked the interrupt.
	60	- LR.Pending transitions to LR.Active on the guest read of the IAR, as
	61	expected.
	62	- On guest EOI, the physical distributor active bit gets cleared,
	63	but the LR.Active is left untouched (set).
	64	- KVM clears the LR on VM exits when the physical distributor
65	active state has been cleared.
66
67	(*): The host handling is slightly more complicated. For some forwarded
68	interrupts (shared), KVM directly sets the active state on the physical
69	distributor before entering the guest, because the interrupt is never actually
70	handled on the host (see details on the timer as an example below). For other
71	forwarded interrupts (non-shared) the host does not deactivate the interrupt
72	when the host ISR completes, but leaves the interrupt active until the guest
73	deactivates it. Leaving the interrupt active is allowed, because Linux
74	configures the physical GIC with EOIMode=1, which causes EOI operations to
75	perform a priority drop allowing the GIC to receive other interrupts of the
76	default priority.
77
78
79	Forwarded Edge and Level Triggered PPIs and SPIs
80	------------------------------------------------
81	Forwarded physical interrupts injected should always be active on the
82	physical distributor when injected to a guest.
83
84	Level-triggered interrupts will keep the interrupt line to the GIC
85	asserted, typically until the guest programs the device to deassert the
86	line. This means that the interrupt will remain pending on the physical
87	distributor until the guest has reprogrammed the device. Since we
88	always run the VM with interrupts enabled on the CPU, a pending
89	interrupt will exit the guest as soon as we switch into the guest,
90	preventing the guest from ever making progress as the process repeats
91	over and over. Therefore, the active state on the physical distributor
92	must be set when entering the guest, preventing the GIC from forwarding
93	the pending interrupt to the CPU. As soon as the guest deactivates the
94	interrupt, the physical line is sampled by the hardware again and the host
95	takes a new interrupt if and only if the physical line is still asserted.
96
97	Edge-triggered interrupts do not exhibit the same problem with
98	preventing guest execution that level-triggered interrupts do. One
99	option is to not use HW bit at all, and inject edge-triggered interrupts
100	from a physical device as pure virtual interrupts. But that would
101	potentially slow down handling of the interrupt in the guest, because a
102	physical interrupt occurring in the middle of the guest ISR would
103	preempt the guest for the host to handle the interrupt. Additionally,
104	if you configure the system to handle interrupts on a separate physical
105	core from that running your VCPU, you still have to interrupt the VCPU
106	to queue the pending state onto the LR, even though the guest won't use
107	this information until the guest ISR completes. Therefore, the HW
108	bit should always be set for forwarded edge-triggered interrupts. With
109	the HW bit set, the virtual interrupt is injected and additional
110	physical interrupts occurring before the guest deactivates the interrupt
111	simply mark the state on the physical distributor as Pending+Active. As
112	soon as the guest deactivates the interrupt, the host takes another
113	interrupt if and only if there was a physical interrupt between injecting
114	the forwarded interrupt to the guest and the guest deactivating the
115	interrupt.
116
117	Consequently, whenever we schedule a VCPU with one or more LRs with the
118	HW bit set, the interrupt must also be active on the physical
119	distributor.
120
121
122	Forwarded LPIs
123	--------------
124	LPIs, introduced in GICv3, are always edge-triggered and do not have an
125	active state. They become pending when a device signal them, and as
126	soon as they are acked by the CPU, they are inactive again.
127
128	It therefore doesn't make sense, and is not supported, to set the HW bit
129	for physical LPIs that are forwarded to a VM as virtual interrupts,
130	typically virtual SPIs.
131
132	For LPIs, there is no other choice than to preempt the VCPU thread if
133	necessary, and queue the pending state onto the LR.
134
135
136	Putting It Together: The Architected Timer
137	------------------------------------------
138	The architected timer is a device that signals interrupts with level
139	triggered semantics. The timer hardware is directly accessed by VCPUs
140	which program the timer to fire at some point in time. Each VCPU on a
141	system programs the timer to fire at different times, and therefore the
142	hardware is multiplexed between multiple VCPUs. This is implemented by
143	context-switching the timer state along with each VCPU thread.
144
145	However, this means that a scenario like the following is entirely
146	possible, and in fact, typical:
147
148	1. KVM runs the VCPU
149	2. The guest programs the time to fire in T+100
150	3. The guest is idle and calls WFI (wait-for-interrupts)
151	4. The hardware traps to the host
152	5. KVM stores the timer state to memory and disables the hardware timer
153	6. KVM schedules a soft timer to fire in T+(100 - time since step 2)
154	7. KVM puts the VCPU thread to sleep (on a waitqueue)
155	8. The soft timer fires, waking up the VCPU thread
156	9. KVM reprograms the timer hardware with the VCPU's values
157	10. KVM marks the timer interrupt as active on the physical distributor
158	11. KVM injects a forwarded physical interrupt to the guest
159	12. KVM runs the VCPU
160
161	Notice that KVM injects a forwarded physical interrupt in step 11 without
162	the corresponding interrupt having actually fired on the host. That is
163	exactly why we mark the timer interrupt as active in step 10, because
164	the active state on the physical distributor is part of the state
165	belonging to the timer hardware, which is context-switched along with
166	the VCPU thread.
167
168	If the guest does not idle because it is busy, the flow looks like this
169	instead:
170
171	1. KVM runs the VCPU
172	2. The guest programs the time to fire in T+100
173	4. At T+100 the timer fires and a physical IRQ causes the VM to exit
174	(note that this initially only traps to EL2 and does not run the host ISR
175	until KVM has returned to the host).
176	5. With interrupts still disabled on the CPU coming back from the guest, KVM
177	stores the virtual timer state to memory and disables the virtual hw timer.
178	6. KVM looks at the timer state (in memory) and injects a forwarded physical
179	interrupt because it concludes the timer has expired.
180	7. KVM marks the timer interrupt as active on the physical distributor
181	7. KVM enables the timer, enables interrupts, and runs the VCPU
182
183	Notice that again the forwarded physical interrupt is injected to the
184	guest without having actually been handled on the host. In this case it
185	is because the physical interrupt is never actually seen by the host because the
186	timer is disabled upon guest return, and the virtual forwarded interrupt is
187	injected on the KVM guest entry path.