Commit | Line | Data |
---|---|---|
352f7bae AK |
1 | Most of the text from Keith Owens, hacked by AK |
2 | ||
3 | x86_64 page size (PAGE_SIZE) is 4K. | |
4 | ||
5 | Like all other architectures, x86_64 has a kernel stack for every | |
6 | active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big. | |
7 | These stacks contain useful data as long as a thread is alive or a | |
8 | zombie. While the thread is in user space the kernel stack is empty | |
9 | except for the thread_info structure at the bottom. | |
10 | ||
11 | In addition to the per thread stacks, there are specialized stacks | |
57d30772 RD |
12 | associated with each CPU. These stacks are only used while the kernel |
13 | is in control on that CPU; when a CPU returns to user space the | |
14 | specialized stacks contain no useful data. The main CPU stacks are: | |
352f7bae AK |
15 | |
16 | * Interrupt stack. IRQSTACKSIZE | |
17 | ||
18 | Used for external hardware interrupts. If this is the first external | |
19 | hardware interrupt (i.e. not a nested hardware interrupt) then the | |
20 | kernel switches from the current task to the interrupt stack. Like | |
7974891d CH |
21 | the split thread and interrupt stacks on i386, this gives more room |
22 | for kernel interrupt processing without having to increase the size | |
23 | of every per thread stack. | |
352f7bae AK |
24 | |
25 | The interrupt stack is also used when processing a softirq. | |
26 | ||
27 | Switching to the kernel interrupt stack is done by software based on a | |
28 | per CPU interrupt nest counter. This is needed because x86-64 "IST" | |
29 | hardware stacks cannot nest without races. | |
30 | ||
31 | x86_64 also has a feature which is not available on i386, the ability | |
32 | to automatically switch to a new stack for designated events such as | |
33 | double fault or NMI, which makes it easier to handle these unusual | |
34 | events on x86_64. This feature is called the Interrupt Stack Table | |
57d30772 RD |
35 | (IST). There can be up to 7 IST entries per CPU. The IST code is an |
36 | index into the Task State Segment (TSS). The IST entries in the TSS | |
37 | point to dedicated stacks; each stack can be a different size. | |
352f7bae | 38 | |
57d30772 | 39 | An IST is selected by a non-zero value in the IST field of an |
352f7bae AK |
40 | interrupt-gate descriptor. When an interrupt occurs and the hardware |
41 | loads such a descriptor, the hardware automatically sets the new stack | |
42 | pointer based on the IST value, then invokes the interrupt handler. If | |
43 | software wants to allow nested IST interrupts then the handler must | |
44 | adjust the IST values on entry to and exit from the interrupt handler. | |
57d30772 | 45 | (This is occasionally done, e.g. for debug exceptions.) |
352f7bae AK |
46 | |
47 | Events with different IST codes (i.e. with different stacks) can be | |
48 | nested. For example, a debug interrupt can safely be interrupted by an | |
49 | NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack | |
50 | pointers on entry to and exit from all IST events, in theory allowing | |
51 | IST events with the same code to be nested. However in most cases, the | |
52 | stack size allocated to an IST assumes no nesting for the same code. | |
53 | If that assumption is ever broken then the stacks will become corrupt. | |
54 | ||
55 | The currently assigned IST stacks are :- | |
56 | ||
57 | * STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | |
58 | ||
59 | Used for interrupt 12 - Stack Fault Exception (#SS). | |
60 | ||
57d30772 | 61 | This allows the CPU to recover from invalid stack segments. Rarely |
352f7bae AK |
62 | happens. |
63 | ||
64 | * DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | |
65 | ||
66 | Used for interrupt 8 - Double Fault Exception (#DF). | |
67 | ||
57d30772 RD |
68 | Invoked when handling one exception causes another exception. Happens |
69 | when the kernel is very confused (e.g. kernel stack pointer corrupt). | |
70 | Using a separate stack allows the kernel to recover from it well enough | |
71 | in many cases to still output an oops. | |
352f7bae AK |
72 | |
73 | * NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | |
74 | ||
75 | Used for non-maskable interrupts (NMI). | |
76 | ||
77 | NMI can be delivered at any time, including when the kernel is in the | |
78 | middle of switching stacks. Using IST for NMI events avoids making | |
79 | assumptions about the previous state of the kernel stack. | |
80 | ||
81 | * DEBUG_STACK. DEBUG_STKSZ | |
82 | ||
83 | Used for hardware debug interrupts (interrupt 1) and for software | |
84 | debug interrupts (INT3). | |
85 | ||
86 | When debugging a kernel, debug interrupts (both hardware and | |
87 | software) can occur at any time. Using IST for these interrupts | |
88 | avoids making assumptions about the previous state of the kernel | |
89 | stack. | |
90 | ||
91 | * MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE). | |
92 | ||
93 | Used for interrupt 18 - Machine Check Exception (#MC). | |
94 | ||
95 | MCE can be delivered at any time, including when the kernel is in the | |
96 | middle of switching stacks. Using IST for MCE events avoids making | |
97 | assumptions about the previous state of the kernel stack. | |
98 | ||
99 | For more details see the Intel IA32 or AMD AMD64 architecture manuals. |