Commit | Line | Data |
---|---|---|
9703d9d7 CM |
1 | Booting AArch64 Linux |
2 | ===================== | |
3 | ||
4 | Author: Will Deacon <will.deacon@arm.com> | |
5 | Date : 07 September 2012 | |
6 | ||
7 | This document is based on the ARM booting document by Russell King and | |
8 | is relevant to all public releases of the AArch64 Linux kernel. | |
9 | ||
10 | The AArch64 exception model is made up of a number of exception levels | |
11 | (EL0 - EL3), with EL0 and EL1 having a secure and a non-secure | |
12 | counterpart. EL2 is the hypervisor level and exists only in non-secure | |
13 | mode. EL3 is the highest priority level and exists only in secure mode. | |
14 | ||
15 | For the purposes of this document, we will use the term `boot loader' | |
16 | simply to define all software that executes on the CPU(s) before control | |
17 | is passed to the Linux kernel. This may include secure monitor and | |
18 | hypervisor code, or it may just be a handful of instructions for | |
19 | preparing a minimal boot environment. | |
20 | ||
21 | Essentially, the boot loader should provide (as a minimum) the | |
22 | following: | |
23 | ||
24 | 1. Setup and initialise the RAM | |
25 | 2. Setup the device tree | |
26 | 3. Decompress the kernel image | |
27 | 4. Call the kernel image | |
28 | ||
29 | ||
30 | 1. Setup and initialise RAM | |
31 | --------------------------- | |
32 | ||
33 | Requirement: MANDATORY | |
34 | ||
35 | The boot loader is expected to find and initialise all RAM that the | |
36 | kernel will use for volatile data storage in the system. It performs | |
37 | this in a machine dependent manner. (It may use internal algorithms | |
38 | to automatically locate and size all RAM, or it may use knowledge of | |
39 | the RAM in the machine, or any other method the boot loader designer | |
40 | sees fit.) | |
41 | ||
42 | ||
43 | 2. Setup the device tree | |
44 | ------------------------- | |
45 | ||
46 | Requirement: MANDATORY | |
47 | ||
4d5e0b15 MS |
48 | The device tree blob (dtb) must be placed on an 8-byte boundary within |
49 | the first 512 megabytes from the start of the kernel image and must not | |
50 | cross a 2-megabyte boundary. This is to allow the kernel to map the | |
9703d9d7 CM |
51 | blob using a single section mapping in the initial page tables. |
52 | ||
53 | ||
54 | 3. Decompress the kernel image | |
55 | ------------------------------ | |
56 | ||
57 | Requirement: OPTIONAL | |
58 | ||
59 | The AArch64 kernel does not currently provide a decompressor and | |
60 | therefore requires decompression (gzip etc.) to be performed by the boot | |
61 | loader if a compressed Image target (e.g. Image.gz) is used. For | |
62 | bootloaders that do not implement this requirement, the uncompressed | |
63 | Image target is available instead. | |
64 | ||
65 | ||
66 | 4. Call the kernel image | |
67 | ------------------------ | |
68 | ||
69 | Requirement: MANDATORY | |
70 | ||
4370eec0 | 71 | The decompressed kernel image contains a 64-byte header as follows: |
9703d9d7 | 72 | |
4370eec0 RF |
73 | u32 code0; /* Executable code */ |
74 | u32 code1; /* Executable code */ | |
a2c1d73b MR |
75 | u64 text_offset; /* Image load offset, little endian */ |
76 | u64 image_size; /* Effective Image size, little endian */ | |
77 | u64 flags; /* kernel flags, little endian */ | |
9703d9d7 | 78 | u64 res2 = 0; /* reserved */ |
4370eec0 RF |
79 | u64 res3 = 0; /* reserved */ |
80 | u64 res4 = 0; /* reserved */ | |
81 | u32 magic = 0x644d5241; /* Magic number, little endian, "ARM\x64" */ | |
a2c1d73b | 82 | u32 res5; /* reserved (used for PE COFF offset) */ |
4370eec0 RF |
83 | |
84 | ||
85 | Header notes: | |
86 | ||
a2c1d73b MR |
87 | - As of v3.17, all fields are little endian unless stated otherwise. |
88 | ||
4370eec0 | 89 | - code0/code1 are responsible for branching to stext. |
a2c1d73b | 90 | |
cdd78578 MS |
91 | - when booting through EFI, code0/code1 are initially skipped. |
92 | res5 is an offset to the PE header and the PE header has the EFI | |
a2c1d73b | 93 | entry point (efi_stub_entry). When the stub has done its work, it |
cdd78578 | 94 | jumps to code0 to resume the normal boot process. |
9703d9d7 | 95 | |
a2c1d73b MR |
96 | - Prior to v3.17, the endianness of text_offset was not specified. In |
97 | these cases image_size is zero and text_offset is 0x80000 in the | |
98 | endianness of the kernel. Where image_size is non-zero image_size is | |
99 | little-endian and must be respected. Where image_size is zero, | |
100 | text_offset can be assumed to be 0x80000. | |
101 | ||
102 | - The flags field (introduced in v3.17) is a little-endian 64-bit field | |
103 | composed as follows: | |
104 | Bit 0: Kernel endianness. 1 if BE, 0 if LE. | |
105 | Bits 1-63: Reserved. | |
106 | ||
107 | - When image_size is zero, a bootloader should attempt to keep as much | |
108 | memory as possible free for use by the kernel immediately after the | |
109 | end of the kernel image. The amount of space required will vary | |
110 | depending on selected features, and is effectively unbound. | |
111 | ||
112 | The Image must be placed text_offset bytes from a 2MB aligned base | |
113 | address near the start of usable system RAM and called there. Memory | |
114 | below that base address is currently unusable by Linux, and therefore it | |
115 | is strongly recommended that this location is the start of system RAM. | |
116 | At least image_size bytes from the start of the image must be free for | |
117 | use by the kernel. | |
118 | ||
119 | Any memory described to the kernel (even that below the 2MB aligned base | |
120 | address) which is not marked as reserved from the kernel e.g. with a | |
121 | memreserve region in the device tree) will be considered as available to | |
122 | the kernel. | |
9703d9d7 CM |
123 | |
124 | Before jumping into the kernel, the following conditions must be met: | |
125 | ||
126 | - Quiesce all DMA capable devices so that memory does not get | |
127 | corrupted by bogus network packets or disk data. This will save | |
128 | you many hours of debug. | |
129 | ||
130 | - Primary CPU general-purpose register settings | |
131 | x0 = physical address of device tree blob (dtb) in system RAM. | |
132 | x1 = 0 (reserved for future use) | |
133 | x2 = 0 (reserved for future use) | |
134 | x3 = 0 (reserved for future use) | |
135 | ||
136 | - CPU mode | |
137 | All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, | |
138 | IRQ and FIQ). | |
139 | The CPU must be in either EL2 (RECOMMENDED in order to have access to | |
140 | the virtualisation extensions) or non-secure EL1. | |
141 | ||
142 | - Caches, MMUs | |
143 | The MMU must be off. | |
144 | Instruction cache may be on or off. | |
c218bca7 CM |
145 | The address range corresponding to the loaded kernel image must be |
146 | cleaned to the PoC. In the presence of a system cache or other | |
147 | coherent masters with caches enabled, this will typically require | |
148 | cache maintenance by VA rather than set/way operations. | |
149 | System caches which respect the architected cache maintenance by VA | |
150 | operations must be configured and may be enabled. | |
151 | System caches which do not respect architected cache maintenance by VA | |
152 | operations (not recommended) must be configured and disabled. | |
9703d9d7 CM |
153 | |
154 | - Architected timers | |
4fcd6e14 MR |
155 | CNTFRQ must be programmed with the timer frequency and CNTVOFF must |
156 | be programmed with a consistent value on all CPUs. If entering the | |
157 | kernel at EL1, CNTHCTL_EL2 must have EL1PCTEN (bit 0) set where | |
158 | available. | |
9703d9d7 CM |
159 | |
160 | - Coherency | |
161 | All CPUs to be booted by the kernel must be part of the same coherency | |
162 | domain on entry to the kernel. This may require IMPLEMENTATION DEFINED | |
163 | initialisation to enable the receiving of maintenance operations on | |
164 | each CPU. | |
165 | ||
166 | - System registers | |
167 | All writable architected system registers at the exception level where | |
168 | the kernel image will be entered must be initialised by software at a | |
169 | higher exception level to prevent execution in an UNKNOWN state. | |
170 | ||
63f8344c MZ |
171 | For systems with a GICv3 interrupt controller: |
172 | - If EL3 is present: | |
173 | ICC_SRE_EL3.Enable (bit 3) must be initialiased to 0b1. | |
174 | ICC_SRE_EL3.SRE (bit 0) must be initialised to 0b1. | |
175 | - If the kernel is entered at EL1: | |
176 | ICC.SRE_EL2.Enable (bit 3) must be initialised to 0b1 | |
177 | ICC_SRE_EL2.SRE (bit 0) must be initialised to 0b1. | |
178 | ||
4fcd6e14 MR |
179 | The requirements described above for CPU mode, caches, MMUs, architected |
180 | timers, coherency and system registers apply to all CPUs. All CPUs must | |
181 | enter the kernel in the same exception level. | |
182 | ||
9703d9d7 CM |
183 | The boot loader is expected to enter the kernel on each CPU in the |
184 | following manner: | |
185 | ||
186 | - The primary CPU must jump directly to the first instruction of the | |
187 | kernel image. The device tree blob passed by this CPU must contain | |
4fcd6e14 MR |
188 | an 'enable-method' property for each cpu node. The supported |
189 | enable-methods are described below. | |
9703d9d7 CM |
190 | |
191 | It is expected that the bootloader will generate these device tree | |
192 | properties and insert them into the blob prior to kernel entry. | |
193 | ||
4fcd6e14 MR |
194 | - CPUs with a "spin-table" enable-method must have a 'cpu-release-addr' |
195 | property in their cpu node. This property identifies a | |
196 | naturally-aligned 64-bit zero-initalised memory location. | |
197 | ||
198 | These CPUs should spin outside of the kernel in a reserved area of | |
199 | memory (communicated to the kernel by a /memreserve/ region in the | |
9703d9d7 CM |
200 | device tree) polling their cpu-release-addr location, which must be |
201 | contained in the reserved region. A wfe instruction may be inserted | |
202 | to reduce the overhead of the busy-loop and a sev will be issued by | |
203 | the primary CPU. When a read of the location pointed to by the | |
4fcd6e14 MR |
204 | cpu-release-addr returns a non-zero value, the CPU must jump to this |
205 | value. The value will be written as a single 64-bit little-endian | |
206 | value, so CPUs must convert the read value to their native endianness | |
207 | before jumping to it. | |
208 | ||
209 | - CPUs with a "psci" enable method should remain outside of | |
210 | the kernel (i.e. outside of the regions of memory described to the | |
211 | kernel in the memory node, or in a reserved area of memory described | |
212 | to the kernel by a /memreserve/ region in the device tree). The | |
213 | kernel will issue CPU_ON calls as described in ARM document number ARM | |
214 | DEN 0022A ("Power State Coordination Interface System Software on ARM | |
215 | processors") to bring CPUs into the kernel. | |
216 | ||
217 | The device tree should contain a 'psci' node, as described in | |
218 | Documentation/devicetree/bindings/arm/psci.txt. | |
9703d9d7 CM |
219 | |
220 | - Secondary CPU general-purpose register settings | |
221 | x0 = 0 (reserved for future use) | |
222 | x1 = 0 (reserved for future use) | |
223 | x2 = 0 (reserved for future use) | |
224 | x3 = 0 (reserved for future use) |