* config.sub: Add support for Sun Chorus
[deliverable/binutils-gdb.git] / ld / ldint.texinfo
CommitLineData
252b5132
RH
1\input texinfo
2@setfilename ldint.info
3
4@ifinfo
5@format
6START-INFO-DIR-ENTRY
7* Ld-Internals: (ldint). The GNU linker internals.
8END-INFO-DIR-ENTRY
9@end format
10@end ifinfo
11
12@ifinfo
13This file documents the internals of the GNU linker ld.
14
15Copyright (C) 1992, 93, 94, 95, 96, 97, 1998 Free Software Foundation, Inc.
16Contributed by Cygnus Support.
17
18Permission is granted to make and distribute verbatim copies of
19this manual provided the copyright notice and this permission notice
20are preserved on all copies.
21
22@ignore
23Permission is granted to process this file through Tex and print the
24results, provided the printed document carries copying permission
25notice identical to this one except for the removal of this paragraph
26(this paragraph not being relevant to the printed manual).
27
28@end ignore
29Permission is granted to copy or distribute modified versions of this
30manual under the terms of the GPL (for which purpose this text may be
31regarded as a program in the language TeX).
32@end ifinfo
33
34@iftex
35@finalout
36@setchapternewpage off
37@settitle GNU Linker Internals
38@titlepage
39@title{A guide to the internals of the GNU linker}
40@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
41@author Cygnus Support
42@page
43
44@tex
45\def\$#1${{#1}} % Kluge: collect RCS revision info without $...$
5b343f5a 46\xdef\manvers{2.10.91} % For use in headers, footers too
252b5132
RH
47{\parskip=0pt
48\hfill Cygnus Support\par
49\hfill \manvers\par
50\hfill \TeX{}info \texinfoversion\par
51}
52@end tex
53
54@vskip 0pt plus 1filll
55Copyright @copyright{} 1992, 93, 94, 95, 96, 97, 1998
56Free Software Foundation, Inc.
57
58Permission is granted to make and distribute verbatim copies of
59this manual provided the copyright notice and this permission notice
60are preserved on all copies.
61
62@end titlepage
63@end iftex
64
65@node Top
66@top
67
68This file documents the internals of the GNU linker @code{ld}. It is a
69collection of miscellaneous information with little form at this point.
70Mostly, it is a repository into which you can put information about
71GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
72
73@menu
74* README:: The README File
75* Emulations:: How linker emulations are generated
76* Emulation Walkthrough:: A Walkthrough of a Typical Emulation
77@end menu
78
79@node README
80@chapter The @file{README} File
81
82Check the @file{README} file; it often has useful information that does not
83appear anywhere else in the directory.
84
85@node Emulations
86@chapter How linker emulations are generated
87
88Each linker target has an @dfn{emulation}. The emulation includes the
89default linker script, and certain emulations also modify certain types
90of linker behaviour.
91
92Emulations are created during the build process by the shell script
93@file{genscripts.sh}.
94
95The @file{genscripts.sh} script starts by reading a file in the
96@file{emulparams} directory. This is a shell script which sets various
97shell variables used by @file{genscripts.sh} and the other shell scripts
98it invokes.
99
100The @file{genscripts.sh} script will invoke a shell script in the
101@file{scripttempl} directory in order to create default linker scripts
102written in the linker command language. The @file{scripttempl} script
103will be invoked 5 (or, in some cases, 6) times, with different
104assignments to shell variables, to create different default scripts.
105The choice of script is made based on the command line options.
106
107After creating the scripts, @file{genscripts.sh} will invoke yet another
108shell script, this time in the @file{emultempl} directory. That shell
109script will create the emulation source file, which contains C code.
110This C code permits the linker emulation to override various linker
111behaviours. Most targets use the generic emulation code, which is in
112@file{emultempl/generic.em}.
113
114To summarize, @file{genscripts.sh} reads three shell scripts: an
115emulation parameters script in the @file{emulparams} directory, a linker
116script generation script in the @file{scripttempl} directory, and an
117emulation source file generation script in the @file{emultempl}
118directory.
119
120For example, the Sun 4 linker sets up variables in
121@file{emulparams/sun4.sh}, creates linker scripts using
122@file{scripttempl/aout.sc}, and creates the emulation code using
123@file{emultempl/sunos.em}.
124
125Note that the linker can support several emulations simultaneously,
126depending upon how it is configured. An emulation can be selected with
127the @code{-m} option. The @code{-V} option will list all supported
128emulations.
129
130@menu
131* emulation parameters:: @file{emulparams} scripts
132* linker scripts:: @file{scripttempl} scripts
133* linker emulations:: @file{emultempl} scripts
134@end menu
135
136@node emulation parameters
137@section @file{emulparams} scripts
138
139Each target selects a particular file in the @file{emulparams} directory
140by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
141This shell variable is used by the @file{configure} script to control
142building an emulation source file.
143
144Certain conventions are enforced. Suppose the @code{targ_emul} variable
145is set to @var{emul} in @file{configure.tgt}. The name of the emulation
146shell script will be @file{emulparams/@var{emul}.sh}. The
147@file{Makefile} must have a target named @file{e@var{emul}.c}; this
148target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
149appropriate scripts in the @file{scripttempl} and @file{emultempl}
150directories. The @file{Makefile} target must invoke @code{GENSCRIPTS}
151with two arguments: @var{emul}, and the value of the make variable
152@code{tdir_@var{emul}}. The value of the latter variable will be set by
153the @file{configure} script, and is used to set the default target
154directory to search.
155
156By convention, the @file{emulparams/@var{emul}.sh} shell script should
157only set shell variables. It may set shell variables which are to be
158interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
159Certain shell variables are interpreted directly by the
160@file{genscripts.sh} script.
161
162Here is a list of shell variables interpreted by @file{genscripts.sh},
163as well as some conventional shell variables interpreted by the
164@file{scripttempl} and @file{emultempl} scripts.
165
166@table @code
167@item SCRIPT_NAME
168This is the name of the @file{scripttempl} script to use. If
169@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
170the script @file{scriptteml/@var{script}.sc}.
171
172@item TEMPLATE_NAME
173This is the name of the @file{emultemlp} script to use. If
174@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
175use the script @file{emultempl/@var{template}.em}. If this variable is
176not set, the default value is @samp{generic}.
177
178@item GENERATE_SHLIB_SCRIPT
179If this is set to a nonempty string, @file{genscripts.sh} will invoke
180the @file{scripttempl} script an extra time to create a shared library
181script. @ref{linker scripts}.
182
183@item OUTPUT_FORMAT
184This is normally set to indicate the BFD output format use (e.g.,
185@samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally
186use it in an @code{OUTPUT_FORMAT} expression in the linker script.
187
188@item ARCH
189This is normally set to indicate the architecture to use (e.g.,
190@samp{sparc}). The @file{scripttempl} script will normally use it in an
191@code{OUTPUT_ARCH} expression in the linker script.
192
193@item ENTRY
194Some @file{scripttempl} scripts use this to set the entry address, in an
195@code{ENTRY} expression in the linker script.
196
197@item TEXT_START_ADDR
198Some @file{scripttempl} scripts use this to set the start address of the
199@samp{.text} section.
200
201@item NONPAGED_TEXT_START_ADDR
202If this is defined, the @file{genscripts.sh} script sets
203@code{TEXT_START_ADDR} to its value before running the
204@file{scripttempl} script for the @code{-n} and @code{-N} options
205(@pxref{linker scripts}).
206
207@item SEGMENT_SIZE
208The @file{genscripts.sh} script uses this to set the default value of
209@code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
210
211@item TARGET_PAGE_SIZE
212If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
213uses this to define it.
214
215@item ALIGNMENT
216Some @file{scripttempl} scripts set this to a number to pass to
217@code{ALIGN} to set the required alignment for the @code{end} symbol.
218@end table
219
220@node linker scripts
221@section @file{scripttempl} scripts
222
223Each linker target uses a @file{scripttempl} script to generate the
224default linker scripts. The name of the @file{scripttempl} script is
225set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
226If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
227invoke @file{scripttempl/@var{script}.sc}.
228
229The @file{genscripts.sh} script will invoke the @file{scripttempl}
230script 5 or 6 times. Each time it will set the shell variable
231@code{LD_FLAG} to a different value. When the linker is run, the
232options used will direct it to select a particular script. (Script
233selection is controlled by the @code{get_script} emulation entry point;
234this describes the conventional behaviour).
235
236The @file{scripttempl} script should just write a linker script, written
237in the linker command language, to standard output. If the emulation
238name--the name of the @file{emulparams} file without the @file{.sc}
239extension--is @var{emul}, then the output will be directed to
240@file{ldscripts/@var{emul}.@var{extension}} in the build directory,
241where @var{extension} changes each time the @file{scripttempl} script is
242invoked.
243
244Here is the list of values assigned to @code{LD_FLAG}.
245
246@table @code
247@item (empty)
248The script generated is used by default (when none of the following
249cases apply). The output has an extension of @file{.x}.
250@item n
251The script generated is used when the linker is invoked with the
252@code{-n} option. The output has an extension of @file{.xn}.
253@item N
254The script generated is used when the linker is invoked with the
255@code{-N} option. The output has an extension of @file{.xbn}.
256@item r
257The script generated is used when the linker is invoked with the
258@code{-r} option. The output has an extension of @file{.xr}.
259@item u
260The script generated is used when the linker is invoked with the
261@code{-Ur} option. The output has an extension of @file{.xu}.
262@item shared
263The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
264this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
265@file{emulparams} file. The @file{emultempl} script must arrange to use
266this script at the appropriate time, normally when the linker is invoked
267with the @code{-shared} option. The output has an extension of
268@file{.xs}.
269@end table
270
271Besides the shell variables set by the @file{emulparams} script, and the
272@code{LD_FLAG} variable, the @file{genscripts.sh} script will set
273certain variables for each run of the @file{scripttempl} script.
274
275@table @code
276@item RELOCATING
277This will be set to a non-empty string when the linker is doing a final
278relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
279
280@item CONSTRUCTING
281This will be set to a non-empty string when the linker is building
282global constructor and destructor tables (e.g., all scripts other than
283@code{-r}).
284
285@item DATA_ALIGNMENT
286This will be set to an @code{ALIGN} expression when the output should be
287page aligned, or to @samp{.} when generating the @code{-N} script.
288
289@item CREATE_SHLIB
290This will be set to a non-empty string when generating a @code{-shared}
291script.
292@end table
293
294The conventional way to write a @file{scripttempl} script is to first
295set a few shell variables, and then write out a linker script using
296@code{cat} with a here document. The linker script will use variable
297substitutions, based on the above variables and those set in the
298@file{emulparams} script, to control its behaviour.
299
300When there are parts of the @file{scripttempl} script which should only
301be run when doing a final relocation, they should be enclosed within a
302variable substitution based on @code{RELOCATING}. For example, on many
303targets special symbols such as @code{_end} should be defined when doing
304a final link. Naturally, those symbols should not be defined when doing
305a relocateable link using @code{-r}. The @file{scripttempl} script
306could use a construct like this to define those symbols:
307@smallexample
308 $@{RELOCATING+ _end = .;@}
309@end smallexample
310This will do the symbol assignment only if the @code{RELOCATING}
311variable is defined.
312
313The basic job of the linker script is to put the sections in the correct
314order, and at the correct memory addresses. For some targets, the
315linker script may have to do some other operations.
316
317For example, on most MIPS platforms, the linker is responsible for
318defining the special symbol @code{_gp}, used to initialize the
319@code{$gp} register. It must be set to the start of the small data
320section plus @code{0x8000}. Naturally, it should only be defined when
321doing a final relocation. This will typically be done like this:
322@smallexample
323 $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
324@end smallexample
325This line would appear just before the sections which compose the small
326data section (@samp{.sdata}, @samp{.sbss}). All those sections would be
327contiguous in memory.
328
329Many COFF systems build constructor tables in the linker script. The
330compiler will arrange to output the address of each global constructor
331in a @samp{.ctor} section, and the address of each global destructor in
332a @samp{.dtor} section (this is done by defining
333@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
334@code{gcc} configuration files). The @code{gcc} runtime support
335routines expect the constructor table to be named @code{__CTOR_LIST__}.
336They expect it to be a list of words, with the first word being the
337count of the number of entries. There should be a trailing zero word.
338(Actually, the count may be -1 if the trailing word is present, and the
339trailing word may be omitted if the count is correct, but, as the
340@code{gcc} behaviour has changed slightly over the years, it is safest
341to provide both). Here is a typical way that might be handled in a
342@file{scripttempl} file.
343@smallexample
344 $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
345 $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
346 $@{CONSTRUCTING+ *(.ctors)@}
347 $@{CONSTRUCTING+ LONG(0)@}
348 $@{CONSTRUCTING+ __CTOR_END__ = .;@}
349 $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
350 $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
351 $@{CONSTRUCTING+ *(.dtors)@}
352 $@{CONSTRUCTING+ LONG(0)@}
353 $@{CONSTRUCTING+ __DTOR_END__ = .;@}
354@end smallexample
355The use of @code{CONSTRUCTING} ensures that these linker script commands
356will only appear when the linker is supposed to be building the
357constructor and destructor tables. This example is written for a target
358which uses 4 byte pointers.
359
360Embedded systems often need to set a stack address. This is normally
361best done by using the @code{PROVIDE} construct with a default stack
362address. This permits the user to easily override the stack address
363using the @code{--defsym} option. Here is an example:
364@smallexample
365 $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
366@end smallexample
367The value of the symbol @code{__stack} would then be used in the startup
368code to initialize the stack pointer.
369
370@node linker emulations
371@section @file{emultempl} scripts
372
373Each linker target uses an @file{emultempl} script to generate the
374emulation code. The name of the @file{emultempl} script is set by the
375@code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the
376@code{TEMPLATE_NAME} variable is not set, the default is
377@samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template},
378@file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
379
380Most targets use the generic @file{emultempl} script,
381@file{emultempl/generic.em}. A different @file{emultempl} script is
382only needed if the linker must support unusual actions, such as linking
383against shared libraries.
384
385The @file{emultempl} script is normally written as a simple invocation
386of @code{cat} with a here document. The document will use a few
387variable substitutions. Typically each function names uses a
388substitution involving @code{EMULATION_NAME}, for ease of debugging when
389the linker supports multiple emulations.
390
391Every function and variable in the emitted file should be static. The
392only globally visible object must be named
393@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
394the name of the emulation set in @file{configure.tgt} (this is also the
395name of the @file{emulparams} file without the @file{.sh} extension).
396The @file{genscripts.sh} script will set the shell variable
397@code{EMULATION_NAME} before invoking the @file{emultempl} script.
398
399The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
400@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
401It defines a set of function pointers which are invoked by the linker,
402as well as strings for the emulation name (normally set from the shell
403variable @code{EMULATION_NAME} and the default BFD target name (normally
404set from the shell variable @code{OUTPUT_FORMAT} which is normally set
405by the @file{emulparams} file).
406
407The @file{genscripts.sh} script will set the shell variable
408@code{COMPILE_IN} when it invokes the @file{emultempl} script for the
409default emulation. In this case, the @file{emultempl} script should
410include the linker scripts directly, and return them from the
411@code{get_scripts} entry point. When the emulation is not the default,
412the @code{get_scripts} entry point should just return a file name. See
413@file{emultempl/generic.em} for an example of how this is done.
414
415At some point, the linker emulation entry points should be documented.
416
417@node Emulation Walkthrough
418@chapter A Walkthrough of a Typical Emulation
419
420This chapter is to help people who are new to the way emulations
421interact with the linker, or who are suddenly thrust into the position
422of having to work with existing emulations. It will discuss the files
423you need to be aware of. It will tell you when the given "hooks" in
424the emulation will be called. It will, hopefully, give you enough
425information about when and how things happen that you'll be able to
426get by. As always, the source is the definitive reference to this.
427
428The starting point for the linker is in @file{ldmain.c} where
429@code{main} is defined. The bulk of the code that's emulation
430specific will initially be in @code{emultempl/@var{emulation}.em} but
431will end up in @code{e@var{emulation}.c} when the build is done.
432Most of the work to select and interface with emulations is in
433@code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h}
434defines the @code{ld_emulation_xfer_struct} structure your emulation
435exports.
436
437Your emulation file exports a symbol
438@code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is
439selected (it usually is, since usually there's only one),
440@code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
441@code{ldemul.c} also defines a number of API functions that interface
442to your emulation, like @code{ldemul_after_parse} which simply calls
443your @code{ld_@var{EMULATION}_emulation.after_parse} function. For
444the rest of this section, the functions will be mentioned, but you
445should assume the indirect reference to your emulation also.
446
447We will also skip or gloss over parts of the link process that don't
448relate to emulations, like setting up internationalization.
449
450After initialization, @code{main} selects an emulation by pre-scanning
451the command line arguments. It calls @code{ldemul_choose_target} to
452choose a target. If you set @code{choose_target} to
453@code{ldemul_default_target}, it picks your @code{target_name} by
454default.
455
456@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
457@code{parse_args} calls @code{ldemul_parse_args} for each arg, which
458must update the @code{getopt} globals if it recognizes the argument.
459If the emulation doesn't recognize it, then parse_args checks to see
460if it recognizes it.
461
462Now that the emulation has had access to all its command-line options,
463@code{main} calls @code{ldemul_set_symbols}. This can be used for any
464initialization that may be affected by options. It is also supposed
465to set up any variables needed by the emulation script.
466
467@code{main} now calls @code{ldemul_get_script} to get the emulation
468script to use (based on arguments, no doubt, @pxref{Emulations}) and
469runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
470@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
471commands. It may call @code{ldemul_unrecognized_file} if you asked
472the linker to link a file it doesn't recognize. It will call
473@code{ldemul_recognized_file} for each file it does recognize, in case
474the emulation wants to handle some files specially. All the while,
475it's loading the files (possibly calling
476@code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's
477done reading the script, @code{main} calls @code{ldemul_after_parse}.
478Use the after-parse hook to set up anything that depends on stuff the
479script might have set up, like the entry point.
480
481@code{main} next calls @code{lang_process} in @code{ldlang.c}. This
482appears to be the main core of the linking itself, as far as emulation
483hooks are concerned(*). It first opens the output file's BFD, calling
484@code{ldemul_set_output_arch}, and calls
485@code{ldemul_create_output_section_statements} in case you need to use
486other means to find or create object files (i.e. shared libraries
487found on a path, or fake stub objects). Despite the name, nobody
488creates output sections here.
489
490(*) In most cases, the BFD library does the bulk of the actual
491linking, handling symbol tables, symbol resolution, relocations, and
492building the final output file. See the BFD reference for all the
493details. Your emulation is usually concerned more with managing
494things at the file and section level, like "put this here, add this
495section", etc.
496
497Next, the objects to be linked are opened and BFDs created for them,
498and @code{ldemul_after_open} is called. At this point, you have all
499the objects and symbols loaded, but none of the data has been placed
500yet.
501
502Next comes the Big Linking Thingy (except for the parts BFD does).
503All input sections are mapped to output sections according to the
504script. If a section doesn't get mapped by default,
505@code{ldemul_place_orphan} will get called to figure out where it goes.
506Next it figures out the offsets for each section, calling
507@code{ldemul_before_allocation} before and
508@code{ldemul_after_allocation} after deciding where each input section
509ends up in the output sections.
510
511The last part of @code{lang_process} is to figure out all the symbols'
512values. After assigning final values to the symbols,
513@code{ldemul_finish} is called, and after that, any undefined symbols
514are turned into fatal errors.
515
516OK, back to @code{main}, which calls @code{ldwrite} in
517@file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does
518all the relocation fixups and writes the output bfd to disk, and we're
519done.
520
521In summary,
522
523@itemize @bullet
524
525@item @code{main()} in @file{ldmain.c}
526@item @file{emultempl/@var{EMULATION}.em} has your code
527@item @code{ldemul_choose_target} (defaults to your @code{target_name})
528@item @code{ldemul_before_parse}
529@item Parse argv, calls @code{ldemul_parse_args} for each
530@item @code{ldemul_set_symbols}
531@item @code{ldemul_get_script}
532@item parse script
533
534@itemize @bullet
535@item may call @code{ldemul_hll} or @code{ldemul_syslib}
536@item may call @code{ldemul_open_dynamic_archive}
537@end itemize
538
539@item @code{ldemul_after_parse}
540@item @code{lang_process()} in @file{ldlang.c}
541
542@itemize @bullet
543@item create @code{output_bfd}
544@item @code{ldemul_set_output_arch}
545@item @code{ldemul_create_output_section_statements}
546@item read objects, create input bfds - all symbols exist, but have no values
547@item may call @code{ldemul_unrecognized_file}
548@item will call @code{ldemul_recognized_file}
549@item @code{ldemul_after_open}
550@item map input sections to output sections
551@item may call @code{ldemul_place_orphan} for remaining sections
552@item @code{ldemul_before_allocation}
553@item gives input sections offsets into output sections, places output sections
554@item @code{ldemul_after_allocation} - section addresses valid
555@item assigns values to symbols
556@item @code{ldemul_finish} - symbol values valid
557@end itemize
558
559@item output bfd is written to disk
560
561@end itemize
562
563@contents
564@bye
This page took 0.086605 seconds and 4 git commands to generate.