* bfd.c (typedef bfd, Error reporting, Miscellaneous): Add INODEs.
[deliverable/binutils-gdb.git] / bfd / doc / bfdsumm.texi
CommitLineData
252b5132 1@c This summary of BFD is shared by the BFD and LD docs.
5bf135a7
NC
2@c Copyright 2012
3@c Free Software Foundation, Inc.
4
252b5132
RH
5When an object file is opened, BFD subroutines automatically determine
6the format of the input object file. They then build a descriptor in
7memory with pointers to routines that will be used to access elements of
8the object file's data structures.
9
fe032580 10As different information from the object files is required,
252b5132
RH
11BFD reads from different sections of the file and processes them.
12For example, a very common operation for the linker is processing symbol
13tables. Each BFD back end provides a routine for converting
14between the object file's representation of symbols and an internal
15canonical format. When the linker asks for the symbol table of an object
16file, it calls through a memory pointer to the routine from the
17relevant BFD back end which reads and converts the table into a canonical
18form. The linker then operates upon the canonical form. When the link is
19finished and the linker writes the output file's symbol table,
20another BFD back end routine is called to take the newly
21created symbol table and convert it into the chosen output format.
22
23@menu
24* BFD information loss:: Information Loss
25* Canonical format:: The BFD canonical object-file format
26@end menu
27
28@node BFD information loss
29@subsection Information Loss
30
31@emph{Information can be lost during output.} The output formats
32supported by BFD do not provide identical facilities, and
33information which can be described in one form has nowhere to go in
34another format. One example of this is alignment information in
35@code{b.out}. There is nowhere in an @code{a.out} format file to store
36alignment information on the contained data, so when a file is linked
37from @code{b.out} and an @code{a.out} image is produced, alignment
38information will not propagate to the output file. (The linker will
39still use the alignment information internally, so the link is performed
40correctly).
41
42Another example is COFF section names. COFF files may contain an
43unlimited number of sections, each one with a textual section name. If
44the target of the link is a format which does not have many sections (e.g.,
45@code{a.out}) or has sections without names (e.g., the Oasys format), the
46link cannot be done simply. You can circumvent this problem by
47describing the desired input-to-output section mapping with the linker command
48language.
49
50@emph{Information can be lost during canonicalization.} The BFD
51internal canonical form of the external formats is not exhaustive; there
52are structures in input formats for which there is no direct
53representation internally. This means that the BFD back ends
54cannot maintain all possible data richness through the transformation
55between external to internal and back to external formats.
56
57This limitation is only a problem when an application reads one
58format and writes another. Each BFD back end is responsible for
59maintaining as much data as possible, and the internal BFD
60canonical form has structures which are opaque to the BFD core,
61and exported only to the back ends. When a file is read in one format,
62the canonical form is generated for BFD and the application. At the
63same time, the back end saves away any information which may otherwise
64be lost. If the data is then written back in the same format, the back
65end routine will be able to use the canonical form provided by the
66BFD core as well as the information it prepared earlier. Since
67there is a great deal of commonality between back ends,
68there is no information lost when
69linking or copying big endian COFF to little endian COFF, or @code{a.out} to
70@code{b.out}. When a mixture of formats is linked, the information is
71only lost from the files whose format differs from the destination.
72
73@node Canonical format
74@subsection The BFD canonical object-file format
75
76The greatest potential for loss of information occurs when there is the least
77overlap between the information provided by the source format, that
78stored by the canonical format, and that needed by the
79destination format. A brief description of the canonical form may help
80you understand which kinds of data you can count on preserving across
81conversions.
82@cindex BFD canonical format
83@cindex internal object-file format
84
85@table @emph
86@item files
87Information stored on a per-file basis includes target machine
88architecture, particular implementation format type, a demand pageable
89bit, and a write protected bit. Information like Unix magic numbers is
90not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
91file would have both the demand pageable bit and the write protected
92text bit set. The byte order of the target is stored on a per-file
93basis, so that big- and little-endian object files may be used with one
94another.
95
96@item sections
97Each section in the input file contains the name of the section, the
98section's original address in the object file, size and alignment
99information, various flags, and pointers into other BFD data
100structures.
101
102@item symbols
103Each symbol contains a pointer to the information for the object file
104which originally defined it, its name, its value, and various flag
105bits. When a BFD back end reads in a symbol table, it relocates all
106symbols to make them relative to the base of the section where they were
107defined. Doing this ensures that each symbol points to its containing
108section. Each symbol also has a varying amount of hidden private data
109for the BFD back end. Since the symbol points to the original file, the
110private data format for that symbol is accessible. @code{ld} can
111operate on a collection of symbols of wildly different formats without
112problems.
113
114Normal global and simple local symbols are maintained on output, so an
115output file (no matter its format) will retain symbols pointing to
116functions and to global, static, and common variables. Some symbol
117information is not worth retaining; in @code{a.out}, type information is
118stored in the symbol table as long symbol names. This information would
119be useless to most COFF debuggers; the linker has command line switches
120to allow users to throw it away.
121
122There is one word of type information within the symbol, so if the
123format supports symbol type information within symbols (for example, COFF,
124IEEE, Oasys) and the type is simple enough to fit within one word
125(nearly everything but aggregates), the information will be preserved.
126
127@item relocation level
128Each canonical BFD relocation record contains a pointer to the symbol to
129relocate to, the offset of the data to relocate, the section the data
130is in, and a pointer to a relocation type descriptor. Relocation is
131performed by passing messages through the relocation type
132descriptor and the symbol pointer. Therefore, relocations can be performed
133on output data using a relocation method that is only available in one of the
134input formats. For instance, Oasys provides a byte relocation format.
135A relocation record requesting this relocation type would point
136indirectly to a routine to perform this, so the relocation may be
137performed on a byte being written to a 68k COFF file, even though 68k COFF
138has no such relocation type.
139
140@item line numbers
141Object formats can contain, for debugging purposes, some form of mapping
142between symbols, source line numbers, and addresses in the output file.
143These addresses have to be relocated along with the symbol information.
144Each symbol with an associated list of line number records points to the
145first record of the list. The head of a line number list consists of a
146pointer to the symbol, which allows finding out the address of the
147function whose line number is being described. The rest of the list is
148made up of pairs: offsets into the section and line numbers. Any format
149which can simply derive this information can pass it successfully
150between formats (COFF, IEEE and Oasys).
151@end table
This page took 0.809463 seconds and 4 git commands to generate.