1 @c This summary of BFD is shared by the BFD and LD docs. 2 @c Copyright (C) 2012-2014 Free Software Foundation, Inc. 3 4 When an object file is opened, BFD subroutines automatically determine 5 the format of the input object file. They then build a descriptor in 6 memory with pointers to routines that will be used to access elements of 7 the object file's data structures. 8 9 As different information from the object files is required, 10 BFD reads from different sections of the file and processes them. 11 For example, a very common operation for the linker is processing symbol 12 tables. Each BFD back end provides a routine for converting 13 between the object file's representation of symbols and an internal 14 canonical format. When the linker asks for the symbol table of an object 15 file, it calls through a memory pointer to the routine from the 16 relevant BFD back end which reads and converts the table into a canonical 17 form. The linker then operates upon the canonical form. When the link is 18 finished and the linker writes the output file's symbol table, 19 another BFD back end routine is called to take the newly 20 created symbol table and convert it into the chosen output format. 21 22 @menu 23 * BFD information loss:: Information Loss 24 * Canonical format:: The BFD canonical object-file format 25 @end menu 26 27 @node BFD information loss 28 @subsection Information Loss 29 30 @emph{Information can be lost during output.} The output formats 31 supported by BFD do not provide identical facilities, and 32 information which can be described in one form has nowhere to go in 33 another format. One example of this is alignment information in 34 @code{b.out}. There is nowhere in an @code{a.out} format file to store 35 alignment information on the contained data, so when a file is linked 36 from @code{b.out} and an @code{a.out} image is produced, alignment 37 information will not propagate to the output file. (The linker will 38 still use the alignment information internally, so the link is performed 39 correctly). 40 41 Another example is COFF section names. COFF files may contain an 42 unlimited number of sections, each one with a textual section name. If 43 the target of the link is a format which does not have many sections (e.g., 44 @code{a.out}) or has sections without names (e.g., the Oasys format), the 45 link cannot be done simply. You can circumvent this problem by 46 describing the desired input-to-output section mapping with the linker command 47 language. 48 49 @emph{Information can be lost during canonicalization.} The BFD 50 internal canonical form of the external formats is not exhaustive; there 51 are structures in input formats for which there is no direct 52 representation internally. This means that the BFD back ends 53 cannot maintain all possible data richness through the transformation 54 between external to internal and back to external formats. 55 56 This limitation is only a problem when an application reads one 57 format and writes another. Each BFD back end is responsible for 58 maintaining as much data as possible, and the internal BFD 59 canonical form has structures which are opaque to the BFD core, 60 and exported only to the back ends. When a file is read in one format, 61 the canonical form is generated for BFD and the application. At the 62 same time, the back end saves away any information which may otherwise 63 be lost. If the data is then written back in the same format, the back 64 end routine will be able to use the canonical form provided by the 65 BFD core as well as the information it prepared earlier. Since 66 there is a great deal of commonality between back ends, 67 there is no information lost when 68 linking or copying big endian COFF to little endian COFF, or @code{a.out} to 69 @code{b.out}. When a mixture of formats is linked, the information is 70 only lost from the files whose format differs from the destination. 71 72 @node Canonical format 73 @subsection The BFD canonical object-file format 74 75 The greatest potential for loss of information occurs when there is the least 76 overlap between the information provided by the source format, that 77 stored by the canonical format, and that needed by the 78 destination format. A brief description of the canonical form may help 79 you understand which kinds of data you can count on preserving across 80 conversions. 81 @cindex BFD canonical format 82 @cindex internal object-file format 83 84 @table @emph 85 @item files 86 Information stored on a per-file basis includes target machine 87 architecture, particular implementation format type, a demand pageable 88 bit, and a write protected bit. Information like Unix magic numbers is 89 not stored here---only the magic numbers' meaning, so a @code{ZMAGIC} 90 file would have both the demand pageable bit and the write protected 91 text bit set. The byte order of the target is stored on a per-file 92 basis, so that big- and little-endian object files may be used with one 93 another. 94 95 @item sections 96 Each section in the input file contains the name of the section, the 97 section's original address in the object file, size and alignment 98 information, various flags, and pointers into other BFD data 99 structures. 100 101 @item symbols 102 Each symbol contains a pointer to the information for the object file 103 which originally defined it, its name, its value, and various flag 104 bits. When a BFD back end reads in a symbol table, it relocates all 105 symbols to make them relative to the base of the section where they were 106 defined. Doing this ensures that each symbol points to its containing 107 section. Each symbol also has a varying amount of hidden private data 108 for the BFD back end. Since the symbol points to the original file, the 109 private data format for that symbol is accessible. @code{ld} can 110 operate on a collection of symbols of wildly different formats without 111 problems. 112 113 Normal global and simple local symbols are maintained on output, so an 114 output file (no matter its format) will retain symbols pointing to 115 functions and to global, static, and common variables. Some symbol 116 information is not worth retaining; in @code{a.out}, type information is 117 stored in the symbol table as long symbol names. This information would 118 be useless to most COFF debuggers; the linker has command line switches 119 to allow users to throw it away. 120 121 There is one word of type information within the symbol, so if the 122 format supports symbol type information within symbols (for example, COFF, 123 IEEE, Oasys) and the type is simple enough to fit within one word 124 (nearly everything but aggregates), the information will be preserved. 125 126 @item relocation level 127 Each canonical BFD relocation record contains a pointer to the symbol to 128 relocate to, the offset of the data to relocate, the section the data 129 is in, and a pointer to a relocation type descriptor. Relocation is 130 performed by passing messages through the relocation type 131 descriptor and the symbol pointer. Therefore, relocations can be performed 132 on output data using a relocation method that is only available in one of the 133 input formats. For instance, Oasys provides a byte relocation format. 134 A relocation record requesting this relocation type would point 135 indirectly to a routine to perform this, so the relocation may be 136 performed on a byte being written to a 68k COFF file, even though 68k COFF 137 has no such relocation type. 138 139 @item line numbers 140 Object formats can contain, for debugging purposes, some form of mapping 141 between symbols, source line numbers, and addresses in the output file. 142 These addresses have to be relocated along with the symbol information. 143 Each symbol with an associated list of line number records points to the 144 first record of the list. The head of a line number list consists of a 145 pointer to the symbol, which allows finding out the address of the 146 function whose line number is being described. The rest of the list is 147 made up of pairs: offsets into the section and line numbers. Any format 148 which can simply derive this information can pass it successfully 149 between formats (COFF, IEEE and Oasys). 150 @end table 151