Home | History | Annotate | Download | only in ld
      1 \input texinfo
      2 @setfilename ldint.info
      3 @c Copyright (C) 1992-2014 Free Software Foundation, Inc.
      4 
      5 @ifnottex
      6 @dircategory Software development
      7 @direntry
      8 * Ld-Internals: (ldint).	The GNU linker internals.
      9 @end direntry
     10 @end ifnottex
     11 
     12 @copying
     13 This file documents the internals of the GNU linker ld.
     14 
     15 Copyright @copyright{} 1992-2014 Free Software Foundation, Inc.
     16 Contributed by Cygnus Support.
     17 
     18 Permission is granted to copy, distribute and/or modify this document
     19 under the terms of the GNU Free Documentation License, Version 1.3 or
     20 any later version published by the Free Software Foundation; with the
     21 Invariant Sections being ``GNU General Public License'' and ``Funding
     22 Free Software'', the Front-Cover texts being (a) (see below), and with
     23 the Back-Cover Texts being (b) (see below).  A copy of the license is
     24 included in the section entitled ``GNU Free Documentation License''.
     25 
     26 (a) The FSF's Front-Cover Text is:
     27 
     28      A GNU Manual
     29 
     30 (b) The FSF's Back-Cover Text is:
     31 
     32      You have freedom to copy and modify this GNU Manual, like GNU
     33      software.  Copies published by the Free Software Foundation raise
     34      funds for GNU development.
     35 @end copying
     36 
     37 @iftex
     38 @finalout
     39 @setchapternewpage off
     40 @settitle GNU Linker Internals
     41 @titlepage
     42 @title{A guide to the internals of the GNU linker}
     43 @author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
     44 @author Cygnus Support
     45 @page
     46 
     47 @tex
     48 \def\$#1${{#1}}  % Kluge: collect RCS revision info without $...$
     49 \xdef\manvers{2.10.91}  % For use in headers, footers too
     50 {\parskip=0pt
     51 \hfill Cygnus Support\par
     52 \hfill \manvers\par
     53 \hfill \TeX{}info \texinfoversion\par
     54 }
     55 @end tex
     56 
     57 @vskip 0pt plus 1filll
     58 Copyright @copyright{} 1992-2014 Free Software Foundation, Inc.
     59 
     60       Permission is granted to copy, distribute and/or modify this document
     61       under the terms of the GNU Free Documentation License, Version 1.3
     62       or any later version published by the Free Software Foundation;
     63       with no Invariant Sections, with no Front-Cover Texts, and with no
     64       Back-Cover Texts.  A copy of the license is included in the
     65       section entitled "GNU Free Documentation License".
     66 
     67 @end titlepage
     68 @end iftex
     69 
     70 @node Top
     71 @top
     72 
     73 This file documents the internals of the GNU linker @code{ld}.  It is a
     74 collection of miscellaneous information with little form at this point.
     75 Mostly, it is a repository into which you can put information about
     76 GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
     77 
     78 This document is distributed under the terms of the GNU Free
     79 Documentation License.  A copy of the license is included in the
     80 section entitled "GNU Free Documentation License".
     81 
     82 @menu
     83 * README::			The README File
     84 * Emulations::			How linker emulations are generated
     85 * Emulation Walkthrough::	A Walkthrough of a Typical Emulation
     86 * Architecture Specific::	Some Architecture Specific Notes
     87 * GNU Free Documentation License::  GNU Free Documentation License
     88 @end menu
     89 
     90 @node README
     91 @chapter The @file{README} File
     92 
     93 Check the @file{README} file; it often has useful information that does not
     94 appear anywhere else in the directory.
     95 
     96 @node Emulations
     97 @chapter How linker emulations are generated
     98 
     99 Each linker target has an @dfn{emulation}.  The emulation includes the
    100 default linker script, and certain emulations also modify certain types
    101 of linker behaviour.
    102 
    103 Emulations are created during the build process by the shell script
    104 @file{genscripts.sh}.
    105 
    106 The @file{genscripts.sh} script starts by reading a file in the
    107 @file{emulparams} directory.  This is a shell script which sets various
    108 shell variables used by @file{genscripts.sh} and the other shell scripts
    109 it invokes.
    110 
    111 The @file{genscripts.sh} script will invoke a shell script in the
    112 @file{scripttempl} directory in order to create default linker scripts
    113 written in the linker command language.  The @file{scripttempl} script
    114 will be invoked 5 (or, in some cases, 6) times, with different
    115 assignments to shell variables, to create different default scripts.
    116 The choice of script is made based on the command line options.
    117 
    118 After creating the scripts, @file{genscripts.sh} will invoke yet another
    119 shell script, this time in the @file{emultempl} directory.  That shell
    120 script will create the emulation source file, which contains C code.
    121 This C code permits the linker emulation to override various linker
    122 behaviours.  Most targets use the generic emulation code, which is in
    123 @file{emultempl/generic.em}.
    124 
    125 To summarize, @file{genscripts.sh} reads three shell scripts: an
    126 emulation parameters script in the @file{emulparams} directory, a linker
    127 script generation script in the @file{scripttempl} directory, and an
    128 emulation source file generation script in the @file{emultempl}
    129 directory.
    130 
    131 For example, the Sun 4 linker sets up variables in
    132 @file{emulparams/sun4.sh}, creates linker scripts using
    133 @file{scripttempl/aout.sc}, and creates the emulation code using
    134 @file{emultempl/sunos.em}.
    135 
    136 Note that the linker can support several emulations simultaneously,
    137 depending upon how it is configured.  An emulation can be selected with
    138 the @code{-m} option.  The @code{-V} option will list all supported
    139 emulations.
    140 
    141 @menu
    142 * emulation parameters::        @file{emulparams} scripts
    143 * linker scripts::              @file{scripttempl} scripts
    144 * linker emulations::           @file{emultempl} scripts
    145 @end menu
    146 
    147 @node emulation parameters
    148 @section @file{emulparams} scripts
    149 
    150 Each target selects a particular file in the @file{emulparams} directory
    151 by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
    152 This shell variable is used by the @file{configure} script to control
    153 building an emulation source file.
    154 
    155 Certain conventions are enforced.  Suppose the @code{targ_emul} variable
    156 is set to @var{emul} in @file{configure.tgt}.  The name of the emulation
    157 shell script will be @file{emulparams/@var{emul}.sh}.  The
    158 @file{Makefile} must have a target named @file{e@var{emul}.c}; this
    159 target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
    160 appropriate scripts in the @file{scripttempl} and @file{emultempl}
    161 directories.  The @file{Makefile} target must invoke @code{GENSCRIPTS}
    162 with two arguments: @var{emul}, and the value of the make variable
    163 @code{tdir_@var{emul}}.  The value of the latter variable will be set by
    164 the @file{configure} script, and is used to set the default target
    165 directory to search.
    166 
    167 By convention, the @file{emulparams/@var{emul}.sh} shell script should
    168 only set shell variables.  It may set shell variables which are to be
    169 interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
    170 Certain shell variables are interpreted directly by the
    171 @file{genscripts.sh} script.
    172 
    173 Here is a list of shell variables interpreted by @file{genscripts.sh},
    174 as well as some conventional shell variables interpreted by the
    175 @file{scripttempl} and @file{emultempl} scripts.
    176 
    177 @table @code
    178 @item SCRIPT_NAME
    179 This is the name of the @file{scripttempl} script to use.  If
    180 @code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
    181 the script @file{scripttempl/@var{script}.sc}.
    182 
    183 @item TEMPLATE_NAME
    184 This is the name of the @file{emultempl} script to use.  If
    185 @code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
    186 use the script @file{emultempl/@var{template}.em}.  If this variable is
    187 not set, the default value is @samp{generic}.
    188 
    189 @item GENERATE_SHLIB_SCRIPT
    190 If this is set to a nonempty string, @file{genscripts.sh} will invoke
    191 the @file{scripttempl} script an extra time to create a shared library
    192 script.  @ref{linker scripts}.
    193 
    194 @item OUTPUT_FORMAT
    195 This is normally set to indicate the BFD output format use (e.g.,
    196 @samp{"a.out-sunos-big"}.  The @file{scripttempl} script will normally
    197 use it in an @code{OUTPUT_FORMAT} expression in the linker script.
    198 
    199 @item ARCH
    200 This is normally set to indicate the architecture to use (e.g.,
    201 @samp{sparc}).  The @file{scripttempl} script will normally use it in an
    202 @code{OUTPUT_ARCH} expression in the linker script.
    203 
    204 @item ENTRY
    205 Some @file{scripttempl} scripts use this to set the entry address, in an
    206 @code{ENTRY} expression in the linker script.
    207 
    208 @item TEXT_START_ADDR
    209 Some @file{scripttempl} scripts use this to set the start address of the
    210 @samp{.text} section.
    211 
    212 @item SEGMENT_SIZE
    213 The @file{genscripts.sh} script uses this to set the default value of
    214 @code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
    215 
    216 @item TARGET_PAGE_SIZE
    217 If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
    218 uses this to define it.
    219 
    220 @item ALIGNMENT
    221 Some @file{scripttempl} scripts set this to a number to pass to
    222 @code{ALIGN} to set the required alignment for the @code{end} symbol.
    223 @end table
    224 
    225 @node linker scripts
    226 @section @file{scripttempl} scripts
    227 
    228 Each linker target uses a @file{scripttempl} script to generate the
    229 default linker scripts.  The name of the @file{scripttempl} script is
    230 set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
    231 If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
    232 invoke @file{scripttempl/@var{script}.sc}.
    233 
    234 The @file{genscripts.sh} script will invoke the @file{scripttempl}
    235 script 5 to 9 times.  Each time it will set the shell variable
    236 @code{LD_FLAG} to a different value.  When the linker is run, the
    237 options used will direct it to select a particular script.  (Script
    238 selection is controlled by the @code{get_script} emulation entry point;
    239 this describes the conventional behaviour).
    240 
    241 The @file{scripttempl} script should just write a linker script, written
    242 in the linker command language, to standard output.  If the emulation
    243 name--the name of the @file{emulparams} file without the @file{.sc}
    244 extension--is @var{emul}, then the output will be directed to
    245 @file{ldscripts/@var{emul}.@var{extension}} in the build directory,
    246 where @var{extension} changes each time the @file{scripttempl} script is
    247 invoked.
    248 
    249 Here is the list of values assigned to @code{LD_FLAG}.
    250 
    251 @table @code
    252 @item (empty)
    253 The script generated is used by default (when none of the following
    254 cases apply).  The output has an extension of @file{.x}.
    255 @item n
    256 The script generated is used when the linker is invoked with the
    257 @code{-n} option.  The output has an extension of @file{.xn}.
    258 @item N
    259 The script generated is used when the linker is invoked with the
    260 @code{-N} option.  The output has an extension of @file{.xbn}.
    261 @item r
    262 The script generated is used when the linker is invoked with the
    263 @code{-r} option.  The output has an extension of @file{.xr}.
    264 @item u
    265 The script generated is used when the linker is invoked with the
    266 @code{-Ur} option.  The output has an extension of @file{.xu}.
    267 @item shared
    268 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
    269 this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
    270 @file{emulparams} file.  The @file{emultempl} script must arrange to use
    271 this script at the appropriate time, normally when the linker is invoked
    272 with the @code{-shared} option.  The output has an extension of
    273 @file{.xs}.
    274 @item c
    275 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
    276 this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
    277 @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
    278 @file{emultempl} script must arrange to use this script at the appropriate
    279 time, normally when the linker is invoked with the @code{-z combreloc}
    280 option.  The output has an extension of
    281 @file{.xc}.
    282 @item cshared
    283 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
    284 this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
    285 @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
    286 @code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
    287 The @file{emultempl} script must arrange to use this script at the
    288 appropriate time, normally when the linker is invoked with the @code{-shared
    289 -z combreloc} option.  The output has an extension of @file{.xsc}.
    290 @item auto_import
    291 The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
    292 this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
    293 @file{emulparams} file.  The @file{emultempl} script must arrange to
    294 use this script at the appropriate time, normally when the linker is
    295 invoked with the @code{--enable-auto-import} option.  The output has
    296 an extension of @file{.xa}.
    297 @end table
    298 
    299 Besides the shell variables set by the @file{emulparams} script, and the
    300 @code{LD_FLAG} variable, the @file{genscripts.sh} script will set
    301 certain variables for each run of the @file{scripttempl} script.
    302 
    303 @table @code
    304 @item RELOCATING
    305 This will be set to a non-empty string when the linker is doing a final
    306 relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
    307 
    308 @item CONSTRUCTING
    309 This will be set to a non-empty string when the linker is building
    310 global constructor and destructor tables (e.g., all scripts other than
    311 @code{-r}).
    312 
    313 @item DATA_ALIGNMENT
    314 This will be set to an @code{ALIGN} expression when the output should be
    315 page aligned, or to @samp{.} when generating the @code{-N} script.
    316 
    317 @item CREATE_SHLIB
    318 This will be set to a non-empty string when generating a @code{-shared}
    319 script.
    320 
    321 @item COMBRELOC
    322 This will be set to a non-empty string when generating @code{-z combreloc}
    323 scripts to a temporary file name which can be used during script generation.
    324 @end table
    325 
    326 The conventional way to write a @file{scripttempl} script is to first
    327 set a few shell variables, and then write out a linker script using
    328 @code{cat} with a here document.  The linker script will use variable
    329 substitutions, based on the above variables and those set in the
    330 @file{emulparams} script, to control its behaviour.
    331 
    332 When there are parts of the @file{scripttempl} script which should only
    333 be run when doing a final relocation, they should be enclosed within a
    334 variable substitution based on @code{RELOCATING}.  For example, on many
    335 targets special symbols such as @code{_end} should be defined when doing
    336 a final link.  Naturally, those symbols should not be defined when doing
    337 a relocatable link using @code{-r}.  The @file{scripttempl} script
    338 could use a construct like this to define those symbols:
    339 @smallexample
    340   $@{RELOCATING+ _end = .;@}
    341 @end smallexample
    342 This will do the symbol assignment only if the @code{RELOCATING}
    343 variable is defined.
    344 
    345 The basic job of the linker script is to put the sections in the correct
    346 order, and at the correct memory addresses.  For some targets, the
    347 linker script may have to do some other operations.
    348 
    349 For example, on most MIPS platforms, the linker is responsible for
    350 defining the special symbol @code{_gp}, used to initialize the
    351 @code{$gp} register.  It must be set to the start of the small data
    352 section plus @code{0x8000}.  Naturally, it should only be defined when
    353 doing a final relocation.  This will typically be done like this:
    354 @smallexample
    355   $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
    356 @end smallexample
    357 This line would appear just before the sections which compose the small
    358 data section (@samp{.sdata}, @samp{.sbss}).  All those sections would be
    359 contiguous in memory.
    360 
    361 Many COFF systems build constructor tables in the linker script.  The
    362 compiler will arrange to output the address of each global constructor
    363 in a @samp{.ctor} section, and the address of each global destructor in
    364 a @samp{.dtor} section (this is done by defining
    365 @code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
    366 @code{gcc} configuration files).  The @code{gcc} runtime support
    367 routines expect the constructor table to be named @code{__CTOR_LIST__}.
    368 They expect it to be a list of words, with the first word being the
    369 count of the number of entries.  There should be a trailing zero word.
    370 (Actually, the count may be -1 if the trailing word is present, and the
    371 trailing word may be omitted if the count is correct, but, as the
    372 @code{gcc} behaviour has changed slightly over the years, it is safest
    373 to provide both).  Here is a typical way that might be handled in a
    374 @file{scripttempl} file.
    375 @smallexample
    376     $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
    377     $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
    378     $@{CONSTRUCTING+ *(.ctors)@}
    379     $@{CONSTRUCTING+ LONG(0)@}
    380     $@{CONSTRUCTING+ __CTOR_END__ = .;@}
    381     $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
    382     $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
    383     $@{CONSTRUCTING+ *(.dtors)@}
    384     $@{CONSTRUCTING+ LONG(0)@}
    385     $@{CONSTRUCTING+ __DTOR_END__ = .;@}
    386 @end smallexample
    387 The use of @code{CONSTRUCTING} ensures that these linker script commands
    388 will only appear when the linker is supposed to be building the
    389 constructor and destructor tables.  This example is written for a target
    390 which uses 4 byte pointers.
    391 
    392 Embedded systems often need to set a stack address.  This is normally
    393 best done by using the @code{PROVIDE} construct with a default stack
    394 address.  This permits the user to easily override the stack address
    395 using the @code{--defsym} option.  Here is an example:
    396 @smallexample
    397   $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
    398 @end smallexample
    399 The value of the symbol @code{__stack} would then be used in the startup
    400 code to initialize the stack pointer.
    401 
    402 @node linker emulations
    403 @section @file{emultempl} scripts
    404 
    405 Each linker target uses an @file{emultempl} script to generate the
    406 emulation code.  The name of the @file{emultempl} script is set by the
    407 @code{TEMPLATE_NAME} variable in the @file{emulparams} script.  If the
    408 @code{TEMPLATE_NAME} variable is not set, the default is
    409 @samp{generic}.  If the value of @code{TEMPLATE_NAME} is @var{template},
    410 @file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
    411 
    412 Most targets use the generic @file{emultempl} script,
    413 @file{emultempl/generic.em}.  A different @file{emultempl} script is
    414 only needed if the linker must support unusual actions, such as linking
    415 against shared libraries.
    416 
    417 The @file{emultempl} script is normally written as a simple invocation
    418 of @code{cat} with a here document.  The document will use a few
    419 variable substitutions.  Typically each function names uses a
    420 substitution involving @code{EMULATION_NAME}, for ease of debugging when
    421 the linker supports multiple emulations.
    422 
    423 Every function and variable in the emitted file should be static.  The
    424 only globally visible object must be named
    425 @code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
    426 the name of the emulation set in @file{configure.tgt} (this is also the
    427 name of the @file{emulparams} file without the @file{.sh} extension).
    428 The @file{genscripts.sh} script will set the shell variable
    429 @code{EMULATION_NAME} before invoking the @file{emultempl} script.
    430 
    431 The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
    432 @code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
    433 It defines a set of function pointers which are invoked by the linker,
    434 as well as strings for the emulation name (normally set from the shell
    435 variable @code{EMULATION_NAME} and the default BFD target name (normally
    436 set from the shell variable @code{OUTPUT_FORMAT} which is normally set
    437 by the @file{emulparams} file).
    438 
    439 The @file{genscripts.sh} script will set the shell variable
    440 @code{COMPILE_IN} when it invokes the @file{emultempl} script for the
    441 default emulation.  In this case, the @file{emultempl} script should
    442 include the linker scripts directly, and return them from the
    443 @code{get_scripts} entry point.  When the emulation is not the default,
    444 the @code{get_scripts} entry point should just return a file name.  See
    445 @file{emultempl/generic.em} for an example of how this is done.
    446 
    447 At some point, the linker emulation entry points should be documented.
    448 
    449 @node Emulation Walkthrough
    450 @chapter A Walkthrough of a Typical Emulation
    451 
    452 This chapter is to help people who are new to the way emulations
    453 interact with the linker, or who are suddenly thrust into the position
    454 of having to work with existing emulations.  It will discuss the files
    455 you need to be aware of.  It will tell you when the given "hooks" in
    456 the emulation will be called.  It will, hopefully, give you enough
    457 information about when and how things happen that you'll be able to
    458 get by.  As always, the source is the definitive reference to this.
    459 
    460 The starting point for the linker is in @file{ldmain.c} where
    461 @code{main} is defined.  The bulk of the code that's emulation
    462 specific will initially be in @code{emultempl/@var{emulation}.em} but
    463 will end up in @code{e@var{emulation}.c} when the build is done.
    464 Most of the work to select and interface with emulations is in
    465 @code{ldemul.h} and @code{ldemul.c}.  Specifically, @code{ldemul.h}
    466 defines the @code{ld_emulation_xfer_struct} structure your emulation
    467 exports.
    468 
    469 Your emulation file exports a symbol
    470 @code{ld_@var{EMULATION_NAME}_emulation}.  If your emulation is
    471 selected (it usually is, since usually there's only one),
    472 @code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
    473 @code{ldemul.c} also defines a number of API functions that interface
    474 to your emulation, like @code{ldemul_after_parse} which simply calls
    475 your @code{ld_@var{EMULATION}_emulation.after_parse} function.  For
    476 the rest of this section, the functions will be mentioned, but you
    477 should assume the indirect reference to your emulation also.
    478 
    479 We will also skip or gloss over parts of the link process that don't
    480 relate to emulations, like setting up internationalization.
    481 
    482 After initialization, @code{main} selects an emulation by pre-scanning
    483 the command line arguments.  It calls @code{ldemul_choose_target} to
    484 choose a target.  If you set @code{choose_target} to
    485 @code{ldemul_default_target}, it picks your @code{target_name} by
    486 default.
    487 
    488 @code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
    489 @code{parse_args} calls @code{ldemul_parse_args} for each arg, which
    490 must update the @code{getopt} globals if it recognizes the argument.
    491 If the emulation doesn't recognize it, then parse_args checks to see
    492 if it recognizes it.
    493 
    494 Now that the emulation has had access to all its command-line options,
    495 @code{main} calls @code{ldemul_set_symbols}.  This can be used for any
    496 initialization that may be affected by options.  It is also supposed
    497 to set up any variables needed by the emulation script.
    498 
    499 @code{main} now calls @code{ldemul_get_script} to get the emulation
    500 script to use (based on arguments, no doubt, @pxref{Emulations}) and
    501 runs it.  While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
    502 @code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
    503 commands.  It may call @code{ldemul_unrecognized_file} if you asked
    504 the linker to link a file it doesn't recognize.  It will call
    505 @code{ldemul_recognized_file} for each file it does recognize, in case
    506 the emulation wants to handle some files specially.  All the while,
    507 it's loading the files (possibly calling
    508 @code{ldemul_open_dynamic_archive}) and symbols and stuff.  After it's
    509 done reading the script, @code{main} calls @code{ldemul_after_parse}.
    510 Use the after-parse hook to set up anything that depends on stuff the
    511 script might have set up, like the entry point.
    512 
    513 @code{main} next calls @code{lang_process} in @code{ldlang.c}.  This
    514 appears to be the main core of the linking itself, as far as emulation
    515 hooks are concerned(*).  It first opens the output file's BFD, calling
    516 @code{ldemul_set_output_arch}, and calls
    517 @code{ldemul_create_output_section_statements} in case you need to use
    518 other means to find or create object files (i.e. shared libraries
    519 found on a path, or fake stub objects).  Despite the name, nobody
    520 creates output sections here.
    521 
    522 (*) In most cases, the BFD library does the bulk of the actual
    523 linking, handling symbol tables, symbol resolution, relocations, and
    524 building the final output file.  See the BFD reference for all the
    525 details.  Your emulation is usually concerned more with managing
    526 things at the file and section level, like "put this here, add this
    527 section", etc.
    528 
    529 Next, the objects to be linked are opened and BFDs created for them,
    530 and @code{ldemul_after_open} is called.  At this point, you have all
    531 the objects and symbols loaded, but none of the data has been placed
    532 yet.
    533 
    534 Next comes the Big Linking Thingy (except for the parts BFD does).
    535 All input sections are mapped to output sections according to the
    536 script.  If a section doesn't get mapped by default,
    537 @code{ldemul_place_orphan} will get called to figure out where it goes.
    538 Next it figures out the offsets for each section, calling
    539 @code{ldemul_before_allocation} before and
    540 @code{ldemul_after_allocation} after deciding where each input section
    541 ends up in the output sections.
    542 
    543 The last part of @code{lang_process} is to figure out all the symbols'
    544 values.  After assigning final values to the symbols,
    545 @code{ldemul_finish} is called, and after that, any undefined symbols
    546 are turned into fatal errors.
    547 
    548 OK, back to @code{main}, which calls @code{ldwrite} in
    549 @file{ldwrite.c}.  @code{ldwrite} calls BFD's final_link, which does
    550 all the relocation fixups and writes the output bfd to disk, and we're
    551 done.
    552 
    553 In summary,
    554 
    555 @itemize @bullet
    556 
    557 @item @code{main()} in @file{ldmain.c}
    558 @item @file{emultempl/@var{EMULATION}.em} has your code
    559 @item @code{ldemul_choose_target} (defaults to your @code{target_name})
    560 @item @code{ldemul_before_parse}
    561 @item Parse argv, calls @code{ldemul_parse_args} for each
    562 @item @code{ldemul_set_symbols}
    563 @item @code{ldemul_get_script}
    564 @item parse script
    565 
    566 @itemize @bullet
    567 @item may call @code{ldemul_hll} or @code{ldemul_syslib}
    568 @item may call @code{ldemul_open_dynamic_archive}
    569 @end itemize
    570 
    571 @item @code{ldemul_after_parse}
    572 @item @code{lang_process()} in @file{ldlang.c}
    573 
    574 @itemize @bullet
    575 @item create @code{output_bfd}
    576 @item @code{ldemul_set_output_arch}
    577 @item @code{ldemul_create_output_section_statements}
    578 @item read objects, create input bfds - all symbols exist, but have no values
    579 @item may call @code{ldemul_unrecognized_file}
    580 @item will call @code{ldemul_recognized_file}
    581 @item @code{ldemul_after_open}
    582 @item map input sections to output sections
    583 @item may call @code{ldemul_place_orphan} for remaining sections
    584 @item @code{ldemul_before_allocation}
    585 @item gives input sections offsets into output sections, places output sections
    586 @item @code{ldemul_after_allocation} - section addresses valid
    587 @item assigns values to symbols
    588 @item @code{ldemul_finish} - symbol values valid
    589 @end itemize
    590 
    591 @item output bfd is written to disk
    592 
    593 @end itemize
    594 
    595 @node Architecture Specific
    596 @chapter Some Architecture Specific Notes
    597 
    598 This is the place for notes on the behavior of @code{ld} on
    599 specific platforms.  Currently, only Intel x86 is documented (and 
    600 of that, only the auto-import behavior for DLLs).
    601 
    602 @menu
    603 * ix86::                        Intel x86
    604 @end menu
    605 
    606 @node ix86
    607 @section Intel x86
    608 
    609 @table @emph
    610 @code{ld} can create DLLs that operate with various runtimes available
    611 on a common x86 operating system.  These runtimes include native (using 
    612 the mingw "platform"), cygwin, and pw.
    613 
    614 @item auto-import from DLLs 
    615 @enumerate
    616 @item
    617 With this feature on, DLL clients can import variables from DLL 
    618 without any concern from their side (for example, without any source
    619 code modifications).  Auto-import can be enabled using the 
    620 @code{--enable-auto-import} flag, or disabled via the 
    621 @code{--disable-auto-import} flag.  Auto-import is disabled by default.
    622 
    623 @item
    624 This is done completely in bounds of the PE specification (to be fair,
    625 there's a minor violation of the spec at one point, but in practice 
    626 auto-import works on all known variants of that common x86 operating
    627 system)  So, the resulting DLL can be used with any other PE 
    628 compiler/linker.
    629 
    630 @item
    631 Auto-import is fully compatible with standard import method, in which
    632 variables are decorated using attribute modifiers. Libraries of either
    633 type may be mixed together.
    634 
    635 @item
    636 Overhead (space): 8 bytes per imported symbol, plus 20 for each
    637 reference to it; Overhead (load time): negligible; Overhead 
    638 (virtual/physical memory): should be less than effect of DLL 
    639 relocation.
    640 @end enumerate
    641 
    642 Motivation
    643 
    644 The obvious and only way to get rid of dllimport insanity is 
    645 to make client access variable directly in the DLL, bypassing 
    646 the extra dereference imposed by ordinary DLL runtime linking.
    647 I.e., whenever client contains something like
    648 
    649 @code{mov dll_var,%eax,}
    650 
    651 address of dll_var in the command should be relocated to point 
    652 into loaded DLL. The aim is to make OS loader do so, and than 
    653 make ld help with that.  Import section of PE made following 
    654 way: there's a vector of structures each describing imports 
    655 from particular DLL. Each such structure points to two other 
    656 parallel vectors: one holding imported names, and one which 
    657 will hold address of corresponding imported name. So, the 
    658 solution is de-vectorize these structures, making import 
    659 locations be sparse and pointing directly into code.
    660 
    661 Implementation
    662 
    663 For each reference of data symbol to be imported from DLL (to 
    664 set of which belong symbols with name <sym>, if __imp_<sym> is 
    665 found in implib), the import fixup entry is generated. That 
    666 entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 
    667 subsection. Each fixup entry contains pointer to symbol's address 
    668 within .text section (marked with __fuN_<sym> symbol, where N is 
    669 integer), pointer to DLL name (so, DLL name is referenced by 
    670 multiple entries), and pointer to symbol name thunk. Symbol name 
    671 thunk is singleton vector (__nm_th_<symbol>) pointing to 
    672 IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing 
    673 imported name. Here comes that "om the edge" problem mentioned above: 
    674 PE specification rambles that name vector (OriginalFirstThunk) should 
    675 run in parallel with addresses vector (FirstThunk), i.e. that they 
    676 should have same number of elements and terminated with zero. We violate
    677 this, since FirstThunk points directly into machine code. But in 
    678 practice, OS loader implemented the sane way: it goes thru 
    679 OriginalFirstThunk and puts addresses to FirstThunk, not something 
    680 else. It once again should be noted that dll and symbol name 
    681 structures are reused across fixup entries and should be there 
    682 anyway to support standard import stuff, so sustained overhead is 
    683 20 bytes per reference. Other question is whether having several 
    684 IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, 
    685 it is done even by native compiler/linker (libth32's functions are in 
    686 fact resident in windows9x kernel32.dll, so if you use it, you have 
    687 two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is 
    688 whether referencing the same PE structures several times is valid. 
    689 The answer is why not, prohibiting that (detecting violation) would 
    690 require more work on behalf of loader than not doing it.
    691 
    692 @end table
    693 
    694 @node GNU Free Documentation License
    695 @chapter GNU Free Documentation License
    696 
    697 @include fdl.texi
    698 
    699 @contents
    700 @bye
    701