Home | History | Annotate | Download | only in doc
      1 \documentstyle[12pt,twoside]{article}
      2 \def\TITLE{IP Command Reference}
      3 \input preamble
      4 \begin{center}
      5 \Large\bf IP Command Reference.
      6 \end{center}
      7 
      8 
      9 \begin{center}
     10 { \large Alexey~N.~Kuznetsov } \\
     11 \em Institute for Nuclear Research, Moscow \\
     12 \verb|kuznet (a] ms2.inr.ac.ru| \\
     13 \rm April 14, 1999
     14 \end{center}
     15 
     16 \vspace{5mm}
     17 
     18 \tableofcontents
     19 
     20 \newpage
     21 
     22 \section{About this document}
     23 
     24 This document presents a comprehensive description of the \verb|ip| utility
     25 from the \verb|iproute2| package. It is not a tutorial or user's guide.
     26 It is a {\em dictionary\/}, not explaining terms,
     27 but translating them into other terms, which may also be unknown to the reader.
     28 However, the document is self-contained and the reader, provided they have a
     29 basic networking background, will find enough information
     30 and examples to understand and configure Linux-2.2 IP and IPv6
     31 networking.
     32 
     33 This document is split into sections explaining \verb|ip| commands
     34 and options, decrypting \verb|ip| output and containing a few examples.
     35 More voluminous examples and some topics, which require more elaborate
     36 discussion, are in the appendix.
     37 
     38 The paragraphs beginning with NB contain side notes, warnings about
     39 bugs and design drawbacks. They may be skipped at the first reading.
     40 
     41 \section{{\tt ip} --- command syntax}
     42 
     43 The generic form of an \verb|ip| command is:
     44 \begin{verbatim}
     45 ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]
     46 \end{verbatim}
     47 where \verb|OPTIONS| is a set of optional modifiers affecting the
     48 general behaviour of the \verb|ip| utility or changing its output. All options
     49 begin with the character \verb|'-'| and may be used in either long or abbreviated 
     50 forms. Currently, the following options are available:
     51 
     52 \begin{itemize}
     53 \item \verb|-V|, \verb|-Version|
     54 
     55 --- print the version of the \verb|ip| utility and exit.
     56 
     57 
     58 \item \verb|-s|, \verb|-stats|, \verb|-statistics|
     59 
     60 --- output more information. If the option
     61 appears twice or more, the amount of information increases.
     62 As a rule, the information is statistics or some time values.
     63 
     64 \item \verb|-d|, \verb|-details|
     65 
     66 --- output more detailed information.
     67 
     68 \item \verb|-f|, \verb|-family| followed by a protocol family
     69 identifier: \verb|inet|, \verb|inet6| or \verb|link|.
     70 
     71 --- enforce the protocol family to use. If the option is not present,
     72 the protocol family is guessed from other arguments. If the rest of the command
     73 line does not give enough information to guess the family, \verb|ip| falls back to the default
     74 one, usually \verb|inet| or \verb|any|. \verb|link| is a special family
     75 identifier meaning that no networking protocol is involved.
     76 
     77 \item \verb|-4|
     78 
     79 --- shortcut for \verb|-family inet|.
     80 
     81 \item \verb|-6|
     82 
     83 --- shortcut for \verb|-family inet6|.
     84 
     85 \item \verb|-0|
     86 
     87 --- shortcut for \verb|-family link|.
     88 
     89 
     90 \item \verb|-o|, \verb|-oneline|
     91 
     92 --- output each record on a single line, replacing line feeds
     93 with the \verb|'\'| character. This is convenient when you want to
     94 count records with \verb|wc| or to \verb|grep| the output. The trivial
     95 script \verb|rtpr| converts the output back into readable form.
     96 
     97 \item \verb|-r|, \verb|-resolve|
     98 
     99 --- use the system's name resolver to print DNS names instead of
    100 host addresses.
    101 
    102 \begin{NB}
    103  Do not use this option when reporting bugs or asking for advice.
    104 \end{NB}
    105 \begin{NB}
    106  \verb|ip| never uses DNS to resolve names to addresses.
    107 \end{NB}
    108 
    109 \item \verb|-b|, \verb|-batch FILE|
    110 
    111 --- read commands from provided file or standart input and invoke them.
    112 First failure will cause termination of \verb|ip|.
    113 In batch \verb|FILE| everything which begins with \verb|#| symbol is
    114 ignored and can be used for comments.
    115 \paragraph{Example:}
    116 \begin{verbatim}
    117 kuznet@kaiser $ cat /tmp/ip_batch.ip
    118 # This is a comment
    119 tuntap add mode tap tap1 # This is an another comment
    120 link set up dev tap1
    121 addr add 10.0.0.1/24 dev tap1
    122 kuznet@kaiser $ sudo ip -b /tmp/ip_batch.ip
    123 \end{verbatim}
    124 or from standart input:
    125 \begin{verbatim}
    126 kuznet@kaiser $ cat /tmp/ip_batch.ip | sudo ip -b -
    127 \end{verbatim}
    128 
    129 \item \verb|-force|
    130 
    131 --- don't terminate ip on errors in batch mode.
    132 If there were any errors during execution of the commands,
    133 the application return code will be non zero.
    134 
    135 \item \verb|-l|, \verb|-loops COUNT|
    136 
    137 --- specify maximum number of loops the 'ip addr flush' logic will attempt
    138 before giving up. The default is 10.  Zero (0) means loop until all
    139 addresses are removed.
    140 
    141 \end{itemize}
    142 
    143 \verb|OBJECT| is the object to manage or to get information about.
    144 The object types currently understood by \verb|ip| are:
    145 
    146 \begin{itemize}
    147 \item \verb|link| --- network device
    148 \item \verb|address| --- protocol (IP or IPv6) address on a device
    149 \item \verb|neighbour| --- ARP or NDISC cache entry
    150 \item \verb|route| --- routing table entry
    151 \item \verb|rule| --- rule in routing policy database
    152 \item \verb|maddress| --- multicast address
    153 \item \verb|mroute| --- multicast routing cache entry
    154 \item \verb|tunnel| --- tunnel over IP
    155 \end{itemize}
    156 
    157 Again, the names of all objects may be written in full or
    158 abbreviated form, f.e.\ \verb|address| is abbreviated as \verb|addr|
    159 or just \verb|a|.
    160 
    161 \verb|COMMAND| specifies the action to perform on the object.
    162 The set of possible actions depends on the object type.
    163 As a rule, it is possible to \verb|add|, \verb|delete| and
    164 \verb|show| (or \verb|list|) objects, but some objects
    165 do not allow all of these operations or have some additional commands.
    166 The \verb|help| command is available for all objects. It prints
    167 out a list of available commands and argument syntax conventions.
    168 
    169 If no command is given, some default command is assumed.
    170 Usually it is \verb|list| or, if the objects of this class
    171 cannot be listed, \verb|help|.
    172 
    173 \verb|ARGUMENTS| is a list of arguments to the command.
    174 The arguments depend on the command and object. There are two types of arguments:
    175 {\em flags\/}, consisting of a single keyword, and {\em parameters\/},
    176 consisting of a keyword followed by a value. For convenience,
    177 each command has some {\em default parameter\/}
    178 which may be omitted. F.e.\ parameter \verb|dev| is the default
    179 for the {\tt ip link} command, so {\tt ip link ls eth0} is equivalent
    180 to {\tt ip link ls dev eth0}.
    181 In the command descriptions below such parameters
    182 are distinguished with the marker: ``(default)''.
    183 
    184 Almost all keywords may be abbreviated with several first (or even single)
    185 letters. The shortcuts are convenient when \verb|ip| is used interactively,
    186 but they are not recommended in scripts or when reporting bugs
    187 or asking for advice. ``Officially'' allowed abbreviations are listed
    188 in the document body.
    189 
    190 
    191 
    192 \section{{\tt ip} --- error messages}
    193 
    194 \verb|ip| may fail for one of the following reasons:
    195 
    196 \begin{itemize}
    197 \item
    198 A syntax error on the command line: an unknown keyword, incorrectly formatted
    199 IP address {\em et al\/}. In this case \verb|ip| prints an error message
    200 and exits. As a rule, the error message will contain information
    201 about the reason for the failure. Sometimes it also prints a help page.
    202 
    203 \item
    204 The arguments did not pass verification for self-consistency.
    205 
    206 \item
    207 \verb|ip| failed to compile a kernel request from the arguments
    208 because the user didn't give enough information.
    209 
    210 \item
    211 The kernel returned an error to some syscall. In this case \verb|ip|
    212 prints the error message, as it is output with \verb|perror(3)|,
    213 prefixed with a comment and a syscall identifier.
    214 
    215 \item
    216 The kernel returned an error to some RTNETLINK request.
    217 In this case \verb|ip| prints the error message, as it is output
    218 with \verb|perror(3)| prefixed with ``RTNETLINK answers:''.
    219 
    220 \end{itemize}
    221 
    222 All the operations are atomic, i.e.\ 
    223 if the \verb|ip| utility fails, it does not change anything
    224 in the system. One harmful exception is \verb|ip link| command
    225 (Sec.\ref{IP-LINK}, p.\pageref{IP-LINK}),
    226 which may change only some of the device parameters given
    227 on command line.
    228 
    229 It is difficult to list all the error messages (especially
    230 syntax errors). However, as a rule, their meaning is clear
    231 from the context of the command.
    232 
    233 The most common mistakes are:
    234 
    235 \begin{enumerate}
    236 \item Netlink is not configured in the kernel. The message is:
    237 \begin{verbatim}
    238 Cannot open netlink socket: Invalid value
    239 \end{verbatim}
    240 
    241 \item RTNETLINK is not configured in the kernel. In this case
    242 one of the following messages may be printed, depending on the command:
    243 \begin{verbatim}
    244 Cannot talk to rtnetlink: Connection refused
    245 Cannot send dump request: Connection refused
    246 \end{verbatim}
    247 
    248 \item The \verb|CONFIG_IP_MULTIPLE_TABLES| option was not selected
    249 when configuring the kernel. In this case any attempt to use the
    250 \verb|ip| \verb|rule| command will fail, f.e.
    251 \begin{verbatim}
    252 kuznet@kaiser $ ip rule list
    253 RTNETLINK error: Invalid argument
    254 dump terminated
    255 \end{verbatim}
    256 
    257 \end{enumerate}
    258 
    259 
    260 \section{{\tt ip link} --- network device configuration}
    261 \label{IP-LINK}
    262 
    263 \paragraph{Object:} A \verb|link| is a network device and the corresponding
    264 commands display and change the state of devices.
    265 
    266 \paragraph{Commands:} \verb|set| and \verb|show| (or \verb|list|).
    267 
    268 \subsection{{\tt ip link set} --- change device attributes}
    269 
    270 \paragraph{Abbreviations:} \verb|set|, \verb|s|.
    271 
    272 \paragraph{Arguments:}
    273 
    274 \begin{itemize}
    275 \item \verb|dev NAME| (default)
    276 
    277 --- \verb|NAME| specifies the network device on which to operate.
    278 
    279 \item \verb|up| and \verb|down|
    280 
    281 --- change the state of the device to \verb|UP| or \verb|DOWN|.
    282 
    283 \item \verb|arp on| or \verb|arp off|
    284 
    285 --- change the \verb|NOARP| flag on the device.
    286 
    287 \begin{NB}
    288 This operation is {\em not allowed\/} if the device is in state \verb|UP|.
    289 Though neither the \verb|ip| utility nor the kernel check for this condition.
    290 You can get unpredictable results changing this flag while the
    291 device is running.
    292 \end{NB}
    293 
    294 \item \verb|multicast on| or \verb|multicast off|
    295 
    296 --- change the \verb|MULTICAST| flag on the device.
    297 
    298 \item \verb|dynamic on| or \verb|dynamic off|
    299 
    300 --- change the \verb|DYNAMIC| flag on the device.
    301 
    302 \item \verb|name NAME|
    303 
    304 --- change the name of the device. This operation is not
    305 recommended if the device is running or has some addresses
    306 already configured.
    307 
    308 \item \verb|txqueuelen NUMBER| or \verb|txqlen NUMBER|
    309 
    310 --- change the transmit queue length of the device.
    311 
    312 \item \verb|mtu NUMBER|
    313 
    314 --- change the MTU of the device.
    315 
    316 \item \verb|address LLADDRESS|
    317 
    318 --- change the station address of the interface.
    319 
    320 \item \verb|broadcast LLADDRESS|, \verb|brd LLADDRESS| or \verb|peer LLADDRESS|
    321 
    322 --- change the link layer broadcast address or the peer address when
    323 the interface is \verb|POINTOPOINT|.
    324 
    325 \vskip 1mm
    326 \begin{NB}
    327 For most devices (f.e.\ for Ethernet) changing the link layer
    328 broadcast address will break networking.
    329 Do not use it, if you do not understand what this operation really does.
    330 \end{NB}
    331 
    332 \item \verb|netns PID|
    333 
    334 --- move the device to the network namespace associated with the process PID.
    335 
    336 \end{itemize}
    337 
    338 \vskip 1mm
    339 \begin{NB}
    340 The \verb|PROMISC| and \verb|ALLMULTI| flags are considered
    341 obsolete and should not be changed administratively, though
    342 the {\tt ip} utility will allow that.
    343 \end{NB}
    344 
    345 \paragraph{Warning:} If multiple parameter changes are requested,
    346 \verb|ip| aborts immediately after any of the changes have failed.
    347 This is the only case when \verb|ip| can move the system to
    348 an unpredictable state. The solution is to avoid changing
    349 several parameters with one {\tt ip link set} call.
    350 
    351 \paragraph{Examples:}
    352 \begin{itemize}
    353 \item \verb|ip link set dummy address 00:00:00:00:00:01|
    354 
    355 --- change the station address of the interface \verb|dummy|.
    356 
    357 \item \verb|ip link set dummy up|
    358 
    359 --- start the interface \verb|dummy|.
    360 
    361 \end{itemize}
    362 
    363 
    364 \subsection{{\tt ip link show} --- display device attributes}
    365 \label{IP-LINK-SHOW}
    366 
    367 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
    368 \verb|l|.
    369 
    370 \paragraph{Arguments:}
    371 \begin{itemize}
    372 \item \verb|dev NAME| (default)
    373 
    374 --- \verb|NAME| specifies the network device to show.
    375 If this argument is omitted all devices are listed.
    376 
    377 \item \verb|up|
    378 
    379 --- only display running interfaces.
    380 
    381 \end{itemize}
    382 
    383 
    384 \paragraph{Output format:}
    385 
    386 \begin{verbatim}
    387 kuznet@alisa:~ $ ip link ls eth0
    388 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
    389     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
    390 kuznet@alisa:~ $ ip link ls sit0
    391 5: sit0@NONE: <NOARP,UP> mtu 1480 qdisc noqueue
    392     link/sit 0.0.0.0 brd 0.0.0.0
    393 kuznet@alisa:~ $ ip link ls dummy
    394 2: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop
    395     link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    396 kuznet@alisa:~ $ 
    397 \end{verbatim}
    398 
    399 
    400 The number before each colon is an {\em interface index\/} or {\em ifindex\/}.
    401 This number uniquely identifies the interface. This is followed by the {\em interface name\/}
    402 (\verb|eth0|, \verb|sit0| etc.). The interface name is also
    403 unique at every given moment. However, the interface may disappear from the
    404 list (f.e.\ when the corresponding driver module is unloaded) and another
    405 one with the same name may be created later. Besides that,
    406 the administrator may change the name of any device with
    407 \verb|ip| \verb|link| \verb|set| \verb|name|
    408 to make it more intelligible.
    409 
    410 The interface name may have another name or \verb|NONE| appended 
    411 after the \verb|@| sign. This means that this device is bound to some other
    412 device,
    413 i.e.\ packets send through it are encapsulated and sent via the ``master''
    414 device. If the name is \verb|NONE|, the master is unknown.
    415 
    416 Then we see the interface {\em mtu\/} (``maximal transfer unit''). This determines
    417 the maximal size of data which can be sent as a single packet over this interface.
    418 
    419 {\em qdisc\/} (``queuing discipline'') shows the queuing algorithm used
    420 on the interface. Particularly, \verb|noqueue| means that this interface
    421 does not queue anything and \verb|noop| means that the interface is in blackhole
    422 mode i.e.\ all packets sent to it are immediately discarded.
    423 {\em qlen\/} is the default transmit queue length of the device measured
    424 in packets.
    425 
    426 The interface flags are summarized in the angle brackets.
    427 
    428 \begin{itemize}
    429 \item \verb|UP| --- the device is turned on. It is ready to accept
    430 packets for transmission and it may inject into the kernel packets received
    431 from other nodes on the network.
    432 
    433 \item \verb|LOOPBACK| --- the interface does not communicate with other
    434 hosts. All packets sent through it will be returned
    435 and nothing but bounced packets can be received.
    436 
    437 \item \verb|BROADCAST| --- the device has the facility to send packets
    438 to all hosts sharing the same link. A typical example is an Ethernet link.
    439 
    440 \item \verb|POINTOPOINT| --- the link has only two ends with one node
    441 attached to each end. All packets sent to this link will reach the peer
    442 and all packets received by us came from this single peer.
    443 
    444 If neither \verb|LOOPBACK| nor \verb|BROADCAST| nor \verb|POINTOPOINT|
    445 are set, the interface is assumed to be NMBA (Non-Broadcast Multi-Access).
    446 This is the most generic type of device and the most complicated one, because
    447 the host attached to a NBMA link has no means to send to anyone
    448 without additionally configured information.
    449 
    450 \item \verb|MULTICAST| --- is an advisory flag indicating that the interface
    451 is aware of multicasting i.e.\ sending packets to some subset of neighbouring
    452 nodes. Broadcasting is a particular case of multicasting, where the multicast
    453 group consists of all nodes on the link. It is important to emphasize
    454 that software {\em must not\/} interpret the absence of this flag as the inability
    455 to use multicasting on this interface. Any \verb|POINTOPOINT| and
    456 \verb|BROADCAST| link is multicasting by definition, because we have
    457 direct access to all the neighbours and, hence, to any part of them.
    458 Certainly, the use of high bandwidth multicast transfers is not recommended
    459 on broadcast-only links because of high expense, but it is not strictly
    460 prohibited.
    461 
    462 \item \verb|PROMISC| --- the device listens to and feeds to the kernel all
    463 traffic on the link even if it is not destined for us, not broadcasted
    464 and not destined for a multicast group of which we are member. Usually
    465 this mode exists only on broadcast links and is used by bridges and for network
    466 monitoring.
    467 
    468 \item \verb|ALLMULTI| --- the device receives all multicast packets
    469 wandering on the link. This mode is used by multicast routers.
    470 
    471 \item \verb|NOARP| --- this flag is different from the other ones. It has
    472 no invariant value and its interpretation depends on the network protocols
    473 involved. As a rule, it indicates that the device needs no address
    474 resolution and that the software or hardware knows how to deliver packets
    475 without any help from the protocol stacks.
    476 
    477 \item \verb|DYNAMIC| --- is an advisory flag indicating that the interface is
    478 dynamically created and destroyed.
    479 
    480 \item \verb|SLAVE| --- this interface is bonded to some other interfaces
    481 to share link capacities.
    482 
    483 \end{itemize}
    484 
    485 \vskip 1mm
    486 \begin{NB}
    487 There are other flags but they are either obsolete (\verb|NOTRAILERS|)
    488 or not implemented (\verb|DEBUG|) or specific to some devices
    489 (\verb|MASTER|, \verb|AUTOMEDIA| and \verb|PORTSEL|). We do not discuss
    490 them here.
    491 \end{NB}
    492 
    493 
    494 The second line contains information on the link layer addresses
    495 associated with the device. The first word (\verb|ether|, \verb|sit|)
    496 defines the interface hardware type. This type determines the format and semantics
    497 of the addresses and is logically part of the address.
    498 The default format of the station address and the broadcast address
    499 (or the peer address for pointopoint links) is a
    500 sequence of hexadecimal bytes separated by colons, but some link
    501 types may have their natural address format, f.e.\ addresses
    502 of tunnels over IP are printed as dotted-quad IP addresses.
    503 
    504 \vskip 1mm
    505 \begin{NB}
    506   NBMA links have no well-defined broadcast or peer address,
    507   however this field may contain useful information, f.e.\
    508   about the address of broadcast relay or about the address of the ARP server.
    509 \end{NB}
    510 \begin{NB}
    511 Multicast addresses are not shown by this command, see
    512 \verb|ip maddr ls| in~Sec.\ref{IP-MADDR} (p.\pageref{IP-MADDR} of this
    513 document).
    514 \end{NB}
    515 
    516 
    517 \paragraph{Statistics:} With the \verb|-statistics| option, \verb|ip| also
    518 prints interface statistics:
    519 
    520 \begin{verbatim}
    521 kuznet@alisa:~ $ ip -s link ls eth0
    522 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
    523     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
    524     RX: bytes  packets  errors  dropped overrun mcast   
    525     2449949362 2786187  0       0       0       0      
    526     TX: bytes  packets  errors  dropped carrier collsns 
    527     178558497  1783945  332     0       332     35172  
    528 kuznet@alisa:~ $
    529 \end{verbatim}
    530 \verb|RX:| and \verb|TX:| lines summarize receiver and transmitter
    531 statistics. They contain:
    532 \begin{itemize}
    533 \item \verb|bytes| --- the total number of bytes received or transmitted
    534 on the interface. This number wraps when the maximal length of the data type
    535 natural for the architecture is exceeded, so continuous monitoring requires
    536 a user level daemon snapping it periodically.
    537 \item \verb|packets| --- the total number of packets received or transmitted
    538 on the interface.
    539 \item \verb|errors| --- the total number of receiver or transmitter errors.
    540 \item \verb|dropped| --- the total number of packets dropped due to lack
    541 of resources.
    542 \item \verb|overrun| --- the total number of receiver overruns resulting
    543 in dropped packets. As a rule, if the interface is overrun, it means
    544 serious problems in the kernel or that your machine is too slow
    545 for this interface.
    546 \item \verb|mcast| --- the total number of received multicast packets. This option
    547 is only supported by a few devices.
    548 \item \verb|carrier| --- total number of link media failures f.e.\ because
    549 of lost carrier.
    550 \item \verb|collsns| --- the total number of collision events
    551 on Ethernet-like media. This number may have a different sense on other
    552 link types.
    553 \item \verb|compressed| --- the total number of compressed packets. This is
    554 available only for links using VJ header compression.
    555 \end{itemize}
    556 
    557 
    558 If the \verb|-s| option is entered twice or more,
    559 \verb|ip| prints more detailed statistics on receiver
    560 and transmitter errors.
    561 
    562 \begin{verbatim}
    563 kuznet@alisa:~ $ ip -s -s link ls eth0
    564 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
    565     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
    566     RX: bytes  packets  errors  dropped overrun mcast   
    567     2449949362 2786187  0       0       0       0      
    568     RX errors: length   crc     frame   fifo    missed
    569                0        0       0       0       0      
    570     TX: bytes  packets  errors  dropped carrier collsns 
    571     178558497  1783945  332     0       332     35172  
    572     TX errors: aborted  fifo    window  heartbeat
    573                0        0       0       332    
    574 kuznet@alisa:~ $
    575 \end{verbatim}
    576 These error names are pure Ethernetisms. Other devices
    577 may have non zero values in these fields but they may be
    578 interpreted differently.
    579 
    580 
    581 \section{{\tt ip address} --- protocol address management}
    582 
    583 \paragraph{Abbreviations:} \verb|address|, \verb|addr|, \verb|a|.
    584 
    585 \paragraph{Object:} The \verb|address| is a protocol (IP or IPv6) address attached
    586 to a network device. Each device must have at least one address
    587 to use the corresponding protocol. It is possible to have several
    588 different addresses attached to one device. These addresses are not
    589 discriminated, so that the term {\em alias\/} is not quite appropriate
    590 for them and we do not use it in this document.
    591 
    592 The \verb|ip addr| command displays addresses and their properties,
    593 adds new addresses and deletes old ones.
    594 
    595 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|flush| and \verb|show|
    596 (or \verb|list|).
    597 
    598 
    599 \subsection{{\tt ip address add} --- add a new protocol address}
    600 \label{IP-ADDR-ADD}
    601 
    602 \paragraph{Abbreviations:} \verb|add|, \verb|a|.
    603 
    604 \paragraph{Arguments:}
    605 
    606 \begin{itemize}
    607 \item \verb|dev NAME|
    608 
    609 \noindent--- the name of the device to add the address to.
    610 
    611 \item \verb|local ADDRESS| (default)
    612 
    613 --- the address of the interface. The format of the address depends
    614 on the protocol. It is a dotted quad for IP and a sequence of hexadecimal halfwords
    615 separated by colons for IPv6. The \verb|ADDRESS| may be followed by
    616 a slash and a decimal number which encodes the network prefix length.
    617 
    618 
    619 \item \verb|peer ADDRESS|
    620 
    621 --- the address of the remote endpoint for pointopoint interfaces.
    622 Again, the \verb|ADDRESS| may be followed by a slash and a decimal number,
    623 encoding the network prefix length. If a peer address is specified,
    624 the local address {\em cannot\/} have a prefix length. The network prefix is associated
    625 with the peer rather than with the local address.
    626 
    627 
    628 \item \verb|broadcast ADDRESS|
    629 
    630 --- the broadcast address on the interface.
    631 
    632 It is possible to use the special symbols \verb|'+'| and \verb|'-'|
    633 instead of the broadcast address. In this case, the broadcast address
    634 is derived by setting/resetting the host bits of the interface prefix.
    635 
    636 \vskip 1mm
    637 \begin{NB}
    638 Unlike \verb|ifconfig|, the \verb|ip| utility {\em does not\/} set any broadcast
    639 address unless explicitly requested.
    640 \end{NB}
    641 
    642 
    643 \item \verb|label NAME|
    644 
    645 --- Each address may be tagged with a label string.
    646 In order to preserve compatibility with Linux-2.0 net aliases,
    647 this string must coincide with the name of the device or must be prefixed
    648 with the device name followed by colon.
    649 
    650 
    651 \item \verb|scope SCOPE_VALUE|
    652 
    653 --- the scope of the area where this address is valid.
    654 The available scopes are listed in file \verb|/etc/iproute2/rt_scopes|.
    655 Predefined scope values are:
    656 
    657  \begin{itemize}
    658 	\item \verb|global| --- the address is globally valid.
    659 	\item \verb|site| --- (IPv6 only) the address is site local,
    660 	i.e.\ it is valid inside this site.
    661 	\item \verb|link| --- the address is link local, i.e.\ 
    662 	it is valid only on this device.
    663 	\item \verb|host| --- the address is valid only inside this host.
    664  \end{itemize}
    665 
    666 Appendix~\ref{ADDR-SEL} (p.\pageref{ADDR-SEL} of this document)
    667 contains more details on address scopes.
    668 
    669 \end{itemize}
    670 
    671 \paragraph{Examples:}
    672 \begin{itemize}
    673 \item \verb|ip addr add 127.0.0.1/8 dev lo brd + scope host|
    674 
    675 --- add the usual loopback address to the loopback device.
    676 
    677 \item \verb|ip addr add 10.0.0.1/24 brd + dev eth0 label eth0:Alias|
    678 
    679 --- add the address 10.0.0.1 with prefix length 24 (i.e.\ netmask
    680 \verb|255.255.255.0|), standard broadcast and label \verb|eth0:Alias|
    681 to the interface \verb|eth0|.
    682 \end{itemize}
    683 
    684 
    685 \subsection{{\tt ip address delete} --- delete a protocol address}
    686 
    687 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
    688 
    689 \paragraph{Arguments:} coincide with the arguments of \verb|ip addr add|.
    690 The device name is a required argument. The rest are optional.
    691 If no arguments are given, the first address is deleted.
    692 
    693 \paragraph{Examples:}
    694 \begin{itemize}
    695 \item \verb|ip addr del 127.0.0.1/8 dev lo|
    696 
    697 --- deletes the loopback address from the loopback device.
    698 It would be best not to repeat this experiment.
    699 
    700 \item Disable IP on the interface \verb|eth0|:
    701 \begin{verbatim}
    702   while ip -f inet addr del dev eth0; do
    703     : nothing
    704   done
    705 \end{verbatim}
    706 Another method to disable IP on an interface using {\tt ip addr flush}
    707 may be found in sec.\ref{IP-ADDR-FLUSH}, p.\pageref{IP-ADDR-FLUSH}.
    708 
    709 \end{itemize}
    710 
    711 
    712 \subsection{{\tt ip address show} --- display protocol addresses}
    713 
    714 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|lst|, \verb|sh|, \verb|ls|,
    715 \verb|l|.
    716 
    717 \paragraph{Arguments:}
    718 
    719 \begin{itemize}
    720 \item \verb|dev NAME| (default)
    721 
    722 --- the name of the device.
    723 
    724 \item \verb|scope SCOPE_VAL|
    725 
    726 --- only list addresses with this scope.
    727 
    728 \item \verb|to PREFIX|
    729 
    730 --- only list addresses matching this prefix.
    731 
    732 \item \verb|label PATTERN|
    733 
    734 --- only list addresses with labels matching the \verb|PATTERN|.
    735 \verb|PATTERN| is a usual shell style pattern.
    736 
    737 
    738 \item \verb|dynamic| and \verb|permanent|
    739 
    740 --- (IPv6 only) only list addresses installed due to stateless
    741 address configuration or only list permanent (not dynamic) addresses.
    742 
    743 \item \verb|tentative|
    744 
    745 --- (IPv6 only) only list addresses which did not pass duplicate
    746 address detection.
    747 
    748 \item \verb|deprecated|
    749 
    750 --- (IPv6 only) only list deprecated addresses.
    751 
    752 
    753 \item  \verb|primary| and \verb|secondary|
    754 
    755 --- only list primary (or secondary) addresses.
    756 
    757 \end{itemize}
    758 
    759 
    760 \paragraph{Output format:}
    761 
    762 \begin{verbatim}
    763 kuznet@alisa:~ $ ip addr ls eth0
    764 3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
    765     link/ether 00:a0:cc:66:18:78 brd ff:ff:ff:ff:ff:ff
    766     inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0
    767     inet6 3ffe:2400:0:1:2a0:ccff:fe66:1878/64 scope global dynamic 
    768        valid_lft forever preferred_lft 604746sec
    769     inet6 fe80::2a0:ccff:fe66:1878/10 scope link 
    770 kuznet@alisa:~ $ 
    771 \end{verbatim}
    772 
    773 The first two lines coincide with the output of \verb|ip link ls|.
    774 It is natural to interpret link layer addresses
    775 as addresses of the protocol family \verb|AF_PACKET|.
    776 
    777 Then the list of IP and IPv6 addresses follows, accompanied by
    778 additional address attributes: scope value (see Sec.\ref{IP-ADDR-ADD},
    779 p.\pageref{IP-ADDR-ADD} above), flags and the address label.
    780 
    781 Address flags are set by the kernel and cannot be changed
    782 administratively. Currently, the following flags are defined:
    783 
    784 \begin{enumerate}
    785 \item \verb|secondary|
    786 
    787 --- the address is not used when selecting the default source address
    788 of outgoing packets (Cf.\ Appendix~\ref{ADDR-SEL}, p.\pageref{ADDR-SEL}.).
    789 An IP address becomes secondary if another address with the same
    790 prefix bits already exists. The first address is primary.
    791 It is the leader of the group of all secondary addresses. When the leader
    792 is deleted, all secondaries are purged too.
    793 There is a tweak in \verb|/proc/sys/net/ipv4/conf/<dev>/promote_secondaries|
    794 which activate secondaries promotion when a primary is deleted.
    795 To permanently enable this feature on all devices add
    796 \verb|net.ipv4.conf.all.promote_secondaries=1| to \verb|/etc/sysctl.conf|.
    797 This tweak is available in linux 2.6.15 and later.
    798 
    799 
    800 \item \verb|dynamic|
    801 
    802 --- the address was created due to stateless autoconfiguration~\cite{RFC-ADDRCONF}.
    803 In this case the output also contains information on times, when
    804 the address is still valid. After \verb|preferred_lft| expires the address is
    805 moved to the deprecated state. After \verb|valid_lft| expires the address
    806 is finally invalidated.
    807 
    808 \item \verb|deprecated|
    809 
    810 --- the address is deprecated, i.e.\ it is still valid, but cannot
    811 be used by newly created connections.
    812 
    813 \item \verb|tentative|
    814 
    815 --- the address is not used because duplicate address detection~\cite{RFC-ADDRCONF}
    816 is still not complete or failed.
    817 
    818 \end{enumerate}
    819 
    820 
    821 \subsection{{\tt ip address flush} --- flush protocol addresses}
    822 \label{IP-ADDR-FLUSH}
    823 
    824 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
    825 
    826 \paragraph{Description:}This command flushes the protocol addresses
    827 selected by some criteria.
    828 
    829 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
    830 The difference is that it does not run when no arguments are given.
    831 
    832 \paragraph{Warning:} This command (and other \verb|flush| commands
    833 described below) is pretty dangerous. If you make a mistake, it will
    834 not forgive it, but will cruelly purge all the addresses.
    835 
    836 \paragraph{Statistics:} With the \verb|-statistics| option, the command
    837 becomes verbose. It prints out the number of deleted addresses and the number
    838 of rounds made to flush the address list. If this option is given
    839 twice, \verb|ip addr flush| also dumps all the deleted addresses
    840 in the format described in the previous subsection.
    841 
    842 \paragraph{Example:} Delete all the addresses from the private network
    843 10.0.0.0/8:
    844 \begin{verbatim}
    845 netadm@amber:~ # ip -s -s a f to 10/8
    846 2: dummy    inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy
    847 3: eth0    inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0
    848 4: eth1    inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1
    849 
    850 *** Round 1, deleting 3 addresses ***
    851 *** Flush is complete after 1 round ***
    852 netadm@amber:~ # 
    853 \end{verbatim}
    854 Another instructive example is disabling IP on all the Ethernets:
    855 \begin{verbatim}
    856 netadm@amber:~ # ip -4 addr flush label "eth*"
    857 \end{verbatim}
    858 And the last example shows how to flush all the IPv6 addresses
    859 acquired by the host from stateless address autoconfiguration
    860 after you enabled forwarding or disabled autoconfiguration.
    861 \begin{verbatim}
    862 netadm@amber:~ # ip -6 addr flush dynamic
    863 \end{verbatim}
    864 
    865 
    866 
    867 \section{{\tt ip neighbour} --- neighbour/arp tables management}
    868 
    869 \paragraph{Abbreviations:} \verb|neighbour|, \verb|neighbor|, \verb|neigh|,
    870 \verb|n|.
    871 
    872 \paragraph{Object:} \verb|neighbour| objects establish bindings between protocol
    873 addresses and link layer addresses for hosts sharing the same link.
    874 Neighbour entries are organized into tables. The IPv4 neighbour table
    875 is known by another name --- the ARP table.
    876 
    877 The corresponding commands display neighbour bindings
    878 and their properties, add new neighbour entries and delete old ones.
    879 
    880 \paragraph{Commands:} \verb|add|, \verb|change|, \verb|replace|,
    881 \verb|delete|, \verb|flush| and \verb|show| (or \verb|list|).
    882 
    883 \paragraph{See also:} Appendix~\ref{PROXY-NEIGH}, p.\pageref{PROXY-NEIGH}
    884 describes how to manage proxy ARP/NDISC with the \verb|ip| utility.
    885 
    886 
    887 \subsection{{\tt ip neighbour add} --- add a new neighbour entry\\
    888 	{\tt ip neighbour change} --- change an existing entry\\
    889 	{\tt ip neighbour replace} --- add a new entry or change an existing one}
    890 
    891 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
    892 \verb|replace|,	\verb|repl|.
    893 
    894 \paragraph{Description:} These commands create new neighbour records
    895 or update existing ones.
    896 
    897 \paragraph{Arguments:}
    898 
    899 \begin{itemize}
    900 \item \verb|to ADDRESS| (default)
    901 
    902 --- the protocol address of the neighbour. It is either an IPv4 or IPv6 address.
    903 
    904 \item \verb|dev NAME|
    905 
    906 --- the interface to which this neighbour is attached.
    907 
    908 
    909 \item \verb|lladdr LLADDRESS|
    910 
    911 --- the link layer address of the neighbour. \verb|LLADDRESS| can also be
    912 \verb|null|. 
    913 
    914 \item \verb|nud NUD_STATE|
    915 
    916 --- the state of the neighbour entry. \verb|nud| is an abbreviation for ``Neighbour
    917 Unreachability Detection''. The state can take one of the following values:
    918 
    919 \begin{enumerate}
    920 \item \verb|permanent| --- the neighbour entry is valid forever and can be only be removed
    921 administratively.
    922 \item \verb|noarp| --- the neighbour entry is valid. No attempts to validate
    923 this entry will be made but it can be removed when its lifetime expires.
    924 \item \verb|reachable| --- the neighbour entry is valid until the reachability
    925 timeout expires.
    926 \item \verb|stale| --- the neighbour entry is valid but suspicious.
    927 This option to \verb|ip neigh| does not change the neighbour state if
    928 it was valid and the address is not changed by this command.
    929 \end{enumerate}
    930 
    931 \end{itemize}
    932 
    933 \paragraph{Examples:}
    934 \begin{itemize}
    935 \item \verb|ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm|
    936 
    937 --- add a permanent ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
    938 
    939 \item \verb|ip neigh chg 10.0.0.3 dev eth0 nud reachable|
    940 
    941 --- change its state to \verb|reachable|.
    942 \end{itemize}
    943 
    944 
    945 \subsection{{\tt ip neighbour delete} --- delete a neighbour entry}
    946 
    947 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
    948 
    949 \paragraph{Description:} This command invalidates a neighbour entry.
    950 
    951 \paragraph{Arguments:} The arguments are the same as with \verb|ip neigh add|,
    952 except that \verb|lladdr| and \verb|nud| are ignored.
    953 
    954 
    955 \paragraph{Example:}
    956 \begin{itemize}
    957 \item \verb|ip neigh del 10.0.0.3 dev eth0|
    958 
    959 --- invalidate an ARP entry for the neighbour 10.0.0.3 on the device \verb|eth0|.
    960 
    961 \end{itemize}
    962 
    963 \begin{NB}
    964  The deleted neighbour entry will not disappear from the tables
    965  immediately. If it is in use it cannot be deleted until the last
    966  client releases it. Otherwise it will be destroyed during
    967  the next garbage collection.
    968 \end{NB}
    969 
    970 
    971 \paragraph{Warning:} Attempts to delete or manually change
    972 a \verb|noarp| entry created by the kernel may result in unpredictable behaviour.
    973 Particularly, the kernel may try to resolve this address even
    974 on a \verb|NOARP| interface or if the address is multicast or broadcast.
    975 
    976 
    977 \subsection{{\tt ip neighbour show} --- list neighbour entries}
    978 
    979 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|.
    980 
    981 \paragraph{Description:}This commands displays neighbour tables.
    982 
    983 \paragraph{Arguments:}
    984 
    985 \begin{itemize}
    986 
    987 \item \verb|to ADDRESS| (default)
    988 
    989 --- the prefix selecting the neighbours to list.
    990 
    991 \item \verb|dev NAME|
    992 
    993 --- only list the neighbours attached to this device.
    994 
    995 \item \verb|unused|
    996 
    997 --- only list neighbours which are not currently in use.
    998 
    999 \item \verb|nud NUD_STATE|
   1000 
   1001 --- only list neighbour entries in this state. \verb|NUD_STATE| takes
   1002 values listed below or the special value \verb|all| which means all states.
   1003 This option may occur more than once. If this option is absent, \verb|ip|
   1004 lists all entries except for \verb|none| and \verb|noarp|.
   1005 
   1006 \end{itemize}
   1007 
   1008 
   1009 \paragraph{Output format:}
   1010 
   1011 \begin{verbatim}
   1012 kuznet@alisa:~ $ ip neigh ls
   1013 :: dev lo lladdr 00:00:00:00:00:00 nud noarp
   1014 fe80::200:cff:fe76:3f85 dev eth0 lladdr 00:00:0c:76:3f:85 router \
   1015     nud stale
   1016 0.0.0.0 dev lo lladdr 00:00:00:00:00:00 nud noarp
   1017 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 nud reachable
   1018 193.233.7.85 dev eth0 lladdr 00:e0:1e:63:39:00 nud stale
   1019 kuznet@alisa:~ $ 
   1020 \end{verbatim}
   1021 
   1022 The first word of each line is the protocol address of the neighbour.
   1023 Then the device name follows. The rest of the line describes the contents of
   1024 the neighbour entry identified by the pair (device, address).
   1025 
   1026 \verb|lladdr| is the link layer address of the neighbour.
   1027 
   1028 \verb|nud| is the state of the ``neighbour unreachability detection'' machine
   1029 for this entry. The detailed description of the neighbour
   1030 state machine can be found in~\cite{RFC-NDISC}. Here is the full list
   1031 of the states with short descriptions:
   1032 
   1033 \begin{enumerate}
   1034 \item\verb|none| --- the state of the neighbour is void.
   1035 \item\verb|incomplete| --- the neighbour is in the process of resolution.
   1036 \item\verb|reachable| --- the neighbour is valid and apparently reachable.
   1037 \item\verb|stale| --- the neighbour is valid, but is probably already
   1038 unreachable, so the kernel will try to check it at the first transmission.
   1039 \item\verb|delay| --- a packet has been sent to the stale neighbour and the kernel is waiting
   1040 for confirmation.
   1041 \item\verb|probe| --- the delay timer expired but no confirmation was received.
   1042 The kernel has started to probe the neighbour with ARP/NDISC messages.
   1043 \item\verb|failed| --- resolution has failed.
   1044 \item\verb|noarp| --- the neighbour is valid. No attempts to check the entry
   1045 will be made.
   1046 \item\verb|permanent| --- it is a \verb|noarp| entry, but only the administrator
   1047 may remove the entry from the neighbour table.
   1048 \end{enumerate}
   1049 
   1050 The link layer address is valid in all states except for \verb|none|,
   1051 \verb|failed| and \verb|incomplete|.
   1052 
   1053 IPv6 neighbours can be marked with the additional flag \verb|router|
   1054 which means that the neighbour introduced itself as an IPv6 router~\cite{RFC-NDISC}.
   1055 
   1056 \paragraph{Statistics:} The \verb|-statistics| option displays some usage
   1057 statistics, f.e.\
   1058 
   1059 \begin{verbatim}
   1060 kuznet@alisa:~ $ ip -s n ls 193.233.7.254
   1061 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
   1062     nud reachable
   1063 kuznet@alisa:~ $ 
   1064 \end{verbatim}
   1065 
   1066 Here \verb|ref| is the number of users of this entry
   1067 and \verb|used| is a triplet of time intervals in seconds
   1068 separated by slashes. In this case they show that:
   1069 
   1070 \begin{enumerate}
   1071 \item the entry was used 12 seconds ago.
   1072 \item the entry was confirmed 13 seconds ago.
   1073 \item the entry was updated 20 seconds ago.
   1074 \end{enumerate}
   1075 
   1076 \subsection{{\tt ip neighbour flush} --- flush neighbour entries}
   1077 
   1078 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
   1079 
   1080 \paragraph{Description:}This command flushes neighbour tables, selecting
   1081 entries to flush by some criteria.
   1082 
   1083 \paragraph{Arguments:} This command has the same arguments as \verb|show|.
   1084 The differences are that it does not run when no arguments are given,
   1085 and that the default neighbour states to be flushed do not include
   1086 \verb|permanent| and \verb|noarp|.
   1087 
   1088 
   1089 \paragraph{Statistics:} With the \verb|-statistics| option, the command
   1090 becomes verbose. It prints out the number of deleted neighbours and the number
   1091 of rounds made to flush the neighbour table. If the option is given
   1092 twice, \verb|ip neigh flush| also dumps all the deleted neighbours
   1093 in the format described in the previous subsection.
   1094 
   1095 \paragraph{Example:}
   1096 \begin{verbatim}
   1097 netadm@alisa:~ # ip -s -s n f 193.233.7.254
   1098 193.233.7.254 dev eth0 lladdr 00:00:0c:76:3f:85 ref 5 used 12/13/20 \
   1099     nud reachable
   1100 
   1101 *** Round 1, deleting 1 entries ***
   1102 *** Flush is complete after 1 round ***
   1103 netadm@alisa:~ # 
   1104 \end{verbatim}
   1105 
   1106 
   1107 \section{{\tt ip route} --- routing table management}
   1108 \label{IP-ROUTE}
   1109 
   1110 \paragraph{Abbreviations:} \verb|route|, \verb|ro|, \verb|r|.
   1111 
   1112 \paragraph{Object:} \verb|route| entries in the kernel routing tables keep
   1113 information about paths to other networked nodes.
   1114 
   1115 Each route entry has a {\em key\/} consisting of a {\em prefix\/}
   1116 (i.e.\ a pair containing a network address and the length of its mask) and,
   1117 optionally, the TOS value. An IP packet matches the route if the highest
   1118 bits of its destination address are equal to the route prefix at least
   1119 up to the prefix length and if the TOS of the route is zero or equal to
   1120 the TOS of the packet.
   1121  
   1122 If several routes match the packet, the following pruning rules
   1123 are used to select the best one (see~\cite{RFC1812}):
   1124 \begin{enumerate}
   1125 \item The longest matching prefix is selected. All shorter ones
   1126 are dropped.
   1127 
   1128 \item If the TOS of some route with the longest prefix is equal to the TOS
   1129 of the packet, the routes with different TOS are dropped.
   1130 
   1131 If no exact TOS match was found and routes with TOS=0 exist,
   1132 the rest of routes are pruned.
   1133 
   1134 Otherwise, the route lookup fails.
   1135 
   1136 \item If several routes remain after the previous steps, then
   1137 the routes with the best preference values are selected.
   1138 
   1139 \item If we still have several routes, then the {\em first\/} of them
   1140 is selected.
   1141 
   1142 \begin{NB}
   1143  Note the ambiguity of the last step. Unfortunately, Linux
   1144  historically allows such a bizarre situation. The sense of the
   1145 word ``first'' depends on the order of route additions and it is practically
   1146 impossible to maintain a bundle of such routes in this order.
   1147 \end{NB}
   1148 
   1149 For simplicity we will limit ourselves to the case where such a situation
   1150 is impossible and routes are uniquely identified by the triplet
   1151 \{prefix, tos, preference\}. Actually, it is impossible to create
   1152 non-unique routes with \verb|ip| commands described in this section.
   1153 
   1154 One useful exception to this rule is the default route on non-forwarding
   1155 hosts. It is ``officially'' allowed to have several fallback routes
   1156 when several routers are present on directly connected networks.
   1157 In this case, Linux-2.2 makes ``dead gateway detection''~\cite{RFC1122}
   1158 controlled by neighbour unreachability detection and by advice
   1159 from transport protocols to select a working router, so the order
   1160 of the routes is not essential. However, in this case,
   1161 fiddling with default routes manually is not recommended. Use the Router Discovery
   1162 protocol (see Appendix~\ref{EXAMPLE-SETUP}, p.\pageref{EXAMPLE-SETUP})
   1163 instead. Actually, Linux-2.2 IPv6 does not give user level applications
   1164 any access to default routes.
   1165 \end{enumerate}
   1166 
   1167 Certainly, the steps above are not performed exactly
   1168 in this sequence. Instead, the routing table in the kernel is kept
   1169 in some data structure to achieve the final result
   1170 with minimal cost. However, not depending on a particular
   1171 routing algorithm implemented in the kernel, we can summarize
   1172 the statements above as: a route is identified by the triplet
   1173 \{prefix, tos, preference\}. This {\em key\/} lets us locate
   1174 the route in the routing table.
   1175 
   1176 \paragraph{Route attributes:} Each route key refers to a routing
   1177 information record containing
   1178 the data required to deliver IP packets (f.e.\ output device and
   1179 next hop router) and some optional attributes (f.e. the path MTU or
   1180 the preferred source address when communicating with this destination).
   1181 These attributes are described in the following subsection.
   1182 
   1183 \paragraph{Route types:} \label{IP-ROUTE-TYPES}
   1184 It is important that the set
   1185 of required and optional attributes depend on the route {\em type\/}.
   1186 The most important route type
   1187 is \verb|unicast|. It describes real paths to other hosts.
   1188 As a rule, common routing tables contain only such routes. However,
   1189 there are other types of routes with different semantics. The
   1190 full list of types understood by Linux-2.2 is:
   1191 \begin{itemize}
   1192 \item \verb|unicast| --- the route entry describes real paths to the
   1193 destinations covered by the route prefix.
   1194 \item \verb|unreachable| --- these destinations are unreachable. Packets
   1195 are discarded and the ICMP message {\em host unreachable\/} is generated.
   1196 The local senders get an \verb|EHOSTUNREACH| error.
   1197 \item \verb|blackhole| --- these destinations are unreachable. Packets
   1198 are discarded silently. The local senders get an \verb|EINVAL| error.
   1199 \item \verb|prohibit| --- these destinations are unreachable. Packets
   1200 are discarded and the ICMP message {\em communication administratively
   1201 prohibited\/} is generated. The local senders get an \verb|EACCES| error.
   1202 \item \verb|local| --- the destinations are assigned to this
   1203 host. The packets are looped back and delivered locally.
   1204 \item \verb|broadcast| --- the destinations are broadcast addresses.
   1205 The packets are sent as link broadcasts.
   1206 \item \verb|throw| --- a special control route used together with policy
   1207 rules (see sec.\ref{IP-RULE}, p.\pageref{IP-RULE}). If such a route is selected, lookup
   1208 in this table is terminated pretending that no route was found.
   1209 Without policy routing it is equivalent to the absence of the route in the routing
   1210 table. The packets are dropped and the ICMP message {\em net unreachable\/}
   1211 is generated. The local senders get an \verb|ENETUNREACH| error.
   1212 \item \verb|nat| --- a special NAT route. Destinations covered by the prefix
   1213 are considered to be dummy (or external) addresses which require translation
   1214 to real (or internal) ones before forwarding. The addresses to translate to
   1215 are selected with the attribute \verb|via|. More about NAT is
   1216 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
   1217 \item \verb|anycast| --- ({\em not implemented\/}) the destinations are
   1218 {\em anycast\/} addresses assigned to this host. They are mainly equivalent
   1219 to \verb|local| with one difference: such addresses are invalid when used
   1220 as the source address of any packet.
   1221 \item \verb|multicast| --- a special type used for multicast routing.
   1222 It is not present in normal routing tables.
   1223 \end{itemize}
   1224 
   1225 \paragraph{Route tables:} Linux-2.2 can pack routes into several routing
   1226 tables identified by a number in the range from 1 to 255 or by
   1227 name from the file \verb|/etc/iproute2/rt_tables|. By default all normal
   1228 routes are inserted into the \verb|main| table (ID 254) and the kernel only uses
   1229 this table when calculating routes.
   1230 
   1231 Actually, one other table always exists, which is invisible but
   1232 even more important. It is the \verb|local| table (ID 255). This table
   1233 consists of routes for local and broadcast addresses. The kernel maintains
   1234 this table automatically and the administrator usually need not modify it
   1235 or even look at it.
   1236 
   1237 The multiple routing tables enter the game when {\em policy routing\/}
   1238 is used. See sec.\ref{IP-RULE}, p.\pageref{IP-RULE}.
   1239 In this case, the table identifier effectively becomes
   1240 one more parameter, which should be added to the triplet
   1241 \{prefix, tos, preference\} to uniquely identify the route.
   1242 
   1243 
   1244 \subsection{{\tt ip route add} --- add a new route\\
   1245 	{\tt ip route change} --- change a route\\
   1246 	{\tt ip route replace} --- change a route or add a new one}
   1247 \label{IP-ROUTE-ADD}
   1248 
   1249 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
   1250 	\verb|replace|, \verb|repl|.
   1251 
   1252 
   1253 \paragraph{Arguments:}
   1254 \begin{itemize}
   1255 \item \verb|to PREFIX| or \verb|to TYPE PREFIX| (default)
   1256 
   1257 --- the destination prefix of the route. If \verb|TYPE| is omitted,
   1258 \verb|ip| assumes type \verb|unicast|. Other values of \verb|TYPE|
   1259 are listed above. \verb|PREFIX| is an IP or IPv6 address optionally followed
   1260 by a slash and the prefix length. If the length of the prefix is missing,
   1261 \verb|ip| assumes a full-length host route. There is also a special
   1262 \verb|PREFIX| --- \verb|default| --- which is equivalent to IP \verb|0/0| or
   1263 to IPv6 \verb|::/0|.
   1264 
   1265 \item \verb|tos TOS| or \verb|dsfield TOS|
   1266 
   1267 --- the Type Of Service (TOS) key. This key has no associated mask and
   1268 the longest match is understood as: First, compare the TOS
   1269 of the route and of the packet. If they are not equal, then the packet
   1270 may still match a route with a zero TOS. \verb|TOS| is either an 8 bit hexadecimal
   1271 number or an identifier from {\tt /etc/iproute2/rt\_dsfield}.
   1272 
   1273 
   1274 \item \verb|metric NUMBER| or \verb|preference NUMBER|
   1275 
   1276 --- the preference value of the route. \verb|NUMBER| is an arbitrary 32bit number.
   1277 
   1278 \item \verb|table TABLEID|
   1279 
   1280 --- the table to add this route to.
   1281 \verb|TABLEID| may be a number or a string from the file
   1282 \verb|/etc/iproute2/rt_tables|. If this parameter is omitted,
   1283 \verb|ip| assumes the \verb|main| table, with the exception of
   1284 \verb|local|, \verb|broadcast| and \verb|nat| routes, which are
   1285 put into the \verb|local| table by default.
   1286 
   1287 \item \verb|dev NAME|
   1288 
   1289 --- the output device name.
   1290 
   1291 \item \verb|via ADDRESS|
   1292 
   1293 --- the address of the nexthop router. Actually, the sense of this field depends
   1294 on the route type. For normal \verb|unicast| routes it is either the true nexthop
   1295 router or, if it is a direct route installed in BSD compatibility mode,
   1296 it can be a local address of the interface.
   1297 For NAT routes it is the first address of the block of translated IP destinations.
   1298 
   1299 \item \verb|src ADDRESS|
   1300 
   1301 --- the source address to prefer when sending to the destinations
   1302 covered by the route prefix.
   1303 
   1304 \item \verb|realm REALMID|
   1305 
   1306 --- the realm to which this route is assigned.
   1307 \verb|REALMID| may be a number or a string from the file
   1308 \verb|/etc/iproute2/rt_realms|. Sec.\ref{RT-REALMS} (p.\pageref{RT-REALMS})
   1309 contains more information on realms.
   1310 
   1311 \item \verb|mtu MTU| or \verb|mtu lock MTU|
   1312 
   1313 --- the MTU along the path to the destination. If the modifier \verb|lock| is
   1314 not used, the MTU may be updated by the kernel due to Path MTU Discovery.
   1315 If the modifier \verb|lock| is used, no path MTU discovery will be tried,
   1316 all packets will be sent without the DF bit in IPv4 case
   1317 or fragmented to MTU for IPv6.
   1318 
   1319 \item \verb|window NUMBER|
   1320 
   1321 --- the maximal window for TCP to advertise to these destinations,
   1322 measured in bytes. It limits maximal data bursts that our TCP
   1323 peers are allowed to send to us.
   1324 
   1325 \item \verb|rtt NUMBER|
   1326 
   1327 --- the initial RTT (``Round Trip Time'') estimate.
   1328 
   1329 
   1330 \item \verb|rttvar NUMBER|
   1331 
   1332 --- \threeonly the initial RTT variance estimate.
   1333 
   1334 
   1335 \item \verb|ssthresh NUMBER|
   1336 
   1337 --- \threeonly an estimate for the initial slow start threshold.
   1338 
   1339 
   1340 \item \verb|cwnd NUMBER|
   1341 
   1342 --- \threeonly the clamp for congestion window. It is ignored if the \verb|lock|
   1343     flag is not used.
   1344 
   1345 
   1346 \item \verb|advmss NUMBER|
   1347 
   1348 --- \threeonly the MSS (``Maximal Segment Size'') to advertise to these
   1349     destinations when establishing TCP connections. If it is not given,
   1350     Linux uses a default value calculated from the first hop device MTU.
   1351 
   1352 \begin{NB}
   1353   If the path to these destination is asymmetric, this guess may be wrong.
   1354 \end{NB}
   1355 
   1356 \item \verb|reordering NUMBER|
   1357 
   1358 --- \threeonly Maximal reordering on the path to this destination.
   1359     If it is not given, Linux uses the value selected with \verb|sysctl|
   1360     variable \verb|net/ipv4/tcp_reordering|.
   1361 
   1362 \item \verb|hoplimit NUMBER|
   1363 
   1364 --- [2.5.74+ only] Maximum number of hops on the path to this destination.
   1365     The default is the value selected with the \verb|sysctl| variable
   1366     \verb|net/ipv4/ip_default_ttl|.
   1367 
   1368 \item \verb|initcwnd NUMBER|
   1369 --- [2.5.70+ only] Initial congestion window size for connections to
   1370     this destination. Actual window size is this value multiplied by the
   1371     MSS (``Maximal Segment Size'') for same connection. The default is
   1372     zero, meaning to use the values specified in~\cite{RFC2414}.
   1373 
   1374 +\item \verb|initrwnd NUMBER|
   1375  
   1376 +--- [2.6.33+ only] Initial receive window size for connections to 
   1377 +    this destination. The actual window size is this value multiplied
   1378 +    by the MSS (''Maximal Segment Size'') of the connection. The default
   1379 +    value is zero, meaning to use Slow Start value.
   1380  
   1381 \item \verb|nexthop NEXTHOP|
   1382 
   1383 --- the nexthop of a multipath route. \verb|NEXTHOP| is a complex value
   1384 with its own syntax similar to the top level argument lists:
   1385 \begin{itemize}
   1386 \item \verb|via ADDRESS| is the nexthop router.
   1387 \item \verb|dev NAME| is the output device.
   1388 \item \verb|weight NUMBER| is a weight for this element of a multipath
   1389 route reflecting its relative bandwidth or quality.
   1390 \end{itemize}
   1391 
   1392 \item \verb|scope SCOPE_VAL|
   1393 
   1394 --- the scope of the destinations covered by the route prefix.
   1395 \verb|SCOPE_VAL| may be a number or a string from the file
   1396 \verb|/etc/iproute2/rt_scopes|.
   1397 If this parameter is omitted,
   1398 \verb|ip| assumes scope \verb|global| for all gatewayed \verb|unicast|
   1399 routes, scope \verb|link| for direct \verb|unicast| and \verb|broadcast| routes
   1400 and scope \verb|host| for \verb|local| routes.
   1401 
   1402 \item \verb|protocol RTPROTO|
   1403 
   1404 --- the routing protocol identifier of this route.
   1405 \verb|RTPROTO| may be a number or a string from the file
   1406 \verb|/etc/iproute2/rt_protos|. If the routing protocol ID is
   1407 not given, \verb|ip| assumes protocol \verb|boot| (i.e.\
   1408 it assumes the route was added by someone who doesn't
   1409 understand what they are doing). Several protocol values have a fixed interpretation.
   1410 Namely:
   1411 \begin{itemize}
   1412 \item \verb|redirect| --- the route was installed due to an ICMP redirect.
   1413 \item \verb|kernel| --- the route was installed by the kernel during
   1414 autoconfiguration.
   1415 \item \verb|boot| --- the route was installed during the bootup sequence.
   1416 If a routing daemon starts, it will purge all of them.
   1417 \item \verb|static| --- the route was installed by the administrator
   1418 to override dynamic routing. Routing daemon will respect them
   1419 and, probably, even advertise them to its peers.
   1420 \item \verb|ra| --- the route was installed by Router Discovery protocol.
   1421 \end{itemize}
   1422 The rest of the values are not reserved and the administrator is free
   1423 to assign (or not to assign) protocol tags. At least, routing
   1424 daemons should take care of setting some unique protocol values,
   1425 f.e.\ as they are assigned in \verb|rtnetlink.h| or in \verb|rt_protos|
   1426 database.
   1427 
   1428 
   1429 \item \verb|onlink|
   1430 
   1431 --- pretend that the nexthop is directly attached to this link,
   1432 even if it does not match any interface prefix. One application of this
   1433 option may be found in~\cite{IP-TUNNELS}.
   1434 
   1435 \item \verb|pref PREF|
   1436 
   1437 --- the IPv6 route preference.
   1438 \verb|PREF| PREF is a string specifying the route preference as defined in
   1439 RFC4191 for Router Discovery messages. Namely:
   1440 \begin{itemize}
   1441 \item \verb|low| --- the route has a lowest priority.
   1442 \item \verb|medium| --- the route has a default priority.
   1443 \item \verb|high| --- the route has a highest priority.
   1444 \end{itemize}
   1445 
   1446 \end{itemize}
   1447 
   1448 
   1449 \begin{NB}
   1450   Actually there are more commands: \verb|prepend| does the same
   1451   thing as classic \verb|route add|, i.e.\ adds a route, even if another
   1452   route to the same destination exists. Its opposite case is \verb|append|,
   1453   which adds the route to the end of the list. Avoid these
   1454   features.
   1455 \end{NB}
   1456 \begin{NB}
   1457   More sad news, IPv6 only understands the \verb|append| command correctly.
   1458   All the others are translated into \verb|append| commands. Certainly,
   1459   this will change in the future.
   1460 \end{NB}
   1461 
   1462 \paragraph{Examples:}
   1463 \begin{itemize}
   1464 \item add a plain route to network 10.0.0/24 via gateway 193.233.7.65
   1465 \begin{verbatim}
   1466   ip route add 10.0.0/24 via 193.233.7.65
   1467 \end{verbatim}
   1468 \item change it to a direct route via the \verb|dummy| device
   1469 \begin{verbatim}
   1470   ip ro chg 10.0.0/24 dev dummy
   1471 \end{verbatim}
   1472 \item add a default multipath route splitting the load between \verb|ppp0|
   1473 and \verb|ppp1|
   1474 \begin{verbatim}
   1475   ip route add default scope global nexthop dev ppp0 \
   1476                                     nexthop dev ppp1
   1477 \end{verbatim}
   1478 Note the scope value. It is not necessary but it informs the kernel
   1479 that this route is gatewayed rather than direct. Actually, if you
   1480 know the addresses of remote endpoints it would be better to use the
   1481 \verb|via| parameter.
   1482 \item announce that the address 192.203.80.144 is not a real one, but
   1483 should be translated to 193.233.7.83 before forwarding
   1484 \begin{verbatim}
   1485   ip route add nat 192.203.80.144 via 193.233.7.83
   1486 \end{verbatim}
   1487 Backward translation is setup with policy rules described
   1488 in the following section (sec.\ref{IP-RULE}, p.\pageref{IP-RULE}).
   1489 \end{itemize}
   1490 
   1491 \subsection{{\tt ip route delete} --- delete a route}
   1492 
   1493 \paragraph{Abbreviations:} \verb|delete|, \verb|del|, \verb|d|.
   1494 
   1495 \paragraph{Arguments:} \verb|ip route del| has the same arguments as
   1496 \verb|ip route add|, but their semantics are a bit different.
   1497 
   1498 Key values (\verb|to|, \verb|tos|, \verb|preference| and \verb|table|)
   1499 select the route to delete. If optional attributes are present, \verb|ip|
   1500 verifies that they coincide with the attributes of the route to delete.
   1501 If no route with the given key and attributes was found, \verb|ip route del|
   1502 fails.
   1503 \begin{NB}
   1504 Linux-2.0 had the option to delete a route selected only by prefix address,
   1505 ignoring its length (i.e.\ netmask). This option no longer exists
   1506 because it was ambiguous. However, look at {\tt ip route flush}
   1507 (sec.\ref{IP-ROUTE-FLUSH}, p.\pageref{IP-ROUTE-FLUSH}) which
   1508 provides similar and even richer functionality.
   1509 \end{NB}
   1510 
   1511 \paragraph{Example:}
   1512 \begin{itemize}
   1513 \item delete the multipath route created by the command in previous subsection
   1514 \begin{verbatim}
   1515   ip route del default scope global nexthop dev ppp0 \
   1516                                     nexthop dev ppp1
   1517 \end{verbatim}
   1518 \end{itemize}
   1519 
   1520 
   1521 
   1522 \subsection{{\tt ip route show} --- list routes}
   1523 
   1524 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
   1525 
   1526 \paragraph{Description:} the command displays the contents of the routing tables
   1527 or the route(s) selected by some criteria.
   1528 
   1529 
   1530 \paragraph{Arguments:}
   1531 \begin{itemize}
   1532 \item \verb|to SELECTOR| (default)
   1533 
   1534 --- only select routes from the given range of destinations. \verb|SELECTOR|
   1535 consists of an optional modifier (\verb|root|, \verb|match| or \verb|exact|)
   1536 and a prefix. \verb|root PREFIX| selects routes with prefixes not shorter
   1537 than \verb|PREFIX|. F.e.\ \verb|root 0/0| selects the entire routing table.
   1538 \verb|match PREFIX| selects routes with prefixes not longer than
   1539 \verb|PREFIX|. F.e.\ \verb|match 10.0/16| selects \verb|10.0/16|,
   1540 \verb|10/8| and \verb|0/0|, but it does not select \verb|10.1/16| and
   1541 \verb|10.0.0/24|. And \verb|exact PREFIX| (or just \verb|PREFIX|)
   1542 selects routes with this exact prefix. If neither of these options
   1543 are present, \verb|ip| assumes \verb|root 0/0| i.e.\ it lists the entire table.
   1544 
   1545 
   1546 \item \verb|tos TOS| or \verb|dsfield TOS|
   1547 
   1548  --- only select routes with the given TOS.
   1549 
   1550 
   1551 \item \verb|table TABLEID|
   1552 
   1553  --- show the routes from this table(s). The default setting is to show
   1554 \verb|table| \verb|main|. \verb|TABLEID| may either be the ID of a real table
   1555 or one of the special values:
   1556   \begin{itemize}
   1557   \item \verb|all| --- list all of the tables.
   1558   \item \verb|cache| --- dump the routing cache.
   1559   \end{itemize}
   1560 \begin{NB}
   1561   IPv6 has a single table. However, splitting it into \verb|main|, \verb|local|
   1562   and \verb|cache| is emulated by the \verb|ip| utility.
   1563 \end{NB}
   1564 
   1565 \item \verb|cloned| or \verb|cached|
   1566 
   1567 --- list cloned routes i.e.\ routes which were dynamically forked from
   1568 other routes because some route attribute (f.e.\ MTU) was updated.
   1569 Actually, it is equivalent to \verb|table cache|.
   1570 
   1571 \item \verb|from SELECTOR|
   1572 
   1573 --- the same syntax as for \verb|to|, but it binds the source address range
   1574 rather than destinations. Note that the \verb|from| option only works with
   1575 cloned routes.
   1576 
   1577 \item \verb|protocol RTPROTO|
   1578 
   1579 --- only list routes of this protocol.
   1580 
   1581 
   1582 \item \verb|scope SCOPE_VAL|
   1583 
   1584 --- only list routes with this scope.
   1585 
   1586 \item \verb|type TYPE|
   1587 
   1588 --- only list routes of this type.
   1589 
   1590 \item \verb|dev NAME|
   1591 
   1592 --- only list routes going via this device.
   1593 
   1594 \item \verb|via PREFIX|
   1595 
   1596 --- only list routes going via the nexthop routers selected by \verb|PREFIX|.
   1597 
   1598 \item \verb|src PREFIX|
   1599 
   1600 --- only list routes with preferred source addresses selected
   1601 by \verb|PREFIX|.
   1602 
   1603 \item \verb|realm REALMID| or \verb|realms FROMREALM/TOREALM|
   1604 
   1605 --- only list routes with these realms.
   1606 
   1607 \end{itemize}
   1608 
   1609 \paragraph{Examples:} Let us count routes of protocol \verb|gated/bgp|
   1610 on a router:
   1611 \begin{verbatim}
   1612 kuznet@amber:~ $ ip ro ls proto gated/bgp | wc
   1613    1413    9891    79010
   1614 kuznet@amber:~ $
   1615 \end{verbatim}
   1616 To count the size of the routing cache, we have to use the \verb|-o| option
   1617 because cached attributes can take more than one line of output:
   1618 \begin{verbatim}
   1619 kuznet@amber:~ $ ip -o ro ls cloned | wc
   1620    159    2543    18707
   1621 kuznet@amber:~ $
   1622 \end{verbatim}
   1623 
   1624 
   1625 \paragraph{Output format:} The output of this command consists
   1626 of per route records separated by line feeds.
   1627 However, some records may consist
   1628 of more than one line: particularly, this is the case when the route
   1629 is cloned or you requested additional statistics. If the
   1630 \verb|-o| option was given, then line feeds separating lines inside
   1631 records are replaced with the backslash sign.
   1632 
   1633 The output has the same syntax as arguments given to {\tt ip route add},
   1634 so that it can be understood easily. F.e.\
   1635 \begin{verbatim}
   1636 kuznet@amber:~ $ ip ro ls 193.233.7/24
   1637 193.233.7.0/24 dev eth0  proto gated/conn  scope link \
   1638     src 193.233.7.65 realms inr.ac 
   1639 kuznet@amber:~ $
   1640 \end{verbatim}
   1641 
   1642 If you list cloned entries, the output contains other attributes which
   1643 are evaluated during route calculation and updated during route
   1644 lifetime. An example of the output is:
   1645 \begin{verbatim}
   1646 kuznet@amber:~ $ ip ro ls 193.233.7.82 tab cache
   1647 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
   1648   realms inr.ac/inr.ac 
   1649     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
   1650 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac 
   1651     cache  mtu 1500 rtt 300
   1652 kuznet@amber:~ $
   1653 \end{verbatim}
   1654 \begin{NB}
   1655   \label{NB-strange-route}
   1656   The route looks a bit strange, doesn't it? Did you notice that
   1657   it is a path from 193.233.7.82 back to 193.233.82? Well, you will
   1658   see in the section on \verb|ip route get| (p.\pageref{NB-nature-of-strangeness})
   1659   how it appeared.
   1660 \end{NB}
   1661 The second line, starting with the word \verb|cache|, shows
   1662 additional attributes which normal routes do not possess.
   1663 Cached flags are summarized in angle brackets:
   1664 \begin{itemize}
   1665 \item \verb|local| --- packets are delivered locally.
   1666 It stands for loopback unicast routes, for broadcast routes
   1667 and for multicast routes, if this host is a member of the corresponding
   1668 group.
   1669 
   1670 \item \verb|reject| --- the path is bad. Any attempt to use it results
   1671 in an error. See attribute \verb|error| below (p.\pageref{IP-ROUTE-GET-error}).
   1672 
   1673 \item \verb|mc| --- the destination is multicast.
   1674 
   1675 \item \verb|brd| --- the destination is broadcast.
   1676 
   1677 \item \verb|src-direct| --- the source is on a directly connected
   1678 interface.
   1679 
   1680 \item \verb|redirected| --- the route was created by an ICMP Redirect.
   1681 
   1682 \item \verb|redirect| --- packets going via this route will 
   1683 trigger an ICMP redirect.
   1684 
   1685 \item \verb|fastroute| --- the route is eligible to be used for fastroute.
   1686 
   1687 \item \verb|equalize| --- make packet by packet randomization
   1688 along this path.
   1689 
   1690 \item \verb|dst-nat| --- the destination address requires translation.
   1691 
   1692 \item \verb|src-nat| --- the source address requires translation.
   1693 
   1694 \item \verb|masq| --- the source address requires masquerading.
   1695 This feature disappeared in linux-2.4.
   1696 
   1697 \item \verb|notify| --- ({\em not implemented}) change/deletion
   1698 of this route will trigger RTNETLINK notification.
   1699 \end{itemize}
   1700 
   1701 Then some optional attributes follow:
   1702 \begin{itemize}
   1703 \item \verb|error| --- on \verb|reject| routes it is error code
   1704 returned to local senders when they try to use this route.
   1705 These error codes are translated into ICMP error codes, sent to remote
   1706 senders, according to the rules described above in the subsection
   1707 devoted to route types (p.\pageref{IP-ROUTE-TYPES}).
   1708 \label{IP-ROUTE-GET-error}
   1709 
   1710 \item \verb|expires| --- this entry will expire after this timeout.
   1711 
   1712 \item \verb|iif| --- the packets for this path are expected to arrive
   1713 on this interface.
   1714 \end{itemize}
   1715 
   1716 \paragraph{Statistics:} With the \verb|-statistics| option, more
   1717 information about this route is shown:
   1718 \begin{itemize}
   1719 \item \verb|users| --- the number of users of this entry.
   1720 \item \verb|age| --- shows when this route was last used.
   1721 \item \verb|used| --- the number of lookups of this route since its creation.
   1722 \end{itemize}
   1723 
   1724 \subsection{{\tt ip route save} -- save routing tables}
   1725 \label{IP-ROUTE-SAVE}
   1726 
   1727 \paragraph{Description:} this command saves the contents of the routing
   1728 tables or the route(s) selected by some criteria to standard output.
   1729 
   1730 \paragraph{Arguments:} \verb|ip route save| has the same arguments as
   1731 \verb|ip route show|.
   1732 
   1733 \paragraph{Example:} This saves all the routes to the {\tt saved\_routes}
   1734 file:
   1735 \begin{verbatim}
   1736 dan@caffeine:~ # ip route save > saved_routes
   1737 \end{verbatim}
   1738 
   1739 \paragraph{Output format:} The format of the data stream provided by
   1740 \verb|ip route save| is that of \verb|rtnetlink|.  See
   1741 \verb|rtnetlink(7)| for more information.
   1742 
   1743 \subsection{{\tt ip route restore} -- restore routing tables}
   1744 \label{IP-ROUTE-RESTORE}
   1745 
   1746 \paragraph{Description:} this command restores the contents of the routing
   1747 tables according to a data stream as provided by \verb|ip route save| via
   1748 standard input.  Note that any routes already in the table are left unchanged.
   1749 Any routes in the input stream that already exist in the tables are ignored.
   1750 
   1751 \paragraph{Arguments:} This command takes no arguments.
   1752 
   1753 \paragraph{Example:} This restores all routes that were saved to the
   1754 {\tt saved\_routes} file:
   1755 
   1756 \begin{verbatim}
   1757 dan@caffeine:~ # ip route restore < saved_routes
   1758 \end{verbatim}
   1759 
   1760 \subsection{{\tt ip route flush} --- flush routing tables}
   1761 \label{IP-ROUTE-FLUSH}
   1762 
   1763 \paragraph{Abbreviations:} \verb|flush|, \verb|f|.
   1764 
   1765 \paragraph{Description:} this command flushes routes selected
   1766 by some criteria.
   1767 
   1768 \paragraph{Arguments:} the arguments have the same syntax and semantics
   1769 as the arguments of \verb|ip route show|, but routing tables are not
   1770 listed but purged. The only difference is the default action: \verb|show|
   1771 dumps all the IP main routing table but \verb|flush| prints the helper page.
   1772 The reason for this difference does not require any explanation, does it?
   1773 
   1774 
   1775 \paragraph{Statistics:} With the \verb|-statistics| option, the command
   1776 becomes verbose. It prints out the number of deleted routes and the number
   1777 of rounds made to flush the routing table. If the option is given
   1778 twice, \verb|ip route flush| also dumps all the deleted routes
   1779 in the format described in the previous subsection.
   1780 
   1781 \paragraph{Examples:} The first example flushes all the
   1782 gatewayed routes from the main table (f.e.\ after a routing daemon crash).
   1783 \begin{verbatim}
   1784 netadm@amber:~ # ip -4 ro flush scope global type unicast
   1785 \end{verbatim}
   1786 This option deserves to be put into a scriptlet \verb|routef|.
   1787 \begin{NB}
   1788 This option was described in the \verb|route(8)| man page borrowed
   1789 from BSD, but was never implemented in Linux.
   1790 \end{NB}
   1791 
   1792 The second example flushes all IPv6 cloned routes:
   1793 \begin{verbatim}
   1794 netadm@amber:~ # ip -6 -s -s ro flush cache
   1795 3ffe:2400::220:afff:fef4:c5d1 via 3ffe:2400::220:afff:fef4:c5d1 \
   1796   dev eth0  metric 0 
   1797     cache  used 2 age 12sec mtu 1500 rtt 300
   1798 3ffe:2400::280:adff:feb7:8034 via 3ffe:2400::280:adff:feb7:8034 \
   1799   dev eth0  metric 0 
   1800     cache  used 2 age 15sec mtu 1500 rtt 300
   1801 3ffe:2400::280:c8ff:fe59:5bcc via 3ffe:2400::280:c8ff:fe59:5bcc \
   1802   dev eth0  metric 0 
   1803     cache  users 1 used 1 age 23sec mtu 1500 rtt 300
   1804 3ffe:2400:0:1:2a0:ccff:fe66:1878 via 3ffe:2400:0:1:2a0:ccff:fe66:1878 \
   1805   dev eth1  metric 0 
   1806     cache  used 2 age 20sec mtu 1500 rtt 300
   1807 3ffe:2400:0:1:a00:20ff:fe71:fb30 via 3ffe:2400:0:1:a00:20ff:fe71:fb30 \
   1808   dev eth1  metric 0 
   1809     cache  used 2 age 33sec mtu 1500 rtt 300
   1810 ff02::1 via ff02::1 dev eth1  metric 0 
   1811     cache  users 1 used 1 age 45sec mtu 1500 rtt 300
   1812 
   1813 *** Round 1, deleting 6 entries ***
   1814 *** Flush is complete after 1 round ***
   1815 netadm@amber:~ # ip -6 -s -s ro flush cache
   1816 Nothing to flush.
   1817 netadm@amber:~ #
   1818 \end{verbatim}
   1819 
   1820 The third example flushes BGP routing tables after a \verb|gated|
   1821 death.
   1822 \begin{verbatim}
   1823 netadm@amber:~ # ip ro ls proto gated/bgp | wc
   1824    1408    9856    78730
   1825 netadm@amber:~ # ip -s ro f proto gated/bgp
   1826 
   1827 *** Round 1, deleting 1408 entries ***
   1828 *** Flush is complete after 1 round ***
   1829 netadm@amber:~ # ip ro f proto gated/bgp
   1830 Nothing to flush.
   1831 netadm@amber:~ # ip ro ls proto gated/bgp
   1832 netadm@amber:~ #
   1833 \end{verbatim}
   1834 
   1835 
   1836 \subsection{{\tt ip route get} --- get a single route}
   1837 \label{IP-ROUTE-GET}
   1838 
   1839 \paragraph{Abbreviations:} \verb|get|, \verb|g|.
   1840 
   1841 \paragraph{Description:} this command gets a single route to a destination
   1842 and prints its contents exactly as the kernel sees it.
   1843 
   1844 \paragraph{Arguments:} 
   1845 \begin{itemize}
   1846 \item \verb|to ADDRESS| (default)
   1847 
   1848 --- the destination address.
   1849 
   1850 \item \verb|from ADDRESS|
   1851 
   1852 --- the source address.
   1853 
   1854 \item \verb|tos TOS| or \verb|dsfield TOS|
   1855 
   1856 --- the Type Of Service.
   1857 
   1858 \item \verb|iif NAME|
   1859 
   1860 --- the device from which this packet is expected to arrive.
   1861 
   1862 \item \verb|oif NAME|
   1863 
   1864 --- force the output device on which this packet will be routed.
   1865 
   1866 \item \verb|connected|
   1867 
   1868 --- if no source address (option \verb|from|) was given, relookup
   1869 the route with the source set to the preferred address received from the first lookup.
   1870 If policy routing is used, it may be a different route.
   1871 
   1872 \end{itemize}
   1873 
   1874 Note that this operation is not equivalent to \verb|ip route show|.
   1875 \verb|show| shows existing routes. \verb|get| resolves them and
   1876 creates new clones if necessary. Essentially, \verb|get|
   1877 is equivalent to sending a packet along this path.
   1878 If the \verb|iif| argument is not given, the kernel creates a route
   1879 to output packets towards the requested destination.
   1880 This is equivalent to pinging the destination
   1881 with a subsequent {\tt ip route ls cache}, however, no packets are
   1882 actually sent. With the \verb|iif| argument, the kernel pretends
   1883 that a packet arrived from this interface and searches for
   1884 a path to forward the packet.
   1885 
   1886 \paragraph{Output format:} This command outputs routes in the same
   1887 format as \verb|ip route ls|.
   1888 
   1889 \paragraph{Examples:} 
   1890 \begin{itemize}
   1891 \item Find a route to output packets to 193.233.7.82:
   1892 \begin{verbatim}
   1893 kuznet@amber:~ $ ip route get 193.233.7.82
   1894 193.233.7.82 dev eth0  src 193.233.7.65 realms inr.ac
   1895     cache  mtu 1500 rtt 300
   1896 kuznet@amber:~ $
   1897 \end{verbatim}
   1898 
   1899 \item Find a route to forward packets arriving on \verb|eth0|
   1900 from 193.233.7.82 and destined for 193.233.7.82:
   1901 \begin{verbatim}
   1902 kuznet@amber:~ $ ip r g 193.233.7.82 from 193.233.7.82 iif eth0
   1903 193.233.7.82 from 193.233.7.82 dev eth0  src 193.233.7.65 \
   1904   realms inr.ac/inr.ac 
   1905     cache <src-direct,redirect>  mtu 1500 rtt 300 iif eth0
   1906 kuznet@amber:~ $
   1907 \end{verbatim}
   1908 \begin{NB}
   1909   \label{NB-nature-of-strangeness}
   1910   This is the command that created the funny route from 193.233.7.82
   1911   looped back to 193.233.7.82 (cf.\ NB on~p.\pageref{NB-strange-route}).
   1912   Note the \verb|redirect| flag on it.
   1913 \end{NB}
   1914 
   1915 \item Find a multicast route for packets arriving on \verb|eth0|
   1916 from host 193.233.7.82 and destined for multicast group 224.2.127.254
   1917 (it is assumed that a multicast routing daemon is running.
   1918 In this case, it is \verb|pimd|)
   1919 \begin{verbatim}
   1920 kuznet@amber:~ $ ip r g 224.2.127.254 from 193.233.7.82 iif eth0
   1921 multicast 224.2.127.254 from 193.233.7.82 dev lo  \
   1922   src 193.233.7.65 realms inr.ac/cosmos 
   1923     cache <mc> iif eth0 Oifs: eth1 pimreg
   1924 kuznet@amber:~ $
   1925 \end{verbatim}
   1926 This route differs from the ones seen before. It contains a ``normal'' part
   1927 and a ``multicast'' part. The normal part is used to deliver (or not to
   1928 deliver) the packet to local IP listeners. In this case the router
   1929 is not a member
   1930 of this group, so that route has no \verb|local| flag and only
   1931 forwards packets. The output device for such entries is always loopback.
   1932 The multicast part consists of an additional \verb|Oifs:| list showing
   1933 the output interfaces.
   1934 \end{itemize}
   1935 
   1936 
   1937 It is time for a more complicated example. Let us add an invalid
   1938 gatewayed route for a destination which is really directly connected:
   1939 \begin{verbatim}
   1940 netadm@alisa:~ # ip route add 193.233.7.98 via 193.233.7.254
   1941 netadm@alisa:~ # ip route get 193.233.7.98
   1942 193.233.7.98 via 193.233.7.254 dev eth0  src 193.233.7.90
   1943     cache  mtu 1500 rtt 3072
   1944 netadm@alisa:~ #
   1945 \end{verbatim}
   1946 and probe it with ping:
   1947 \begin{verbatim}
   1948 netadm@alisa:~ # ping -n 193.233.7.98
   1949 PING 193.233.7.98 (193.233.7.98) from 193.233.7.90 : 56 data bytes
   1950 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
   1951 64 bytes from 193.233.7.98: icmp_seq=0 ttl=255 time=3.5 ms
   1952 From 193.233.7.254: Redirect Host(New nexthop: 193.233.7.98)
   1953 64 bytes from 193.233.7.98: icmp_seq=1 ttl=255 time=2.2 ms
   1954 64 bytes from 193.233.7.98: icmp_seq=2 ttl=255 time=0.4 ms
   1955 64 bytes from 193.233.7.98: icmp_seq=3 ttl=255 time=0.4 ms
   1956 64 bytes from 193.233.7.98: icmp_seq=4 ttl=255 time=0.4 ms
   1957 ^C
   1958 --- 193.233.7.98 ping statistics ---
   1959 5 packets transmitted, 5 packets received, 0% packet loss
   1960 round-trip min/avg/max = 0.4/1.3/3.5 ms
   1961 netadm@alisa:~ #
   1962 \end{verbatim}
   1963 What happened? Router 193.233.7.254 understood that we have a much
   1964 better path to the destination and sent us an ICMP redirect message.
   1965 We may retry \verb|ip route get| to see what we have in the routing
   1966 tables now:
   1967 \begin{verbatim}
   1968 netadm@alisa:~ # ip route get 193.233.7.98
   1969 193.233.7.98 dev eth0  src 193.233.7.90 
   1970     cache <redirected>  mtu 1500 rtt 3072
   1971 netadm@alisa:~ #
   1972 \end{verbatim}
   1973 
   1974 
   1975 
   1976 \section{{\tt ip rule} --- routing policy database management}
   1977 \label{IP-RULE}
   1978 
   1979 \paragraph{Abbreviations:} \verb|rule|, \verb|ru|.
   1980 
   1981 \paragraph{Object:} \verb|rule|s in the routing policy database control
   1982 the route selection algorithm.
   1983 
   1984 Classic routing algorithms used in the Internet make routing decisions
   1985 based only on the destination address of packets (and in theory,
   1986 but not in practice, on the TOS field). The seminal review of classic
   1987 routing algorithms and their modifications can be found in~\cite{RFC1812}.
   1988 
   1989 In some circumstances we want to route packets differently depending not only
   1990 on destination addresses, but also on other packet fields: source address,
   1991 IP protocol, transport protocol ports or even packet payload.
   1992 This task is called ``policy routing''.
   1993 
   1994 \begin{NB}
   1995   ``policy routing'' $\neq$ ``routing policy''.
   1996 
   1997 \noindent	``policy routing'' $=$ ``cunning routing''.
   1998 
   1999 \noindent	``routing policy'' $=$ ``routing tactics'' or ``routing plan''.
   2000 \end{NB}
   2001 
   2002 To solve this task, the conventional destination based routing table, ordered
   2003 according to the longest match rule, is replaced with a ``routing policy
   2004 database'' (or RPDB), which selects routes
   2005 by executing some set of rules. The rules may have lots of keys of different
   2006 natures and therefore they have no natural ordering, but one imposed
   2007 by the administrator. Linux-2.2 RPDB is a linear list of rules
   2008 ordered by numeric priority value.
   2009 RPDB explicitly allows matching a few packet fields:
   2010 
   2011 \begin{itemize}
   2012 \item packet source address.
   2013 \item packet destination address.
   2014 \item TOS.
   2015 \item incoming interface (which is packet metadata, rather than a packet field).
   2016 \end{itemize}
   2017 
   2018 Matching IP protocols and transport ports is also possible,
   2019 indirectly, via \verb|ipchains|, by exploiting their ability
   2020 to mark some classes of packets with \verb|fwmark|. Therefore,
   2021 \verb|fwmark| is also included in the set of keys checked by rules.
   2022 
   2023 Each policy routing rule consists of a {\em selector\/} and an {\em action\/}
   2024 predicate. The RPDB is scanned in the order of increasing priority. The selector
   2025 of each rule is applied to \{source address, destination address, incoming
   2026 interface, tos, fwmark\} and, if the selector matches the packet,
   2027 the action is performed.  The action predicate may return with success.
   2028 In this case, it will either give a route or failure indication
   2029 and the RPDB lookup is terminated. Otherwise, the RPDB program
   2030 continues on the next rule.
   2031 
   2032 What is the action, semantically? The natural action is to select the
   2033 nexthop and the output device. This is what
   2034 Cisco IOS~\cite{IOS} does. Let us call it ``match \& set''.
   2035 The Linux-2.2 approach is more flexible. The action includes
   2036 lookups in destination-based routing tables and selecting
   2037 a route from these tables according to the classic longest match algorithm.
   2038 The ``match \& set'' approach is the simplest case of the Linux one. It is realized
   2039 when a second level routing table contains a single default route.
   2040 Recall that Linux-2.2 supports multiple tables
   2041 managed with the \verb|ip route| command, described in the previous section.
   2042 
   2043 At startup time the kernel configures the default RPDB consisting of three
   2044 rules:
   2045 
   2046 \begin{enumerate}
   2047 \item Priority: 0, Selector: match anything, Action: lookup routing
   2048 table \verb|local| (ID 255).
   2049 The \verb|local| table is a special routing table containing
   2050 high priority control routes for local and broadcast addresses.
   2051 
   2052 Rule 0 is special. It cannot be deleted or overridden.
   2053 
   2054 
   2055 \item Priority: 32766, Selector: match anything, Action: lookup routing
   2056 table \verb|main| (ID 254).
   2057 The \verb|main| table is the normal routing table containing all non-policy
   2058 routes. This rule may be deleted and/or overridden with other
   2059 ones by the administrator.
   2060 
   2061 \item Priority: 32767, Selector: match anything, Action: lookup routing
   2062 table \verb|default| (ID 253).
   2063 The \verb|default| table is empty. It is reserved for some
   2064 post-processing if no previous default rules selected the packet.
   2065 This rule may also be deleted.
   2066 
   2067 \end{enumerate}
   2068 
   2069 Do not confuse routing tables with rules: rules point to routing tables,
   2070 several rules may refer to one routing table and some routing tables
   2071 may have no rules pointing to them. If the administrator deletes all the rules
   2072 referring to a table, the table is not used, but it still exists
   2073 and will disappear only after all the routes contained in it are deleted.
   2074 
   2075 
   2076 \paragraph{Rule attributes:} Each RPDB entry has additional
   2077 attributes. F.e.\ each rule has a pointer to some routing
   2078 table. NAT and masquerading rules have an attribute to select new IP
   2079 address to translate/masquerade. Besides that, rules have some
   2080 optional attributes, which routes have, namely \verb|realms|.
   2081 These values do not override those contained in the routing tables. They
   2082 are only used if the route did not select any attributes.
   2083 
   2084 
   2085 \paragraph{Rule types:} The RPDB may contain rules of the following
   2086 types:
   2087 \begin{itemize}
   2088 \item \verb|unicast| --- the rule prescribes to return the route found
   2089 in the routing table referenced by the rule.
   2090 \item \verb|blackhole| --- the rule prescribes to silently drop the packet.
   2091 \item \verb|unreachable| --- the rule prescribes to generate a ``Network
   2092 is unreachable'' error.
   2093 \item \verb|prohibit| --- the rule prescribes to generate
   2094 ``Communication is administratively prohibited'' error.
   2095 \item \verb|nat| --- the rule prescribes to translate the source address
   2096 of the IP packet into some other value. More about NAT is
   2097 in Appendix~\ref{ROUTE-NAT}, p.\pageref{ROUTE-NAT}.
   2098 \end{itemize}
   2099 
   2100 
   2101 \paragraph{Commands:} \verb|add|, \verb|delete| and \verb|show|
   2102 (or \verb|list|).
   2103 
   2104 \subsection{{\tt ip rule add} --- insert a new rule\\
   2105 	{\tt ip rule delete} --- delete a rule}
   2106 \label{IP-RULE-ADD}
   2107 
   2108 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|,
   2109 	\verb|d|.
   2110 
   2111 \paragraph{Arguments:}
   2112 
   2113 \begin{itemize}
   2114 \item \verb|type TYPE| (default)
   2115 
   2116 --- the type of this rule. The list of valid types was given in the previous
   2117 subsection.
   2118 
   2119 \item \verb|from PREFIX|
   2120 
   2121 --- select the source prefix to match.
   2122 
   2123 \item \verb|to PREFIX|
   2124 
   2125 --- select the destination prefix to match.
   2126 
   2127 \item \verb|iif NAME|
   2128 
   2129 --- select the incoming device to match. If the interface is loopback,
   2130 the rule only matches packets originating from this host. This means that you
   2131 may create separate routing tables for forwarded and local packets and,
   2132 hence, completely segregate them.
   2133 
   2134 \item \verb|tos TOS| or \verb|dsfield TOS|
   2135 
   2136 --- select the TOS value to match.
   2137 
   2138 \item \verb|fwmark MARK|
   2139 
   2140 --- select the \verb|fwmark| value to match.
   2141 
   2142 \item \verb|priority PREFERENCE|
   2143 
   2144 --- the priority of this rule. Each rule should have an explicitly
   2145 set {\em unique\/} priority value.
   2146 \begin{NB}
   2147   Really, for historical reasons \verb|ip rule add| does not require a
   2148   priority value and allows them to be non-unique.
   2149   If the user does not supplied a priority, it is selected by the kernel.
   2150   If the user creates a rule with a priority value that
   2151   already exists, the kernel does not reject the request. It adds
   2152   the new rule before all old rules of the same priority.
   2153 
   2154   It is mistake in design, no more. And it will be fixed one day,
   2155   so do not rely on this feature. Use explicit priorities.
   2156 \end{NB}
   2157 
   2158 
   2159 \item \verb|table TABLEID|
   2160 
   2161 --- the routing table identifier to lookup if the rule selector matches.
   2162 
   2163 \item \verb|realms FROM/TO|
   2164 
   2165 --- Realms to select if the rule matched and the routing table lookup
   2166 succeeded. Realm \verb|TO| is only used if the route did not select
   2167 any realm.
   2168 
   2169 \item \verb|nat ADDRESS|
   2170 
   2171 --- The base of the IP address block to translate (for source addresses).
   2172 The \verb|ADDRESS| may be either the start of the block of NAT addresses
   2173 (selected by NAT routes) or in linux-2.2 a local host address (or even zero).
   2174 In the last case the router does not translate the packets,
   2175 but masquerades them to this address; this feature disappered in 2.4.
   2176 More about NAT is in Appendix~\ref{ROUTE-NAT},
   2177 p.\pageref{ROUTE-NAT}.
   2178 
   2179 \end{itemize}
   2180 
   2181 \paragraph{Warning:} Changes to the RPDB made with these commands
   2182 do not become active immediately. It is assumed that after
   2183 a script finishes a batch of updates, it flushes the routing cache
   2184 with \verb|ip route flush cache|.
   2185 
   2186 \paragraph{Examples:}
   2187 \begin{itemize}
   2188 \item Route packets with source addresses from 192.203.80/24
   2189 according to routing table \verb|inr.ruhep|:
   2190 \begin{verbatim}
   2191 ip ru add from 192.203.80.0/24 table inr.ruhep prio 220
   2192 \end{verbatim}
   2193 
   2194 \item Translate packet source address 193.233.7.83 into 192.203.80.144
   2195 and route it according to table \#1 (actually, it is \verb|inr.ruhep|):
   2196 \begin{verbatim}
   2197 ip ru add from 193.233.7.83 nat 192.203.80.144 table 1 prio 320
   2198 \end{verbatim}
   2199 
   2200 \item Delete the unused default rule:
   2201 \begin{verbatim}
   2202 ip ru del prio 32767
   2203 \end{verbatim}
   2204 
   2205 \end{itemize}
   2206 
   2207 
   2208 
   2209 \subsection{{\tt ip rule show} --- list rules}
   2210 \label{IP-RULE-SHOW}
   2211 
   2212 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
   2213 
   2214 
   2215 \paragraph{Arguments:} Good news, this is one command that has no arguments.
   2216 
   2217 \paragraph{Output format:}
   2218 
   2219 \begin{verbatim}
   2220 kuznet@amber:~ $ ip ru ls
   2221 0:	from all lookup local 
   2222 200:	from 192.203.80.0/24 to 193.233.7.0/24 lookup main
   2223 210:	from 192.203.80.0/24 to 192.203.80.0/24 lookup main
   2224 220:	from 192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
   2225 300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
   2226 310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
   2227 320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
   2228 32766:	from all lookup main 
   2229 kuznet@amber:~ $
   2230 \end{verbatim}
   2231 
   2232 In the first column is the rule priority value followed
   2233 by a colon. Then the selectors follow. Each key is prefixed
   2234 with the same keyword that was used to create the rule.
   2235 
   2236 The keyword \verb|lookup| is followed by a routing table identifier,
   2237 as it is recorded in the file \verb|/etc/iproute2/rt_tables|.
   2238 
   2239 If the rule does NAT (f.e.\ rule \#320), it is shown by the keyword
   2240 \verb|map-to| followed by the start of the block of addresses to map.
   2241 
   2242 The sense of this example is pretty simple. The prefixes
   2243 192.203.80.0/24 and 193.233.7.0/24 form the internal network, but
   2244 they are routed differently when the packets leave it.
   2245 Besides that, the host 193.233.7.83 is translated into
   2246 another prefix to look like 192.203.80.144 when talking
   2247 to the outer world.
   2248 
   2249 \subsection{{\tt ip rule save} -- save rules tables}
   2250 \label{IP-RULE-SAVE}
   2251 
   2252 \paragraph{Description:} this command saves the contents of the rules
   2253 tables or the rule(s) selected by some criteria to standard output.
   2254 
   2255 \paragraph{Arguments:} \verb|ip rule save| has the same arguments as
   2256 \verb|ip rule show|.
   2257 
   2258 \paragraph{Example:} This saves all the rules to the {\tt saved\_rules}
   2259 file:
   2260 \begin{verbatim}
   2261 dan@caffeine:~ # ip rule save > saved_rules
   2262 \end{verbatim}
   2263 
   2264 \paragraph{Output format:} The format of the data stream provided by
   2265 \verb|ip rule save| is that of \verb|rtnetlink|.  See
   2266 \verb|rtnetlink(7)| for more information.
   2267 
   2268 \subsection{{\tt ip rule restore} -- restore rules tables}
   2269 \label{IP-RULE-RESTORE}
   2270 
   2271 \paragraph{Description:} this command restores the contents of the rules
   2272 tables according to a data stream as provided by \verb|ip rule save| via
   2273 standard input.  Note that any rules already in the table are left unchanged,
   2274 and duplicates are not ignored.
   2275 
   2276 \paragraph{Arguments:} This command takes no arguments.
   2277 
   2278 \paragraph{Example:} This restores all rules that were saved to the
   2279 {\tt saved\_rules} file:
   2280 
   2281 \begin{verbatim}
   2282 dan@caffeine:~ # ip rule restore < saved_rules
   2283 \end{verbatim}
   2284 
   2285 
   2286 
   2287 \section{{\tt ip maddress} --- multicast addresses management}
   2288 \label{IP-MADDR}
   2289 
   2290 \paragraph{Object:} \verb|maddress| objects are multicast addresses.
   2291 
   2292 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|show| (or \verb|list|).
   2293 
   2294 \subsection{{\tt ip maddress show} --- list multicast addresses}
   2295 
   2296 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
   2297 
   2298 \paragraph{Arguments:}
   2299 
   2300 \begin{itemize}
   2301 
   2302 \item \verb|dev NAME| (default)
   2303 
   2304 --- the device name.
   2305 
   2306 \end{itemize}
   2307 
   2308 \paragraph{Output format:}
   2309 
   2310 \begin{verbatim}
   2311 kuznet@alisa:~ $ ip maddr ls dummy
   2312 2:  dummy
   2313     link  33:33:00:00:00:01
   2314     link  01:00:5e:00:00:01
   2315     inet  224.0.0.1 users 2
   2316     inet6 ff02::1
   2317 kuznet@alisa:~ $ 
   2318 \end{verbatim}
   2319 
   2320 The first line of the output shows the interface index and its name.
   2321 Then the multicast address list follows. Each line starts with the
   2322 protocol identifier. The word \verb|link| denotes a link layer
   2323 multicast addresses.
   2324 
   2325 If a multicast address has more than one user, the number
   2326 of users is shown after the \verb|users| keyword.
   2327 
   2328 One additional feature not present in the example above
   2329 is the \verb|static| flag, which indicates that the address was joined
   2330 with \verb|ip maddr add|. See the following subsection.
   2331 
   2332 
   2333 
   2334 \subsection{{\tt ip maddress add} --- add a multicast address\\
   2335 	    {\tt ip maddress delete} --- delete a multicast address}
   2336 
   2337 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|delete|, \verb|del|, \verb|d|.
   2338 
   2339 \paragraph{Description:} these commands attach/detach
   2340 a static link layer multicast address to listen on the interface.
   2341 Note that it is impossible to join protocol multicast groups
   2342 statically. This command only manages link layer addresses.
   2343 
   2344 
   2345 \paragraph{Arguments:}
   2346 
   2347 \begin{itemize}
   2348 \item \verb|address LLADDRESS| (default)
   2349 
   2350 --- the link layer multicast address.
   2351 
   2352 \item \verb|dev NAME|
   2353 
   2354 --- the device to join/leave this multicast address.
   2355 
   2356 \end{itemize}
   2357 
   2358 
   2359 \paragraph{Example:} Let us continue with the example from the previous subsection.
   2360 
   2361 \begin{verbatim}
   2362 netadm@alisa:~ # ip maddr add 33:33:00:00:00:01 dev dummy
   2363 netadm@alisa:~ # ip -0 maddr ls dummy
   2364 2:  dummy
   2365     link  33:33:00:00:00:01 users 2 static
   2366     link  01:00:5e:00:00:01
   2367 netadm@alisa:~ # ip maddr del 33:33:00:00:00:01 dev dummy
   2368 \end{verbatim}
   2369 
   2370 \begin{NB}
   2371  Neither \verb|ip| nor the kernel check for multicast address validity.
   2372  Particularly, this means that you can try to load a unicast address
   2373  instead of a multicast address. Most drivers will ignore such addresses,
   2374  but several (f.e.\ Tulip) will intern it to their on-board filter.
   2375  The effects may be strange. Namely, the addresses become additional
   2376  local link addresses and, if you loaded the address of another host
   2377  to the router, wait for duplicated packets on the wire.
   2378  It is not a bug, but rather a hole in the API and intra-kernel interfaces.
   2379  This feature is really more useful for traffic monitoring, but using it
   2380  with Linux-2.2 you {\em have to\/} be sure that the host is not
   2381  a router and, especially, that it is not a transparent proxy or masquerading
   2382  agent.
   2383 \end{NB}
   2384 
   2385 
   2386 
   2387 \section{{\tt ip mroute} --- multicast routing cache management}
   2388 \label{IP-MROUTE}
   2389 
   2390 \paragraph{Abbreviations:} \verb|mroute|, \verb|mr|.
   2391 
   2392 \paragraph{Object:} \verb|mroute| objects are multicast routing cache
   2393 entries created by a user level mrouting daemon
   2394 (f.e.\ \verb|pimd| or \verb|mrouted|).
   2395 
   2396 Due to the limitations of the current interface to the multicast routing
   2397 engine, it is impossible to change \verb|mroute| objects administratively,
   2398 so we may only display them. This limitation will be removed
   2399 in the future.
   2400 
   2401 \paragraph{Commands:} \verb|show| (or \verb|list|).
   2402 
   2403 
   2404 \subsection{{\tt ip mroute show} --- list mroute cache entries}
   2405 
   2406 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
   2407 
   2408 \paragraph{Arguments:}
   2409 
   2410 \begin{itemize}
   2411 \item \verb|to PREFIX| (default)
   2412 
   2413 --- the prefix selecting the destination multicast addresses to list.
   2414 
   2415 
   2416 \item \verb|iif NAME|
   2417 
   2418 --- the interface on which multicast packets are received.
   2419 
   2420 
   2421 \item \verb|from PREFIX|
   2422 
   2423 --- the prefix selecting the IP source addresses of the multicast route.
   2424 
   2425 
   2426 \end{itemize}
   2427 
   2428 \paragraph{Output format:}
   2429 
   2430 \begin{verbatim}
   2431 kuznet@amber:~ $ ip mroute ls
   2432 (193.232.127.6, 224.0.1.39)      Iif: unresolved 
   2433 (193.232.244.34, 224.0.1.40)     Iif: unresolved 
   2434 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
   2435 kuznet@amber:~ $ 
   2436 \end{verbatim}
   2437 
   2438 Each line shows one (S,G) entry in the multicast routing cache,
   2439 where S is the source address and G is the multicast group. \verb|Iif| is
   2440 the interface on which multicast packets are expected to arrive.
   2441 If the word \verb|unresolved| is there instead of the interface name,
   2442 it means that the routing daemon still hasn't resolved this entry.
   2443 The keyword \verb|oifs| is followed by a list of output interfaces, separated
   2444 by spaces. If a multicast routing entry is created with non-trivial
   2445 TTL scope, administrative distances are appended to the device names
   2446 in the \verb|oifs| list.
   2447 
   2448 \paragraph{Statistics:} The \verb|-statistics| option also prints the
   2449 number of packets and bytes forwarded along this route and
   2450 the number of packets that arrived on the wrong interface, if this number is not zero.
   2451 
   2452 \begin{verbatim}
   2453 kuznet@amber:~ $ ip -s mr ls 224.66/16
   2454 (193.233.7.65, 224.66.66.66)     Iif: eth0       Oifs: pimreg 
   2455   9383 packets, 300256 bytes
   2456 kuznet@amber:~ $
   2457 \end{verbatim}
   2458 
   2459 
   2460 \section{{\tt ip tunnel} --- tunnel configuration}
   2461 \label{IP-TUNNEL}
   2462 
   2463 \paragraph{Abbreviations:} \verb|tunnel|, \verb|tunl|.
   2464 
   2465 \paragraph{Object:} \verb|tunnel| objects are tunnels, encapsulating
   2466 packets in IPv4 packets and then sending them over the IP infrastructure.
   2467 
   2468 \paragraph{Commands:} \verb|add|, \verb|delete|, \verb|change|, \verb|show|
   2469 (or \verb|list|).
   2470 
   2471 \paragraph{See also:} A more informal discussion of tunneling
   2472 over IP and the \verb|ip tunnel| command can be found in~\cite{IP-TUNNELS}.
   2473 
   2474 \subsection{{\tt ip tunnel add} --- add a new tunnel\\
   2475 	{\tt ip tunnel change} --- change an existing tunnel\\
   2476 	{\tt ip tunnel delete} --- destroy a tunnel}
   2477 
   2478 \paragraph{Abbreviations:} \verb|add|, \verb|a|; \verb|change|, \verb|chg|;
   2479 \verb|delete|, \verb|del|, \verb|d|.
   2480 
   2481 
   2482 \paragraph{Arguments:}
   2483 
   2484 \begin{itemize}
   2485 
   2486 \item \verb|name NAME| (default)
   2487 
   2488 --- select the tunnel device name.
   2489 
   2490 \item \verb|mode MODE|
   2491 
   2492 --- set the tunnel mode. Three modes are currently available:
   2493 	\verb|ipip|, \verb|sit| and \verb|gre|.
   2494 
   2495 \item \verb|remote ADDRESS|
   2496 
   2497 --- set the remote endpoint of the tunnel.
   2498 
   2499 \item \verb|local ADDRESS|
   2500 
   2501 --- set the fixed local address for tunneled packets.
   2502 It must be an address on another interface of this host.
   2503 
   2504 \item \verb|ttl N|
   2505 
   2506 --- set a fixed TTL \verb|N| on tunneled packets.
   2507 	\verb|N| is a number in the range 1--255. 0 is a special value
   2508 	meaning that packets inherit the TTL value. 
   2509 		The default value is: \verb|inherit|.
   2510 
   2511 \item \verb|tos T| or \verb|dsfield T|
   2512 
   2513 --- set a fixed TOS \verb|T| on tunneled packets.
   2514 		The default value is: \verb|inherit|.
   2515 
   2516 
   2517 
   2518 \item \verb|dev NAME| 
   2519 
   2520 --- bind the tunnel to the device \verb|NAME| so that
   2521 	tunneled packets will only be routed via this device and will
   2522 	not be able to escape to another device when the route to endpoint changes.
   2523 
   2524 \item \verb|nopmtudisc|
   2525 
   2526 --- disable Path MTU Discovery on this tunnel.
   2527 	It is enabled by default. Note that a fixed ttl is incompatible
   2528 	with this option: tunnelling with a fixed ttl always makes pmtu discovery.
   2529 
   2530 \item \verb|key K|, \verb|ikey K|, \verb|okey K|
   2531 
   2532 --- (only GRE tunnels) use keyed GRE with key \verb|K|. \verb|K| is
   2533 	either a number or an IP address-like dotted quad.
   2534    The \verb|key| parameter sets the key to use in both directions.
   2535    The \verb|ikey| and \verb|okey| parameters set different keys for input and output.
   2536    
   2537 
   2538 \item \verb|csum|, \verb|icsum|, \verb|ocsum|
   2539 
   2540 --- (only GRE tunnels) generate/require checksums for tunneled packets.
   2541    The \verb|ocsum| flag calculates checksums for outgoing packets.
   2542    The \verb|icsum| flag requires that all input packets have the correct
   2543    checksum. The \verb|csum| flag is equivalent to the combination
   2544   ``\verb|icsum| \verb|ocsum|''.
   2545 
   2546 \item \verb|seq|, \verb|iseq|, \verb|oseq|
   2547 
   2548 --- (only GRE tunnels) serialize packets.
   2549    The \verb|oseq| flag enables sequencing of outgoing packets.
   2550    The \verb|iseq| flag requires that all input packets are serialized.
   2551    The \verb|seq| flag is equivalent to the combination ``\verb|iseq| \verb|oseq|''.
   2552 
   2553 \begin{NB}
   2554  I think this option does not
   2555 	work. At least, I did not test it, did not debug it and
   2556 	do not even understand how it is supposed to work or for what
   2557 	purpose Cisco planned to use it. Do not use it.
   2558 \end{NB}
   2559 
   2560 
   2561 \end{itemize}
   2562 
   2563 \paragraph{Example:} Create a pointopoint IPv6 tunnel with maximal TTL of 32.
   2564 \begin{verbatim}
   2565 netadm@amber:~ # ip tunl add Cisco mode sit remote 192.31.7.104 \
   2566     local 192.203.80.142 ttl 32 
   2567 \end{verbatim}
   2568 
   2569 \subsection{{\tt ip tunnel show} --- list tunnels}
   2570 
   2571 \paragraph{Abbreviations:} \verb|show|, \verb|list|, \verb|sh|, \verb|ls|, \verb|l|.
   2572 
   2573 
   2574 \paragraph{Arguments:} None.
   2575 
   2576 \paragraph{Output format:}
   2577 \begin{verbatim}
   2578 kuznet@amber:~ $ ip tunl ls Cisco
   2579 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
   2580 kuznet@amber:~ $ 
   2581 \end{verbatim}
   2582 The line starts with the tunnel device name followed by a colon.
   2583 Then the tunnel mode follows. The parameters of the tunnel are listed
   2584 with the same keywords that were used when creating the tunnel.
   2585 
   2586 \paragraph{Statistics:}
   2587 
   2588 \begin{verbatim}
   2589 kuznet@amber:~ $ ip -s tunl ls Cisco
   2590 Cisco: ipv6/ip  remote 192.31.7.104  local 192.203.80.142  ttl 32 
   2591 RX: Packets    Bytes        Errors CsumErrs OutOfSeq Mcasts
   2592     12566      1707516      0      0        0        0       
   2593 TX: Packets    Bytes        Errors DeadLoop NoRoute  NoBufs
   2594     13445      1879677      0      0        0        0     
   2595 kuznet@amber:~ $ 
   2596 \end{verbatim}
   2597 Essentially, these numbers are the same as the numbers
   2598 printed with {\tt ip -s link show}
   2599 (sec.\ref{IP-LINK-SHOW}, p.\pageref{IP-LINK-SHOW}) but the tags are different
   2600 to reflect that they are tunnel specific.
   2601 \begin{itemize}
   2602 \item \verb|CsumErrs| --- the total number of packets dropped
   2603 because of checksum failures for a GRE tunnel with checksumming enabled.
   2604 \item \verb|OutOfSeq| --- the total number of packets dropped
   2605 because they arrived out of sequence for a GRE tunnel with
   2606 serialization enabled.
   2607 \item \verb|Mcasts| --- the total number of multicast packets
   2608 received on a broadcast GRE tunnel.
   2609 \item \verb|DeadLoop| --- the total number of packets which were not
   2610 transmitted because the tunnel is looped back to itself.
   2611 \item \verb|NoRoute| --- the total number of packets which were not
   2612 transmitted because there is no IP route to the remote endpoint.
   2613 \item \verb|NoBufs| --- the total number of packets which were not
   2614 transmitted because the kernel failed to allocate a buffer.
   2615 \end{itemize}
   2616 
   2617 
   2618 \section{{\tt ip monitor} and {\tt rtmon} --- state monitoring}
   2619 \label{IP-MONITOR}
   2620 
   2621 The \verb|ip| utility can monitor the state of devices, addresses
   2622 and routes continuously. This option has a slightly different format.
   2623 Namely,
   2624 the \verb|monitor| command is the first in the command line and then
   2625 the object list follows:
   2626 \begin{verbatim}
   2627   ip monitor [ file FILE ] [ all | OBJECT-LIST ] [ label ]
   2628 \end{verbatim}
   2629 \verb|OBJECT-LIST| is the list of object types that we want to
   2630 monitor.  It may contain \verb|link|, \verb|address| and \verb|route|.
   2631 Specifying \verb|label| indicates that output lines should be labelled
   2632 with the type of object being printed --- this happens by default if
   2633 \verb|all| is specified.  If no \verb|file| argument is given,
   2634 \verb|ip| opens RTNETLINK, listens on it and dumps state changes in
   2635 the format described in previous sections.
   2636 
   2637 If a file name is given, it does not listen on RTNETLINK,
   2638 but opens the file containing RTNETLINK messages saved in binary format
   2639 and dumps them. Such a history file can be generated with the
   2640 \verb|rtmon| utility. This utility has a command line syntax similar to
   2641 \verb|ip monitor|.
   2642 Ideally, \verb|rtmon| should be started before
   2643 the first network configuration command is issued. F.e.\ if
   2644 you insert:
   2645 \begin{verbatim}
   2646   rtmon file /var/log/rtmon.log
   2647 \end{verbatim}
   2648 in a startup script, you will be able to view the full history
   2649 later.
   2650 
   2651 Certainly, it is possible to start \verb|rtmon| at any time.
   2652 It prepends the history with the state snapshot dumped at the moment
   2653 of starting.
   2654 
   2655 
   2656 \section{Route realms and policy propagation, {\tt rtacct}}
   2657 \label{RT-REALMS}
   2658 
   2659 On routers using OSPF ASE or, especially, the BGP protocol, routing
   2660 tables may be huge. If we want to classify or to account for the packets
   2661 per route, we will have to keep lots of information. Even worse, if we
   2662 want to distinguish the packets not only by their destination, but
   2663 also by their source, the task gets quadratic complexity and its solution
   2664 is physically impossible.
   2665 
   2666 One approach to propagating the policy from routing protocols
   2667 to the forwarding engine has been proposed in~\cite{IOS-BGP-PP}.
   2668 Essentially, Cisco Policy Propagation via BGP is based on the fact
   2669 that dedicated routers all have the RIB (Routing Information Base)
   2670 close to the forwarding engine, so policy routing rules can
   2671 check all the route attributes, including ASPATH information
   2672 and community strings.
   2673 
   2674 The Linux architecture, splitting the RIB (maintained by a user level
   2675 daemon) and the kernel based FIB (Forwarding Information Base),
   2676 does not allow such a simple approach.
   2677 
   2678 It is to our fortune because there is another solution
   2679 which allows even more flexible policy and richer semantics.
   2680 
   2681 Namely, routes can be clustered together in user space, based on their
   2682 attributes.  F.e.\ a BGP router knows route ASPATH, its community;
   2683 an OSPF router knows the route tag or its area. The administrator, when adding
   2684 routes manually, also knows their nature. Providing that the number of such
   2685 aggregates (we call them {\em realms\/}) is low, the task of full
   2686 classification both by source and destination becomes quite manageable.
   2687 
   2688 So each route may be assigned to a realm. It is assumed that
   2689 this identification is made by a routing daemon, but static routes
   2690 can also be handled manually with \verb|ip route| (see sec.\ref{IP-ROUTE},
   2691 p.\pageref{IP-ROUTE}).
   2692 \begin{NB}
   2693   There is a patch to \verb|gated|, allowing classification of routes
   2694   to realms with all the set of policy rules implemented in \verb|gated|:
   2695   by prefix, by ASPATH, by origin, by tag etc.
   2696 \end{NB}
   2697 
   2698 To facilitate the construction (f.e.\ in case the routing
   2699 daemon is not aware of realms), missing realms may be completed
   2700 with routing policy rules, see sec.~\ref{IP-RULE}, p.\pageref{IP-RULE}.
   2701 
   2702 For each packet the kernel calculates a tuple of realms: source realm
   2703 and destination realm, using the following algorithm:
   2704 
   2705 \begin{enumerate}
   2706 \item If the route has a realm, the destination realm of the packet is set to it.
   2707 \item If the rule has a source realm, the source realm of the packet is set to it.
   2708 If the destination realm was not inherited from the route and the rule has a destination realm,
   2709 it is also set.
   2710 \item If at least one of the realms is still unknown, the kernel finds
   2711 the reversed route to the source of the packet.
   2712 \item If the source realm is still unknown, get it from the reversed route.
   2713 \item If one of the realms is still unknown, swap the realms of reversed
   2714 routes and apply step 2 again.
   2715 \end{enumerate}
   2716 
   2717 After this procedure is completed we know what realm the packet
   2718 arrived from and the realm where it is going to propagate to.
   2719 If some of the realms are unknown, they are initialized to zero
   2720 (or realm \verb|unknown|).
   2721 
   2722 The main application of realms is the TC \verb|route| classifier~\cite{TC-CREF},
   2723 where they are used to help assign packets to traffic classes,
   2724 to account, police and schedule them according to this
   2725 classification.
   2726 
   2727 A much simpler but still very useful application is incoming packet
   2728 accounting by realms. The kernel gathers a packet statistics summary
   2729 which can be viewed with the \verb|rtacct| utility.
   2730 \begin{verbatim}
   2731 kuznet@amber:~ $ rtacct russia
   2732 Realm      BytesTo    PktsTo     BytesFrom  PktsFrom   
   2733 russia     20576778   169176     47080168   153805     
   2734 kuznet@amber:~ $
   2735 \end{verbatim}
   2736 This shows that this router received 153805 packets from
   2737 the realm \verb|russia| and forwarded 169176 packets to \verb|russia|.
   2738 The realm \verb|russia| consists of routes with ASPATHs not leaving
   2739 Russia.
   2740 
   2741 Note that locally originating packets are not accounted here,
   2742 \verb|rtacct| shows incoming packets only. Using the \verb|route|
   2743 classifier (see~\cite{TC-CREF}) you can get even more detailed
   2744 accounting information about outgoing packets, optionally
   2745 summarizing traffic not only by source or destination, but
   2746 by any pair of source and destination realms.
   2747 
   2748 
   2749 \begin{thebibliography}{99}
   2750 \addcontentsline{toc}{section}{References}
   2751 \bibitem{RFC-NDISC} T.~Narten, E.~Nordmark, W.~Simpson.
   2752 ``Neighbor Discovery for IP Version 6 (IPv6)'', RFC-2461.
   2753 
   2754 \bibitem{RFC-ADDRCONF} S.~Thomson, T.~Narten.
   2755 ``IPv6 Stateless Address Autoconfiguration'', RFC-2462.
   2756 
   2757 \bibitem{RFC1812} F.~Baker.
   2758 ``Requirements for IP Version 4 Routers'', RFC-1812.
   2759 
   2760 \bibitem{RFC1122} R.~T.~Braden.
   2761 ``Requirements for Internet hosts --- communication layers'', RFC-1122.
   2762 
   2763 \bibitem{IOS} ``Cisco IOS Release 12.0 Network Protocols
   2764 Command Reference, Part 1'' and
   2765 ``Cisco IOS Release 12.0 Quality of Service Solutions
   2766 Configuration Guide: Configuring Policy-Based Routing'',\\
   2767 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
   2768 
   2769 \bibitem{IP-TUNNELS} A.~N.~Kuznetsov.
   2770 ``Tunnels over IP in Linux-2.2'', \\
   2771 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
   2772 
   2773 \bibitem{TC-CREF} A.~N.~Kuznetsov. ``TC Command Reference'',\\
   2774 In: {\tt ftp://ftp.inr.ac.ru/ip-routing/iproute2-current.tar.gz}.
   2775 
   2776 \bibitem{IOS-BGP-PP} ``Cisco IOS Release 12.0 Quality of Service Solutions
   2777 Configuration Guide: Configuring QoS Policy Propagation via
   2778 Border Gateway Protocol'',\\
   2779 http://www.cisco.com/univercd/cc/td/doc/product/software/ios120.
   2780 
   2781 \bibitem{RFC-DHCP} R.~Droms.
   2782 ``Dynamic Host Configuration Protocol.'', RFC-2131
   2783 
   2784 \bibitem{RFC2414}  M.~Allman, S.~Floyd, C.~Partridge.
   2785 ``Increasing TCP's Initial Window'', RFC-2414.
   2786 
   2787 \end{thebibliography}
   2788 
   2789 
   2790 
   2791 
   2792 \appendix
   2793 \addcontentsline{toc}{section}{Appendix}
   2794 
   2795 \section{Source address selection}
   2796 \label{ADDR-SEL}
   2797 
   2798 When a host creates an IP packet, it must select some source
   2799 address. Correct source address selection is a critical procedure,
   2800 because it gives the receiver the information needed to deliver a
   2801 reply. If the source is selected incorrectly, in the best case,
   2802 the backward path may appear different to the forward one which
   2803 is harmful for performance. In the worst case, when the addresses
   2804 are administratively scoped, the reply may be lost entirely.
   2805 
   2806 Linux-2.2 selects source addresses using the following algorithm:
   2807 
   2808 \begin{itemize}
   2809 \item
   2810 The application may select a source address explicitly with \verb|bind(2)|
   2811 syscall or supplying it to \verb|sendmsg(2)| via the ancillary data object
   2812 \verb|IP_PKTINFO|. In this case the kernel only checks the validity
   2813 of the address and never tries to ``improve'' an incorrect user choice,
   2814 generating an error instead.
   2815 \begin{NB}
   2816  Never say ``Never''. The sysctl option \verb|ip_dynaddr| breaks
   2817  this axiom. It has been made deliberately with the purpose
   2818  of automatically reselecting the address on hosts with dynamic dial-out interfaces.
   2819  However, this hack {\em must not\/} be used on multihomed hosts
   2820  and especially on routers: it would break them.
   2821 \end{NB}
   2822 
   2823 
   2824 \item Otherwise, IP routing tables can contain an explicit source
   2825 address hint for this destination. The hint is set with the \verb|src| parameter
   2826 to the \verb|ip route| command, sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}.
   2827 
   2828 
   2829 \item Otherwise, the kernel searches through the list of addresses
   2830 attached to the interface through which the packets will be routed.
   2831 The search strategies are different for IP and IPv6. Namely:
   2832 
   2833 \begin{itemize}
   2834 \item IPv6 searches for the first valid, not deprecated address
   2835 with the same scope as the destination.
   2836 
   2837 \item IP searches for the first valid address with a scope wider
   2838 than the scope of the destination but it prefers addresses
   2839 which fall to the same subnet as the nexthop of the route
   2840 to the destination. Unlike IPv6, the scopes of IPv4 destinations
   2841 are not encoded in their addresses but are supplied
   2842 in routing tables instead (the \verb|scope| parameter to the \verb|ip route| command,
   2843 sec.\ref{IP-ROUTE}, p.\pageref{IP-ROUTE}).
   2844 
   2845 \end{itemize}
   2846 
   2847 
   2848 \item Otherwise, if the scope of the destination is \verb|link| or \verb|host|,
   2849 the algorithm fails and returns a zero source address.
   2850 
   2851 \item Otherwise, all interfaces are scanned to search for an address
   2852 with an appropriate scope. The loopback device \verb|lo| is always the first
   2853 in the search list, so that if an address with global scope (not 127.0.0.1!)
   2854 is configured on loopback, it is always preferred.
   2855 
   2856 \end{itemize}
   2857 
   2858 
   2859 \section{Proxy ARP/NDISC}
   2860 \label{PROXY-NEIGH}
   2861 
   2862 Routers may answer ARP/NDISC solicitations on behalf of other hosts.
   2863 In Linux-2.2 proxy ARP on an interface may be enabled
   2864 by setting the kernel \verb|sysctl| variable 
   2865 \verb|/proc/sys/net/ipv4/conf/<dev>/proxy_arp| to 1. After this, the router
   2866 starts to answer ARP requests on the interface \verb|<dev>|, provided
   2867 the route to the requested destination does {\em not\/} go back via the same
   2868 device.
   2869 
   2870 The variable \verb|/proc/sys/net/ipv4/conf/all/proxy_arp| enables proxy
   2871 ARP on all the IP devices.
   2872 
   2873 However, this approach fails in the case of IPv6 because the router
   2874 must join the solicited node multicast address to listen for the corresponding
   2875 NDISC queries. It means that proxy NDISC is possible only on a per destination
   2876 basis.
   2877 
   2878 Logically, proxy ARP/NDISC is not a kernel task. It can easily be implemented
   2879 in user space. However, similar functionality was present in BSD kernels
   2880 and in Linux-2.0, so we have to preserve it at least to the extent that
   2881 is standardized in BSD.
   2882 \begin{NB}
   2883   Linux-2.0 ARP had a feature called {\em subnet\/} proxy ARP.
   2884   It is replaced with the sysctl flag in Linux-2.2.
   2885 \end{NB}
   2886 
   2887 
   2888 The \verb|ip| utility provides a way to manage proxy ARP/NDISC
   2889 with the \verb|ip neigh| command, namely:
   2890 \begin{verbatim}
   2891   ip neigh add proxy ADDRESS [ dev NAME ]
   2892 \end{verbatim}
   2893 adds a new proxy ARP/NDISC record and
   2894 \begin{verbatim}
   2895   ip neigh del proxy ADDRESS [ dev NAME ]
   2896 \end{verbatim}
   2897 deletes it.
   2898 
   2899 If the name of the device is not given, the router will answer solicitations
   2900 for address \verb|ADDRESS| on all devices, otherwise it will only serve
   2901 the device \verb|NAME|. Even if the proxy entry is created with
   2902 \verb|ip neigh|, the router {\em will not\/} answer a query if the route
   2903 to the destination goes back via the interface from which the solicitation
   2904 was received.
   2905 
   2906 It is important to emphasize that proxy entries have {\em no\/}
   2907 parameters other than these (IP/IPv6 address and optional device).
   2908 Particularly, the entry does not store any link layer address.
   2909 It always advertises the station address of the interface
   2910 on which it sends advertisements (i.e. it's own station address).
   2911 
   2912 \section{Route NAT status}
   2913 \label{ROUTE-NAT}
   2914 
   2915 NAT (or ``Network Address Translation'') remaps some parts
   2916 of the IP address space into other ones. Linux-2.2 route NAT is supposed
   2917 to be used to facilitate policy routing by rewriting addresses
   2918 to other routing domains or to help while renumbering sites
   2919 to another prefix.
   2920 
   2921 \paragraph{What it is not:}
   2922 It is necessary to emphasize that {\em it is not supposed\/}
   2923 to be used to compress address space or to split load.
   2924 This is not missing functionality but a design principle.
   2925 Route NAT is {\em stateless\/}. It does not hold any state
   2926 about translated sessions. This means that it handles any number
   2927 of sessions flawlessly. But it also means that it is {\em static\/}.
   2928 It cannot detect the moment when the last TCP client stops
   2929 using an address. For the same reason, it will not help to split
   2930 load between several servers.
   2931 \begin{NB}
   2932 It is a pretty commonly held belief that it is useful to split load between
   2933 several servers with NAT. This is a mistake. All you get from this
   2934 is the requirement that the router keep the state of all the TCP connections
   2935 going via it. Well, if the router is so powerful, run apache on it. 8)
   2936 \end{NB}
   2937 
   2938 The second feature: it does not touch packet payload,
   2939 does not try to ``improve'' broken protocols by looking
   2940 through its data and mangling it. It mangles IP addresses,
   2941 only IP addresses and nothing but IP addresses.
   2942 This also, is not missing any functionality.
   2943 
   2944 To resume: if you need to compress address space or keep
   2945 active FTP clients happy, your choice is not route NAT but masquerading,
   2946 port forwarding, NAPT etc. 
   2947 \begin{NB}
   2948 By the way, you may also want to look at
   2949 http://www.suse.com/\~mha/HyperNews/get/linux-ip-nat.html
   2950 \end{NB}
   2951 
   2952 
   2953 \paragraph{How it works.}
   2954 Some part of the address space is reserved for dummy addresses
   2955 which will look for all the world like some host addresses
   2956 inside your network. No other hosts may use these addresses,
   2957 however other routers may also be configured to translate them.
   2958 \begin{NB}
   2959 A great advantage of route NAT is that it may be used not
   2960 only in stub networks but in environments with arbitrarily complicated
   2961 structure. It does not firewall, it {\em forwards.}
   2962 \end{NB}
   2963 These addresses are selected by the \verb|ip route| command
   2964 (sec.\ref{IP-ROUTE-ADD}, p.\pageref{IP-ROUTE-ADD}). F.e.\
   2965 \begin{verbatim}
   2966   ip route add nat 192.203.80.144 via 193.233.7.83
   2967 \end{verbatim}
   2968 states that the single address 192.203.80.144 is a dummy NAT address.
   2969 For all the world it looks like a host address inside our network.
   2970 For neighbouring hosts and routers it looks like the local address
   2971 of the translating router. The router answers ARP for it, advertises
   2972 this address as routed via it, {\em et al\/}. When the router
   2973 receives a packet destined for 192.203.80.144, it replaces 
   2974 this address with 193.233.7.83 which is the address of some real
   2975 host and forwards the packet. If you need to remap
   2976 blocks of addresses, you may use a command like:
   2977 \begin{verbatim}
   2978   ip route add nat 192.203.80.192/26 via 193.233.7.64
   2979 \end{verbatim}
   2980 This command will map a block of 63 addresses 192.203.80.192-255 to
   2981 193.233.7.64-127.
   2982 
   2983 When an internal host (193.233.7.83 in the example above)
   2984 sends something to the outer world and these packets are forwarded
   2985 by our router, it should translate the source address 193.233.7.83
   2986 into 192.203.80.144. This task is solved by setting a special
   2987 policy rule (sec.\ref{IP-RULE-ADD}, p.\pageref{IP-RULE-ADD}):
   2988 \begin{verbatim}
   2989   ip rule add prio 320 from 193.233.7.83 nat 192.203.80.144
   2990 \end{verbatim}
   2991 This rule says that the source address 193.233.7.83
   2992 should be translated into 192.203.80.144 before forwarding.
   2993 It is important that the address after the \verb|nat| keyword
   2994 is some NAT address, declared by {\tt ip route add nat}.
   2995 If it is just a random address the router will not map to it.
   2996 \begin{NB}
   2997 The exception is when the address is a local address of this
   2998 router (or 0.0.0.0) and masquerading is configured in the linux-2.2
   2999 kernel. In this case the router will masquerade the packets as this address.
   3000 If 0.0.0.0 is selected, the result is equivalent to one
   3001 obtained with firewalling rules. Otherwise, you have the way
   3002 to order Linux to masquerade to this fixed address.
   3003 NAT mechanism used in linux-2.4 is more flexible than
   3004 masquerading, so that this feature has lost meaning and disabled.
   3005 \end{NB}
   3006 
   3007 If the network has non-trivial internal structure, it is
   3008 useful and even necessary to add rules disabling translation
   3009 when a packet does not leave this network. Let us return to the
   3010 example from sec.\ref{IP-RULE-SHOW} (p.\pageref{IP-RULE-SHOW}).
   3011 \begin{verbatim}
   3012 300:	from 193.233.7.83 to 193.233.7.0/24 lookup main
   3013 310:	from 193.233.7.83 to 192.203.80.0/24 lookup main
   3014 320:	from 193.233.7.83 lookup inr.ruhep map-to 192.203.80.144
   3015 \end{verbatim}
   3016 This block of rules causes normal forwarding when
   3017 packets from 193.233.7.83 do not leave networks 193.233.7/24
   3018 and 192.203.80/24. Also, if the \verb|inr.ruhep| table does not
   3019 contain a route to the destination (which means that the routing
   3020 domain owning addresses from 192.203.80/24 is dead), no translation
   3021 will occur. Otherwise, the packets are translated.
   3022 
   3023 \paragraph{How to only translate selected ports:}
   3024 If you only want to translate selected ports (f.e.\ http)
   3025 and leave the rest intact, you may use \verb|ipchains|
   3026 to \verb|fwmark| a class of packets.
   3027 Suppose you did and all the packets from 193.233.7.83
   3028 destined for port 80 are marked with marker 0x1234 in input fwchain.
   3029 In this case you may replace rule \#320 with:
   3030 \begin{verbatim}
   3031 320:	from 193.233.7.83 fwmark 1234 lookup main map-to 192.203.80.144
   3032 \end{verbatim}
   3033 and translation will only be enabled for outgoing http requests.
   3034 
   3035 \section{Example: minimal host setup}
   3036 \label{EXAMPLE-SETUP}
   3037 
   3038 The following script gives an example of a fault safe
   3039 setup of IP (and IPv6, if it is compiled into the kernel)
   3040 in the common case of a node attached to a single broadcast
   3041 network. A more advanced script, which may be used both on multihomed
   3042 hosts and on routers, is described in the following
   3043 section.
   3044 
   3045 The utilities used in the script may be found in the
   3046 directory ftp://ftp.inr.ac.ru/ip-routing/:
   3047 \begin{enumerate}
   3048 \item \verb|ip| --- package \verb|iproute2|.
   3049 \item \verb|arping| --- package \verb|iputils|.
   3050 \item \verb|rdisc| --- package \verb|iputils|.
   3051 \end{enumerate}
   3052 \begin{NB}
   3053 It also refers to a DHCP client, \verb|dhcpcd|. I should refrain from
   3054 recommending a good DHCP client to use. All that I can
   3055 say is that ISC \verb|dhcp-2.0b1pl6| patched with the patch that
   3056 can be found in the \verb|dhcp.bootp.rarp| subdirectory of
   3057 the same ftp site {\em does\/} work,
   3058 at least on Ethernet and Token Ring.
   3059 \end{NB}
   3060 
   3061 \begin{verbatim}
   3062 #! /bin/bash
   3063 \end{verbatim}
   3064 \begin{flushleft}
   3065 \# {\bf Usage: \verb|ifone ADDRESS[/PREFIX-LENGTH] [DEVICE]|}\\
   3066 \# {\bf Parameters:}\\
   3067 \# \$1 --- Static IP address, optionally followed by prefix length.\\
   3068 \# \$2 --- Device name. If it is missing, \verb|eth0| is asssumed.\\
   3069 \# F.e. \verb|ifone 193.233.7.90|
   3070 \end{flushleft}
   3071 \begin{verbatim}
   3072 dev=$2
   3073 : ${dev:=eth0}
   3074 ipaddr=
   3075 \end{verbatim}
   3076 \# Parse IP address, splitting prefix length.
   3077 \begin{verbatim}
   3078 if [ "$1" != "" ]; then
   3079   ipaddr=${1%/*}
   3080   if [ "$1" != "$ipaddr" ]; then
   3081     pfxlen=${1#*/}
   3082   fi
   3083   : ${pfxlen:=24}
   3084 fi
   3085 pfx="${ipaddr}/${pfxlen}"
   3086 \end{verbatim}
   3087 
   3088 \begin{flushleft}
   3089 \# {\bf Step 0} --- enable loopback.\\
   3090 \#\\
   3091 \# This step is necessary on any networked box before attempt\\
   3092 \# to configure any other device.\\
   3093 \end{flushleft}
   3094 \begin{verbatim}
   3095 ip link set up dev lo
   3096 ip addr add 127.0.0.1/8 dev lo brd + scope host
   3097 \end{verbatim}
   3098 \begin{flushleft}
   3099 \# IPv6 autoconfigure themself on loopback.\\
   3100 \#\\
   3101 \# If user gave loopback as device, we add the address as alias and exit.
   3102 \end{flushleft}
   3103 \begin{verbatim}
   3104 if [ "$dev" = "lo" ]; then
   3105   if [ "$ipaddr" != "" -a  "$ipaddr" != "127.0.0.1" ]; then
   3106     ip address add $ipaddr dev $dev
   3107     exit $?
   3108   fi
   3109   exit 0
   3110 fi
   3111 \end{verbatim}
   3112 
   3113 \noindent\# {\bf Step 1} --- enable device \verb|$dev|
   3114 
   3115 \begin{verbatim}
   3116 if ! ip link set up dev $dev ; then
   3117   echo "Cannot enable interface $dev. Aborting." 1>&2
   3118   exit 1
   3119 fi
   3120 \end{verbatim}
   3121 \begin{flushleft}
   3122 \# The interface is \verb|UP|. IPv6 started stateless autoconfiguration itself,\\
   3123 \# and its configuration finishes here. However,\\
   3124 \# IP still needs some static preconfigured address.
   3125 \end{flushleft}
   3126 \begin{verbatim}
   3127 if [ "$ipaddr" = "" ]; then
   3128   echo "No address for $dev is configured, trying DHCP..." 1>&2
   3129   dhcpcd
   3130   exit $?
   3131 fi
   3132 \end{verbatim}
   3133 
   3134 \begin{flushleft}
   3135 \# {\bf Step 2} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
   3136 \# Send two probes and wait for result for 3 seconds.\\
   3137 \# If the interface opens slower f.e.\ due to long media detection,\\
   3138 \# you want to increase the timeout.\\
   3139 \end{flushleft}
   3140 \begin{verbatim}
   3141 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
   3142   echo "Address $ipaddr is busy, trying DHCP..." 1>&2
   3143   dhcpcd
   3144   exit $?
   3145 fi
   3146 \end{verbatim}
   3147 \begin{flushleft}
   3148 \# OK, the address is unique, we may add it on the interface.\\
   3149 \#\\
   3150 \# {\bf Step 3} --- Configure the address on the interface.
   3151 \end{flushleft}
   3152 
   3153 \begin{verbatim}
   3154 if ! ip address add $pfx brd + dev $dev; then
   3155   echo "Failed to add $pfx on $dev, trying DHCP..." 1>&2
   3156   dhcpcd
   3157   exit $?
   3158 fi
   3159 \end{verbatim}
   3160 
   3161 \noindent\# {\bf Step 4} --- Announce our presence on the link.
   3162 \begin{verbatim}
   3163 arping -A -c 1 -I $dev $ipaddr
   3164 noarp=$?
   3165 ( sleep 2;
   3166   arping -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
   3167 \end{verbatim}
   3168 
   3169 \begin{flushleft}
   3170 \# {\bf Step 5} (optional) --- Add some control routes.\\
   3171 \#\\
   3172 \# 1. Prohibit link local multicast addresses.\\
   3173 \# 2. Prohibit link local (alias, limited) broadcast.\\
   3174 \# 3. Add default multicast route.
   3175 \end{flushleft}
   3176 \begin{verbatim}
   3177 ip route add unreachable 224.0.0.0/24 
   3178 ip route add unreachable 255.255.255.255
   3179 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
   3180   ip route add 224.0.0.0/4 dev $dev scope global
   3181 fi
   3182 \end{verbatim}
   3183 
   3184 \begin{flushleft}
   3185 \# {\bf Step 6} --- Add fallback default route with huge metric.\\
   3186 \# If a proxy ARP server is present on the interface, we will be\\
   3187 \# able to talk to all the Internet without further configuration.\\
   3188 \# It is not so cheap though and we still hope that this route\\
   3189 \# will be overridden by more correct one by rdisc.\\
   3190 \# Do not make this step if the device is not ARPable,\\
   3191 \# because dead nexthop detection does not work on them.
   3192 \end{flushleft}
   3193 \begin{verbatim}
   3194 if [ "$noarp" = "0" ]; then
   3195   ip ro add default dev $dev metric 30000 scope global
   3196 fi
   3197 \end{verbatim}
   3198 
   3199 \begin{flushleft}
   3200 \# {\bf Step 7} --- Restart router discovery and exit.
   3201 \end{flushleft}
   3202 \begin{verbatim}
   3203 killall -HUP rdisc || rdisc -fs
   3204 exit 0
   3205 \end{verbatim}
   3206 
   3207 
   3208 \section{Example: {\protect\tt ifcfg} --- interface address management}
   3209 \label{EXAMPLE-IFCFG}
   3210 
   3211 This is a simplistic script replacing one option of \verb|ifconfig|,
   3212 namely, IP address management. It not only adds
   3213 addresses, but also carries out Duplicate Address Detection~\cite{RFC-DHCP},
   3214 sends unsolicited ARP to update the caches of other hosts sharing
   3215 the interface, adds some control routes and restarts Router Discovery
   3216 when it is necessary.
   3217 
   3218 I strongly recommend using it {\em instead\/} of \verb|ifconfig| both
   3219 on hosts and on routers.
   3220 
   3221 \begin{verbatim}
   3222 #! /bin/bash
   3223 \end{verbatim}
   3224 \begin{flushleft}
   3225 \# {\bf Usage: \verb?ifcfg DEVICE[:ALIAS] [add|del] ADDRESS[/LENGTH] [PEER]?}\\
   3226 \# {\bf Parameters:}\\
   3227 \# ---Device name. It may have alias suffix, separated by colon.\\
   3228 \# ---Command: add, delete or stop.\\
   3229 \# ---IP address, optionally followed by prefix length.\\
   3230 \# ---Optional peer address for pointopoint interfaces.\\
   3231 \# F.e. \verb|ifcfg eth0 193.233.7.90/24|
   3232 
   3233 \noindent\# This function determines, whether it is router or host.\\
   3234 \# It returns 0, if the host is apparently not router.
   3235 \end{flushleft}
   3236 \begin{verbatim}
   3237 CheckForwarding () {
   3238   local sbase fwd
   3239   sbase=/proc/sys/net/ipv4/conf
   3240   fwd=0
   3241   if [ -d $sbase ]; then
   3242     for dir in $sbase/*/forwarding; do
   3243       fwd=$[$fwd + `cat $dir`]
   3244     done
   3245   else
   3246     fwd=2
   3247   fi
   3248   return $fwd
   3249 }
   3250 \end{verbatim}
   3251 \begin{flushleft}
   3252 \# This function restarts Router Discovery.\\
   3253 \end{flushleft}
   3254 \begin{verbatim}
   3255 RestartRDISC () {
   3256   killall -HUP rdisc || rdisc -fs
   3257 }
   3258 \end{verbatim}
   3259 \begin{flushleft}
   3260 \# Calculate ABC "natural" mask length\\
   3261 \# Arg: \$1 = dotquad address
   3262 \end{flushleft}
   3263 \begin{verbatim}
   3264 ABCMaskLen () {
   3265   local class;
   3266   class=${1%%.*}
   3267   if [ $class -eq 0 -o $class -ge 224 ]; then return 0
   3268   elif [ $class -ge 192 ]; then return 24
   3269   elif [ $class -ge 128 ]; then return 16
   3270   else  return 8 ; fi
   3271 }
   3272 \end{verbatim}
   3273 
   3274 
   3275 \begin{flushleft}
   3276 \# {\bf MAIN()}\\
   3277 \#\\
   3278 \# Strip alias suffix separated by colon.
   3279 \end{flushleft}
   3280 \begin{verbatim}
   3281 label="label $1"
   3282 ldev=$1
   3283 dev=${1%:*}
   3284 if [ "$dev" = "" -o "$1" = "help" ]; then
   3285   echo "Usage: ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
   3286   echo "       add - add new address" 1>&2
   3287   echo "       del - delete address" 1>&2
   3288   echo "       stop - completely disable IP" 1>&2
   3289   exit 1
   3290 fi
   3291 shift
   3292 
   3293 CheckForwarding
   3294 fwd=$?
   3295 \end{verbatim}
   3296 \begin{flushleft}
   3297 \# Parse command. If it is ``stop'', flush and exit.
   3298 \end{flushleft}
   3299 \begin{verbatim}
   3300 deleting=0
   3301 case "$1" in
   3302 add) shift ;;
   3303 stop)
   3304   if [ "$ldev" != "$dev" ]; then
   3305     echo "Cannot stop alias $ldev" 1>&2
   3306     exit 1;
   3307   fi
   3308   ip -4 addr flush dev $dev $label || exit 1
   3309   if [ $fwd -eq 0 ]; then RestartRDISC; fi
   3310   exit 0 ;;
   3311 del*)
   3312   deleting=1; shift ;;
   3313 *)
   3314 esac
   3315 \end{verbatim}
   3316 \begin{flushleft}
   3317 \# Parse prefix, split prefix length, separated by slash.
   3318 \end{flushleft}
   3319 \begin{verbatim}
   3320 ipaddr=
   3321 pfxlen=
   3322 if [ "$1" != "" ]; then
   3323   ipaddr=${1%/*}
   3324   if [ "$1" != "$ipaddr" ]; then
   3325     pfxlen=${1#*/}
   3326   fi
   3327   if [ "$ipaddr" = "" ]; then
   3328     echo "$1 is bad IP address." 1>&2
   3329     exit 1
   3330   fi
   3331 fi
   3332 shift
   3333 \end{verbatim}
   3334 \begin{flushleft}
   3335 \# If peer address is present, prefix length is 32.\\
   3336 \# Otherwise, if prefix length was not given, guess it.
   3337 \end{flushleft}
   3338 \begin{verbatim}
   3339 peer=$1
   3340 if [ "$peer" != "" ]; then
   3341   if [ "$pfxlen" != "" -a "$pfxlen" != "32" ]; then
   3342     echo "Peer address with non-trivial netmask." 1>&2
   3343     exit 1
   3344   fi
   3345   pfx="$ipaddr peer $peer"
   3346 else
   3347   if [ "$pfxlen" = "" ]; then
   3348     ABCMaskLen $ipaddr
   3349     pfxlen=$?
   3350   fi
   3351   pfx="$ipaddr/$pfxlen"
   3352 fi
   3353 if [ "$ldev" = "$dev" -a "$ipaddr" != "" ]; then
   3354   label=
   3355 fi
   3356 \end{verbatim}
   3357 \begin{flushleft}
   3358 \# If deletion was requested, delete the address and restart RDISC
   3359 \end{flushleft}
   3360 \begin{verbatim}
   3361 if [ $deleting -ne 0 ]; then
   3362   ip addr del $pfx dev $dev $label || exit 1
   3363   if [ $fwd -eq 0 ]; then RestartRDISC; fi
   3364   exit 0
   3365 fi
   3366 \end{verbatim}
   3367 \begin{flushleft}
   3368 \# Start interface initialization.\\
   3369 \#\\
   3370 \# {\bf Step 0} --- enable device \verb|$dev|
   3371 \end{flushleft}
   3372 \begin{verbatim}
   3373 if ! ip link set up dev $dev ; then
   3374   echo "Error: cannot enable interface $dev." 1>&2
   3375   exit 1
   3376 fi
   3377 if [ "$ipaddr" = "" ]; then exit 0; fi
   3378 \end{verbatim}
   3379 \begin{flushleft}
   3380 \# {\bf Step 1} --- IP Duplicate Address Detection~\cite{RFC-DHCP}.\\
   3381 \# Send two probes and wait for result for 3 seconds.\\
   3382 \# If the interface opens slower f.e.\ due to long media detection,\\
   3383 \# you want to increase the timeout.\\
   3384 \end{flushleft}
   3385 \begin{verbatim}
   3386 if ! arping -q -c 2 -w 3 -D -I $dev $ipaddr ; then
   3387   echo "Error: some host already uses address $ipaddr on $dev." 1>&2
   3388   exit 1
   3389 fi
   3390 \end{verbatim}
   3391 \begin{flushleft}
   3392 \# OK, the address is unique. We may add it to the interface.\\
   3393 \#\\
   3394 \# {\bf Step 2} --- Configure the address on the interface.
   3395 \end{flushleft}
   3396 \begin{verbatim}
   3397 if ! ip address add $pfx brd + dev $dev $label; then
   3398   echo "Error: failed to add $pfx on $dev." 1>&2
   3399   exit 1
   3400 fi
   3401 \end{verbatim}
   3402 \noindent\# {\bf Step 3} --- Announce our presence on the link
   3403 \begin{verbatim}
   3404 arping -q -A -c 1 -I $dev $ipaddr
   3405 noarp=$?
   3406 ( sleep 2 ;
   3407   arping -q -U -c 1 -I $dev $ipaddr ) >& /dev/null </dev/null &
   3408 \end{verbatim}
   3409 \begin{flushleft}
   3410 \# {\bf Step 4} (optional) --- Add some control routes.\\
   3411 \#\\
   3412 \# 1. Prohibit link local multicast addresses.\\
   3413 \# 2. Prohibit link local (alias, limited) broadcast.\\
   3414 \# 3. Add default multicast route.
   3415 \end{flushleft}
   3416 \begin{verbatim}
   3417 ip route add unreachable 224.0.0.0/24 >& /dev/null 
   3418 ip route add unreachable 255.255.255.255 >& /dev/null
   3419 if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
   3420   ip route add 224.0.0.0/4 dev $dev scope global >& /dev/null
   3421 fi
   3422 \end{verbatim}
   3423 \begin{flushleft}
   3424 \# {\bf Step 5} --- Add fallback default route with huge metric.\\
   3425 \# If a proxy ARP server is present on the interface, we will be\\
   3426 \# able to talk to all the Internet without further configuration.\\
   3427 \# Do not make this step on router or if the device is not ARPable.\\
   3428 \# because dead nexthop detection does not work on them.
   3429 \end{flushleft}
   3430 \begin{verbatim}
   3431 if [ $fwd -eq 0 ]; then
   3432   if [ $noarp -eq 0 ]; then
   3433     ip ro append default dev $dev metric 30000 scope global
   3434   elif [ "$peer" != "" ]; then
   3435     if ping -q -c 2 -w 4 $peer ; then
   3436       ip ro append default via $peer dev $dev metric 30001
   3437     fi
   3438   fi
   3439   RestartRDISC
   3440 fi
   3441 
   3442 exit 0
   3443 \end{verbatim}
   3444 \begin{flushleft}
   3445 \# End of {\bf MAIN()}
   3446 \end{flushleft}
   3447 
   3448 
   3449 \end{document}
   3450