Home | History | Annotate | Download | only in doc
      1 \documentstyle[12pt,twoside]{article}
      2 \def\TITLE{IPv6 Flow Labels}
      3 \input preamble
      4 \begin{center}
      5 \Large\bf IPv6 Flow Labels in Linux-2.2.
      6 \end{center}
      7 
      8 
      9 \begin{center}
     10 { \large Alexey~N.~Kuznetsov } \\
     11 \em Institute for Nuclear Research, Moscow \\
     12 \verb|kuznet (a] ms2.inr.ac.ru| \\
     13 \rm April 11, 1999
     14 \end{center}
     15 
     16 \vspace{5mm}
     17 
     18 \tableofcontents
     19 
     20 \section{Introduction.}
     21 
     22 Every IPv6 packet carries 28 bits of flow information. RFC2460 splits
     23 these bits to two fields: 8 bits of traffic class (or DS field, if you
     24 prefer this term) and 20 bits of flow label. Currently there exist
     25 no well-defined API to manage IPv6 flow information. In this document
     26 I describe an attempt to design the API for Linux-2.2 IPv6 stack.
     27 
     28 \vskip 1mm
     29 
     30 The API must solve the following tasks:
     31 
     32 \begin{enumerate}
     33 
     34 \item To allow user to set traffic class bits.
     35 
     36 \item To allow user to read traffic class bits of received packets.
     37 This feature is not so useful as the first one, however it will be
     38 necessary f.e.\ to implement ECN [RFC2481] for datagram oriented services
     39 or to implement receiver side of SRP or another end-to-end protocol
     40 using traffic class bits.
     41 
     42 \item To assign flow labels to packets sent by user.
     43 
     44 \item To get flow labels of received packets. I do not know
     45 any applications of this feature, but it is possible that receiver will
     46 want to use flow labels to distinguish sub-flows.
     47 
     48 \item To allocate flow labels in the way, compliant to RFC2460. Namely:
     49 
     50 \begin{itemize}
     51 \item
     52 Flow labels must be uniformly distributed (pseudo-)random numbers,
     53 so that any subset of 20 bits can be used as hash key.
     54 
     55 \item
     56 Flows with coinciding source address and flow label must have identical
     57 destination address and not-fragmentable extensions headers (i.e.\ 
     58 hop by hop options and all the headers up to and including routing header,
     59 if it is present.)
     60 
     61 \begin{NB}
     62 There is a hole in specs: some hop-by-hop options can be
     63 defined only on per-packet base (f.e.\  jumbo payload option).
     64 Essentially, it means that such options cannot present in packets
     65 with flow labels.
     66 \end{NB}
     67 \begin{NB}
     68 NB notes here and below reflect only my personal opinion,
     69 they should be read with smile or should not be read at all :-).
     70 \end{NB}
     71 
     72 
     73 \item
     74 Flow labels have finite lifetime and source is not allowed to reuse
     75 flow label for another flow within the maximal lifetime has expired,
     76 so that intermediate nodes will be able to invalidate flow state before
     77 the label is taken over by another flow.
     78 Flow state, including lifetime, is propagated along datagram path
     79 by some application specific methods
     80 (f.e.\ in RSVP PATH messages or in some hop-by-hop option).
     81 
     82 
     83 \end{itemize}
     84 
     85 \end{enumerate}
     86 
     87 \section{Sending/receiving flow information.}
     88 
     89 \paragraph{Discussion.}
     90 \addcontentsline{toc}{subsection}{Discussion}
     91 It was proposed (Where? I do not remember any explicit statement)
     92 to solve the first four tasks using
     93 \verb|sin6_flowinfo| field added to \verb|struct| \verb|sockaddr_in6|
     94 (see RFC2553).
     95 
     96 \begin{NB}
     97 	This method is difficult to consider as reasonable, because it
     98 	puts additional overhead to all the services, despite of only
     99 	very small subset of them (none, to be more exact) really use it.
    100 	It contradicts both to IETF spirit and the letter. Before RFC2553
    101 	one justification existed, IPv6 address alignment left 4 byte
    102 	hole in \verb|sockaddr_in6| in any case. Now it has no justification.
    103 \end{NB}
    104 
    105 We have two problems with this method. The first one is common for all OSes:
    106 if \verb|recvmsg()| initializes \verb|sin6_flowinfo| to flow info
    107 of received packet, we loose one very important property of BSD socket API,
    108 namely, we are not allowed to use received address for reply directly
    109 and have to mangle it, even if we are not interested in flowinfo subtleties.
    110 
    111 \begin{NB}
    112 	RFC2553 adds new requirement: to clear \verb|sin6_flowinfo|.
    113 	Certainly, it is not solution but rather attempt to force applications
    114 	to make unnecessary work. Well, as usually, one mistake in design
    115 	is followed by attempts	to patch the hole and more mistakes...
    116 \end{NB}
    117 
    118 Another problem is Linux specific. Historically Linux IPv6 did not
    119 initialize \verb|sin6_flowinfo| at all, so that, if kernel does not
    120 support flow labels, this field is not zero, but a random number.
    121 Some applications also did not take care about it. 
    122 
    123 \begin{NB}
    124 Following RFC2553 such applications can be considered as broken,
    125 but I still think that they are right: clearing all the address
    126 before filling known fields is robust but stupid solution.
    127 Useless wasting CPU cycles and
    128 memory bandwidth is not a good idea. Such patches are acceptable
    129 as temporary hacks, but not as standard of the future.
    130 \end{NB}
    131 
    132 
    133 \paragraph{Implementation.}
    134 \addcontentsline{toc}{subsection}{Implementation}
    135 By default Linux IPv6 does not read \verb|sin6_flowinfo| field
    136 assuming that common applications are not obliged to initialize it
    137 and are permitted to consider it as pure alignment padding.
    138 In order to tell kernel that application
    139 is aware of this field, it is necessary to set socket option
    140 \verb|IPV6_FLOWINFO_SEND|.
    141 
    142 \begin{verbatim}
    143   int on = 1;
    144   setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO_SEND,
    145              (void*)&on, sizeof(on));
    146 \end{verbatim}
    147 
    148 Linux kernel never fills \verb|sin6_flowinfo| field, when passing
    149 message to user space, though the kernels which support flow labels
    150 initialize it to zero. If user wants to get received flowinfo, he
    151 will set option \verb|IPV6_FLOWINFO| and after this he will receive
    152 flowinfo as ancillary data object of type \verb|IPV6_FLOWINFO|
    153 (cf.\ RFC2292).
    154 
    155 \begin{verbatim}
    156   int on = 1;
    157   setsockopt(sock, SOL_IPV6, IPV6_FLOWINFO, (void*)&on, sizeof(on));
    158 \end{verbatim}
    159 
    160 Flowinfo received and latched by a connected TCP socket also may be fetched
    161 with \verb|getsockopt()| \verb|IPV6_PKTOPTIONS| together with
    162 another optional information.
    163 
    164 Besides that, in the spirit of RFC2292 the option \verb|IPV6_FLOWINFO|
    165 may be used as alternative way to send flowinfo with \verb|sendmsg()| or
    166 to latch it with \verb|IPV6_PKTOPTIONS|.
    167 
    168 \paragraph{Note about IPv6 options and destination address.}
    169 \addcontentsline{toc}{subsection}{IPv6 options and destination address}
    170 If \verb|sin6_flowinfo| does contain not zero flow label,
    171 destination address in \verb|sin6_addr| and non-fragmentable
    172 extension headers are ignored. Instead, kernel uses the values
    173 cached at flow setup (see below). However, for connected sockets
    174 kernel prefers the values set at connection time.
    175 
    176 \paragraph{Example.}
    177 \addcontentsline{toc}{subsection}{Example}
    178 After setting socket option \verb|IPV6_FLOWINFO|
    179 flowlabel and DS field are received as ancillary data object
    180 of type \verb|IPV6_FLOWINFO| and level \verb|SOL_IPV6|.
    181 In the cases when it is convenient to use \verb|recvfrom(2)|,
    182 it is possible to replace library variant with your own one,
    183 sort of:
    184 
    185 \begin{verbatim}
    186 #include <sys/socket.h>
    187 #include <netinet/in6.h>
    188 
    189 size_t recvfrom(int fd, char *buf, size_t len, int flags,
    190                 struct sockaddr *addr, int *addrlen)
    191 {
    192   size_t cc;
    193   char cbuf[128];
    194   struct cmsghdr *c;
    195   struct iovec iov = { buf, len };
    196   struct msghdr msg = { addr, *addrlen,
    197                         &iov,  1,
    198                         cbuf, sizeof(cbuf),
    199                         0 };
    200 
    201   cc = recvmsg(fd, &msg, flags);
    202   if (cc < 0)
    203     return cc;
    204   ((struct sockaddr_in6*)addr)->sin6_flowinfo = 0;
    205   *addrlen = msg.msg_namelen;
    206   for (c=CMSG_FIRSTHDR(&msg); c; c = CMSG_NEXTHDR(&msg, c)) {
    207     if (c->cmsg_level != SOL_IPV6 ||
    208       c->cmsg_type != IPV6_FLOWINFO)
    209         continue;
    210     ((struct sockaddr_in6*)addr)->sin6_flowinfo = *(__u32*)CMSG_DATA(c);
    211   }
    212   return cc;
    213 }
    214 \end{verbatim}
    215 
    216 
    217 
    218 \section{Flow label management.}
    219 
    220 \paragraph{Discussion.}
    221 \addcontentsline{toc}{subsection}{Discussion}
    222 Requirements of RFC2460 are pretty tough. Particularly, lifetimes
    223 longer than boot time require to store allocated labels at stable
    224 storage, so that the full implementation necessarily includes user space flow
    225 label manager. There are at least three different approaches:
    226 
    227 \begin{enumerate}
    228 \item {\bf ``Cooperative''. } We could leave flow label allocation wholly
    229 to user space. When user needs label he requests manager directly. The approach
    230 is valid, but as any ``cooperative'' approach it suffers of security problems.
    231 
    232 \begin{NB}
    233 One idea is to disallow not privileged user to allocate flow
    234 labels, but instead to pass the socket to manager via \verb|SCM_RIGHTS|
    235 control message, so that it will allocate label and assign it to socket
    236 itself. Hmm... the idea is interesting.
    237 \end{NB}
    238 
    239 \item {\bf ``Indirect''.} Kernel redirects requests to user level daemon
    240 and does not install label until the daemon acknowledged the request.
    241 The approach is the most promising, it is especially pleasant to recognize
    242 parallel with IPsec API [RFC2367,Craig]. Actually, it may share API with
    243 IPsec.
    244 
    245 \item {\bf ``Stupid''.} To allocate labels in kernel space. It is the simplest
    246 method, but it suffers of two serious flaws: the first,
    247 we cannot lease labels with lifetimes longer than boot time, the second, 
    248 it is sensitive to DoS attacks. Kernel have to remember all the obsolete
    249 labels until their expiration and malicious user may fastly eat all the
    250 flow label space.
    251 
    252 \end{enumerate}
    253 
    254 Certainly, I choose the most ``stupid'' method. It is the cheapest one
    255 for implementor (i.e.\ me), and taking into account that flow labels
    256 still have no serious applications it is not useful to work on more
    257 advanced API, especially, taking into account that eventually we
    258 will get it for no fee together with IPsec.
    259 
    260 
    261 \paragraph{Implementation.}
    262 \addcontentsline{toc}{subsection}{Implementation}
    263 Socket option \verb|IPV6_FLOWLABEL_MGR| allows to
    264 request flow label manager to allocate new flow label, to reuse
    265 already allocated one or to delete old flow label.
    266 Its argument is \verb|struct| \verb|in6_flowlabel_req|:
    267 
    268 \begin{verbatim}
    269 struct in6_flowlabel_req
    270 {
    271         struct in6_addr flr_dst;
    272         __u32           flr_label;
    273         __u8            flr_action;
    274         __u8            flr_share;
    275         __u16           flr_flags;
    276         __u16           flr_expires;
    277         __u16           flr_linger;
    278         __u32         __flr_reserved;
    279         /* Options in format of IPV6_PKTOPTIONS */
    280 };
    281 \end{verbatim}
    282 
    283 \begin{itemize}
    284 
    285 \item \verb|dst| is IPv6 destination address associated with the label.
    286 
    287 \item \verb|label| is flow label value in network byte order. If it is zero,
    288 kernel will allocate new pseudo-random number. Otherwise, kernel will try
    289 to lease flow label ordered by user. In this case, it is user task to provide
    290 necessary flow label randomness.
    291 
    292 \item \verb|action| is requested operation. Currently, only three operations
    293 are defined:
    294 
    295 \begin{verbatim}
    296 #define IPV6_FL_A_GET   0   /* Get flow label */
    297 #define IPV6_FL_A_PUT   1   /* Release flow label */
    298 #define IPV6_FL_A_RENEW 2   /* Update expire time */
    299 \end{verbatim}
    300 
    301 \item \verb|flags| are optional modifiers. Currently
    302 only \verb|IPV6_FL_A_GET| has modifiers:
    303 
    304 \begin{verbatim}
    305 #define IPV6_FL_F_CREATE 1   /* Allowed to create new label */
    306 #define IPV6_FL_F_EXCL   2   /* Do not create new label */
    307 \end{verbatim}
    308 
    309 
    310 \item \verb|share| defines who is allowed to reuse the same flow label.
    311 
    312 \begin{verbatim}
    313 #define IPV6_FL_S_NONE    0   /* Not defined */
    314 #define IPV6_FL_S_EXCL    1   /* Label is private */
    315 #define IPV6_FL_S_PROCESS 2   /* May be reused by this process */
    316 #define IPV6_FL_S_USER    3   /* May be reused by this user */
    317 #define IPV6_FL_S_ANY     255 /* Anyone may reuse it */
    318 \end{verbatim}
    319 
    320 \item \verb|linger| is time in seconds. After the last user releases flow
    321 label, it will not be reused with different destination and options at least
    322 during this time. If \verb|share| is not \verb|IPV6_FL_S_EXCL| the label
    323 still can be shared by another sockets. Current implementation does not allow
    324 unprivileged user to set linger longer than 60 sec.
    325 
    326 \item \verb|expires| is time in seconds. Flow label will be kept at least
    327 for this time, but it will not be destroyed before user released it explicitly
    328 or closed all the sockets using it. Current implementation does not allow
    329 unprivileged user to set timeout longer than 60 sec. Proviledged applications
    330 MAY set longer lifetimes, but in this case they MUST save allocated
    331 labels at stable storage and restore them back after reboot before the first
    332 application allocates new flow.
    333 
    334 \end{itemize}
    335 
    336 This structure is followed by optional extension headers associated
    337 with this flow label in format of \verb|IPV6_PKTOPTIONS|. Only
    338 \verb|IPV6_HOPOPTS|, \verb|IPV6_RTHDR| and, if \verb|IPV6_RTHDR| presents,
    339 \verb|IPV6_DSTOPTS| are allowed.
    340 
    341 \paragraph{Example.}
    342 \addcontentsline{toc}{subsection}{Example}
    343  The function \verb|get_flow_label| allocates
    344 private flow label.
    345 
    346 \begin{verbatim}
    347 int get_flow_label(int fd, struct sockaddr_in6 *dst, __u32 fl)
    348 {
    349         int on = 1;
    350         struct in6_flowlabel_req freq;
    351 
    352         memset(&freq, 0, sizeof(freq));
    353         freq.flr_label = htonl(fl);
    354         freq.flr_action = IPV6_FL_A_GET;
    355         freq.flr_flags = IPV6_FL_F_CREATE | IPV6_FL_F_EXCL;
    356         freq.flr_share = IPV6_FL_S_EXCL;
    357         memcpy(&freq.flr_dst, &dst->sin6_addr, 16);
    358         if (setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
    359                        &freq, sizeof(freq)) == -1) {
    360                 perror ("can't lease flowlabel");
    361                 return -1;
    362         }
    363         dst->sin6_flowinfo |= freq.flr_label;
    364 
    365         if (setsockopt(fd, SOL_IPV6, IPV6_FLOWINFO_SEND,
    366                        &on, sizeof(on)) == -1) {
    367                 perror ("can't send flowinfo");
    368 
    369                 freq.flr_action = IPV6_FL_A_PUT;
    370                 setsockopt(fd, SOL_IPV6, IPV6_FLOWLABEL_MGR,
    371                            &freq, sizeof(freq));
    372                 return -1;
    373         }
    374         return 0;
    375 }
    376 \end{verbatim}
    377 
    378 A bit more complicated example using routing header can be found
    379 in \verb|ping6| utility (\verb|iputils| package). Linux rsvpd backend
    380 contains an example of using operation \verb|IPV6_FL_A_RENEW|.
    381 
    382 \paragraph{Listing flow labels.} 
    383 \addcontentsline{toc}{subsection}{Listing flow labels}
    384 List of currently allocated
    385 flow labels may be read from \verb|/proc/net/ip6_flowlabel|.
    386 
    387 \begin{verbatim}
    388 Label S Owner Users Linger Expires Dst                              Opt
    389 A1BE5 1 0     0     6      3       3ffe2400000000010a0020fffe71fb30 0
    390 \end{verbatim}
    391 
    392 \begin{itemize}
    393 \item \verb|Label| is hexadecimal flow label value.
    394 \item \verb|S| is sharing style.
    395 \item \verb|Owner| is ID of creator, it is zero, pid or uid, depending on
    396 		sharing style.
    397 \item \verb|Users| is number of applications using the label now.
    398 \item \verb|Linger| is \verb|linger| of this label in seconds.
    399 \item \verb|Expires| is time until expiration of the label in seconds. It may
    400 	be negative, if the label is in use.
    401 \item \verb|Dst| is IPv6 destination address.
    402 \item \verb|Opt| is length of options, associated with the label. Option
    403 	data are not accessible.
    404 \end{itemize}
    405 
    406 
    407 \paragraph{Flow labels and RSVP.} 
    408 \addcontentsline{toc}{subsection}{Flow labels and RSVP}
    409 RSVP daemon supports IPv6 flow labels
    410 without any modifications to standard ISI RAPI. Sender must allocate
    411 flow label, fill corresponding sender template and submit it to local rsvp
    412 daemon. rsvpd will check the label and start to announce it in PATH
    413 messages. Rsvpd on sender node will renew the flow label, so that it will not
    414 be reused before path state expires and all the intermediate
    415 routers and receiver purge flow state.
    416 
    417 \verb|rtap| utility is modified to parse flow labels. F.e.\ if user allocated
    418 flow label \verb|0xA1234|, he may write:
    419 
    420 \begin{verbatim}
    421 RTAP> sender 3ffe:2400::1/FL0xA1234 <Tspec>
    422 \end{verbatim}
    423 
    424 Receiver makes reservation with command:
    425 \begin{verbatim}
    426 RTAP> reserve ff 3ffe:2400::1/FL0xA1234 <Flowspec>
    427 \end{verbatim}
    428 
    429 \end{document}
    430