Home | History | Annotate | Download | only in actions
      1 
      2 This documented is slightly dated but should give you idea of how things
      3 work.
      4 
      5 What is it?
      6 -----------
      7 
      8 An extension to the filtering/classification architecture of Linux Traffic
      9 Control. 
     10 Up to 2.6.8 the only action that could be "attached" to a filter was policing. 
     11 i.e you could say something like:
     12 
     13 -----
     14 tc filter add dev lo parent ffff: protocol ip prio 10 u32 match ip src \
     15 127.0.0.1/32 flowid 1:1 police mtu 4000 rate 1500kbit burst 90k
     16 -----
     17 
     18 which implies "if a packet is seen on the ingress of the lo device with
     19 a source IP address of 127.0.0.1/32 we give it a classification id  of 1:1 and
     20 we execute a policing action which rate limits its bandwidth utilization 
     21 to 1.5Mbps".
     22 
     23 The new extensions allow for more than just policing actions to be added.
     24 They are also fully backward compatible. If you have a kernel that doesnt
     25 understand them, then the effect is null i.e if you have a newer tc
     26 but older kernel, the actions are not installed. Likewise if you
     27 have a newer kernel but older tc, obviously the tc will use current
     28 syntax which will work fine. Of course to get the required effect you need
     29 both newer tc and kernel. If you are reading this you have the
     30 right tc ;->
     31 
     32 A side effect is that we can now get stateless firewalling to work with tc. 
     33 Essentially this is now an alternative to iptables.
     34 I wont go into details of my dislike for iptables at times, but 
     35 scalability is one of the main issues; however, if you need stateful
     36 classification - use netfilter (for now).
     37 
     38 This stuff works on both ingress and egress qdiscs.
     39 
     40 Features
     41 --------
     42 
     43 1) new additional syntax and actions enabled. Note old syntax is still valid.
     44 
     45 Essentially this is still the same syntax as tc with a new construct
     46 "action". The syntax is of the form:
     47 tc filter add <DEVICE> parent 1:0 protocol ip prio 10 <Filter description>
     48 flowid 1:1 action <ACTION description>*
     49 
     50 You can have as many actions as you want (within sensible reasoning).
     51 
     52 In the past the only real action was the policer; i.e you could do something
     53 along the lines of:
     54 tc filter add dev lo parent ffff: protocol ip prio 10 u32 \
     55 match ip src 127.0.0.1/32 flowid 1:1 \
     56 police mtu 4000 rate 1500kbit burst 90k
     57 
     58 Although you can still use the same syntax, now you can say:
     59 
     60 tc filter add dev lo parent 1:0 protocol ip prio 10 u32 \
     61 match ip src 127.0.0.1/32 flowid 1:1 \
     62 action police mtu 4000 rate 1500kbit burst 90k
     63 
     64 " generic Actions" (gact) at the moment are: 
     65 { drop, pass, reclassify, continue}
     66 (If you have others, no listed here give me a reason and we will add them)
     67 +drop says to drop the packet
     68 +pass and ok (are equivalent) says to accept it
     69 +reclassify requests for reclassification of the packet
     70 +continue requests for next lookup to match
     71 
     72 2)In order to take advantage of some of the targets written by the
     73 iptables people, a classifier can have a packet being massaged by an
     74 iptable target. I have only tested with mangler targets up to now.
     75 (infact anything that is not in the mangling table is disabled right now)
     76 
     77 In terms of hooks:
     78 *ingress is mapped to pre-routing hook
     79 *egress is mapped to post-routing hook
     80 I dont see much value in the other hooks, if you see it and email me good
     81 reasons, the addition is trivial.
     82 
     83 Example syntax for iptables targets usage becomes:
     84 tc filter add ..... u32 <u32 syntax> action ipt -j <iptables target syntax>
     85 
     86 example:
     87 tc filter add dev lo parent ffff: protocol ip prio 8 u32 \
     88 match ip dst 127.0.0.8/32 flowid 1:12 \
     89 action ipt -j mark --set-mark 2
     90 
     91 NOTE: flowid 1:12 is parsed flowid 0x1:0x12.  Make sure if you want flowid
     92 decimal 12, then use flowid 1:c.
     93 
     94 3) A feature i call pipe
     95 The motivation is derived from Unix pipe mechanism but applied to packets.
     96 Essentially take a matching packet and pass it through 
     97 action1 | action2 | action3 etc.
     98 You could do something similar to this with the tc policer and the "continue"
     99 operator but this rather restricts it to just the policer and requires 
    100 multiple rules (and lookups, hence quiet inefficient); 
    101 
    102 as an example -- and please note that this is just an example _not_ The 
    103 Word Youve Been Waiting For (yes i have had problems giving examples
    104 which ended becoming dogma in documents and people modifying them a little
    105 to look clever); 
    106 
    107 i selected the metering rates to be small so that i can show better how 
    108 things work.
    109  
    110 The script below does the following: 
    111 - an incoming packet from 10.0.0.21 is first given a firewall mark of 1. 
    112 
    113 - It is then metered to make sure it does not exceed its allocated rate of 
    114 1Kbps. If it doesnt exceed rate, this is where we terminate action execution.
    115 
    116 - If it does exceed its rate, its "color" changes to a mark of 2 and it is 
    117 then passed through a second meter.
    118 
    119 -The second meter is shared across all flows on that device [i am suprised 
    120 that this seems to be not a well know feature of the policer; Bert was telling 
    121 me that someone was writing a qdisc just to do sharing across multiple devices;
    122 it must be the summer heat again; weve had someone doing that every year around
    123 summer  -- the key to sharing is to use a operator "index" in your policer 
    124 rules (example "index 20"). All your rules have to use the same index to 
    125 share.]
    126  
    127 -If the second meter is exceeded the color of the flow changes further to 3.
    128 
    129 -We then pass the packet to another meter which is shared across all devices
    130 in the system. If this meter is exceeded we drop the packet.
    131 
    132 Note the mark can be used further up the system to do things like policy 
    133 or more interesting things on the egress.
    134 
    135 ------------------ cut here -------------------------------
    136 #
    137 # Add an ingress qdisc on eth0
    138 tc qdisc add dev eth0 ingress
    139 #
    140 #if you see an incoming packet from 10.0.0.21
    141 tc filter add dev eth0 parent ffff: protocol ip prio 1 \
    142 u32 match ip src 10.0.0.21/32 flowid 1:15 \
    143 #
    144 # first give it a mark of 1
    145 action ipt -j mark --set-mark 1 index 2 \
    146 #
    147 # then pass it through a policer which allows 1kbps; if the flow
    148 # doesnt exceed that rate, this is where we stop, if it exceeds we
    149 # pipe the packet to the next action
    150 action police rate 1kbit burst 9k pipe \
    151 #
    152 # which marks the packet fwmark as 2 and pipes
    153 action ipt -j mark --set-mark 2 \
    154 #
    155 # next attempt to borrow b/width from a meter
    156 # used across all flows incoming on eth0("index 30")
    157 # and if that is exceeded we pipe to the next action
    158 action police index 30 mtu 5000 rate 1kbit burst 10k pipe \
    159 # mark it as fwmark 3 if exceeded
    160 action ipt -j mark --set-mark 3 \
    161 # and then attempt to borrow from a meter used by all devices in the
    162 # system. Should this be exceeded, drop the packet on the floor.
    163 action police index 20 mtu 5000 rate 1kbit burst 90k drop
    164 --------------------------------- 
    165 
    166 Now lets see the actions installed with 
    167 "tc filter show parent ffff: dev eth0"
    168 
    169 -------- output -----------
    170 jroot# tc filter show parent ffff: dev eth0
    171 filter protocol ip pref 1 u32 
    172 filter protocol ip pref 1 u32 fh 800: ht divisor 1 
    173 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 
    174 
    175    action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    176         target MARK set 0x1  index 2
    177 
    178    action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb 
    179 
    180    action order 3: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    181         target MARK set 0x2  index 1
    182 
    183    action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b 
    184 
    185    action order 5: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    186         target MARK set 0x3  index 3
    187 
    188    action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b 
    189 
    190   match 0a000015/ffffffff at 12
    191 -------------------------------
    192 
    193 Note the ordering of the actions is based on the order in which we entered
    194 them. In the future i will add explicit priorities.
    195 
    196 Now lets run a ping -f from 10.0.0.21 to this host; stop the ping after
    197 you see a few lines of dots
    198 
    199 ----
    200 [root@jzny hadi]# ping -f  10.0.0.22
    201 PING 10.0.0.22 (10.0.0.22): 56 data bytes
    202 ....................................................................................................................................................................................................................................................................................................................................................................................................................................................
    203 --- 10.0.0.22 ping statistics ---
    204 2248 packets transmitted, 1811 packets received, 19% packet loss
    205 round-trip min/avg/max = 0.7/9.3/20.1 ms
    206 -----------------------------
    207 
    208 Now lets take a look at the stats with "tc -s filter show parent ffff: dev eth0"
    209 
    210 --------------
    211 jroot# tc -s filter show parent ffff: dev eth0
    212 filter protocol ip pref 1 u32 
    213 filter protocol ip pref 1 u32 fh 800: ht divisor 1 
    214 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1
    215 5 
    216 
    217    action order 1: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    218         target MARK set 0x1  index 2
    219          Sent 188832 bytes 2248 pkts (dropped 0, overlimits 0) 
    220 
    221    action order 2: police 1 action pipe rate 1Kbit burst 9Kb mtu 2Kb 
    222          Sent 188832 bytes 2248 pkts (dropped 0, overlimits 2122) 
    223 
    224    action order 3: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    225         target MARK set 0x2  index 1
    226          Sent 178248 bytes 2122 pkts (dropped 0, overlimits 0) 
    227 
    228    action order 4: police 30 action pipe rate 1Kbit burst 10Kb mtu 5000b 
    229          Sent 178248 bytes 2122 pkts (dropped 0, overlimits 1945) 
    230 
    231    action order 5: tablename: mangle  hook: NF_IP_PRE_ROUTING 
    232         target MARK set 0x3  index 3
    233          Sent 163380 bytes 1945 pkts (dropped 0, overlimits 0) 
    234 
    235    action order 6: police 20 action drop rate 1Kbit burst 90Kb mtu 5000b 
    236          Sent 163380 bytes 1945 pkts (dropped 0, overlimits 437) 
    237 
    238   match 0a000015/ffffffff at 12
    239 -------------------------------
    240 
    241 Neat, eh?
    242 
    243 
    244 Wanna write an action module?
    245 ------------------------------
    246 Its easy. Either look at the code or send me email. I will document at
    247 some point; will also accept documentation.
    248 
    249 TODO
    250 ----
    251 
    252 Lotsa goodies/features coming. Requests also being accepted.
    253 At the moment the focus has been on getting the architecture in place.
    254 Expect new things in the spurious time i have to work on this
    255 (particularly around end of year when i have typically get time off
    256 from work).
    257 
    258