. The main "selling" point of HFSC is feature (1), which is achieved by using nonlinear service curves (more about what it actually is later). This is particularly useful in VoIP or games, where not only a guarantee of consistent bandwidth is important, but also limiting the initial delay of a data stream. Note that it matters only for leaf classes (where the actual queues are) - thus class hierarchy is ignored in the realtime case. Feature (2) is well, obvious - any algorithm featuring class hierarchy (such as HTB or CBQ) strives to achieve that. HFSC does that well, although you might end with unusual situations, if you define service curves carelessly - see section CORNER CASES for examples. Feature (3) is mentioned due to the nature of the problem. There may be situations where it's either not possible to guarantee service of all curves at the same time, and/or it's impossible to do so fairly. Both will be explained later. Note that this is mainly related to interior (aka aggregate) classes, as the leafs are already handled by (1). Still, it's perfectly possible to create a leaf class without realtime service, and in such a case the caveats will naturally extend to leaf classes as well.
RT - realtime LS - linkshare UL - upperlimit SC - service curve.
Let's define the criterion as follows:
."(1)" 4For each t1, there must exist t0 in set B, so S(t1-t0)\~<=\~w(t0,t1)
Here 'w' denotes the amount of service received during some time period between t0 and t1. B is a set of all times, where a session becomes active after idling period (further denoted as 'becoming backlogged'). For a clearer picture, imagine two situations: .
Consider (a): if the service received during both periods meets (1), then all is well. But what if it doesn't do so during the 2nd period? If the amount of service received during the 1st period is larger than the service curve, then it might compensate for smaller service during the 2nd period and the gap - if the gap is small enough. If the gap is larger (b) - then it's less likely to happen (unless the excess bandwidth allocated during the 1st part was really large). Still, the larger the gap - the less interesting is what happened in the past (e.g. 10 minutes ago) - what matters is the current traffic that just started. From HFSC's perspective, more interesting is answering the following question: when should we start transferring packets, so a service curve of a class is not violated. Or rephrasing it: How much X() amount of service should a session receive by time t, so the service curve is not violated. Function X() defined as below is the basic building block of HFSC, used in: eligible, deadline, virtual-time and fit-time curves. Of course, X() is based on equation (1) and is defined recursively:
min(X() from previous period ; w(t0)+S(t-t0) for t>=t0),
... where t0 denotes the beginning of the current backlogged period.
.
HFSC uses either linear, or two-piece linear service curves. In case of linear or two-piece linear convex functions (first slope < second slope), min() in X's definition reduces to the 2nd argument. But in case of two-piece concave functions, the 1st argument might quickly become lesser for some t>=t0. Note, that for some backlogged period, X() is defined only from that period's beginning. We also define X^(-1)(w) as smallest t>=t0, for which X(t)\~=\~w. We have to define it this way, as X() is usually not an injection. The above generic X() can be one of the following: .
Assume for a moment, that we only use D() for both finding eligible packets, and choosing the most fitting one, thus eligible time would be computed as D^(-1)(w) and deadline time would be computed as D^(-1)(w+l). If the 2nd class starts sending packets 1 second after the 1st class, it's of course impossible to guarantee 14Mbit, as the interface capability is only 10Mbit. The only workaround in this scenario is to allow the 1st class to send the packets earlier that would normally be allowed. That's where separate E() comes to help. Putting all the math aside (see HFSC paper for details), E() for RT concave service curve is just like D(), but for the RT convex service curve - it's constructed using only RT service curve's 2nd slope (in our example 7Mbit). The effect of such E() - packets will be sent earlier, and at the same time D() will be updated - so the current deadline time calculated from it will be later. Thus, when the 2nd class starts sending packets later, both the 1st and the 2nd class will be eligible, but the 2nd session's deadline time will be smaller and its packets will be sent first. When the 1st class becomes idle at some later point, the 2nd class will be able to "buffer" up again for later active period of the 1st class. A short remark - in a situation, where the total amount of bandwidth available on the interface is larger than the allocated total realtime parts (imagine a 10 Mbit interface, but 1Mbit/2Mbit and 2Mbit/1Mbit classes), the sole speed of the interface could suffice to guarantee the times. Important part of RT criterion is that apart from updating its D() and E(), also V() used by LS criterion is updated. Generally the RT criterion is secondary to LS one, and used only if there's a risk of violating precise realtime requirements. Still, the "participation" in bandwidth distributed by LS criterion is there, so V() has to be updated along the way. LS criterion can than properly compensate for non-ideal fair sharing situation, caused by RT scheduling. If you use UL service curve its F() will be updated as well (UL service curve is an extension to LS one - see UPPERLIMIT CRITERION section). Anyway - careless specification of LS and RT service curves can lead to potentially undesired situations (see CORNER CASES for examples). This wasn't the case in HFSC paper where LS and RT service curves couldn't be specified separately.
They sum up nicely to 10Mbit - the interface's capacity. But if we wanted to only use LS for guarantees and fairness - it simply won't work. In LS context, only V() is used for making decision which class to schedule. If the 2nd class becomes active when the 1st one is in its second slope, the fairness will be preserved - ratio will be 1:1 (7Mbit:7Mbit), but LS itself is of course unable to guarantee the absolute values themselves - as it would have to go beyond of what the interface is capable of.
R (root) - 10Mbit A - 7Mbit, then 3Mbit A1 - 5Mbit, then 2Mbit A2 - 2Mbit, then 1Mbit B - 3Mbit, then 7MbitR arbitrates between left subtree (A) and right (B). Assume that A2 and B are constantly backlogged, and at some later point A1 becomes backlogged (when all other classes are in their 2nd linear part). What happens now? B (choice made by R) will always get 7 Mbit as R is only (obviously) concerned with the ratio between its direct children. Thus A subtree gets 3Mbit, but its children would want (at the point when A1 became backlogged) 5Mbit + 1Mbit. That's of course impossible, as they can only get 3Mbit due to interface limitation. In the left subtree - we have the same situation as previously (fair split between A1 and A2, but violated guarantees), but in the whole tree - there's no fairness (B got 7Mbit, but A1 and A2 have to fit together in 3Mbit) and there's no guarantees for all classes (only B got what it wanted). Even if we violated fairness in the A subtree and set A2's service curve to 0, A1 would still not get the required bandwidth. .
A - ls 5.0Mbit B - ls 2.5Mbit C - ls 2.5Mbit, ul 2.5MbitIf B was idle, while A and C were constantly backlogged, A and C would normally (as far as LS criterion is concerned) divide bandwidth in 2:1 ratio. But due to UL service curve in place, C would get at most 2.5Mbit, and A would get the remaining 7.5Mbit. The longer the backlogged period, the more the virtual times of A and C would drift apart. If B became backlogged at some later point in time, its virtual time would be set to (A's\~vt\~+\~C's\~vt)/2, thus blocking A from sending any traffic until B's virtual time catches up with A. .
A - ls 5.0Mbit, rt 8Mbit B - ls 2.5Mbit C - ls 2.5MbitImagine A and C are constantly backlogged. As B is idle, A and C would divide bandwidth in 2:1 ratio, considering LS service curve (so in theory - 6.66 and 3.33). Alas RT criterion takes priority, so A will get 8Mbit and LS will be able to compensate class C for only 2 Mbit - this will cause discrepancy between virtual times of A and C. Assume this situation lasts for a long time with no idle periods, and suddenly B becomes active. B's virtual time will be updated to (A's\~vt\~+\~C's\~vt)/2, effectively landing in the middle between A's and C's virtual time. The effect - B, having no RT guarantees, will be punished and will not be allowed to transfer until C's virtual time catches up. If the interface had a higher capacity, for example 100Mbit, this example would behave perfectly fine though. Let's look a bit closer at the above example - it "cleverly" invalidates one of the basic things LS criterion tries to achieve - equality of all virtual times across class hierarchy. Leaf classes without RT service curves are literally left to their own fate (governed by messed up virtual times). Also, it doesn't make much sense. Class A will always be guaranteed up to 8Mbit, and this is more than any absolute bandwidth that could happen from its LS criterion (excluding trivial case of only A being active). If the bandwidth taken by A is smaller than absolute value from LS criterion, the unused part will be automatically assigned to other active classes (as A has idling periods in such case). The only "advantage" is, that even in case of low bandwidth on average, bursts would be handled at the speed defined by RT criterion. Still, if extra speed is needed (e.g. due to latency), non linear service curves should be used in such case. In the other words: the LS criterion is meaningless in the above example. You can quickly "workaround" it by making sure each leaf class has RT service curve assigned (thus guaranteeing all of them will get some bandwidth), but it doesn't make it any more valid. Keep in mind - if you use nonlinear curves and irregularities explained above happen only in the first segment, then there's little wrong with "overusing" RT curve a bit:
A - ls 5.0Mbit, rt 9Mbit/30ms, then 1Mbit B - ls 2.5Mbit C - ls 2.5MbitHere, the vt of A will "spike" in the initial period, but then A will never get more than 1Mbit until B & C catch up. Then everything will be back to normal. .
tc qdisc add dev eth0 root handle 1:0 hfsc default 1 tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10MbitAssuming packet of ~1KB size and HZ=100, that averages to ~0.8Mbit - anything beyond it (e.g. the above example with specified rate over 10x larger) will require appropriate queuing and cause bursts every ~10 ms. As you can imagine, any HFSC's RT guarantees will be seriously invalidated by that. Aforementioned example is mainly important if you deal with old hardware - as is particularly popular for home server chores. Even then, you can easily set HZ=1000 and have very accurate scheduling for typical adsl speeds. Anything modern (apic or even hpet msi based timers + 'tickless system') will provide enough accuracy for superb 1Gbit scheduling. For example, on one of my cheap dual-core AMD boards I have the following settings:
tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1 tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 300mbitAnd a simple:
nc -u dst.host.com 54321 </dev/zero nc -l -p 54321 >/dev/null...will yield the following effects over a period of ~10 seconds (taken from /proc/interrupts):
319: 42124229 0 HPET_MSI-edge hpet2 (before) 319: 42436214 0 HPET_MSI-edge hpet2 (after 10s.)That's roughly 31000/s. Now compare it with HZ=1000 setting. The obvious drawback of it is that cpu load can be rather high with servicing that many timer interrupts. The example with 300Mbit RT service curve on 1Gbit link is particularly ugly, as it requires a lot of throttling with minuscule delays. Also note that it's just an example showing the capabilities of current hardware. The above example (essentially a 300Mbit TBF emulator) is pointless on an internal interface to begin with: you will pretty much always want a regular LS service curve there, and in such a scenario HFSC simply doesn't throttle at all. 300Mbit RT service curve (selected columns from mpstat -P ALL 1):
10:56:43 PM CPU %sys %irq %soft %idle 10:56:44 PM all 20.10 6.53 34.67 37.19 10:56:44 PM 0 35.00 0.00 63.00 0.00 10:56:44 PM 1 4.95 12.87 6.93 73.27So, in the rare case you need those speeds with only a RT service curve, or with a UL service curve: remember the drawbacks. .
tc qdisc add dev eth0 root handle 1:0 hfsc default 0x8002 tc class add dev eth0 parent 1:0 classid 1:999 hfsc rt m2 50Mbit tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 2Mbit ul m2 2Mbit... so "internet" tree under 1:1 and "router itself" as 1:999 .