Home | History | Annotate | Download | only in man
      1 EPOLL(4)                   Linux Programmer's Manual                  EPOLL(4)
      2 
      3 
      4 
      5 NAME
      6        epoll - I/O event notification facility
      7 
      8 SYNOPSIS
      9        #include <sys/epoll.h>
     10 
     11 DESCRIPTION
     12        epoll  is a variant of poll(2) that can be used either as Edge or Level
     13        Triggered interface and scales well to large numbers  of  watched  fds.
     14        Three  system  calls  are  provided to set up and control an epoll set:
     15        epoll_create(2), epoll_ctl(2), epoll_wait(2).
     16 
     17        An epoll set is connected to a file descriptor  created  by  epoll_cre-
     18        ate(2).   Interest  for certain file descriptors is then registered via
     19        epoll_ctl(2).  Finally, the actual wait is started by epoll_wait(2).
     20 
     21 
     22 NOTES
     23        The epoll event distribution interface is able to behave both  as  Edge
     24        Triggered  ( ET ) and Level Triggered ( LT ). The difference between ET
     25        and LT event distribution mechanism can be described as  follows.  Sup-
     26        pose that this scenario happens :
     27 
     28        1      The file descriptor that represent the read side of a pipe ( RFD
     29               ) is added inside the epoll device.
     30 
     31        2      Pipe writer writes 2Kb of data on the write side of the pipe.
     32 
     33        3      A call to epoll_wait(2) is done that will return  RFD  as  ready
     34               file descriptor.
     35 
     36        4      The pipe reader reads 1Kb of data from RFD.
     37 
     38        5      A call to epoll_wait(2) is done.
     39 
     40 
     41        If  the RFD file descriptor has been added to the epoll interface using
     42        the EPOLLET flag, the call to epoll_wait(2) done in step 5 will  proba-
     43        bly  hang because of the available data still present in the file input
     44        buffers and the remote peer might be expecting a response based on  the
     45        data  it already sent. The reason for this is that Edge Triggered event
     46        distribution delivers events only when events happens on the  monitored
     47        file.  So, in step 5 the caller might end up waiting for some data that
     48        is already present inside the input buffer. In the  above  example,  an
     49        event on RFD will be generated because of the write done in 2 , and the
     50        event is consumed in 3.  Since the read operation done in  4  does  not
     51        consume the whole buffer data, the call to epoll_wait(2) done in step 5
     52        might lock indefinitely. The epoll interface, when used with the  EPOL-
     53        LET flag ( Edge Triggered ) should use non-blocking file descriptors to
     54        avoid having a blocking read or write starve the task that is  handling
     55        multiple  file  descriptors.  The suggested way to use epoll as an Edge
     56        Triggered ( EPOLLET ) interface is  below,  and  possible  pitfalls  to
     57        avoid follow.
     58 
     59        i      with non-blocking file descriptors
     60 
     61        ii     by  going  to  wait  for an event only after read(2) or write(2)
     62               return EAGAIN
     63 
     64        On the contrary, when used as a Level Triggered interface, epoll is  by
     65        all means a faster poll(2), and can be used wherever the latter is used
     66        since it shares the same semantics. Since even with the Edge  Triggered
     67        epoll  multiple  events  can  be  generated  up on receival of multiple
     68        chunks of data, the caller has the option to specify  the  EPOLLONESHOT
     69        flag, to tell epoll to disable the associated file descriptor after the
     70        receival of an event with epoll_wait(2).  When the EPOLLONESHOT flag is
     71        specified,  it  is  caller  responsibility to rearm the file descriptor
     72        using epoll_ctl(2) with EPOLL_CTL_MOD.
     73 
     74 
     75 EXAMPLE FOR SUGGESTED USAGE
     76        While the usage of epoll when employed like a Level Triggered interface
     77        does  have  the  same  semantics  of  poll(2),  an Edge Triggered usage
     78        requires more clarifiction to avoid stalls  in  the  application  event
     79        loop.  In this example, listener is a non-blocking socket on which lis-
     80        ten(2) has been called. The function do_use_fd()  uses  the  new  ready
     81        file descriptor until EAGAIN is returned by either read(2) or write(2).
     82        An event driven state machine application should, after having received
     83        EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
     84        do_use_fd() it will continue to  read(2)  or  write(2)  from  where  it
     85        stopped before.
     86 
     87        struct epoll_event ev, *events;
     88 
     89        for(;;) {
     90            nfds = epoll_wait(kdpfd, events, maxevents, -1);
     91 
     92            for(n = 0; n < nfds; ++n) {
     93                if(events[n].data.fd == listener) {
     94                    client = accept(listener, (struct sockaddr *) &local,
     95                                    &addrlen);
     96                    if(client < 0){
     97                        perror("accept");
     98                        continue;
     99                    }
    100                    setnonblocking(client);
    101                    ev.events = EPOLLIN | EPOLLET;
    102                    ev.data.fd = client;
    103                    if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
    104                        fprintf(stderr, "epoll set insertion error: fd=%d0,
    105                                client);
    106                        return -1;
    107                    }
    108                }
    109                else
    110                    do_use_fd(events[n].data.fd);
    111            }
    112        }
    113 
    114        When  used  as an Edge triggered interface, for performance reasons, it
    115        is possible to add the file descriptor inside  the  epoll  interface  (
    116        EPOLL_CTL_ADD  )  once  by specifying ( EPOLLIN|EPOLLOUT ). This allows
    117        you to avoid continuously switching between EPOLLIN and EPOLLOUT  call-
    118        ing epoll_ctl(2) with EPOLL_CTL_MOD.
    119 
    120 
    121 QUESTIONS AND ANSWERS (from linux-kernel)
    122        Q1     What happens if you add the same fd to an epoll_set twice?
    123 
    124        A1     You  will  probably get EEXIST. However, it is possible that two
    125               threads may add the same fd twice. This is a harmless condition.
    126 
    127        Q2     Can  two  epoll  sets  wait  for  the same fd? If so, are events
    128               reported to both epoll sets fds?
    129 
    130        A2     Yes. However, it is not recommended. Yes it would be reported to
    131               both.
    132 
    133        Q3     Is the epoll fd itself poll/epoll/selectable?
    134 
    135        A3     Yes.
    136 
    137        Q4     What happens if the epoll fd is put into its own fd set?
    138 
    139        A4     It  will  fail.  However, you can add an epoll fd inside another
    140               epoll fd set.
    141 
    142        Q5     Can I send the epoll fd over a unix-socket to another process?
    143 
    144        A5     No.
    145 
    146        Q6     Will the close of an fd cause it to be removed  from  all  epoll
    147               sets automatically?
    148 
    149        A6     Yes.
    150 
    151        Q7     If more than one event comes in between epoll_wait(2) calls, are
    152               they combined or reported separately?
    153 
    154        A7     They will be combined.
    155 
    156        Q8     Does an operation on an fd affect the already collected but  not
    157               yet reported events?
    158 
    159        A8     You  can  do  two  operations on an existing fd. Remove would be
    160               meaningless for this case. Modify will re-read available I/O.
    161 
    162        Q9     Do I need to continuously read/write an  fd  until  EAGAIN  when
    163               using the EPOLLET flag ( Edge Triggered behaviour ) ?
    164 
    165        A9     No  you don't. Receiving an event from epoll_wait(2) should sug-
    166               gest to you that such file descriptor is ready for the requested
    167               I/O  operation.  You  have simply to consider it ready until you
    168               will receive the next EAGAIN. When and how  you  will  use  such
    169               file  descriptor is entirely up to you. Also, the condition that
    170               the read/write I/O space is exhausted can be detected by  check-
    171               ing  the  amount  of  data  read/write  from/to  the target file
    172               descriptor. For example, if you call read(2) by asking to read a
    173               certain  amount  of  data  and read(2) returns a lower number of
    174               bytes, you can be sure to have exhausted the read I/O space  for
    175               such  file  descriptor.  Same  is  valid  when writing using the
    176               write(2) function.
    177 
    178 
    179 POSSIBLE PITFALLS AND WAYS TO AVOID THEM
    180        o Starvation ( Edge Triggered )
    181 
    182        If there is a large amount of I/O space, it is possible that by  trying
    183        to  drain it the other files will not get processed causing starvation.
    184        This is not specific to epoll.
    185 
    186 
    187        The solution is to maintain a ready list and mark the  file  descriptor
    188        as  ready in its associated data structure, thereby allowing the appli-
    189        cation to remember which files need to be  processed  but  still  round
    190        robin  amongst  all the ready files. This also supports ignoring subse-
    191        quent events you receive for fd's that are already ready.
    192 
    193 
    194 
    195        o If using an event cache...
    196 
    197        If you use  an  event  cache  or  store  all  the  fd's  returned  from
    198        epoll_wait(2),  then  make  sure  to  provide a way to mark its closure
    199        dynamically (ie- caused by a previous event's processing). Suppose  you
    200        receive  100  events  from epoll_wait(2), and in eventi #47 a condition
    201        causes event #13 to be closed.  If you remove the structure and close()
    202        the  fd  for event #13, then your event cache might still say there are
    203        events waiting for that fd causing confusion.
    204 
    205 
    206        One solution for this is to call, during the processing  of  event  47,
    207        epoll_ctl(EPOLL_CTL_DEL)  to  delete  fd  13 and close(), then mark its
    208        associated data structure as removed and link it to a cleanup list.  If
    209        you  find  another  event  for fd 13 in your batch processing, you will
    210        discover the fd had been previously removed and there will be no confu-
    211        sion.
    212 
    213 
    214 
    215 CONFORMING TO
    216        epoll(4) is a new API introduced in Linux kernel 2.5.44.  Its interface
    217        should be finalized in Linux kernel 2.5.66.
    218 
    219 SEE ALSO
    220        epoll_create(2) epoll_ctl(2) epoll_wait(2)
    221 
    222 
    223 
    224 
    225 Linux                           23 October 2002                       EPOLL(4)
    226