Home | History | Annotate | only in /system/netd/server/dns
Up to higher level directory
NameDateSize
DnsTlsDispatcher.cpp21-Aug-20186K
DnsTlsDispatcher.h21-Aug-20184.7K
DnsTlsQueryMap.cpp21-Aug-20184.2K
DnsTlsQueryMap.h21-Aug-20183.4K
DnsTlsServer.cpp21-Aug-20184.6K
DnsTlsServer.h21-Aug-20182.4K
DnsTlsSessionCache.cpp21-Aug-20182.3K
DnsTlsSessionCache.h21-Aug-20182.1K
DnsTlsSocket.cpp21-Aug-201816.7K
DnsTlsSocket.h21-Aug-20184.9K
DnsTlsSocketFactory.h21-Aug-20181.6K
DnsTlsTransport.cpp21-Aug-20187.5K
DnsTlsTransport.h21-Aug-20183.1K
IDnsTlsSocket.h21-Aug-20181.6K
IDnsTlsSocketFactory.h21-Aug-20181.3K
IDnsTlsSocketObserver.h21-Aug-20181.2K
README.md21-Aug-20186.6K

README.md

      1 # DNS-over-TLS query forwarder design
      2 
      3 ## Overview
      4 
      5 The DNS-over-TLS query forwarder consists of five classes:
      6  * `DnsTlsDispatcher`
      7  * `DnsTlsTransport`
      8  * `DnsTlsQueryMap`
      9  * `DnsTlsSessionCache`
     10  * `DnsTlsSocket`
     11 
     12 `DnsTlsDispatcher` is a singleton class whose `query` method is the `dns/` directory's
     13 only public interface.  `DnsTlsDispatcher` is just a table holding the
     14 `DnsTlsTransport` for each server (represented by a `DnsTlsServer` struct) and
     15 network.  `DnsTlsDispatcher` also blocks each query thread, waiting on a
     16 `std::future` returned by `DnsTlsTransport` that represents the response.
     17 
     18 `DnsTlsTransport` sends each query over a `DnsTlsSocket`, opening a
     19 new one if necessary.  It also has to listen for responses from the
     20 `DnsTlsSocket`, which happen on a different thread.
     21 `IDnsTlsSocketObserver` is an interface defining how `DnsTlsSocket` returns
     22 responses to `DnsTlsTransport`.
     23 
     24 `DnsTlsQueryMap` and `DnsTlsSessionCache` are helper classes owned by `DnsTlsTransport`.
     25 `DnsTlsQueryMap` handles ID renumbering and query-response pairing.
     26 `DnsTlsSessionCache` allows TLS session resumption.
     27 
     28 `DnsTlsSocket` interleaves all queries onto a single socket, and reports all
     29 responses to `DnsTlsTransport` (through the `IDnsTlsObserver` interface).  It doesn't
     30 know anything about which queries correspond to which responses, and does not retain
     31 state to indicate whether there is an outstanding query.
     32 
     33 ## Threading
     34 
     35 ### Overall patterns
     36 
     37 For clarity, each of the five classes in this design is thread-safe and holds one lock.
     38 Classes that spawn a helper thread call `thread::join()` in their destructor to ensure
     39 that it is cleaned up appropriately.
     40 
     41 All the classes here make full use of Clang thread annotations (and also null-pointer
     42 annotations) to minimize the likelihood of a latent threading bug.  The unit tests are
     43 also heavily threaded to exercise this functionality.
     44 
     45 This code creates O(1) threads per socket, and does not create a new thread for each
     46 query or response.  However, bionic's stub resolver does create a thread for each query.
     47 
     48 ### Threading in `DnsTlsSocket`
     49 
     50 `DnsTlsSocket` can receive queries on any thread, and send them over a
     51 "reliable datagram pipe" (`socketpair()` in `SOCK_SEQPACKET` mode).
     52 The query method writes a struct (containing a pointer to the query) to the pipe
     53 from its thread, and the loop thread (which owns the SSL socket)
     54 reads off the other end of the pipe.  The pipe doesn't actually have a queue "inside";
     55 instead, any queueing happens by blocking the query thread until the
     56 socket thread can read the datagram off the other end.
     57 
     58 We need to pass messages between threads using a pipe, and not a condition variable
     59 or a thread-safe queue, because the socket thread has to be blocked
     60 in `select` waiting for data from the server, but also has to be woken
     61 up on inputs from the query threads.  Therefore, inputs from the query
     62 threads have to arrive on a socket, so that `select()` can listen for them.
     63 (There can only be a single thread because [you can't use different threads
     64 to read and write in OpenSSL](https://www.openssl.org/blog/blog/2017/02/21/threads/)).
     65 
     66 ## ID renumbering
     67 
     68 `DnsTlsDispatcher` accepts queries that have colliding ID numbers and still sends them on
     69 a single socket.  To avoid confusion at the server, `DnsTlsQueryMap` assigns each
     70 query a new ID for transmission, records the mapping from input IDs to sent IDs, and
     71 applies the inverse mapping to responses before returning them to the caller.
     72 
     73 `DnsTlsQueryMap` assigns each new query the ID number one greater than the largest
     74 ID number of an outstanding query.  This means that ID numbers are initially sequential
     75 and usually small.  If the largest possible ID number is already in use,
     76 `DnsTlsQueryMap` will scan the ID space to find an available ID, or fail the query
     77 if there are no available IDs.  Queries will not block waiting for an ID number to
     78 become available.
     79 
     80 ## Time constants
     81 
     82 `DnsTlsSocket` imposes a 20-second inactivity timeout.  A socket that has been idle for
     83 20 seconds will be closed.  This sets the limit of tolerance for slow replies,
     84 which could happen as a result of malfunctioning authoritative DNS servers.
     85 If there are any pending queries, `DnsTlsTransport` will retry them.
     86 
     87 `DnsTlsQueryMap` imposes a retry limit of 3.  `DnsTlsTransport` will retry the query up
     88 to 3 times before reporting failure to `DnsTlsDispatcher`.
     89 This limit helps to ensure proper functioning in the case of a recursive resolver that
     90 is malfunctioning or is flooded with requests that are stalled due to malfunctioning
     91 authoritative servers.
     92 
     93 `DnsTlsDispatcher` maintains a 5-minute timeout.  Any `DnsTlsTransport` that has had no
     94 outstanding queries for 5 minutes will be destroyed at the next query on a different
     95 transport.
     96 This sets the limit on how long session tickets will be preserved during idle periods,
     97 because each `DnsTlsTransport` owns a `DnsTlsSessionCache`.  Imposing this timeout
     98 increases latency on the first query after an idle period, but also helps to avoid
     99 unbounded memory usage.
    100 
    101 `DnsTlsSessionCache` sets a limit of 5 sessions in each cache, expiring the oldest one
    102 when the limit is reached.  However, because the client code does not currently
    103 reuse sessions more than once, it should not be possible to hit this limit.
    104 
    105 ## Testing
    106 
    107 Unit tests are in `../tests/dns_tls_test.cpp`.  They cover all the classes except
    108 `DnsTlsSocket` (which requires `CAP_NET_ADMIN` because it uses `setsockopt(SO_MARK)`) and
    109 `DnsTlsSessionCache` (which requires integration with libssl).  These classes are
    110 exercised by the integration tests in `../tests/netd_test.cpp`.
    111 
    112 ### Dependency Injection
    113 
    114 For unit testing, we would like to be able to mock out `DnsTlsSocket`.  This is
    115 particularly required for unit testing of `DnsTlsDispatcher` and `DnsTlsTransport`.
    116 To make these unit tests possible, this code uses a dependency injection pattern:
    117 `DnsTlsSocket` is produced by a `DnsTlsSocketFactory`, and both of these have a
    118 defined interface.
    119 
    120 `DnsTlsDispatcher`'s constructor takes an `IDnsTlsSocketFactory`,
    121 which in production is a `DnsTlsSocketFactory`.  However, in unit tests, we can
    122 substitute a test factory that returns a fake socket, so that the unit tests can
    123 run without actually connecting over TLS to a test server.  (The integration tests
    124 do actual TLS.)
    125 
    126 ## Logging
    127 
    128 This code uses `ALOGV` throughout for low-priority logging, and does not use
    129 `ALOGD`.  `ALOGV` is disabled by default, unless activated by `#define LOG_NDEBUG 0`.
    130 (`ALOGD` is not disabled by default, requiring extra measures to avoid spamming the
    131 system log in production builds.)
    132 
    133 ## Reference
    134  * [BoringSSL API docs](https://commondatastorage.googleapis.com/chromium-boringssl-docs/headers.html)
    135