Home | History | Annotate | Download | only in tensorflow
      1 # Using TensorFlow Securely
      2 
      3 This document discusses how tosafely deal with untrusted programs (models or
      4 model parameters), and input data. Below, we also provide guidelines on how to
      5 report vulnerabilities in TensorFlow.
      6 
      7 ## TensorFlow models are programs
      8 
      9 TensorFlow's runtime system interprets and executes programs. What machine
     10 learning practitioners term
     11 [**models**](https://developers.google.com/machine-learning/glossary/#model) are
     12 expressed as programs that TensorFlow executes.  TensorFlow programs are encoded
     13 as computation
     14 [**graphs**](https://developers.google.com/machine-learning/glossary/#graph).
     15 The model's parameters are often stored separately in **checkpoints**.
     16 
     17 At runtime, TensorFlow executes the computation graph using the parameters
     18 provided. Note that the behavior of the computation graph may change
     19 depending on the parameters provided. TensorFlow itself is not a sandbox. When
     20 executing the computation graph, TensorFlow may read and write files, send and
     21 receive data over the network, and even spawn additional processes. All these
     22 tasks are performed with the permissions of the TensorFlow process. Allowing
     23 for this flexibility makes for a powerful machine learning platform,
     24 but it has implications for security.
     25 
     26 The computation graph may also accept **inputs**. Those inputs are the
     27 data you supply to TensorFlow to train a model, or to use a model to run
     28 inference on the data.
     29 
     30 **TensorFlow models are programs, and need to be treated as such from a security
     31 perspective.**
     32 
     33 ## Running untrusted models
     34 
     35 As a general rule: **Always** execute untrusted models inside a sandbox (e.g.,
     36 [nsjail](https://github.com/google/nsjail)).
     37 
     38 There are several ways in which a model could become untrusted. Obviously, if an
     39 untrusted party supplies TensorFlow kernels, arbitrary code may be executed.
     40 The same is true if the untrusted party provides Python code, such as the
     41 Python code that generates TensorFlow graphs.
     42 
     43 Even if the untrusted party only supplies the serialized computation
     44 graph (in form of a `GraphDef`, `SavedModel`, or equivalent on-disk format), the
     45 set of computation primitives available to TensorFlow is powerful enough that
     46 you should assume that the TensorFlow process effectively executes arbitrary
     47 code. One common solution is to whitelist only a few safe Ops. While this is
     48 possible in theory, we still recommend you sandbox the execution.
     49 
     50 It depends on the computation graph whether a user provided checkpoint is safe.
     51 It is easily possible to create computation graphs in which malicious
     52 checkpoints can trigger unsafe behavior. For example, consider a graph that
     53 contains a `tf.cond` depending on the value of a `tf.Variable`. One branch of
     54 the `tf.cond` is harmless, but the other is unsafe. Since the `tf.Variable` is
     55 stored in the checkpoint, whoever provides the checkpoint now has the ability to
     56 trigger unsafe behavior, even though the graph is not under their control.
     57 
     58 In other words, graphs can contain vulnerabilities of their own. To allow users
     59 to provide checkpoints to a model you run on their behalf (e.g., in order to
     60 compare model quality for a fixed model architecture), you must carefully audit
     61 your model, and we recommend you run the TensorFlow process in a sandbox.
     62 
     63 ## Accepting untrustedInputs
     64 
     65 It is possible to write models that are secure in a sense that they can safely
     66 process untrusted inputs assuming there are no bugs. There are two main reasons
     67 to not rely on this: first, it is easy to write models which must not be exposed
     68 to untrusted inputs, andsecond, there are bugs in any software system of
     69 sufficient complexity. Letting users control inputs could allow them to trigger
     70 bugs either in TensorFlow or in dependent libraries.
     71 
     72 In general, it is good practice to isolate parts of any system which is exposed
     73 to untrusted (e.g., user-provided) inputs in a sandbox.
     74 
     75 A useful analogy to how any TensorFlow graph is executed is any interpreted
     76 programming language, such as Python. While it is possible to write secure
     77 Python code which can be exposed to user supplied inputs (by, e.g., carefully
     78 quoting and sanitizinginput strings, size-checking input blobs, etc.), it is
     79 very easy to write Python programs which are insecure. Even secure Python code
     80 could be rendered insecure by a bug in the Python interpreter, or in a bug in a
     81 Python library used (e.g.,
     82 [this one](https://www.cvedetails.com/cve/CVE-2017-12852/)).
     83 
     84 ## Running a TensorFlow server
     85 
     86 TensorFlow is a platform for distributed computing, and as such there is a
     87 TensorFlow server (`tf.train.Server`). **The TensorFlow server is meant for
     88 internal communication only. It is not built for use in an untrusted network.**
     89 
     90 For performance reasons, the default TensorFlow server does not include any
     91 authorization protocol and sends messages unencrypted. It accepts connections
     92 from anywhere, and executes the graphs it is sent without performing any checks.
     93 Therefore, if you run a `tf.train.Server` in your network, anybody with
     94 access to the network can execute what you should consider arbitrary code with
     95 the privileges of the process running the `tf.train.Server`.
     96 
     97 When running distributed TensorFlow, you must isolate the network in which the
     98 cluster lives. Cloud providers provide instructions for setting up isolated
     99 networks, which are sometimes branded as "virtual private cloud." Refer to the
    100 instructions for
    101 [GCP](https://cloud.google.com/compute/docs/networks-and-firewalls) and
    102 [AWS](https://aws.amazon.com/vpc/)) for details.
    103 
    104 Note that `tf.train.Server` is different from the server created by
    105 `tensorflow/serving` (the default binary for which is called `ModelServer`).
    106 By default, `ModelServer` also has no built-in mechanism for authentication.
    107 Connecting it to an untrusted network allows anyone on this network to run the
    108 graphs known to the `ModelServer`. This means that an attacker may run
    109 graphs using untrusted inputs as described above, but they would not be able to
    110 execute arbitrary graphs. It is possible to safely expose a `ModelServer`
    111 directly to an untrusted network, **but only if the graphs it is configured to
    112 use have been carefully audited to be safe**.
    113 
    114 Similar to best practices for other servers, we recommend running any
    115 `ModelServer` with appropriate privileges (i.e., using a separate user with
    116 reduced permissions). In the spirit of defense in depth, we recommend
    117 authenticating requests to any TensorFlow server connected to an untrusted
    118 network, as well as sandboxing the server to minimize the adverse effects of
    119 any breach.
    120 
    121 ## Vulnerabilities in TensorFlow
    122 
    123 TensorFlow is a large and complex system. It also depends on a large set of
    124 third party libraries (e.g., `numpy`, `libjpeg-turbo`, PNG parsers, `protobuf`).
    125 It is possible that TensorFlow or its dependent libraries contain
    126 vulnerabilities that would allow triggering unexpected or dangerous behavior
    127 with specially crafted inputs.
    128 
    129 ### What is a vulnerability?
    130 
    131 Given TensorFlow's flexibility, it is possible to specify computation graphs
    132 which exhibit unexpected or unwanted behavior. The fact that TensorFlow models
    133 can perform arbitrary computations means that they may read and write files,
    134 communicate via the network, produce deadlocks and infinite loops, or run out
    135 of memory. It is only when these behaviors are outside the specifications of the
    136 operations involved that such behavior is a vulnerability.
    137 
    138 A `FileWriter` writing a file is not unexpected behavior and therefore is not a
    139 vulnerability in TensorFlow. A `MatMul` allowing arbitrary binary code execution
    140 **is** a vulnerability.
    141 
    142 This is more subtle from a system perspective. For example, it is easy to cause
    143 a TensorFlow process to try to allocate more memory than available by specifying
    144 a computation graph containing an ill-considered `tf.tile` operation. TensorFlow
    145 should exit cleanly in this case (it would raise an exception in Python, or
    146 return an error `Status` in C++). However, if the surrounding system is not
    147 expecting the possibility, such behavior could be used in a denial of service
    148 attack (or worse). Because TensorFlow behaves correctly, this is not a
    149 vulnerability in TensorFlow (although it would be a vulnerability of this
    150 hypothetical system).
    151 
    152 As a general rule, it is incorrect behavior for Tensorflow to access memory it
    153 does not own, or to terminate in an unclean way. Bugs in TensorFlow that lead to
    154 such behaviors constitute a vulnerability.
    155 
    156 One of the most critical parts of any system is input handling. If malicious
    157 input can trigger side effects or incorrect behavior, this is a bug, and likely
    158 a vulnerability.
    159 
    160 ### Reporting vulnerabilities
    161 
    162 Please email reports about any security related issues you find to
    163 `security (a] tensorflow.org`. This mail is delivered to a small security team. Your
    164 email will be acknowledged within one business day, and you'll receive a more
    165 detailed response to your email within 7 days indicating the next steps in
    166 handling your report. For critical problems, you may encrypt your report (see
    167 below).
    168 
    169 Please use a descriptive subject line for your report email. After the initial
    170 reply to your report, the security team will endeavor to keep you informed of
    171 the progress being made towards a fix and announcement.
    172 
    173 In addition, please include the following information along with your report:
    174 
    175 * Your name and affiliation (if any).
    176 * A description of the technical details of the vulnerabilities. It is very
    177   important to let us know how we can reproduce your findings.
    178 * An explanation who can exploit this vulnerability, and what they gain when
    179   doing so -- write an attack scenario. This will help us evaluate your report
    180   quickly, especially if the issue is complex.
    181 * Whether this vulnerability public or known to third parties. If it is, please
    182   provide details.
    183 
    184 If you believe that an existing (public) issue is security-related, please send
    185 an email to `security (a] tensorflow.org`. The email should include the issue ID and
    186 a short description of why it should be handled according to this security
    187 policy.
    188 
    189 Once an issue is reported, TensorFlow uses the following disclosure process:
    190 
    191 * When a report is received, we confirm the issue and determine its severity.
    192 * If we know of specific third-party services or software based on TensorFlow
    193   that require mitigation before publication, those projects will be notified.
    194 * An advisory is prepared (but not published) which details the problem and
    195   steps for mitigation.
    196 * Wherever possible, fixes are prepared for the last minor release of the two
    197   latest major releases, as well as the master branch. We will attempt to
    198   commit these fixes as soon as possible, and as close together as
    199   possible.
    200 * Patch releases are published for all fixed released versions, a
    201   notification is sent to discuss (a] tensorflow.org, and the advisory is published.
    202 
    203 Past security advisories are listed below. We credit reporters for identifying
    204 security issues, although we keep your name confidential if you request it.
    205 
    206 #### Encryption key for `security (a] tensorflow.org`
    207 
    208 If your disclosure is extremely sensitive, you may choose to encrypt your
    209 report using the key below. Please only use this for critical security
    210 reports.
    211 
    212 ```
    213 -----BEGIN PGP PUBLIC KEY BLOCK-----
    214 
    215 mQENBFpqdzwBCADTeAHLNEe9Vm77AxhmGP+CdjlY84O6DouOCDSq00zFYdIU/7aI
    216 LjYwhEmDEvLnRCYeFGdIHVtW9YrVktqYE9HXVQC7nULU6U6cvkQbwHCdrjaDaylP
    217 aJUXkNrrxibhx9YYdy465CfusAaZ0aM+T9DpcZg98SmsSml/HAiiY4mbg/yNVdPs
    218 SEp/Ui4zdIBNNs6at2gGZrd4qWhdM0MqGJlehqdeUKRICE/mdedXwsWLM8AfEA0e
    219 OeTVhZ+EtYCypiF4fVl/NsqJ/zhBJpCx/1FBI1Uf/lu2TE4eOS1FgmIqb2j4T+jY
    220 e+4C8kGB405PAC0n50YpOrOs6k7fiQDjYmbNABEBAAG0LVRlbnNvckZsb3cgU2Vj
    221 dXJpdHkgPHNlY3VyaXR5QHRlbnNvcmZsb3cub3JnPokBTgQTAQgAOBYhBEkvXzHm
    222 gOJBnwP4Wxnef3wVoM2yBQJaanc8AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA
    223 AAoJEBnef3wVoM2yNlkIAICqetv33MD9W6mPAXH3eon+KJoeHQHYOuwWfYkUF6CC
    224 o+X2dlPqBSqMG3bFuTrrcwjr9w1V8HkNuzzOJvCm1CJVKaxMzPuXhBq5+DeT67+a
    225 T/wK1L2R1bF0gs7Pp40W3np8iAFEh8sgqtxXvLGJLGDZ1Lnfdprg3HciqaVAiTum
    226 HBFwszszZZ1wAnKJs5KVteFN7GSSng3qBcj0E0ql2nPGEqCVh+6RG/TU5C8gEsEf
    227 3DX768M4okmFDKTzLNBm+l08kkBFt+P43rNK8dyC4PXk7yJa93SmS/dlK6DZ16Yw
    228 2FS1StiZSVqygTW59rM5XNwdhKVXy2mf/RtNSr84gSi5AQ0EWmp3PAEIALInfBLR
    229 N6fAUGPFj+K3za3PeD0fWDijlC9f4Ety/icwWPkOBdYVBn0atzI21thPRbfuUxfe
    230 zr76xNNrtRRlbDSAChA1J5T86EflowcQor8dNC6fS+oHFCGeUjfEAm16P6mGTo0p
    231 osdG2XnnTHOOEFbEUeWOwR/zT0QRaGGknoy2pc4doWcJptqJIdTl1K8xyBieik/b
    232 nSoClqQdZJa4XA3H9G+F4NmoZGEguC5GGb2P9NHYAJ3MLHBHywZip8g9oojIwda+
    233 OCLL4UPEZ89cl0EyhXM0nIAmGn3Chdjfu3ebF0SeuToGN8E1goUs3qSE77ZdzIsR
    234 BzZSDFrgmZH+uP0AEQEAAYkBNgQYAQgAIBYhBEkvXzHmgOJBnwP4Wxnef3wVoM2y
    235 BQJaanc8AhsMAAoJEBnef3wVoM2yX4wIALcYZbQhSEzCsTl56UHofze6C3QuFQIH
    236 J4MIKrkTfwiHlCujv7GASGU2Vtis5YEyOoMidUVLlwnebE388MmaJYRm0fhYq6lP
    237 A3vnOCcczy1tbo846bRdv012zdUA+wY+mOITdOoUjAhYulUR0kiA2UdLSfYzbWwy
    238 7Obq96Jb/cPRxk8jKUu2rqC/KDrkFDtAtjdIHh6nbbQhFuaRuWntISZgpIJxd8Bt
    239 Gwi0imUVd9m9wZGuTbDGi6YTNk0GPpX5OMF5hjtM/objzTihSw9UN+65Y/oSQM81
    240 v//Fw6ZeY+HmRDFdirjD7wXtIuER4vqCryIqR6Xe9X8oJXz9L/Jhslc=
    241 =CDME
    242 -----END PGP PUBLIC KEY BLOCK-----
    243 ```
    244 
    245 ### Known Vulnerabilities
    246 
    247 For a list of known vulnerabilities and security advisories for TensorFlow,
    248 [click here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/index.md).
    249