1 # Using TensorFlow Securely 2 3 This document discusses how tosafely deal with untrusted programs (models or 4 model parameters), and input data. Below, we also provide guidelines on how to 5 report vulnerabilities in TensorFlow. 6 7 ## TensorFlow models are programs 8 9 TensorFlow's runtime system interprets and executes programs. What machine 10 learning practitioners term 11 [**models**](https://developers.google.com/machine-learning/glossary/#model) are 12 expressed as programs that TensorFlow executes. TensorFlow programs are encoded 13 as computation 14 [**graphs**](https://developers.google.com/machine-learning/glossary/#graph). 15 The model's parameters are often stored separately in **checkpoints**. 16 17 At runtime, TensorFlow executes the computation graph using the parameters 18 provided. Note that the behavior of the computation graph may change 19 depending on the parameters provided. TensorFlow itself is not a sandbox. When 20 executing the computation graph, TensorFlow may read and write files, send and 21 receive data over the network, and even spawn additional processes. All these 22 tasks are performed with the permissions of the TensorFlow process. Allowing 23 for this flexibility makes for a powerful machine learning platform, 24 but it has implications for security. 25 26 The computation graph may also accept **inputs**. Those inputs are the 27 data you supply to TensorFlow to train a model, or to use a model to run 28 inference on the data. 29 30 **TensorFlow models are programs, and need to be treated as such from a security 31 perspective.** 32 33 ## Running untrusted models 34 35 As a general rule: **Always** execute untrusted models inside a sandbox (e.g., 36 [nsjail](https://github.com/google/nsjail)). 37 38 There are several ways in which a model could become untrusted. Obviously, if an 39 untrusted party supplies TensorFlow kernels, arbitrary code may be executed. 40 The same is true if the untrusted party provides Python code, such as the 41 Python code that generates TensorFlow graphs. 42 43 Even if the untrusted party only supplies the serialized computation 44 graph (in form of a `GraphDef`, `SavedModel`, or equivalent on-disk format), the 45 set of computation primitives available to TensorFlow is powerful enough that 46 you should assume that the TensorFlow process effectively executes arbitrary 47 code. One common solution is to whitelist only a few safe Ops. While this is 48 possible in theory, we still recommend you sandbox the execution. 49 50 It depends on the computation graph whether a user provided checkpoint is safe. 51 It is easily possible to create computation graphs in which malicious 52 checkpoints can trigger unsafe behavior. For example, consider a graph that 53 contains a `tf.cond` depending on the value of a `tf.Variable`. One branch of 54 the `tf.cond` is harmless, but the other is unsafe. Since the `tf.Variable` is 55 stored in the checkpoint, whoever provides the checkpoint now has the ability to 56 trigger unsafe behavior, even though the graph is not under their control. 57 58 In other words, graphs can contain vulnerabilities of their own. To allow users 59 to provide checkpoints to a model you run on their behalf (e.g., in order to 60 compare model quality for a fixed model architecture), you must carefully audit 61 your model, and we recommend you run the TensorFlow process in a sandbox. 62 63 ## Accepting untrustedInputs 64 65 It is possible to write models that are secure in a sense that they can safely 66 process untrusted inputs assuming there are no bugs. There are two main reasons 67 to not rely on this: first, it is easy to write models which must not be exposed 68 to untrusted inputs, andsecond, there are bugs in any software system of 69 sufficient complexity. Letting users control inputs could allow them to trigger 70 bugs either in TensorFlow or in dependent libraries. 71 72 In general, it is good practice to isolate parts of any system which is exposed 73 to untrusted (e.g., user-provided) inputs in a sandbox. 74 75 A useful analogy to how any TensorFlow graph is executed is any interpreted 76 programming language, such as Python. While it is possible to write secure 77 Python code which can be exposed to user supplied inputs (by, e.g., carefully 78 quoting and sanitizinginput strings, size-checking input blobs, etc.), it is 79 very easy to write Python programs which are insecure. Even secure Python code 80 could be rendered insecure by a bug in the Python interpreter, or in a bug in a 81 Python library used (e.g., 82 [this one](https://www.cvedetails.com/cve/CVE-2017-12852/)). 83 84 ## Running a TensorFlow server 85 86 TensorFlow is a platform for distributed computing, and as such there is a 87 TensorFlow server (`tf.train.Server`). **The TensorFlow server is meant for 88 internal communication only. It is not built for use in an untrusted network.** 89 90 For performance reasons, the default TensorFlow server does not include any 91 authorization protocol and sends messages unencrypted. It accepts connections 92 from anywhere, and executes the graphs it is sent without performing any checks. 93 Therefore, if you run a `tf.train.Server` in your network, anybody with 94 access to the network can execute what you should consider arbitrary code with 95 the privileges of the process running the `tf.train.Server`. 96 97 When running distributed TensorFlow, you must isolate the network in which the 98 cluster lives. Cloud providers provide instructions for setting up isolated 99 networks, which are sometimes branded as "virtual private cloud." Refer to the 100 instructions for 101 [GCP](https://cloud.google.com/compute/docs/networks-and-firewalls) and 102 [AWS](https://aws.amazon.com/vpc/)) for details. 103 104 Note that `tf.train.Server` is different from the server created by 105 `tensorflow/serving` (the default binary for which is called `ModelServer`). 106 By default, `ModelServer` also has no built-in mechanism for authentication. 107 Connecting it to an untrusted network allows anyone on this network to run the 108 graphs known to the `ModelServer`. This means that an attacker may run 109 graphs using untrusted inputs as described above, but they would not be able to 110 execute arbitrary graphs. It is possible to safely expose a `ModelServer` 111 directly to an untrusted network, **but only if the graphs it is configured to 112 use have been carefully audited to be safe**. 113 114 Similar to best practices for other servers, we recommend running any 115 `ModelServer` with appropriate privileges (i.e., using a separate user with 116 reduced permissions). In the spirit of defense in depth, we recommend 117 authenticating requests to any TensorFlow server connected to an untrusted 118 network, as well as sandboxing the server to minimize the adverse effects of 119 any breach. 120 121 ## Vulnerabilities in TensorFlow 122 123 TensorFlow is a large and complex system. It also depends on a large set of 124 third party libraries (e.g., `numpy`, `libjpeg-turbo`, PNG parsers, `protobuf`). 125 It is possible that TensorFlow or its dependent libraries contain 126 vulnerabilities that would allow triggering unexpected or dangerous behavior 127 with specially crafted inputs. 128 129 ### What is a vulnerability? 130 131 Given TensorFlow's flexibility, it is possible to specify computation graphs 132 which exhibit unexpected or unwanted behavior. The fact that TensorFlow models 133 can perform arbitrary computations means that they may read and write files, 134 communicate via the network, produce deadlocks and infinite loops, or run out 135 of memory. It is only when these behaviors are outside the specifications of the 136 operations involved that such behavior is a vulnerability. 137 138 A `FileWriter` writing a file is not unexpected behavior and therefore is not a 139 vulnerability in TensorFlow. A `MatMul` allowing arbitrary binary code execution 140 **is** a vulnerability. 141 142 This is more subtle from a system perspective. For example, it is easy to cause 143 a TensorFlow process to try to allocate more memory than available by specifying 144 a computation graph containing an ill-considered `tf.tile` operation. TensorFlow 145 should exit cleanly in this case (it would raise an exception in Python, or 146 return an error `Status` in C++). However, if the surrounding system is not 147 expecting the possibility, such behavior could be used in a denial of service 148 attack (or worse). Because TensorFlow behaves correctly, this is not a 149 vulnerability in TensorFlow (although it would be a vulnerability of this 150 hypothetical system). 151 152 As a general rule, it is incorrect behavior for Tensorflow to access memory it 153 does not own, or to terminate in an unclean way. Bugs in TensorFlow that lead to 154 such behaviors constitute a vulnerability. 155 156 One of the most critical parts of any system is input handling. If malicious 157 input can trigger side effects or incorrect behavior, this is a bug, and likely 158 a vulnerability. 159 160 ### Reporting vulnerabilities 161 162 Please email reports about any security related issues you find to 163 `security (a] tensorflow.org`. This mail is delivered to a small security team. Your 164 email will be acknowledged within one business day, and you'll receive a more 165 detailed response to your email within 7 days indicating the next steps in 166 handling your report. For critical problems, you may encrypt your report (see 167 below). 168 169 Please use a descriptive subject line for your report email. After the initial 170 reply to your report, the security team will endeavor to keep you informed of 171 the progress being made towards a fix and announcement. 172 173 In addition, please include the following information along with your report: 174 175 * Your name and affiliation (if any). 176 * A description of the technical details of the vulnerabilities. It is very 177 important to let us know how we can reproduce your findings. 178 * An explanation who can exploit this vulnerability, and what they gain when 179 doing so -- write an attack scenario. This will help us evaluate your report 180 quickly, especially if the issue is complex. 181 * Whether this vulnerability public or known to third parties. If it is, please 182 provide details. 183 184 If you believe that an existing (public) issue is security-related, please send 185 an email to `security (a] tensorflow.org`. The email should include the issue ID and 186 a short description of why it should be handled according to this security 187 policy. 188 189 Once an issue is reported, TensorFlow uses the following disclosure process: 190 191 * When a report is received, we confirm the issue and determine its severity. 192 * If we know of specific third-party services or software based on TensorFlow 193 that require mitigation before publication, those projects will be notified. 194 * An advisory is prepared (but not published) which details the problem and 195 steps for mitigation. 196 * Wherever possible, fixes are prepared for the last minor release of the two 197 latest major releases, as well as the master branch. We will attempt to 198 commit these fixes as soon as possible, and as close together as 199 possible. 200 * Patch releases are published for all fixed released versions, a 201 notification is sent to discuss (a] tensorflow.org, and the advisory is published. 202 203 Past security advisories are listed below. We credit reporters for identifying 204 security issues, although we keep your name confidential if you request it. 205 206 #### Encryption key for `security (a] tensorflow.org` 207 208 If your disclosure is extremely sensitive, you may choose to encrypt your 209 report using the key below. Please only use this for critical security 210 reports. 211 212 ``` 213 -----BEGIN PGP PUBLIC KEY BLOCK----- 214 215 mQENBFpqdzwBCADTeAHLNEe9Vm77AxhmGP+CdjlY84O6DouOCDSq00zFYdIU/7aI 216 LjYwhEmDEvLnRCYeFGdIHVtW9YrVktqYE9HXVQC7nULU6U6cvkQbwHCdrjaDaylP 217 aJUXkNrrxibhx9YYdy465CfusAaZ0aM+T9DpcZg98SmsSml/HAiiY4mbg/yNVdPs 218 SEp/Ui4zdIBNNs6at2gGZrd4qWhdM0MqGJlehqdeUKRICE/mdedXwsWLM8AfEA0e 219 OeTVhZ+EtYCypiF4fVl/NsqJ/zhBJpCx/1FBI1Uf/lu2TE4eOS1FgmIqb2j4T+jY 220 e+4C8kGB405PAC0n50YpOrOs6k7fiQDjYmbNABEBAAG0LVRlbnNvckZsb3cgU2Vj 221 dXJpdHkgPHNlY3VyaXR5QHRlbnNvcmZsb3cub3JnPokBTgQTAQgAOBYhBEkvXzHm 222 gOJBnwP4Wxnef3wVoM2yBQJaanc8AhsDBQsJCAcCBhUKCQgLAgQWAgMBAh4BAheA 223 AAoJEBnef3wVoM2yNlkIAICqetv33MD9W6mPAXH3eon+KJoeHQHYOuwWfYkUF6CC 224 o+X2dlPqBSqMG3bFuTrrcwjr9w1V8HkNuzzOJvCm1CJVKaxMzPuXhBq5+DeT67+a 225 T/wK1L2R1bF0gs7Pp40W3np8iAFEh8sgqtxXvLGJLGDZ1Lnfdprg3HciqaVAiTum 226 HBFwszszZZ1wAnKJs5KVteFN7GSSng3qBcj0E0ql2nPGEqCVh+6RG/TU5C8gEsEf 227 3DX768M4okmFDKTzLNBm+l08kkBFt+P43rNK8dyC4PXk7yJa93SmS/dlK6DZ16Yw 228 2FS1StiZSVqygTW59rM5XNwdhKVXy2mf/RtNSr84gSi5AQ0EWmp3PAEIALInfBLR 229 N6fAUGPFj+K3za3PeD0fWDijlC9f4Ety/icwWPkOBdYVBn0atzI21thPRbfuUxfe 230 zr76xNNrtRRlbDSAChA1J5T86EflowcQor8dNC6fS+oHFCGeUjfEAm16P6mGTo0p 231 osdG2XnnTHOOEFbEUeWOwR/zT0QRaGGknoy2pc4doWcJptqJIdTl1K8xyBieik/b 232 nSoClqQdZJa4XA3H9G+F4NmoZGEguC5GGb2P9NHYAJ3MLHBHywZip8g9oojIwda+ 233 OCLL4UPEZ89cl0EyhXM0nIAmGn3Chdjfu3ebF0SeuToGN8E1goUs3qSE77ZdzIsR 234 BzZSDFrgmZH+uP0AEQEAAYkBNgQYAQgAIBYhBEkvXzHmgOJBnwP4Wxnef3wVoM2y 235 BQJaanc8AhsMAAoJEBnef3wVoM2yX4wIALcYZbQhSEzCsTl56UHofze6C3QuFQIH 236 J4MIKrkTfwiHlCujv7GASGU2Vtis5YEyOoMidUVLlwnebE388MmaJYRm0fhYq6lP 237 A3vnOCcczy1tbo846bRdv012zdUA+wY+mOITdOoUjAhYulUR0kiA2UdLSfYzbWwy 238 7Obq96Jb/cPRxk8jKUu2rqC/KDrkFDtAtjdIHh6nbbQhFuaRuWntISZgpIJxd8Bt 239 Gwi0imUVd9m9wZGuTbDGi6YTNk0GPpX5OMF5hjtM/objzTihSw9UN+65Y/oSQM81 240 v//Fw6ZeY+HmRDFdirjD7wXtIuER4vqCryIqR6Xe9X8oJXz9L/Jhslc= 241 =CDME 242 -----END PGP PUBLIC KEY BLOCK----- 243 ``` 244 245 ### Known Vulnerabilities 246 247 For a list of known vulnerabilities and security advisories for TensorFlow, 248 [click here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/security/index.md). 249