Home | History | Annotate | only in /external/tensorflow/tensorflow/lite/experimental/micro
Up to higher level directory
NameDateSize
bluepill/22-Oct-2020
BUILD22-Oct-20201.7K
compatibility.h22-Oct-20201.4K
debug_log.cc22-Oct-20202.2K
debug_log.h22-Oct-20201.1K
debug_log_numbers.cc22-Oct-20206.2K
debug_log_numbers.h22-Oct-20201K
ecm3531/22-Oct-2020
examples/22-Oct-2020
kernels/22-Oct-2020
mbed/22-Oct-2020
micro_error_reporter.cc22-Oct-20202K
micro_error_reporter.h22-Oct-20201.3K
micro_error_reporter_test.cc22-Oct-20201.1K
micro_interpreter.cc22-Oct-202011K
micro_interpreter.h22-Oct-20202.7K
micro_interpreter_test.cc22-Oct-20207.5K
micro_mutable_op_resolver.cc22-Oct-20203K
micro_mutable_op_resolver.h22-Oct-20201.8K
micro_mutable_op_resolver_test.cc22-Oct-20203K
README.md22-Oct-202051.6K
riscv32_mcu/22-Oct-2020
simple_tensor_allocator.cc22-Oct-20205.7K
simple_tensor_allocator.h22-Oct-20201.9K
simple_tensor_allocator_test.cc22-Oct-20205.8K
testing/22-Oct-2020
tools/22-Oct-2020

README.md

      1 # TensorFlow Lite for Microcontrollers
      2 
      3 This an experimental port of TensorFlow Lite aimed at micro controllers and
      4 other devices with only kilobytes of memory. It doesn't require any operating
      5 system support, any standard C or C++ libraries, or dynamic memory allocation,
      6 so it's designed to be portable even to 'bare metal' systems. The core runtime
      7 fits in 16KB on a Cortex M3, and with enough operators to run a speech keyword
      8 detection model, takes up a total of 22KB.
      9 
     10 ## Table of Contents
     11 
     12 -   [Getting Started](#getting-started)
     13 
     14     *   [Getting Started with Portable Reference Code](#getting-started-with-portable-reference-code)
     15     *   [Building Portable Reference Code using Make](#building-portable-reference-code-using-make)
     16     *   [Building for the "Blue Pill" STM32F103 using Make](#building-for-the-blue-pill-stm32f103-using-make)
     17     *   [Building for "Hifive1" SiFive FE310 development board using Make](#building-for-hifive1-sifive-fe310-development-board-using-make)
     18     *   [Building for Ambiq Micro Apollo3Blue EVB using Make](#building-for-ambiq-micro-apollo3blue-evb-using-make)
     19         *   [Additional Apollo3 Instructions](#additional-apollo3-instructions)
     20     *   [Building for the Eta Compute ECM3531 EVB using Make](#Building-for-the-Eta-Compute-ECM3531-EVB-using-Make)
     21 
     22 -   [Goals](#goals)
     23 
     24 -   [Generating Project Files](#generating-project-#files)
     25 
     26 -   [How to Port TensorFlow Lite Micro to a New Platform](#how-to-port-tensorflow-lite-micro-to-a-new-platform)
     27 
     28     *   [Requirements](#requirements)
     29     *   [Getting Started](getting-started)
     30     *   [Troubleshooting](#troubleshooting)
     31     *   [Optimizing for your Platform](#optimizing-for-your-platform)
     32     *   [Code Module Organization](#code-module-organization)
     33     *   [Working with Generated Projects](#working-with-generated-projects)
     34     *   [Supporting a Platform with Makefiles](#supporting-a-platform-with-makefiles)
     35     *   [Supporting a Platform with Emulation Testing](#supporting-a-platform-with-emulation-testing)
     36     *   [Implementing More Optimizations](#implementing-more-optimizations)
     37 
     38 # Getting Started
     39 
     40 One of the challenges of embedded software development is that there are a lot
     41 of different architectures, devices, operating systems, and build systems. We
     42 aim to support as many of the popular combinations as we can, and make it as
     43 easy as possible to add support for others.
     44 
     45 If you're a product developer, we have build instructions or pre-generated
     46 project files that you can download for the following platforms:
     47 
     48 Device                                                                                         | Mbed                                                                           | Keil                                                                           | Make/GCC
     49 ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------------ | --------
     50 [STM32F746G Discovery Board](https://www.st.com/en/evaluation-tools/32f746gdiscovery.html)     | [Download](https://drive.google.com/open?id=1OtgVkytQBrEYIpJPsE8F6GUKHPBS3Xeb) | -                                                                              | [Download](https://drive.google.com/open?id=1u46mTtAMZ7Y1aD-He1u3R8AE4ZyEpnOl)
     51 ["Blue Pill" STM32F103-compatible development board](https://github.com/google/stm32_bare_lib) | -                                                                              | -                                                                              | [Instructions](#building-for-the-blue-pill-stm32f103-using-make)
     52 [Ambiq Micro Apollo3Blue EVB using Make](https://ambiqmicro.com/apollo-ultra-low-power-mcus/)  | -                                                                              | -                                                                              | [Instructions](#building-for-ambiq-micro-apollo3blue-evb-using-make)
     53 [Generic Keil uVision Projects](http://www2.keil.com/mdk5/uvision/)                            | -                                                                              | [Download](https://drive.google.com/open?id=1Lw9rsdquNKObozClLPoE5CTJLuhfh5mV) | -
     54 [Eta Compute ECM3531 EVB](https://etacompute.com/)                                             | -                                                                              | -                                                                              | [Instructions](#Building-for-the-Eta-Compute-ECM3531-EVB-using-Make)
     55 
     56 If your device is not yet supported, it may not be too hard to add support. You
     57 can learn about that process
     58 [here](#how-to-port-tensorflow-lite-micro-to-a-new-platform). We're looking
     59 forward to getting your help expanding this table!
     60 
     61 ## Getting Started with Portable Reference Code
     62 
     63 If you don't have a particular microcontroller platform in mind yet, or just
     64 want to try out the code before beginning porting, the easiest way to begin is
     65 by
     66 [downloading the platform-agnostic reference code](https://drive.google.com/open?id=1cawEQAkqquK_SO4crReDYqf_v7yAwOY8).
     67 You'll see a series of folders inside the archive, with each one containing just
     68 the source files you need to build one binary. There is a simple Makefile for
     69 each folder, but you should be able to load the files into almost any IDE and
     70 build them. There's also a [Visual Studio Code](https://code.visualstudio.com/) project file already set up, so
     71 you can easily explore the code in a cross-platform IDE.
     72 
     73 ## Building Portable Reference Code using Make
     74 
     75 It's easy to build portable reference code directly from GitHub using make if
     76 you're on a Linux or OS X machine.
     77 
     78 -   Open a terminal
     79 -   Download the TensorFlow source with `git clone
     80     https://github.com/tensorflow/tensorflow.git`
     81 -   Enter the source root directory by running `cd tensorflow`
     82 -   Download the dependencies by running
     83     `tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh`.
     84     This may take a few minutes
     85 -   Build and test the library with `make -f
     86     tensorflow/lite/experimental/micro/tools/make/Makefile test`
     87 
     88 You should see a series of compilation steps, followed by `~~~ALL TESTS
     89 PASSED~~~` for the various tests of the code that it will run. If there's an
     90 error, you should get an informative message from make about what went wrong.
     91 
     92 These tests are all built as simple binaries with few dependencies, so you can
     93 run them manually. For example, here's how to run the depthwise convolution
     94 test, and its output:
     95 
     96 ```
     97 tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test
     98 
     99 Testing SimpleTest
    100 Testing SimpleTestQuantized
    101 Testing SimpleTestRelu
    102 Testing SimpleTestReluQuantized
    103 4/4 tests passed
    104 ~ALL TESTS PASSED~~~
    105 ```
    106 
    107 Looking at the
    108 [depthwise_conv_test.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test.cc)
    109 code, you'll see a sequence that looks like this:
    110 
    111 ```
    112 ...
    113 TF_LITE_MICRO_TESTS_BEGIN
    114 
    115 TF_LITE_MICRO_TEST(SimpleTest) {
    116 ...
    117 }
    118 ...
    119 TF_LITE_MICRO_TESTS_END
    120 ```
    121 
    122 These macros work a lot like
    123 [the Google test framework](https://github.com/google/googletest), but they
    124 don't require any dependencies and just write results to stderr, rather than
    125 aborting the program. If all the tests pass, then `~~~ALL TESTS PASSED~~~` is
    126 output, and the test harness that runs the binary during the make process knows
    127 that everything ran correctly. If there's an error, the lack of the expected
    128 string lets the harness know that the test failed.
    129 
    130 So, why are we running tests in this complicated way? So far, we've been
    131 building binaries that run locally on the Mac OS or Linux machine you're
    132 building on, but this approach becomes important when we're targeting simple
    133 micro controller devices.
    134 
    135 ## Building for the "Blue Pill" STM32F103 using Make
    136 
    137 The goal of this library is to enable machine learning on resource-constrained
    138 micro controllers and DSPs, and as part of that we've targeted the
    139 ["Blue Pill" STM32F103-compatible development board](https://github.com/google/stm32_bare_lib)
    140 as a cheap and popular platform. It only has 20KB of RAM and 64KB of flash, so
    141 it's a good device to ensure we can run efficiently on small chips.
    142 
    143 It's fairly easy to
    144 [buy and wire up a physical board](https://github.com/google/stm32_bare_lib#wiring-up-your-blue-pill),
    145 but even if you don't have an actual device, the
    146 [Renode project](https://renode.io/) makes it easy to run a faithful emulation
    147 on your desktop machine. You'll need [Docker](https://www.docker.com/)
    148 installed, but once you have that set up, try running the following command:
    149 
    150 `make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=bluepill
    151 test`
    152 
    153 You should see a similar set of outputs as you did in the previous section, with
    154 the addition of some extra Docker logging messages. These are because we're
    155 using Docker to run the Renode micro controller emulation tool, and the tests
    156 themselves are being run on a simulated STM32F103 device. The communication
    157 channels between an embedded device and the host are quite limited, so the test
    158 harness looks at the output of the debug log to see if tests have passed, just
    159 as it did in the previous section. This makes it a very flexible way to run
    160 cross-platform tests, even when a platform has no operating system facilities,
    161 as long as it can output debugging text logs.
    162 
    163 To understand what's happening here, try running the same depthwise convolution
    164 test, but through the emulated device test harness, with the following command:
    165 
    166 ```
    167 tensorflow/lite/experimental/micro/testing/test_bluepill_binary.sh \
    168 tensorflow/lite/experimental/micro/tools/make/gen/bluepill_cortex-m3/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test \
    169 '~~~ALL TESTS PASSED~~~'
    170 
    171 ```
    172 
    173 You should see output that looks something like this:
    174 
    175 ```
    176 Sending build context to Docker daemon   21.5kB
    177 Step 1/2 : FROM antmicro/renode:latest
    178  ---> 1b670a243e8f
    179 Step 2/2 : LABEL maintainer="Pete Warden <petewarden (a] google.com>"
    180  ---> Using cache
    181  ---> 3afcd410846d
    182 Successfully built 3afcd410846d
    183 Successfully tagged renode_bluepill:latest
    184 LOGS:
    185 ...
    186 03:27:32.4340 [INFO] machine-0: Machine started.
    187 03:27:32.4790 [DEBUG] cpu.uartSemihosting: [+0.22s host +0s virt 0s virt from start] Testing SimpleTest
    188 03:27:32.4812 [DEBUG] cpu.uartSemihosting: [+2.21ms host +0s virt 0s virt from start]   Testing SimpleTestQuantized
    189 03:27:32.4833 [DEBUG] cpu.uartSemihosting: [+2.14ms host +0s virt 0s virt from start]   Testing SimpleTestRelu
    190 03:27:32.4834 [DEBUG] cpu.uartSemihosting: [+0.18ms host +0s virt 0s virt from start]   Testing SimpleTestReluQuantized
    191 03:27:32.4838 [DEBUG] cpu.uartSemihosting: [+0.4ms host +0s virt 0s virt from start]   4/4 tests passed
    192 03:27:32.4839 [DEBUG] cpu.uartSemihosting: [+41s host +0s virt 0s virt from start]   ~~~ALL TESTS PASSED~~~
    193 03:27:32.4839 [DEBUG] cpu.uartSemihosting: [+5s host +0s virt 0s virt from start]
    194 ...
    195 tensorflow/lite/experimental/micro/tools/make/gen/bluepill_cortex-m3/bin/tensorflow/lite/experimental/micro/kernels/depthwise_conv_test: PASS
    196 ```
    197 
    198 There's a lot of output here, but you should be able to see that the same tests
    199 that were covered when we ran locally on the development machine show up in the
    200 debug logs here, along with the magic string `~~~ALL TESTS PASSED~~~`. This is
    201 the exact same code as before, just compiled and run on the STM32F103 rather
    202 than your desktop. We hope that the simplicity of this testing approach will
    203 help make adding support for new platforms as easy as possible.
    204 
    205 ## Building for "Hifive1" SiFive FE310 development board
    206 
    207 We've targeted the
    208 ["HiFive1" Arduino-compatible development board](https://www.sifive.com/boards/hifive1)
    209 as a test platform for RISC-V MCU.
    210 
    211 Similar to Blue Pill setup, you will need Docker installed. The binary can be
    212 executed on either HiFive1 board or emulated using
    213 [Renode project](https://renode.io/) on your desktop machine.
    214 
    215 The following instructions builds and transfers the source files to the Docker
    216 `docker build -t riscv_build \ -f
    217 {PATH_TO_TENSORFLOW_ROOT_DIR}/tensorflow/lite/experimental/micro/testing/Dockerfile.riscv
    218 \ {PATH_TO_TENSORFLOW_ROOT_DIR}/tensorflow/lite/experimental/micro/testing/`
    219 
    220 You should see output that looks something like this:
    221 
    222 ```
    223 Sending build context to Docker daemon  28.16kB
    224 Step 1/4 : FROM antmicro/renode:latest
    225  ---> 19c08590e817
    226 Step 2/4 : LABEL maintainer="Pete Warden <petewarden (a] google.com>"
    227  ---> Using cache
    228  ---> 5a7770d3d3f5
    229 Step 3/4 : RUN apt-get update
    230  ---> Using cache
    231  ---> b807ab77eeb1
    232 Step 4/4 : RUN apt-get install -y curl git unzip make g++
    233  ---> Using cache
    234  ---> 8da1b2aa2438
    235 Successfully built 8da1b2aa2438
    236 Successfully tagged riscv_build:latest
    237 ```
    238 
    239 Building micro_speech_test binary
    240 
    241 -   Launch the Docker that we just created using: `docker run -it-v
    242     /tmp/copybara_out:/workspace riscv_build:latest bash`
    243 -   Enter the source root directory by running `cd /workspace`
    244 -   Download the dependencies by running
    245     `./tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh`.
    246     This may take a few minutes.
    247 -   Set the path to RISC-V tools: `export
    248     PATH=${PATH}:/workspace/tensorflow/lite/experimental/micro/tools/make/downloads/riscv_toolchain/bin/`
    249 -   Build the binary: `make -f
    250     tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=riscv32_mcu`
    251 
    252 Launching Renode to test the binary, currently this set up is not automated.
    253 
    254 -   Execute the binary on Renode: `renode -P 5000 --disable-xwt -e 's
    255     @/workspace/tensorflow/lite/experimental/micro/testing/sifive_fe310.resc'`
    256 
    257 You should see the following log with the magic string `~~~ALL TEST PASSED~~~`:
    258 
    259 ```
    260 02:25:22.2059 [DEBUG] uart0: [+17.25s host +80ms virt 80ms virt from start] core freq at 0 Hz
    261 02:25:22.2065 [DEBUG] uart0: [+0.61ms host +0s virt 80ms virt from start]   Testing TestInvoke
    262 02:25:22.4243 [DEBUG] uart0: [+0.22s host +0.2s virt 0.28s virt from start]   Ran successfully
    263 02:25:22.4244 [DEBUG] uart0: [+42s host +0s virt 0.28s virt from start]
    264 02:25:22.4245 [DEBUG] uart0: [+0.15ms host +0s virt 0.28s virt from start]   1/1 tests passed
    265 02:25:22.4247 [DEBUG] uart0: [+62s host +0s virt 0.28s virt from start]   ~~~ALL TESTS PASSED~~~
    266 02:25:22.4251 [DEBUG] uart0: [+8s host +0s virt 0.28s virt from start]
    267 02:25:22.4252 [DEBUG] uart0: [+0.39ms host +0s virt 0.28s virt from start]
    268 02:25:22.4253 [DEBUG] uart0: [+0.16ms host +0s virt 0.28s virt from start]   Progam has exited with code:0x00000000
    269 ```
    270 
    271 ## Building for Ambiq Micro Apollo3Blue EVB using Make
    272 
    273 Follow these steps to get the pushbutton yes/no example working on Apollo 3:
    274 
    275 1.  Make sure to run the "Building Portable Reference Code using Make" section
    276     before performing the following steps
    277 2.  The Ambiq Micro SDK is downloaded into
    278     `tensorflow/lite/experimental/micro/tools/make/downloads` by
    279     'download_dependencies.sh'.
    280 3.  Compile the project with the following command: make -f
    281     tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=apollo3evb
    282     pushbutton_cmsis_speech_test_bin
    283 4.  Install [Segger JLink tools](https://www.segger.com/downloads/jlink/)
    284 5.  Connect the Apollo3 EVB (with mic shield in slot 3 of Microbus Shield board)
    285     to the computer and power it on.
    286 6.  Start the GDB server in a new terminal with the following command:
    287     JLinkGDBServer -select USB -device AMA3B1KK-KBR -endian little -if SWD
    288     -speed 1000 -noir -noLocalhostOnly
    289     1.  The command has run successfully if you see the message "Waiting for GDB
    290         connection"
    291 7.  Back in the original terminal, run the program via the debugger
    292     1.  Navigate to
    293         tensorflow/lite/experimental/micro/examples/micro_speech/apollo3
    294     2.  Start gdb by entering the following command: arm-none-eabi-gdb
    295     3.  Run the command script by entering the following command: source
    296         pushbutton_cmsis_scores.cmd. This script does the following:
    297         1.  Load the binary created in step 6
    298         2.  Set a breakpoint after inference scores have been computed
    299         3.  Tell the debugger what variables should be printed out at this
    300             breakpoint
    301         4.  Begin program execution
    302         5.  Press Ctrl+c to exit
    303     4.  Press BTN2. An LED will flash for 1 second. Speak your utterance during
    304         this one second
    305     5.  The debugger will print out four numbers. They are the probabilites for
    306         1.  no speech
    307         2.  unknown speech
    308         3.  yes
    309         4.  no
    310     6.  The EVB LEDs will indicate detection.
    311         1.  LED0 (rightmost LED) - ON when capturing 1sec of audio
    312         2.  LED1 - ON when detecting silence
    313         3.  LED2 - ON when detecting UNKNOWN utterance
    314         4.  LED3 - ON when detecting YES utterance
    315         5.  LED4 (leftmost LED) - ON when detecting NO utterance
    316 
    317 ### Additional Apollo3 Instructions
    318 
    319 To flash a part with JFlash Lite, do the following: 
    320 
    321 1. At the command line: JFlashLiteExe 
    322 2. Device = AMA3B1KK-KBR 
    323 3. Interface = SWD at 1000 kHz 
    324 4. Data file = `tensorflow/lite/experimental/micro/tools/make/gen/apollo3evb_cortex-m4/bin/pushbutton_cmsis_speech_test.bin`
    325 5. Prog Addr = 0x0000C000
    326 
    327 ## Building for the Eta Compute ECM3531 EVB using Make
    328 
    329 1.  Follow the instructions at
    330     [Tensorflow Micro Speech](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/examples/micro_speech#getting-started)
    331     to down load the Tensorflow source code and the support libraries \(but do
    332     not run the make command shown there.\)
    333 2.  Download the Eta Compute SDK, version 0.0.17. Contact info (a] etacompute.com
    334 3.  You will need the the Arm compiler arm-none-eabi-gcc, version 7.3.1
    335     20180622, release ARM/embedded-7-branch revision 261907, 7-2018-q2-update.
    336     This compiler is downloaded when you run the
    337     tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh
    338     script.
    339 4.  Edit the file
    340     tensorflow/lite/experimental/micro/tools/make/targets/ecm3531_makefile.inc
    341     so that the variables ETA_SDK and GCC_ARM point to the correct directories.
    342 5.  Compile the code with the command \
    343     &nbsp;&nbsp;&nbsp;&nbsp;make -f
    344     tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=ecm3531
    345     TAGS="CMSIS" test \
    346     This will produce a set of executables in the
    347     tensorflow/lite/experimental/micro/tools/make/gen/ecm3531_cortex-m3/bin
    348     directory.
    349 6.  To load an executable into SRAM \
    350     &nbsp;&nbsp;&nbsp;&nbsp;Start ocd \
    351     &nbsp;&nbsp;&nbsp;&nbsp;cd
    352     tensorflow/lite/experimental/micro/tools/make/targets/ecm3531 \
    353     &nbsp;&nbsp;&nbsp;&nbsp;./load_program name_of_executable, for e.g.,
    354     ./load_program audio_provider_test \
    355     &nbsp;&nbsp;&nbsp;&nbsp;Start PuTTY \(Connection type = Serial, Speed =
    356     11520, Data bits = 8, Stop bits = 1, Parity = None\) \
    357     The following output should appear: \
    358     Testing TestAudioProvider \
    359     Testing TestTimer \
    360     2/2 tests passed \
    361     \~\~\~ALL TESTS PASSED\~\~\~ \
    362     Execution time \(msec\) = 7
    363 7.  To load into flash \
    364     &nbsp;&nbsp;&nbsp;&nbsp;Edit the variable ETA_LDS_FILE in
    365     tensorflow/lite/experimental/micro/tools/&nbsp;&nbsp;make/targets/ecm3531_makefile.inc
    366     to point to the ecm3531_flash.lds file \
    367     &nbsp;&nbsp;&nbsp;&nbsp;Recompile \( make -f
    368     tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=ecm3531
    369     TAGS="CMSIS" test\) \
    370     &nbsp;&nbsp;&nbsp;&nbsp;cd
    371     tensorflow/lite/experimental/micro/tools/make/targets/ecm3531 \
    372     &nbsp;&nbsp;&nbsp;&nbsp;./flash_program executable_name to load into flash.
    373 
    374 ## Goals
    375 
    376 The design goals are for the framework to be:
    377 
    378 -   **Readable**: We want embedded software engineers to be able to understand
    379     what's required to run ML inference without having to study research papers.
    380     We've tried to keep the code base small, modular, and have reference
    381     implementations of all operations to help with this.
    382 
    383 -   **Easy to modify**: We know that there are a lot of different platforms and
    384     requirements in the embedded world, and we don't expect to cover all of them
    385     in one framework. Instead, we're hoping that it can be a good starting point
    386     for developers to build on top of to meet their own needs. For example, we
    387     tried to make it easy to replace the implementations of key computational
    388     operators that are often crucial for performance, without having to touch
    389     the data flow and other runtime code. We want it to make more sense to use
    390     our workflow to handle things like model import and less-important
    391     operations, and customize the parts that matter, rather than having to
    392     reimplement everything in your own engine.
    393 
    394 -   **Well-tested**: If you're modifying code, you need to know if your changes
    395     are correct. Having an easy way to test lets you develop much faster. To
    396     help there, we've written tests for all the components, and we've made sure
    397     that the tests can be run on almost any platform, with no dependencies apart
    398     from the ability to log text to a debug console somewhere. We also provide
    399     an easy way to run all the tests on-device as part of an automated test
    400     framework, and we use qemu/Renode emulation so that tests can be run even
    401     without physical devices present.
    402 
    403 -   **Easy to integrate**: We want to be as open a system as possible, and use
    404     the best code available for each platform. To do that, we're going to rely
    405     on projects like
    406     [CMSIS-NN](https://www.keil.com/pack/doc/CMSIS/NN/html/index.html),
    407     [uTensor](https://github.com/uTensor/uTensor), and other vendor libraries to
    408     handle as much performance-critical code as possible. We know that there are
    409     an increasing number of options to accelerate neural networks on
    410     microcontrollers, so we're aiming to be a good host for deploying those
    411     hardware technologies too.
    412 
    413 -   **Compatible**: We're using the same file schema, interpreter API, and
    414     kernel interface as regular TensorFlow Lite, so we leverage the large
    415     existing set of tools, documentation, and examples for the project. The
    416     biggest barrier to deploying ML models is getting them from a training
    417     environment into a form that's easy to run inference on, so we see reusing
    418     this rich ecosystem as being crucial to being easily usable. We also hope to
    419     integrate this experimental work back into the main codebase in the future.
    420 
    421 To meet those goals, we've made some tradeoffs:
    422 
    423 -   **Simple C++**: To help with readability, our code is written in a modern
    424     version of C++, but we generally treat it as a "better C", rather relying on
    425     more complex features such as template meta-programming. As mentioned
    426     earlier, we avoid any use of dynamic memory allocation (new/delete) or the
    427     standard C/C++ libraries, so we believe this should still be fairly
    428     portable. It does mean that some older devices with C-only toolchains won't
    429     be supported, but we're hoping that the reference operator implementations
    430     (which are simple C-like functions) can still be useful in those cases. The
    431     interfaces are also designed to be C-only, so it should be possible to
    432     integrate the resulting library with pure C projects.
    433 
    434 -   **Interpreted**: Code generation is a popular pattern for embedded code,
    435     because it gives standalone code that's easy to modify and step through, but
    436     we've chosen to go with an interpreted approach. In our internal
    437     microcontroller work we've found that using an extremely stripped-down
    438     interpreter with almost no dependencies gives us a lot of the same
    439     advantages, but is easier to maintain. For example, when new updates come
    440     out for the underlying library, you can just merge your local modifications
    441     in a single step, rather than having to regenerate new code and then patch
    442     in any changes you subsequently made. The coarse granularity of the
    443     interpreted primitives means that each operation call typically takes
    444     hundreds of thousands of instruction cycles at least, so we don't see
    445     noticeable performance gains from avoiding what's essentially a single
    446     switch statement at the interpreter level to call each operation. We're
    447     still working on improving the packaging though, for example we're
    448     considering having the ability to snapshot all the source files and headers
    449     used for a particular model, being able to compile the code and data
    450     together as a library, and then access it through a minimal set of C
    451     interface calls which hide the underlying complexity.
    452 
    453 -   **Flatbuffers**: We represent our models using
    454     [the standard flatbuffer schema used by the rest of TensorFlow Lite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema.fbs),
    455     with the difference that we always keep it in read-only program memory
    456     (typically flash) rather than relying on having a file system to read it
    457     from. This is a good fit because flatbuffer's serialized format is designed
    458     to be mapped into memory without requiring any extra memory allocations or
    459     modifications to access it. All of the functions to read model values work
    460     directly on the serialized bytes, and large sections of data like weights
    461     are directly accessible as sequential C-style arrays of their data type,
    462     with no strides or unpacking needed. We do get a lot of value from using
    463     flatbuffers, but there is a cost in complexity. The flat buffer library code
    464     is all inline
    465     [inside the main headers](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/schema/schema_generated.h),
    466     but it isn't straightforward to inspect their implementations, and the model
    467     data structures aren't easy to comprehend from the debugger. The header for
    468     the schema itself also has to be periodically updated when new information
    469     is added to the file format, though we try to handle that transparently for
    470     most developers by checking in a pre-generated version.
    471 
    472 -   **Code Duplication**: Some of the code in this prototype largely duplicates
    473     the logic in other parts of the TensorFlow Lite code base, for example the
    474     operator wrappers. We've tried to keep share as much as we can between the
    475     two interpreters, but there are some assumptions built into the original
    476     runtime that make this difficult. We'll be working on modularizing the main
    477     interpreter so that we can move to an entirely shared system.
    478 
    479 This initial preview release is designed to get early feedback, and is not
    480 intended to be a final product. It only includes enough operations to run a
    481 simple keyword recognition model, and the implementations are not optimized.
    482 We're hoping this will be a good way to get feedback and collaborate to improve
    483 the framework.
    484 
    485 ## Generating Project Files
    486 
    487 It's not always easy or convenient to use a makefile-based build process,
    488 especially if you're working on a product that uses a different IDE for the rest
    489 of its code. To address that, it's possible to generate standalone project
    490 folders for various popular build systems. These projects are self-contained,
    491 with only the headers and source files needed by a particular binary, and
    492 include project files to make loading them into an IDE easy. These can be
    493 auto-generated for any target you can compile using the main Make system, using
    494 a command like this (making sure you've run `download_dependencies.sh` first):
    495 
    496 ```
    497 make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET=mbed TAGS="CMSIS disco_f746ng" generate_micro_speech_mbed_project
    498 ```
    499 
    500 This will create a folder in
    501 `tensorflow/lite/experimental/micro/tools/make/gen/mbed_cortex-m4/prj/micro_speech_main_test/mbed`
    502 that contains the source and header files, some Mbed configuration files, and a
    503 README. You should then be able to copy this directory to another machine, and
    504 use it just like any other Mbed project. There's more information about project
    505 files [below](#working-with-generated-projects).
    506 
    507 ## How to Port TensorFlow Lite Micro to a New Platform
    508 
    509 Are you a hardware or operating system provider looking to run machine learning
    510 on your platform? We're keen to help, and we've had experience helping other
    511 teams do the same thing, so here are our recommendations.
    512 
    513 ### Requirements
    514 
    515 Since the core neural network operations are pure arithmetic, and don't require
    516 any I/O or other system-specific functionality, the code doesn't have to have
    517 many dependencies. We've tried to enforce this, so that it's as easy as possible
    518 to get TensorFlow Lite Micro running even on 'bare metal' systems without an OS.
    519 Here are the core requirements that a platform needs to run the framework:
    520 
    521 -   C/C++ compiler capable of C++11 compatibility. This is probably the most
    522     restrictive of the requirements, since C++11 is not as widely adopted in the
    523     embedded world as it is elsewhere. We made the decision to require it since
    524     one of the main goals of TFL Micro is to share as much code as possible with
    525     the wider TensorFlow codebase, and since that relies on C++11 features, we
    526     need compatibility to achieve it. We only use a small, sane, subset of C++
    527     though, so don't worry about having to deal with template metaprogramming or
    528     similar challenges!
    529 
    530 -   Debug logging. The core network operations don't need any I/O functions, but
    531     to be able to run tests and tell if they've worked as expected, the
    532     framework needs some way to write out a string to some kind of debug
    533     console. This will vary from system to system, for example on Linux it could
    534     just be `fprintf(stderr, debug_string)` whereas an embedded device might
    535     write the string out to a specified UART. As long as there's some mechanism
    536     for outputting debug strings, you should be able to use TFL Micro on that
    537     platform.
    538 
    539 -   Math library. The C standard `libm.a` library is needed to handle some of
    540     the mathematical operations used to calculate neural network results.
    541 
    542 -   Global variable initialization. We do use a pattern of relying on global
    543     variables being set before `main()` is run in some places, so you'll need to
    544     make sure your compiler toolchain
    545 
    546 And that's it! You may be wondering about some other common requirements that
    547 are needed by a lot of non-embedded software, so here's a brief list of things
    548 that aren't necessary to get started with TFL Micro on a new platform:
    549 
    550 -   Operating system. Since the only platform-specific function we need is
    551     `DebugLog()`, there's no requirement for any kind of Posix or similar
    552     functionality around files, processes, or threads.
    553 
    554 -   C or C++ standard libraries. The framework tries to avoid relying on any
    555     standard library functions that require linker-time support. This includes
    556     things like string functions, but still allows us to use headers like
    557     `stdtypes.h` which typically just define constants and typedefs.
    558     Unfortunately this distinction isn't officially defined by any standard, so
    559     it's possible that different toolchains may decide to require linked code
    560     even for the subset we use, but in practice we've found it's usually a
    561     pretty obvious decision and stable over platforms and toolchains.
    562 
    563 -   Dynamic memory allocation. All the TFL Micro code avoids dynamic memory
    564     allocation, instead relying on local variables on the stack in most cases,
    565     or global variables for a few situations. These are all fixed-size, which
    566     can mean some compile-time configuration to ensure there's enough space for
    567     particular networks, but does avoid any need for a heap and the
    568     implementation of `malloc\new` on a platform.
    569 
    570 -   Floating point. Eight-bit integer arithmetic is enough for inference on many
    571     networks, so if a model sticks to these kind of quantized operations, no
    572     floating point instructions should be required or executed by the framework.
    573 
    574 ### Getting Started
    575 
    576 We recommend that you start trying to compile and run one of the simplest tests
    577 in the framework as your first step. The full TensorFlow codebase can seem
    578 overwhelming to work with at first, so instead you can begin with a collection
    579 of self-contained project folders that only include the source files needed for
    580 a particular test or executable. You can find a set of pre-generated projects
    581 [here](https://drive.google.com/open?id=1cawEQAkqquK_SO4crReDYqf_v7yAwOY8).
    582 
    583 As mentioned above, the one function you will need to implement for a completely
    584 new platform is debug logging. If your device is just a variation on an existing
    585 platform you may be able to reuse code that's already been written. To
    586 understand what's available, begin with the default reference implementation at
    587 [tensorflow/lite/experimental/micro/debug_log.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/debug_log.cc]),
    588 which uses fprintf and stderr. If your platform has this level of support for
    589 the C standard library in its toolchain, then you can just reuse this.
    590 Otherwise, you'll need to do some research into how your platform and device can
    591 communicate logging statements to the outside world. As another example, take a
    592 look at
    593 [the Mbed version of `DebugLog()`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/mbed/debug_log.cc),
    594 which creates a UART object and uses it to output strings to the host's console
    595 if it's connected.
    596 
    597 Begin by navigating to the micro_error_reporter_test folder in the pregenerated
    598 projects you downloaded. Inside here, you'll see a set of folders containing all
    599 the source code you need. If you look through them, you should find a total of
    600 around 60 C or C++ files that compiled together will create the test executable.
    601 There's an example makefile in the directory that lists all of the source files
    602 and include paths for the headers. If you're building on a Linux or MacOS host
    603 system, you may just be able to reuse that same makefile to cross-compile for
    604 your system, as long as you swap out the `CC` and `CXX` variables from their
    605 defaults, to point to your cross compiler instead (for example
    606 `arm-none-eabi-gcc` or `riscv64-unknown-elf-gcc`). Otherwise, set up a project
    607 in the build system you are using. It should hopefully be fairly
    608 straightforward, since all of the source files in the folder need to be
    609 compiled, so on many IDEs you can just drag the whole lot in. Then you need to
    610 make sure that C++11 compatibility is turned on, and that the right include
    611 paths (as mentioned in the makefile) have been added.
    612 
    613 You'll see the default `DebugLog()` implementation in
    614 'tensorflow/lite/experimental/micro/debug_log.cc' inside the
    615 micro_error_reporter_test folder. Modify that file to add the right
    616 implementation for your platform, and then you should be able to build the set
    617 of files into an executable. Transfer that executable to your target device (for
    618 example by flashing it), and then try running it. You should see output that
    619 looks something like this:
    620 
    621 ```
    622 Number: 42
    623 Badly-formed format string
    624 Another  badly-formed  format string
    625 ~~ALL TESTS PASSED~~~
    626 ```
    627 
    628 If not, you'll need to debug what went wrong, but hopefully with this small
    629 starting project it should be manageable.
    630 
    631 ### Troubleshooting
    632 
    633 When we've been porting to new platforms, it's often been hard to figure out
    634 some of the fundamentals like linker settings and other toolchain setup flags.
    635 If you are having trouble, see if you can find a simple example program for your
    636 platform, like one that just blinks an LED. If you're able to build and run that
    637 successfully, then start to swap in parts of the TF Lite Micro codebase to that
    638 working project, taking it a step at a time and ensuring it's still working
    639 after every change. For example, a first step might be to paste in your
    640 `DebugLog()` implementation and call `DebugLog("Hello World!")` from the main
    641 function.
    642 
    643 Another common problem on embedded platforms is the stack size being too small.
    644 Mbed defaults to 4KB for the main thread's stack, which is too small for most
    645 models since TensorFlow Lite allocates buffers and other data structures that
    646 require more memory. The exact size will depend on which model you're running,
    647 but try increasing it if you are running into strange corruption issues that
    648 might be related to stack overwriting.
    649 
    650 ### Optimizing for your Platform
    651 
    652 The default reference implementations in TensorFlow Lite Micro are written to be
    653 portable and easy to understand, not fast, so you'll want to replace performance
    654 critical parts of the code with versions specifically tailored to your
    655 architecture. The framework has been designed with this in mind, and we hope the
    656 combination of small modules and many tests makes it as straightforward as
    657 possible to swap in your own code a piece at a time, ensuring you have a working
    658 version at every step. To write specialized implementations for a platform, it's
    659 useful to understand how optional components are handled inside the build
    660 system.
    661 
    662 ### Code Module Organization
    663 
    664 We have adopted a system of small modules with platform-specific implementations
    665 to help with portability. Every module is just a standard `.h` header file
    666 containing the interface (either functions or a class), with an accompanying
    667 reference implementation in a `.cc` with the same name. The source file
    668 implements all of the code that's declared in the header. If you have a
    669 specialized implementation, you can create a folder in the same directory as the
    670 header and reference source, name it after your platform, and put your
    671 implementation in a `.cc` file inside that folder. We've already seen one
    672 example of this, where the Mbed and Bluepill versions of `DebugLog()` are inside
    673 [mbed](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/mbed)
    674 and
    675 [bluepill](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/bluepill)
    676 folders, children of the
    677 [same directory](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro)
    678 where the stdio-based
    679 [`debug_log.cc`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/micro/debug_log.cc)
    680 reference implementation is found.
    681 
    682 The advantage of this approach is that we can automatically pick specialized
    683 implementations based on the current build target, without having to manually
    684 edit build files for every new platform. It allows incremental optimizations
    685 from a always-working foundation, without cluttering the reference
    686 implementations with a lot of variants.
    687 
    688 To see why we're doing this, it's worth looking at the alternatives. TensorFlow
    689 Lite has traditionally used preprocessor macros to separate out some
    690 platform-specific code within particular files, for example:
    691 
    692 ```
    693 #ifndef USE_NEON
    694 #if defined(__ARM_NEON__) || defined(__ARM_NEON)
    695 #define USE_NEON
    696 #include <arm_neon.h>
    697 #endif
    698 ```
    699 
    700 Theres also a tradition in gemmlowp of using file suffixes to indicate
    701 platform-specific versions of particular headers, with kernel_neon.h being
    702 included by kernel.h if `USE_NEON` is defined. As a third variation, kernels are
    703 separated out using a directory structure, with
    704 tensorflow/lite/kernels/internal/reference containing portable implementations,
    705 and tensorflow/lite/kernels/internal/optimized holding versions optimized for
    706 NEON on Arm platforms.
    707 
    708 These approaches are hard to extend to multiple platforms. Using macros means
    709 that platform-specific code is scattered throughout files in a hard-to-find way,
    710 and can make following the control flow difficult since you need to understand
    711 the macro state to trace it. For example, I temporarily introduced a bug that
    712 disabled NEON optimizations for some kernels when I removed
    713 tensorflow/lite/kernels/internal/common.h from their includes, without realizing
    714 it was where USE_NEON was defined!
    715 
    716 Its also tough to port to different build systems, since figuring out the right
    717 combination of macros to use can be hard, especially since some of them are
    718 automatically defined by the compiler, and others are only set by build scripts,
    719 often across multiple rules.
    720 
    721 The approach we are using extends the file system approach that we use for
    722 kernel implementations, but with some specific conventions:
    723 
    724 -   For each module in TensorFlow Lite, there will be a parent directory that
    725     contains tests, interface headers used by other modules, and portable
    726     implementations of each part.
    727 -   Portable means that the code doesnt include code from any libraries except
    728     flatbuffers, or other TF Lite modules. You can include a limited subset of
    729     standard C or C++ headers, but you cant use any functions that require
    730     linking against those libraries, including fprintf, etc. You can link
    731     against functions in the standard math library, in <math.h>.
    732 -   Specialized implementations are held inside subfolders of the parent
    733     directory, named after the platform or library that they depend on. So, for
    734     example if you had my_module/foo.cc, a version that used RISC-V extensions
    735     would live in my_module/riscv/foo.cc. If you had a version that used the
    736     CMSIS library, it should be in my_module/cmsis/foo.cc.
    737 -   These specialized implementations should completely replace the top-level
    738     implementations. If this involves too much code duplication, the top-level
    739     implementation should be split into smaller files, so only the
    740     platform-specific code needs to be replaced.
    741 -   There is a convention about how build systems pick the right implementation
    742     file. There will be an ordered list of 'tags' defining the preferred
    743     implementations, and to generate the right list of source files, each module
    744     will be examined in turn. If a subfolder with a tags name contains a .cc
    745     file with the same base name as one in the parent folder, then it will
    746     replace the parent folders version in the list of build files. If there are
    747     multiple subfolders with matching tags and file names, then the tag thats
    748     latest in the ordered list will be chosen. This allows us to express Id
    749     like generically-optimized fixed point if its available, but Id prefer
    750     something using the CMSIS library using the list 'fixed_point cmsis'. These
    751     tags are passed in as `TAGS="<foo>"` on the command line when you use the
    752     main Makefile to build.
    753 -   There is an implicit reference tag at the start of every list, so that
    754     its possible to support directory structures like the current
    755     tensorflow/kernels/internal where portable implementations are held in a
    756     reference folder thats a sibling to the NEON-optimized folder.
    757 -   The headers for each unit in a module should remain platform-agnostic, and
    758     be the same for all implementations. Private headers inside a sub-folder can
    759     be used as needed, but shouldnt be referred to by any portable code at the
    760     top level.
    761 -   Tests should be at the parent level, with no platform-specific code.
    762 -   No platform-specific macros or #ifdefs should be used in any portable code.
    763 
    764 The implementation of these rules is handled inside the Makefile, with a
    765 [`specialize` function](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/tools/make/helper_functions.inc#L42)
    766 that takes a list of reference source file paths as an input, and returns the
    767 equivalent list with specialized versions of those files swapped in if they
    768 exist.
    769 
    770 ### Working with Generated Projects
    771 
    772 So far, I've recommended that you use the standalone generated projects for your
    773 system. You might be wondering why you're not just checking out the full
    774 [TensorFlow codebase from GitHub](https://github.com/tensorflow/tensorflow/)?
    775 The main reason is that there is a lot more diversity of architectures, IDEs,
    776 support libraries, and operating systems in the embedded world. Many of the
    777 toolchains require their own copy of source files, or a list of sources to be
    778 written to a project file. When a developer working on TensorFlow adds a new
    779 source file or changes its location, we can't expect her to update multiple
    780 different project files, many of which she may not have the right software to
    781 verify the change was correct. That means we have to rely on a central listing
    782 of source files (which in our case is held in the makefile), and then call a
    783 tool to generate other project files from those. We could ask embedded
    784 developers to do this process themselves after downloading the main source, but
    785 running the makefile requires a Linux system which may not be available, takes
    786 time, and involves downloading a lot of dependencies. That is why we've opted to
    787 make regular snapshots of the results of generating these projects for popular
    788 IDEs and platforms, so that embedded developers have a fast and friendly way to
    789 start using TensorFlow Lite for Microcontrollers.
    790 
    791 This does have the disadvantage that you're no longer working directly on the
    792 main repository, instead you have a copy that's outside of source control. We've
    793 tried to make the copy as similar to the main repo as possible, for example by
    794 keeping the paths of all source files the same, and ensuring that there are no
    795 changes between the copied files and the originals, but it still makes it
    796 tougher to sync as the main repository is updated. There are also multiple
    797 copies of the source tree, one for each target, so any change you make to one
    798 copy has to be manually propagated across all the other projects you care about.
    799 This doesn't matter so much if you're just using the projects as they are to
    800 build products, but if you want to support a new platform and have the changes
    801 reflected in the main code base, you'll have to do some extra work.
    802 
    803 As an example, think about the `DebugLog()` implementation we discussed adding
    804 for a new platform earlier. At this point, you have a new version of
    805 `debug_log.cc` that does what's required, but how can you share that with the
    806 wider community? The first step is to pick a tag name for your platform. This
    807 can either be the operating system (for example 'mbed'), the name of a device
    808 ('bluepill'), or some other text that describes it. This should be a short
    809 string with no spaces or special characters. Log in or create an account on
    810 GitHub, fork the full
    811 [TensorFlow codebase](https://github.com/tensorflow/tensorflow/) using the
    812 'Fork' button on the top left, and then grab your fork by using a command like
    813 `git clone https://github.com/<your user name>/tensorflow`.
    814 
    815 You'll either need Linux, MacOS, or Windows with something like CygWin installed
    816 to run the next steps, since they involve building a makefile. Run the following
    817 commands from a terminal, inside the root of the source folder:
    818 
    819 ```
    820 tensorflow/lite/experimental/micro/tools/make/download_dependencies.sh
    821 make -f tensorflow/lite/experimental/micro/tools/make/Makefile generate_projects
    822 ```
    823 
    824 This will take a few minutes, since it has to download some large toolchains for
    825 the dependencies. Once it has finished, you should see some folders created
    826 inside a path like
    827 `tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/`. The exact
    828 path depends on your host operating system, but you should be able to figure it
    829 out from all the copy commands. These folders contain the generated project and
    830 source files, with
    831 `tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/keil`
    832 containing the Keil uVision targets,
    833 `tensorflow/lite/experimental/micro/tools/make/gen/linux_x86_64/prj/mbed` with
    834 the Mbed versions, and so on.
    835 
    836 If you've got this far, you've successfully set up the project generation flow.
    837 Now you need to add your specialized implementation of `DebugLog()`. Start by
    838 creating a folder inside `tensorflow/lite/experimental/micro/` named after the
    839 tag you picked earlier. Put your `debug_log.cc` file inside this folder, and
    840 then run this command, with '<your tag>' replaced by the actual folder name:
    841 
    842 ```
    843 make -f tensorflow/lite/experimental/micro/tools/make/Makefile TAGS="<your tag>" generate_projects
    844 ```
    845 
    846 If your tag name actually refers to a whole target architecture, then you'll use
    847 TARGET or TARGET_ARCH instead. For example, here's how a simple RISC-V set of
    848 projects is generated:
    849 
    850 ```
    851 make -f tensorflow/lite/experimental/micro/tools/make/Makefile TARGET="riscv32_mcu" generate_projects
    852 ```
    853 
    854 The way it works is the same as TAGS though, it just looks for specialized
    855 implementations with the same containing folder name.
    856 
    857 If you look inside the projects that have been created, you should see that the
    858 default `DebugLog()` implementation is no longer present at
    859 `tensorflow/lite/experimental/micro/debug_log.cc`, and instead
    860 `tensorflow/lite/experimental/micro/<your tag>/debug_log.cc` is being used. Copy
    861 over the generated project files and try building them in your own IDE. If
    862 everything works, then you're ready to submit your change.
    863 
    864 To do this, run something like:
    865 
    866 ```
    867 git add tensorflow/lite/experimental/micro/<your tag>/debug_log.cc
    868 git commit -a -m "Added DebugLog() support for <your platform>"
    869 git push origin master
    870 ```
    871 
    872 Then go back to https://github.com/<your account>/tensorflow, and choose "New
    873 Pull Request" near the top. You should then be able to go through the standard
    874 TensorFlow PR process to get your change added to the main repository, and
    875 available to the rest of the community!
    876 
    877 ### Supporting a Platform with Makefiles
    878 
    879 The changes you've made so far will enable other developers using the generated
    880 projects to use your platform, but TensorFlow's continuous integration process
    881 uses makefiles to build frequently and ensure changes haven't broken the build
    882 process for different systems. If you are able to convert your build procedure
    883 into something that can be expressed by a makefile, then we can integrate your
    884 platform into our CI builds and make sure it continues to work.
    885 
    886 Fully describing how to do this is beyond the scope of this documentation, but
    887 the biggest needs are:
    888 
    889 -   A command-line compiler that can be called for every source file.
    890 -   A list of the arguments to pass into the compiler to build and link all
    891     files.
    892 -   The correct linker map files and startup assembler to ensure `main()` gets
    893     called.
    894 
    895 ### Supporting a Platform with Emulation Testing
    896 
    897 Integrating your platform into the makefile process should help us make sure
    898 that it continues to build, but it doesn't guarantee that the results of the
    899 build process will run correctly. Running tests is something we require to be
    900 able to say that TensorFlow officially supports a platform, since otherwise we
    901 can't guarantee that users will have a good experience when they try using it.
    902 Since physically maintaining a full set of all supported hardware devices isn't
    903 feasible, we rely on software emulation to run these tests. A good example is
    904 our
    905 [STM32F4 'Bluepill' support](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/testing/test_bluepill_binary.sh),
    906 which uses [Docker](https://www.docker.com/) and [Renode](https://renode.io/) to
    907 run built binaries in an emulator. You can use whatever technologies you want,
    908 the only requirements are that they capture the debug log output of the tests
    909 being run in the emulator, and parse them for the string that indicates the test
    910 was successful. These scripts need to run on Ubuntu 18.04, in a bash
    911 environment, though Docker is available if you need to install extra software or
    912 have other dependencies.
    913 
    914 ### Implementing More Optimizations
    915 
    916 Clearly, getting debug logging support is only the beginning of the work you'll
    917 need to do on a particular platform. It's very likely that you'll want to
    918 optimize the core deep learning operations that take up the most time when
    919 running models you care about. The good news is that the process for providing
    920 optimized implementations is the same as the one you just went through to
    921 provide your own logging. You'll need to identify parts of the code that are
    922 bottlenecks, and then add specialized implementations in their own folders.
    923 These don't need to be platform specific, they can also be broken out by which
    924 library they rely on for example. [Here's where we do that for the CMSIS
    925 implementation of integer fast-fourier
    926 transforms](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/CMSIS/preprocessor.cc).
    927 This more complex case shows that you can also add helper source files alongside
    928 the main implementation, as long as you
    929 [mention them in the platform-specific makefile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/experimental/micro/examples/micro_speech/CMSIS/Makefile.inc).
    930 You can also do things like update the list of libraries that need to be linked
    931 in, or add include paths to required headers.
    932