Home | History | Annotate | Download | only in articles
      1 <!--{
      2 	"Title": "Data Race Detector",
      3 	"Template": true
      4 }-->
      5 
      6 <h2 id="Introduction">Introduction</h2>
      7 
      8 <p>
      9 Data races are among the most common and hardest to debug types of bugs in concurrent systems.
     10 A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write.
     11 See the <a href="/ref/mem/">The Go Memory Model</a> for details.
     12 </p>
     13 
     14 <p>
     15 Here is an example of a data race that can lead to crashes and memory corruption:
     16 </p>
     17 
     18 <pre>
     19 func main() {
     20 	c := make(chan bool)
     21 	m := make(map[string]string)
     22 	go func() {
     23 		m["1"] = "a" // First conflicting access.
     24 		c &lt;- true
     25 	}()
     26 	m["2"] = "b" // Second conflicting access.
     27 	&lt;-c
     28 	for k, v := range m {
     29 		fmt.Println(k, v)
     30 	}
     31 }
     32 </pre>
     33 
     34 <h2 id="Usage">Usage</h2>
     35 
     36 <p>
     37 To help diagnose such bugs, Go includes a built-in data race detector.
     38 To use it, add the <code>-race</code> flag to the go command:
     39 </p>
     40 
     41 <pre>
     42 $ go test -race mypkg    // to test the package
     43 $ go run -race mysrc.go  // to run the source file
     44 $ go build -race mycmd   // to build the command
     45 $ go install -race mypkg // to install the package
     46 </pre>
     47 
     48 <h2 id="Report_Format">Report Format</h2>
     49 
     50 <p>
     51 When the race detector finds a data race in the program, it prints a report.
     52 The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created.
     53 Here is an example:
     54 </p>
     55 
     56 <pre>
     57 WARNING: DATA RACE
     58 Read by goroutine 185:
     59   net.(*pollServer).AddFD()
     60       src/net/fd_unix.go:89 +0x398
     61   net.(*pollServer).WaitWrite()
     62       src/net/fd_unix.go:247 +0x45
     63   net.(*netFD).Write()
     64       src/net/fd_unix.go:540 +0x4d4
     65   net.(*conn).Write()
     66       src/net/net.go:129 +0x101
     67   net.func060()
     68       src/net/timeout_test.go:603 +0xaf
     69 
     70 Previous write by goroutine 184:
     71   net.setWriteDeadline()
     72       src/net/sockopt_posix.go:135 +0xdf
     73   net.setDeadline()
     74       src/net/sockopt_posix.go:144 +0x9c
     75   net.(*conn).SetDeadline()
     76       src/net/net.go:161 +0xe3
     77   net.func061()
     78       src/net/timeout_test.go:616 +0x3ed
     79 
     80 Goroutine 185 (running) created at:
     81   net.func061()
     82       src/net/timeout_test.go:609 +0x288
     83 
     84 Goroutine 184 (running) created at:
     85   net.TestProlongTimeout()
     86       src/net/timeout_test.go:618 +0x298
     87   testing.tRunner()
     88       src/testing/testing.go:301 +0xe8
     89 </pre>
     90 
     91 <h2 id="Options">Options</h2>
     92 
     93 <p>
     94 The <code>GORACE</code> environment variable sets race detector options.
     95 The format is:
     96 </p>
     97 
     98 <pre>
     99 GORACE="option1=val1 option2=val2"
    100 </pre>
    101 
    102 <p>
    103 The options are:
    104 </p>
    105 
    106 <ul>
    107 <li>
    108 <code>log_path</code> (default <code>stderr</code>): The race detector writes
    109 its report to a file named <code>log_path.<em>pid</em></code>.
    110 The special names <code>stdout</code>
    111 and <code>stderr</code> cause reports to be written to standard output and
    112 standard error, respectively.
    113 </li>
    114 
    115 <li>
    116 <code>exitcode</code> (default <code>66</code>): The exit status to use when
    117 exiting after a detected race.
    118 </li>
    119 
    120 <li>
    121 <code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix
    122 from all reported file paths, to make reports more concise.
    123 </li>
    124 
    125 <li>
    126 <code>history_size</code> (default <code>1</code>): The per-goroutine memory
    127 access history is <code>32K * 2**history_size elements</code>.
    128 Increasing this value can avoid a "failed to restore the stack" error in reports, at the
    129 cost of increased memory usage.
    130 </li>
    131 
    132 <li>
    133 <code>halt_on_error</code> (default <code>0</code>): Controls whether the program
    134 exits after reporting first data race.
    135 </li>
    136 </ul>
    137 
    138 <p>
    139 Example:
    140 </p>
    141 
    142 <pre>
    143 $ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race
    144 </pre>
    145 
    146 <h2 id="Excluding_Tests">Excluding Tests</h2>
    147 
    148 <p>
    149 When you build with <code>-race</code> flag, the <code>go</code> command defines additional
    150 <a href="/pkg/go/build/#hdr-Build_Constraints">build tag</a> <code>race</code>.
    151 You can use the tag to exclude some code and tests when running the race detector.
    152 Some examples:
    153 </p>
    154 
    155 <pre>
    156 // +build !race
    157 
    158 package foo
    159 
    160 // The test contains a data race. See issue 123.
    161 func TestFoo(t *testing.T) {
    162 	// ...
    163 }
    164 
    165 // The test fails under the race detector due to timeouts.
    166 func TestBar(t *testing.T) {
    167 	// ...
    168 }
    169 
    170 // The test takes too long under the race detector.
    171 func TestBaz(t *testing.T) {
    172 	// ...
    173 }
    174 </pre>
    175 
    176 <h2 id="How_To_Use">How To Use</h2>
    177 
    178 <p>
    179 To start, run your tests using the race detector (<code>go test -race</code>).
    180 The race detector only finds races that happen at runtime, so it can't find
    181 races in code paths that are not executed.
    182 If your tests have incomplete coverage,
    183 you may find more races by running a binary built with <code>-race</code> under a realistic
    184 workload.
    185 </p>
    186 
    187 <h2 id="Typical_Data_Races">Typical Data Races</h2>
    188 
    189 <p>
    190 Here are some typical data races.  All of them can be detected with the race detector.
    191 </p>
    192 
    193 <h3 id="Race_on_loop_counter">Race on loop counter</h3>
    194 
    195 <pre>
    196 func main() {
    197 	var wg sync.WaitGroup
    198 	wg.Add(5)
    199 	for i := 0; i < 5; i++ {
    200 		go func() {
    201 			fmt.Println(i) // Not the 'i' you are looking for.
    202 			wg.Done()
    203 		}()
    204 	}
    205 	wg.Wait()
    206 }
    207 </pre>
    208 
    209 <p>
    210 The variable <code>i</code> in the function literal is the same variable used by the loop, so
    211 the read in the goroutine races with the loop increment.
    212 (This program typically prints 55555, not 01234.)
    213 The program can be fixed by making a copy of the variable:
    214 </p>
    215 
    216 <pre>
    217 func main() {
    218 	var wg sync.WaitGroup
    219 	wg.Add(5)
    220 	for i := 0; i < 5; i++ {
    221 		go func(j int) {
    222 			fmt.Println(j) // Good. Read local copy of the loop counter.
    223 			wg.Done()
    224 		}(i)
    225 	}
    226 	wg.Wait()
    227 }
    228 </pre>
    229 
    230 <h3 id="Accidentally_shared_variable">Accidentally shared variable</h3>
    231 
    232 <pre>
    233 // ParallelWrite writes data to file1 and file2, returns the errors.
    234 func ParallelWrite(data []byte) chan error {
    235 	res := make(chan error, 2)
    236 	f1, err := os.Create("file1")
    237 	if err != nil {
    238 		res &lt;- err
    239 	} else {
    240 		go func() {
    241 			// This err is shared with the main goroutine,
    242 			// so the write races with the write below.
    243 			_, err = f1.Write(data)
    244 			res &lt;- err
    245 			f1.Close()
    246 		}()
    247 	}
    248 	f2, err := os.Create("file2") // The second conflicting write to err.
    249 	if err != nil {
    250 		res &lt;- err
    251 	} else {
    252 		go func() {
    253 			_, err = f2.Write(data)
    254 			res &lt;- err
    255 			f2.Close()
    256 		}()
    257 	}
    258 	return res
    259 }
    260 </pre>
    261 
    262 <p>
    263 The fix is to introduce new variables in the goroutines (note the use of <code>:=</code>):
    264 </p>
    265 
    266 <pre>
    267 			...
    268 			_, err := f1.Write(data)
    269 			...
    270 			_, err := f2.Write(data)
    271 			...
    272 </pre>
    273 
    274 <h3 id="Unprotected_global_variable">Unprotected global variable</h3>
    275 
    276 <p>
    277 If the following code is called from several goroutines, it leads to races on the <code>service</code> map.
    278 Concurrent reads and writes of the same map are not safe:
    279 </p>
    280 
    281 <pre>
    282 var service map[string]net.Addr
    283 
    284 func RegisterService(name string, addr net.Addr) {
    285 	service[name] = addr
    286 }
    287 
    288 func LookupService(name string) net.Addr {
    289 	return service[name]
    290 }
    291 </pre>
    292 
    293 <p>
    294 To make the code safe, protect the accesses with a mutex:
    295 </p>
    296 
    297 <pre>
    298 var (
    299 	service   map[string]net.Addr
    300 	serviceMu sync.Mutex
    301 )
    302 
    303 func RegisterService(name string, addr net.Addr) {
    304 	serviceMu.Lock()
    305 	defer serviceMu.Unlock()
    306 	service[name] = addr
    307 }
    308 
    309 func LookupService(name string) net.Addr {
    310 	serviceMu.Lock()
    311 	defer serviceMu.Unlock()
    312 	return service[name]
    313 }
    314 </pre>
    315 
    316 <h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3>
    317 
    318 <p>
    319 Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.),
    320 as in this example:
    321 </p>
    322 
    323 <pre>
    324 type Watchdog struct{ last int64 }
    325 
    326 func (w *Watchdog) KeepAlive() {
    327 	w.last = time.Now().UnixNano() // First conflicting access.
    328 }
    329 
    330 func (w *Watchdog) Start() {
    331 	go func() {
    332 		for {
    333 			time.Sleep(time.Second)
    334 			// Second conflicting access.
    335 			if w.last < time.Now().Add(-10*time.Second).UnixNano() {
    336 				fmt.Println("No keepalives for 10 seconds. Dying.")
    337 				os.Exit(1)
    338 			}
    339 		}
    340 	}()
    341 }
    342 </pre>
    343 
    344 <p>
    345 Even such "innocent" data races can lead to hard-to-debug problems caused by
    346 non-atomicity of the memory accesses,
    347 interference with compiler optimizations,
    348 or reordering issues accessing processor memory .
    349 </p>
    350 
    351 <p>
    352 A typical fix for this race is to use a channel or a mutex.
    353 To preserve the lock-free behavior, one can also use the
    354 <a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package.
    355 </p>
    356 
    357 <pre>
    358 type Watchdog struct{ last int64 }
    359 
    360 func (w *Watchdog) KeepAlive() {
    361 	atomic.StoreInt64(&amp;w.last, time.Now().UnixNano())
    362 }
    363 
    364 func (w *Watchdog) Start() {
    365 	go func() {
    366 		for {
    367 			time.Sleep(time.Second)
    368 			if atomic.LoadInt64(&amp;w.last) < time.Now().Add(-10*time.Second).UnixNano() {
    369 				fmt.Println("No keepalives for 10 seconds. Dying.")
    370 				os.Exit(1)
    371 			}
    372 		}
    373 	}()
    374 }
    375 </pre>
    376 
    377 <h2 id="Supported_Systems">Supported Systems</h2>
    378 
    379 <p>
    380 The race detector runs on <code>darwin/amd64</code>, <code>freebsd/amd64</code>,
    381 <code>linux/amd64</code>, and <code>windows/amd64</code>.
    382 </p>
    383 
    384 <h2 id="Runtime_Overheads">Runtime Overhead</h2>
    385 
    386 <p>
    387 The cost of race detection varies by program, but for a typical program, memory
    388 usage may increase by 5-10x and execution time by 2-20x.
    389 </p>
    390