1 <!--{ 2 "Title": "Data Race Detector", 3 "Template": true 4 }--> 5 6 <h2 id="Introduction">Introduction</h2> 7 8 <p> 9 Data races are among the most common and hardest to debug types of bugs in concurrent systems. 10 A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. 11 See the <a href="/ref/mem/">The Go Memory Model</a> for details. 12 </p> 13 14 <p> 15 Here is an example of a data race that can lead to crashes and memory corruption: 16 </p> 17 18 <pre> 19 func main() { 20 c := make(chan bool) 21 m := make(map[string]string) 22 go func() { 23 m["1"] = "a" // First conflicting access. 24 c <- true 25 }() 26 m["2"] = "b" // Second conflicting access. 27 <-c 28 for k, v := range m { 29 fmt.Println(k, v) 30 } 31 } 32 </pre> 33 34 <h2 id="Usage">Usage</h2> 35 36 <p> 37 To help diagnose such bugs, Go includes a built-in data race detector. 38 To use it, add the <code>-race</code> flag to the go command: 39 </p> 40 41 <pre> 42 $ go test -race mypkg // to test the package 43 $ go run -race mysrc.go // to run the source file 44 $ go build -race mycmd // to build the command 45 $ go install -race mypkg // to install the package 46 </pre> 47 48 <h2 id="Report_Format">Report Format</h2> 49 50 <p> 51 When the race detector finds a data race in the program, it prints a report. 52 The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. 53 Here is an example: 54 </p> 55 56 <pre> 57 WARNING: DATA RACE 58 Read by goroutine 185: 59 net.(*pollServer).AddFD() 60 src/net/fd_unix.go:89 +0x398 61 net.(*pollServer).WaitWrite() 62 src/net/fd_unix.go:247 +0x45 63 net.(*netFD).Write() 64 src/net/fd_unix.go:540 +0x4d4 65 net.(*conn).Write() 66 src/net/net.go:129 +0x101 67 net.func060() 68 src/net/timeout_test.go:603 +0xaf 69 70 Previous write by goroutine 184: 71 net.setWriteDeadline() 72 src/net/sockopt_posix.go:135 +0xdf 73 net.setDeadline() 74 src/net/sockopt_posix.go:144 +0x9c 75 net.(*conn).SetDeadline() 76 src/net/net.go:161 +0xe3 77 net.func061() 78 src/net/timeout_test.go:616 +0x3ed 79 80 Goroutine 185 (running) created at: 81 net.func061() 82 src/net/timeout_test.go:609 +0x288 83 84 Goroutine 184 (running) created at: 85 net.TestProlongTimeout() 86 src/net/timeout_test.go:618 +0x298 87 testing.tRunner() 88 src/testing/testing.go:301 +0xe8 89 </pre> 90 91 <h2 id="Options">Options</h2> 92 93 <p> 94 The <code>GORACE</code> environment variable sets race detector options. 95 The format is: 96 </p> 97 98 <pre> 99 GORACE="option1=val1 option2=val2" 100 </pre> 101 102 <p> 103 The options are: 104 </p> 105 106 <ul> 107 <li> 108 <code>log_path</code> (default <code>stderr</code>): The race detector writes 109 its report to a file named <code>log_path.<em>pid</em></code>. 110 The special names <code>stdout</code> 111 and <code>stderr</code> cause reports to be written to standard output and 112 standard error, respectively. 113 </li> 114 115 <li> 116 <code>exitcode</code> (default <code>66</code>): The exit status to use when 117 exiting after a detected race. 118 </li> 119 120 <li> 121 <code>strip_path_prefix</code> (default <code>""</code>): Strip this prefix 122 from all reported file paths, to make reports more concise. 123 </li> 124 125 <li> 126 <code>history_size</code> (default <code>1</code>): The per-goroutine memory 127 access history is <code>32K * 2**history_size elements</code>. 128 Increasing this value can avoid a "failed to restore the stack" error in reports, at the 129 cost of increased memory usage. 130 </li> 131 132 <li> 133 <code>halt_on_error</code> (default <code>0</code>): Controls whether the program 134 exits after reporting first data race. 135 </li> 136 </ul> 137 138 <p> 139 Example: 140 </p> 141 142 <pre> 143 $ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race 144 </pre> 145 146 <h2 id="Excluding_Tests">Excluding Tests</h2> 147 148 <p> 149 When you build with <code>-race</code> flag, the <code>go</code> command defines additional 150 <a href="/pkg/go/build/#hdr-Build_Constraints">build tag</a> <code>race</code>. 151 You can use the tag to exclude some code and tests when running the race detector. 152 Some examples: 153 </p> 154 155 <pre> 156 // +build !race 157 158 package foo 159 160 // The test contains a data race. See issue 123. 161 func TestFoo(t *testing.T) { 162 // ... 163 } 164 165 // The test fails under the race detector due to timeouts. 166 func TestBar(t *testing.T) { 167 // ... 168 } 169 170 // The test takes too long under the race detector. 171 func TestBaz(t *testing.T) { 172 // ... 173 } 174 </pre> 175 176 <h2 id="How_To_Use">How To Use</h2> 177 178 <p> 179 To start, run your tests using the race detector (<code>go test -race</code>). 180 The race detector only finds races that happen at runtime, so it can't find 181 races in code paths that are not executed. 182 If your tests have incomplete coverage, 183 you may find more races by running a binary built with <code>-race</code> under a realistic 184 workload. 185 </p> 186 187 <h2 id="Typical_Data_Races">Typical Data Races</h2> 188 189 <p> 190 Here are some typical data races. All of them can be detected with the race detector. 191 </p> 192 193 <h3 id="Race_on_loop_counter">Race on loop counter</h3> 194 195 <pre> 196 func main() { 197 var wg sync.WaitGroup 198 wg.Add(5) 199 for i := 0; i < 5; i++ { 200 go func() { 201 fmt.Println(i) // Not the 'i' you are looking for. 202 wg.Done() 203 }() 204 } 205 wg.Wait() 206 } 207 </pre> 208 209 <p> 210 The variable <code>i</code> in the function literal is the same variable used by the loop, so 211 the read in the goroutine races with the loop increment. 212 (This program typically prints 55555, not 01234.) 213 The program can be fixed by making a copy of the variable: 214 </p> 215 216 <pre> 217 func main() { 218 var wg sync.WaitGroup 219 wg.Add(5) 220 for i := 0; i < 5; i++ { 221 go func(j int) { 222 fmt.Println(j) // Good. Read local copy of the loop counter. 223 wg.Done() 224 }(i) 225 } 226 wg.Wait() 227 } 228 </pre> 229 230 <h3 id="Accidentally_shared_variable">Accidentally shared variable</h3> 231 232 <pre> 233 // ParallelWrite writes data to file1 and file2, returns the errors. 234 func ParallelWrite(data []byte) chan error { 235 res := make(chan error, 2) 236 f1, err := os.Create("file1") 237 if err != nil { 238 res <- err 239 } else { 240 go func() { 241 // This err is shared with the main goroutine, 242 // so the write races with the write below. 243 _, err = f1.Write(data) 244 res <- err 245 f1.Close() 246 }() 247 } 248 f2, err := os.Create("file2") // The second conflicting write to err. 249 if err != nil { 250 res <- err 251 } else { 252 go func() { 253 _, err = f2.Write(data) 254 res <- err 255 f2.Close() 256 }() 257 } 258 return res 259 } 260 </pre> 261 262 <p> 263 The fix is to introduce new variables in the goroutines (note the use of <code>:=</code>): 264 </p> 265 266 <pre> 267 ... 268 _, err := f1.Write(data) 269 ... 270 _, err := f2.Write(data) 271 ... 272 </pre> 273 274 <h3 id="Unprotected_global_variable">Unprotected global variable</h3> 275 276 <p> 277 If the following code is called from several goroutines, it leads to races on the <code>service</code> map. 278 Concurrent reads and writes of the same map are not safe: 279 </p> 280 281 <pre> 282 var service map[string]net.Addr 283 284 func RegisterService(name string, addr net.Addr) { 285 service[name] = addr 286 } 287 288 func LookupService(name string) net.Addr { 289 return service[name] 290 } 291 </pre> 292 293 <p> 294 To make the code safe, protect the accesses with a mutex: 295 </p> 296 297 <pre> 298 var ( 299 service map[string]net.Addr 300 serviceMu sync.Mutex 301 ) 302 303 func RegisterService(name string, addr net.Addr) { 304 serviceMu.Lock() 305 defer serviceMu.Unlock() 306 service[name] = addr 307 } 308 309 func LookupService(name string) net.Addr { 310 serviceMu.Lock() 311 defer serviceMu.Unlock() 312 return service[name] 313 } 314 </pre> 315 316 <h3 id="Primitive_unprotected_variable">Primitive unprotected variable</h3> 317 318 <p> 319 Data races can happen on variables of primitive types as well (<code>bool</code>, <code>int</code>, <code>int64</code>, etc.), 320 as in this example: 321 </p> 322 323 <pre> 324 type Watchdog struct{ last int64 } 325 326 func (w *Watchdog) KeepAlive() { 327 w.last = time.Now().UnixNano() // First conflicting access. 328 } 329 330 func (w *Watchdog) Start() { 331 go func() { 332 for { 333 time.Sleep(time.Second) 334 // Second conflicting access. 335 if w.last < time.Now().Add(-10*time.Second).UnixNano() { 336 fmt.Println("No keepalives for 10 seconds. Dying.") 337 os.Exit(1) 338 } 339 } 340 }() 341 } 342 </pre> 343 344 <p> 345 Even such "innocent" data races can lead to hard-to-debug problems caused by 346 non-atomicity of the memory accesses, 347 interference with compiler optimizations, 348 or reordering issues accessing processor memory . 349 </p> 350 351 <p> 352 A typical fix for this race is to use a channel or a mutex. 353 To preserve the lock-free behavior, one can also use the 354 <a href="/pkg/sync/atomic/"><code>sync/atomic</code></a> package. 355 </p> 356 357 <pre> 358 type Watchdog struct{ last int64 } 359 360 func (w *Watchdog) KeepAlive() { 361 atomic.StoreInt64(&w.last, time.Now().UnixNano()) 362 } 363 364 func (w *Watchdog) Start() { 365 go func() { 366 for { 367 time.Sleep(time.Second) 368 if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() { 369 fmt.Println("No keepalives for 10 seconds. Dying.") 370 os.Exit(1) 371 } 372 } 373 }() 374 } 375 </pre> 376 377 <h2 id="Supported_Systems">Supported Systems</h2> 378 379 <p> 380 The race detector runs on <code>darwin/amd64</code>, <code>freebsd/amd64</code>, 381 <code>linux/amd64</code>, and <code>windows/amd64</code>. 382 </p> 383 384 <h2 id="Runtime_Overheads">Runtime Overhead</h2> 385 386 <p> 387 The cost of race detection varies by program, but for a typical program, memory 388 usage may increase by 5-10x and execution time by 2-20x. 389 </p> 390