MSMProp2, a simplified but functionally equivalent version of MSMProp1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Julian Seward, OpenWorks Ltd, 19 August 2008 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Note that this file does NOT describe the state machine used in the svn://svn.valgrind.org/branches/YARD version of Helgrind. That state machine is different again from any previously described machine. See the file README_YARD.txt for more details on YARD. ---------------------- In early 2008 Konstantin Serebryany proposed "MSMProp1", a memory state machine for data race detection. It is described at http://code.google.com/p/data-race-test/wiki/MSMProp1 Implementation experiences show MSMProp1 is useful, but difficult to implement efficiently. In particular keeping the memory usage under control is complex and difficult. This note points out a key simplification of MSMProp1, which makes it easier to implement without changing the functionality. The idea ~~~~~~~~ The core of the idea pertains to the "Condition" entry for MSMProp1 state machine rules E5 and E6(r). These are, respectively: HB(SS, currS) and its negation ! HB(SS, currS). Here, SS is a set of segments, and currS is a single segment. Each segment contains a vector timestamp. The expression "HB(SS, currS)" is intended to denote for each segment S in SS . happens_before(S,currS) where happens_before(S,T) means that S's vector timestamp is ordered before-or-equal to T's vector timestamp. In words, the expression for each segment S in SS . happens_before(S,currS) is equivalent to saying that currS has a timestamp which is greater-than-equal to the timestamps of all the segments in SS. The key observation is that this is equivalent to happens_before( JOIN(SS), currS ) where JOIN is the lattice-theoretic "max" or "least upper bound" operation on vector clocks. Given the definition of HB, happens_before and (binary) JOIN, this is easy to prove. The consequences ~~~~~~~~~~~~~~~~ With that observation in place, it is a short step to observe that storing segment sets in MSMProp1 is unnecessary. Instead of storing a segment set in each shadow value, just store and update a single vector timestamp. The following two equivalences hold: MSMProp1 MSMProp2 adding a segment S join-ing S's vector timestamp to the segment-set to the current vector timestamp HB(SS,currS) happens_before( currS's timestamp, current vector timestamp ) Once it is no longer necessary to represent segment sets, it then also becomes unnecessary to represent segments. This constitutes a significant simplication to the implementation. The resulting state machine, MSMProp2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MSMProp2 is isomorphic to MSMProp1, with the following changes: States are New, Read(VTS,LS), Write(VTS,LS) where LS is a lockset (as before) and VTS is a vector timestamp. For a thread T with current lockset 'currLS' and current VTS 'currVTS' making a memory access, the new rules are Name Old-State Op Guard New-State Race-If E1 New rd True Read(currVTS,currLS) False E2 New wr True Write(currVTS,currLS) False E3 Read(oldVTS,oldLS) rd True Read(newVTS,newLS) False E4 Read(oldVTS,oldLS) wr True Write(newVTS,newLS) #newLS == 0 && !hb(oldVTS,currVTS) E5 Write(oldVTS,oldLS) rd hb(oldVTS, Read(currVTS,currLS) False currVTS) E6r Write(oldVTS,oldLS) rd !hb(oldVTS, Write(newVTS,newLS) #newLS == 0 currVTS) && !hb(oldVTS,currVTS) E6w Write(oldVTS,oldLS) wr True Write(newVTS,newLS) #newLS == 0 && !hb(oldVTS,currVTS) where newVTS = join2(oldVTS,currVTS) newLS = if hb(oldVTS,currVTS) then currLS else intersect(oldLS,currLS) hb(vts1, vts2) = vts1 happens before or is equal to vts2 Interpretation of the states ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I always found the state names in MSMProp1 confusing. Both MSMProp1 and MSMProp2 are easier to understand if the states Read and Write are renamed, like this: old name new name Read WriteConstraint Write AllConstraint The effect of a state Read(VTS,LS) is to constrain all later-observed writes so that either (1) the writing thread holds at least one lock in common with LS, or (2) those writes must happen-after VTS. If neither of those two conditions hold, a race is reported. Hence a Read state places a constraint on writes. The effect of a state Write(VTS,LS) is similar, but it applies to all later-observed accesses: either (1) the accessing thread holds at least one lock in common with LS, or (2) those accesses must happen-after VTS. If neither of those two conditions hold, a race is reported. Hence a Write state places a constraint on all accesses. If we ignore the LS component of these states, the intuitive interpretation of the VTS component is that it states the earliest vector-time that the next write / access may safely happen.