In the trenches: Debugging GHC evaluation
This will be a collection of notes from my efforts debugging GHC bug #13615, a nasty concurrency issue leading to non-deterministic incorrect evaluation in some parallel programs.
This post will not be an introduction to gdb
, although will rely on it. Moreover, we will assume some knowledge of GHC internals, in particular a working understanding of STG machine. See (???) for details.
A fairly nice reproducer can be found here. In short what we have here is a program which builds a strict HashMap
(from unordered-containers
) from a list of elements evaluated in parallel (using parallel
’s strategies),
type Probably a = [(a, Rational)]
regroup :: (...) => Probably a -> Probably a
regroup xs =
let xs = HM.toList $ HM.fromListWith (+) xs
in ...
winStates :: (...) => Solution state description -> Probably state
winStates =
case s of
...
Node _ _ _ ps -> regroup $ concat $ parMap rdeepseq someFunction ps
It turns out that when run with +RTS -N2
, winStates
will return an incorrect result. Specifically, it appears that some elements of the input list to regroup
are not folded into the HashMap
.
Looking at Data.HashMap.Strict.fromListWith
it becomes quite clear that there is some subtlety here as it uses internal mutation while building the HashMap
. At this point it will likely be helpful to look at the representation of HashMap
,
data Leaf k v = L !k v
deriving (Eq)
data HashMap k v
= Empty
| BitmapIndexed !Bitmap !(A.Array (HashMap k v))
| Leaf !Hash !(Leaf k v)
| Full !(A.Array (HashMap k v))
| Collision !Hash !(A.Array (Leaf k v))
fromListWith
is implemented in terms of unsafeInsertWith
, which, given a key and value to insert, traverses this tree, finds the appropriate node to insert into, and if possible performs an in-place insertion. This is the codepath which
unsafeInsertWith :: (Eq k, Hashable k, NFData k, NFData v, Show k)
=> (v -> v -> v) -> k -> v -> HashMap k v -> ST s (HashMap k v)
unsafeInsertWith f k0 v0 m0 = go h0 k0 v0 0 m0
where
h0 = hash k0
go !h !k x !_ Empty = return $! leaf h k x
go h k x s (Leaf hy l@(L ky y))
| hy == h = if ky == k
then return $! leaf h k (f x y)
else return $! collision h l (L k x)
| otherwise = two s h k x hy ky y
go h k x s t@(BitmapIndexed b ary)
| b .&. m == 0 = do
let ary' = A.insert ary i $! leaf h k x
return $! bitmapIndexedOrFull (b .|. m) ary'
| otherwise = do
st <- A.indexM ary i
st' <- rnf x `seq` go h k x (s+bitsPerSubkey) st
A.unsafeUpdateM ary i st' -- in-place mutation
return t
where m = mask h s
i = sparseIndex b m
go h k x s t@(Full ary) = do
st <- A.indexM ary i
st' <- go h k x (s+bitsPerSubkey) st
A.unsafeUpdateM ary i st' -- in-place mutation
return t
where i = index h s
go h k x s t@(Collision hy v)
| h == hy = return $! Collision h (updateOrSnocWith f k x v)
| otherwise = go h k x s $ BitmapIndexed (mask hy s) (A.singleton t)
We can show that in-place mutation is responsible by changing both the references to A.unsafeUpdateM
to safe copy-and-modify updates. Doing this indeed appears to hide the issue.
So, the question then becomes why this in-place modification results in non-deterministic behavior. Given that parallelism is involved, one concern is that we have multiple threads performing mutable updates on the same array concurrently. There are a few ways we can test this,
Compile
Data.HashMap.Strict
with-feager-blackholing
to ensure that multiple threads aren’t evaluating some thunk whole code mutates, resulting in multiple racing mutable updatesWrap
unsafeInsertWith
with anoDuplicate
, essentially forcing a more localized blackholing check
Indeed both of these options seem to fix the issue. This is somewhat surprising; afterall, no unsafeInsertWith
thunks should be leaking out of fromListWith
, which is essentially a strict fold over the input list producing a result in normal form. Given this, it’s hard to imagine how we are entering such a thunk more than once.
Catching the program in the act
To determine how we end up with multiple entry we’ll want to catch when it occurs. We can then pause the program and poke at its state in gdb
. To catch this, we’ll write some sanity checking logic to wrap unsafeInsertWith
with ensuring that only one thread is in our critical section at once,
data Status = AllOut -- ^ no one is in the critical section
| OneIn -- ^ one thread is currently in the critical section
| TwoIn -- ^ a second thread tried to enter
deriving (Eq)
failOnDuplicate :: IORef Status -> b -> ST s a -> ST s a
failOnDuplicate ref obj action = do
a <- unsafeIOToST $ atomicModifyIORef' ref $ \x -> (OneIn, x)
-- Is someone already in our critical section?
when (a == OneIn) $ unsafeIOToST $ do
writeIORef ref TwoIn
-- if so log some details and hang
logWord 0x22
logThreadId
logWord 0x22
logPointer obj
forever $ return ()
!r <- action
a' <- unsafeIOToST $ atomicModifyIORef' ref $ \x -> (AllOut, x)
-- did a second thread try to enter the
when (a' == TwoIn) $ unsafeIOToST $ do
logWord 0x44
logThreadId
logWord 0x44
logPointer obj
breakpoint
return r
unsafeInsertWith :: (Eq k, Hashable k, NFData k, NFData v, Show k)
=> IORef Status -> (v -> v -> v) -> k -> v -> HashMap k v -> ST s (HashMap k v)
unsafeInsertWith ref f k0 v0 m0 = failOnDuplicate ref m0 $ go h0 k0 v0 0 m0
where
...
Where the log*
and breakpoint
functions come from a utility module that I carry around from times like these. Respectively they call to C to log machine words to a ring-buffer, and fire an int $3
to break into GDB.
Compiling and running the program within gdb
indeed breaks, again confirming our multi-entry hypothesis.
Thread 13 "ghc_worker" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffd17fa700 (LWP 10758)]
0x00000000004569f1 in c_breakpoint () at cbits/breakpoint.c:19
19 __asm__("int $3");
(gdb) info threads
Id Target Id Frame
...
15 Thread 0x7fffd1ffb700 (LWP 10803) "ghc_worker" rxSQ_info () at ./Data/HashMap/Strict.hs:100
...
* 17 Thread 0x7fffd17fa700 (LWP 10805) "ghc_worker" 0x00000000004569f1 in c_breakpoint () at cbits/breakpoint.c:19
...
Here we clearly see the two threads in play: thread 15 is looping at the forever $ return ()
whereas thread 17 fired the breakpoint. Looking at the log ringbuffer sheds a bit more light,
(gdb) x/-8xg word_log + log_idx
0x7de2e0 <word_log>: 0x0000000000000022 0x00007fffd0ff9700
0x7de2f0 <word_log+16>: 0x0000000000000022 0x0000004200cc09c2
0x7de300 <word_log+32>: 0x0000000000000044 0x00007fffd27fc700
0x7de310 <word_log+48>: 0x0000000000000044 0x0000004200cc09c2
Here we see that we have a 44
type message, indicating that a thread (namely thread 15) tried to insert into the map which was already being inserted into. We also have a 44
message, from the thread who was inserting, thread 17. Most importantly we see that both of these calls are acting upon the same HashMap
object, located at 0x0000004200cc09c2
(note that this isn’t always the case; often we seem to get two calls entering the critical section acting on two different HashMap
s. However, we’ll focus on the case where both calls are acting on the same object as I suspect this will be a bit easier to reason about).
Let’s take a moment to examine the HashMap
object.
Examining heap objects in GDB
From the HashMap
pointer we can immediately tell from the tag (contained in the pointer’s bottom three bits, e.g. 2) that it is a BitmapIndexed
constructor. Clearing the tag, we can view the heap object itself,
(gdb) x/3a 0x0000004200cc09c0
0x4200cc09c0: 0x456928 <unorderedzmcontainerszm0zi2zi8zi0zminplace_DataziHashMapziBase_BitmapIndexed_con_info> 0x4200cc0998
0x4200cc09d0: 0x800
Here we see the Bitmap
field (having a value of 0x800
), and a pointer to the node’s Array
. Note how the field order is inverted here: this is due to GHC’s convention of placing pointer fields first, then non-pointers. Next let’s examine the node’s Array
(which follows the layout described by struct StgMutArrPtrs
in includes/rts/storage/Closures.h
),
(gdb) x/4a 0x4200cc0998
0x4200cc0998: 0x55f2b8 <stg_MUT_ARR_PTRS_FROZEN0_info> 0x1
0x4200cc09a8: 0x2 0x4200cc0982
Here we can tell from the info pointer that the array is of the FROZEN0
variety, with length 1 (the 0x2 is the length including the size of the card table, which directly follows the pointer content). The first (and only) element of the array is found at 0x4200cc0982
,
(gdb) x/4a 0x4200cc0980
0x4200cc0980: 0x456928 <unorderedzmcontainerszm0zi2zi8zi0zminplace_DataziHashMapziBase_BitmapIndexed_con_info> 0x4200cc0958
If we link against the debug RTS (with GHC’s -debug
option) we can also use the RTS’s printClosure
function, which makes interpretation much easier,
(gdb) print printClosure(0x4200cc0980)
unordered-containers-0.2.8.0-inplace:Data.HashMap.Base.BitmapIndexed(0x4200cc0998, 0x800#)
Examining the stack
The head of the STG evaluation stack is located at the address held in the $rbp
register on amd64,
(gdb) x/16a $rbp
0x42002f0678: 0x5477e8 <stg_ap_v_info> 0x45e390 <base_GHCziBase_zdfApplicativeIO2_info+64>
0x42002f0688: 0x4200144e00 0x4a8f30 <r1rC_info+64>
0x42002f0698: 0x42001449e8 0x4a8c08 <r1rB_info+64>
0x42002f06a8: 0x42001437b1 0x4a8fd8 <base_GHCziST_runST_info+56>
0x42002f06b8: 0x4124e0 <srlH_info+184> 0x4200142288
0x42002f06c8: 0x7b0708 0x4200142220
0x42002f06d8: 0x4200142240 0x4200142250
0x42002f06e8: 0x42001421e0 0x540da0 <stg_upd_frame_info>
However, this is one area where a little help from the debugger helps immensely,
(gdb) ghc-backtrace -n25
0: RET_SMALL
return=0x5477e8 <stg_ap_v_info> (rts/dist/build/AutoApply.cmm:8)
1: RET_SMALL
return=0x45e390 <base_GHCziBase_zdfApplicativeIO2_info+64> (libraries/base/GHC/Base.hs:1195)
field 0: Ptr 0x4200144e00 : BLACKHOLE
2: RET_SMALL
return=0x4a8f30 <r1rC_info+64> (libraries/base/GHC/ST.hs:66)
field 0: Ptr 0x42001449e8 : THUNK
3: RET_SMALL
return=0x4a8c08 <r1rB_info+64> (libraries/base/GHC/ST.hs:74)
field 0: Ptr 0x42001437b1 : FUN_2_0
4: RET_SMALL
return=0x4a8fd8 <base_GHCziST_runST_info+56> (libraries/base/GHC/ST.hs:136)
5: RET_SMALL
return=0x4124e0 <srlH_info+184> (src/Solver.hs:96)
field 0: Ptr 0x4200142288 : THUNK_1_0
field 1: Ptr 0x7b0708 : off-heap(0x7b0708)
field 2: Ptr 0x4200142220 : THUNK_2_0
field 3: Ptr 0x4200142240 : THUNK
field 4: Ptr 0x4200142250 : THUNK
field 5: Ptr 0x42001421e0 : BLACKHOLE
6: UPDATE_FRAME(0x4200141a98: THUNK_2_0)
7: RET_SMALL
return=0x45f528 <s5BF_info+168> (libraries/base/GHC/Base.hs:860)
field 0: Ptr 0x4200141348 : THUNK_2_0
8: UPDATE_FRAME(0x4200140d60: THUNK_2_0)
9: RET_SMALL
return=0x411ca0 <srkh_info+96> (src/Solver.hs:87)
field 0: Ptr 0x4200140e20 : THUNK
field 1: Ptr 0x4200140ef1 : CONSTR_1_0
field 2: Ptr 0x4200140fb9 : CONSTR_1_0
field 3: Ptr 0x4200140fc9 : FUN
...