git: kernel - Attempt to fix two out-of-order read MP races

看板DFBSD_commit作者時間15年前 (2010/09/20 02:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串1/1
commit 562273eaa099ba3345f7ce4472d23b8a4da399db Author: Matthew Dillon <dillon@apollo.backplane.com> Date: Sun Sep 19 08:36:16 2010 -0700 kernel - Attempt to fix two out-of-order read MP races * Example of issue during buildworld -j 8 loop tests, cc1 occassionally fails after upwards of a trillion instructions worth of testing: Sep 18 07:11:58 test29 kernel: seg-fault accessing address 0 rip=0x584ef7 pid=30229 p_comm=cc1 In this example gdb'ing cc1 and examining the code revealed an impossible crash case where off(%ebx) was deterministically accessed a few instructions before, then accessed again and somehow %ebx had become zero. Unfortunately I could find no smoking gun, but my conjecture is that it is a MP race which can occur when the thread migrates between cpus and/or a mis-handled IPI. * In the LWKT messaging code move the cpu_mfence() call in the sequence, from rindex->read_args->MFENCE->call to rindex->MFENCE->read_args->call. * In the LWKT thread acquisition code (for thread migration between cpus), add a cpu_mfence() call after the td_flags check indicates success, instead of inside the loop where we are waiting for the flags check to indicate success. * In both cases the issue seems to be out-of-order reads and/or speculative reads. Even though MP writes are well ordered on Intel/AMD systems reads are not. In the case of the IPIQ FIFO the data related to the arguments can be ordered ahead of the read of the FIFO rindex and thus wind up being stale relative to the other CPU writing the entry. Moving the mfence ensures that the args stored in the FIFO are not accessed until after the rindex is read. For the thread aquisition code access to and manipulation of the thread td_allq might be based on stale out-of-order reads prior to the determination that the thread completed its move. This can be a problem because several mechanisms in DragonFly are able to operate without even having to use locked bus cycle. The IPIQ, thread migration, and kern/sys_pipe.c being the best examples, so the natural barrier provided by the locked bus-cycle instruction is not necessarily present. Summary of changes: sys/kern/lwkt_ipiq.c | 21 +++++++++++++-------- sys/kern/lwkt_thread.c | 4 +++- 2 files changed, 16 insertions(+), 9 deletions(-) http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/562273eaa099ba3345f7ce4472d23b8a4da399db -- DragonFly BSD source repository
文章代碼(AID): #1Cba_fdi (DFBSD_commit)