git: kernel - scheduler adjustments for large ncpus / 48-core mo

看板DFBSD_commit作者時間15年前 (2010/12/18 17:32), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串1/1
commit 2a4189307741dbcfbe11b31d6cc51a4fb39a8cde Author: Matthew Dillon <dillon@apollo.backplane.com> Date: Sat Dec 18 00:42:52 2010 -0800 kernel - scheduler adjustments for large ncpus / 48-core monster * Change the LWKT scheduler's token spinning algorithm. It used to DELAY a short period of time and then simply retry, creating a lot of contention between cpus trying to acquire a token. Now the LWKT scheduler uses a FIFO index mechanic to resequence the contending cpus into 1uS retry slots using essentially just atomic_fetchadd_int(), so it is very cache friendly. The spin-retry thus has a bounded cache management traffic load regardless of the number of cpus and contending cpus will not be tripping over each other. The new algorithm slightly regresses 4-cpu operation (~5% under heavy contention) but significantly improves 48-cpu operation. It is also flexible enough for further work down the road. The old algorithm simply did not scale very well. Add three sysctls: sysctl lwkt.spin_method=1 0 Allow a user thread to be scheduled on a cpu while kernel threads are contended on a token, using the IPI mechanic to interrupt the user thread and reschedule on decontention. This can potentially result in excessive IPI traffic. 1 Allow a user thread to be scheduled on a cpu while kernel threads are contended on a token, reschedule on the next clock tick (100 Hz typically). Decontention will NOT generate any IPI traffic. DEFAULT. 2 Do not allow a user thread to be scheduled on a cpu while kernel threads are contended. Should not be used normally, for debugging only. sysctl lwkt.spin_delay=1 Slot time in microseconds, default 1uS. Recommended values are 1 or 2 but not longer. sysctl lwkt.spin_loops=10 Number of times the LWKT scheduler loops on contended threads before giving up and allowing an idle-thread HLT. In order to wake up from the HLT decontention will cause an IPI so you do not want to set this value too small and. Values between 10 and 100 are recommended. * Redo the token decontention algorithm. Use a new gd_reqflags flag, RQF_WAKEUP, coupled with RQF_AST_LWKT_RESCHED in the per-cpu globaldata structure to determine what cpus actually need to be IPId on token decontention (to wakeup their idle threads stuck in HLT). This requires that all gd_reqflags operations use locked atomic instructions rather than non-locked instructions. * Decontention IPIs are a last-gasp effort if the LWKT scheduler has spun too many times. Under normal conditions, even under heavy contention, actual IPIing should be minimal. Summary of changes: sys/cpu/i386/include/cpu.h | 24 +- sys/cpu/x86_64/include/cpu.h | 19 +- sys/kern/lwkt_thread.c | 342 +++++++++++++++++++-------- sys/kern/lwkt_token.c | 92 +++++++- sys/platform/pc32/i386/trap.c | 6 +- sys/platform/pc32/isa/intr_machdep.c | 2 +- sys/platform/pc32/isa/ipl_funcs.c | 2 +- sys/platform/pc64/isa/intr_machdep.c | 2 +- sys/platform/pc64/x86_64/ipl_funcs.c | 2 +- sys/platform/pc64/x86_64/trap.c | 6 +- sys/platform/vkernel/i386/trap.c | 2 +- sys/platform/vkernel/platform/ipl_funcs.c | 2 +- sys/platform/vkernel/platform/machintr.c | 8 +- sys/platform/vkernel64/platform/ipl_funcs.c | 2 +- sys/platform/vkernel64/platform/machintr.c | 8 +- sys/platform/vkernel64/x86_64/trap.c | 6 +- sys/vm/vm_fault.c | 26 ++- sys/vm/vnode_pager.c | 2 +- 18 files changed, 381 insertions(+), 172 deletions(-) http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/2a4189307741dbcfbe11b31d6cc51a4fb39a8cde -- DragonFly BSD source repository
文章代碼(AID): #1D37-MQx (DFBSD_commit)