Re: cvs commit: src/sys/sys tls.h src/lib/libc/gen tls.c src/lib
:...
:> prefer NOT to do). I did a quick timing test on sys_set_tls_area()
:> and it costs around 339ns on my AMD64 test cube. But this is still
:> going to be far higher performing then having to call __tls_get_addr
:> all the time. The procedure setup cost for figuring out the GOT offset
:> alone is 17ns on the same box.
:
:It's not about calling __tls_get_addr, but
: mov %gs:0, %eax
: mov a@NTPOFF(%eax), %eax
:vs.
: mov $gs:a@NTPOFF, %eax
:
:The difference is one load instruction with possible a pipe-line stale
:involved here. The difference should be zero once the base register is
:loaded.
:
:Joerg
There's no pipeline stall there. %gs:0 is likely to ALWAYS be in the
L1 cache. The %gs prefix itself can cost time verses a non-prefixed
relative load instruction so my guess is that it turns out to be a wash.
Also keep in mind that GCC will cache the data loaded from %gs:0, which
makes it even less of an issue (and potentially faster then %gs:OFFSET).
I did a quick test with both the direct and indirect %gs models and
couldn't see any difference in timing.
Matthew Dillon
<dillon@backplane.com>
討論串 (同標題文章)
完整討論串 (本文為第 8 之 14 篇):