Re: Best practice for accepting TCP connections on multicore?

看板FB_hackers作者時間11年前 (2014/06/08 05:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串16/18 (看更多)
On 7 June 2014 16:37, Igor Mozolevsky <igor@hybrid-lab.co.uk> wrote: > > > > On 7 June 2014 21:18, Adrian Chadd <adrian@freebsd.org> wrote: >> >> > Not quite - the gist (and the point) of that slide with Rob's story was >> > that >> > by the time Rob wrote something that could comprehensively deal with >> > states >> > in an even-driven server, he ended up essentially re-inventing the >> > wheel. >> >> I read the same slides you did. He didn't reinvent the wheel - threads >> are a different concept - at any point the state can change and you >> switch to a new thread. Event driven, asynchronous programming isn't >> quite like that. > > > Not quite- unless you're dealing with stateless HTTP, you still need to know > what the "current" state of the "current" connection is, which is the point > of that slide. > > >> > Paul Tyma's presentation posted earlier did conclude with various models >> > for >> > different types of daemons, which the OP might find at least >> > interesting. >> >> Agreed, but again - it's all java, it's all linux, and it's 2008. > > > Agreed, but threading models are platform-agnostic. > > >> The current state is that threads and thread context switching are >> more expensive than you'd like. You really want to (a) avoid locking >> at all, (b) keep the CPU hot with cached data, and (c) keep it from >> changing contexts. > > > Agreed, but uncontested locking should be virtually cost-free (or close to > that), modern CPUs have plenty of L2/L3 cache to keep enough data nearby, > and there are plenty of cores to keep cycling in the same thread-loop, and > hyper-threading helps with ctx switching (or at least is supposed to). In > any event, shuttling data between RAM and cache (especially with the on-die > RAM controllers, and even if data has to go through QPI/HyperT), and the > cost of changing contexts is tiny compared to that of disk and network IO. I was doing 40gbit/sec testing over 2^16 connections (and was hoping to get the chance to optimise this stuff to get to 2^17 active streaming connections, but I ran out of CPU.) If you're not careful about keeping work on a local CPU, you end up blowing your caches and hitting lock contention pretty quickly. And QPI isn't free. There's a cost going backwards and forwards with packet data and cache lines for uncontested data. I'm not going to worry about QPI and socket awareness just for now - that's a bigger problem to solve. I'll first worry about getting RSS working for a single socket setup and then convert a couple of drivers over to be RSS aware. I'll then worry about multiple socket awareness and being aware of whether a NIC is local to a socket. I'm hoping that with this work and the Verisign TCP locking changes, we'll be able to handle 40gig bulk data on single socket Sandy Bridge Xeon hardware and/or > 100,000 TCP sessions a second with plenty of CPU to spare. Then it's getting to 80 gig on Ivy bridge class single-socket hardware. I'm hoping we can aim for much higher (million + transactions a second) on the current generation hardware but that requires a bunch more locking work. And well, whatever hardware I can play with. All I have at home is a 4-core ivy bridge desktop box with igb(4). :-P -a _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
文章代碼(AID): #1JatsEZK (FB_hackers)
討論串 (同標題文章)
完整討論串 (本文為第 16 之 18 篇):
文章代碼(AID): #1JatsEZK (FB_hackers)