Re: Delayed ACK triggered by Header Prediction

看板DFBSD_kernel作者時間21年前 (2005/03/16 17:32), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串2/5 (看更多)
Hmmm. There is something going on but it doesn't have anything to do with tcpcbackq[]. I will investigate why the acks are getting seriously delayed on wednesday. Here is an explanation of how tcpcbackq[] works: DragonFly doesn't send back-to-back acks which would otherwise occur due to normal aggregation of receive packets by the ethernet hardware. So, for example, a GigE interface typically only generates an interrupt once every 8 to 10 received packets (if the packets are coming in at full speed). If all the packets are associated with the same tcp connection, DragonFly will only send one ACK back after processing the whole set rather then sending 4 back-to-back acks. This greatly reduces both the return channel bandwidth and the overhead AND reduces the overhead on the sender's processing of the acks. A 100BaseT interface typically does NOT aggregate packets... the packets are coming in too slowly (usually) for such aggregation to occur. This means that tcpcbackq[] would not the acks to occur less often then every other packet over 100BaseT, not unless the cpu load is very, very high. It should be noted that DragonFly is *NOT* delaying the ack in the time domain. In fact, the sender will get a more up-to-date ack more quickly because it won't have to wade through 4 ack packets before it gets the most up-to-date ack (in the GigE case). I have never noticed any performance degredation from this. I get 10MBytes/sec over 100BaseT, at least between two DragonFly boxes. Sure the congestion window might open up a bit more slowly on some senders, but it requires a multi-packet data burst to trigger the effect (i.e. 3 or more packets sent back-to-back) and at that point the congestion window should already be sufficiently open to not effect performance. At those speeds it would only take a few milliseconds at most for the congestion window to open up completely. Unless your link has a lot of packet loss, you shouldn't notice any degredation in performance, and even if your link has packet loss you shouldn't notice much (because the effect doesn't occur until the congestion window is at least 3 packets long). The business about sending one ack for every second segment is a very old part of the RFC (if I remember correctly), and might have made sense for a 10BaseT connection, but it makes very little sense for a 100BaseT or GiGE connection with packet aggregation interrupt hardware. -- In anycase, I *AM* seeing a performance reduction when I FTP with a DragonFly box as a receiver over 100BaseT. I am seeing 7-8MBytes/sec instead of 10+MB/sec. It is NOT related to the way delayed acks work or how tcpcbackq[] works, however. It looks like there is an output delay being imposed somewhere but it is occuring outside the TCP stack. You can verify this by doing a tcpdump in the middle of a transfer on the sender, and then doing a tcpdump on the receiver. The receiver believes it is sending an ack out every other packet (at 100BaseT speeds), but the sender is seeing those acks globbed together. When I run the same test with a FreeBSD box as the receiver the sender is NOT seeing the acks globbed together (at least not anywhere near as badly). Clearly there is something wrong here. I don't yet know what it is but I am fairly sure from the tcpdump output that the TCP stack is not to blame. -Matt Matthew Dillon <dillon@backplane.com> :I am using DragonFlyBSD as a TCP receiver since yersterday night in my :experiences. And I found that the number of ACK segments sent in reply :to received data segments is less than expected. : :Normally, ACK segments are sent for every second full-sized data segment. : : (As many of you know, it is called Delayed ACK and is specified in : section 4.2.3.2 of RFC1122 as follows: : : A TCP SHOULD implement a delayed ACK, but an ACK should not : be excessively delayed; in particular, the delay MUST be : less than 0.5 seconds, and in a stream of full-sized : segments there SHOULD be an ACK for at least every second : segment. : ) : :But the Header Prediction code in DragonFlyBSD TCP sends ACK segments :less frequently. It just queues an output request into tcpcbackq[]. :And tcp_willblock() processes the request later. It seems that :tcp_willblock() is called less frequently than receiving two :full-sized data segments in my environment (100Mbps). (I put printf()'s :in tcp_input(), tcp_output() and tcp_willblock() and found this.) :That would be the reason why the number of ACK segments is less than :expected. : :In my experiences, since DragonFlyBSD sends less ACK segments than :expected, the congestion window in the sender machine grows slowly :and the TCP performance becoms poor. : :I tried the followings: : : 1. "sysctl -w net.inet.tcp.avoid_pure_win_update=0" : But my problem was not solved. : : 2. I replaced the code fragment that inserts an output request in : Header Prediction with a code that simply calls tcp_output(). : With this change, the TCP performance becomes normal. : (compared with the performance when a Linux box is a receiver.) : :I checked "cvs log". tcpcbackq[] was introduced on Aug 3, 2004 to :reduce the number of ACK segments across GbE. Unfortunately, it reduces :the TCP performance on 100Mbps path when DragonFlyBSD acts as a receiver. :I think the same phenomenon will occur when DragonFlyBSD acts as a receiver :across 10GbE. : : What I would like to say here is that when acting as a receiver, : if the number of ACK segments sent in reply to data segments is reduced, : TCP performance from peer node would also be reduced because of : the standard congestion control algorithm. : :So, I think it is better to send an ACK segment for every second :full-sized data segment even on GbE. But I have not experienced :DragonFlyBSD on GbE yet. So, I may be wrong. I am sorry in such :case. : :Regards, :Noritoshi Demizu
文章代碼(AID): #12D_qQ00 (DFBSD_kernel)
文章代碼(AID): #12D_qQ00 (DFBSD_kernel)