Re: RFC: GEOM MULTIPATH rewrite

看板FB_current作者時間14年前 (2012/01/21 02:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串17/17 (看更多)
On Jan 20, 2012, at 3:38 PM, Alexander Motin wrote: > On 01/20/12 15:27, Nikolay Denev wrote: >>=20 >> On Jan 20, 2012, at 2:31 PM, Alexander Motin wrote: >>=20 >>> On 01/20/12 14:13, Nikolay Denev wrote: >>>> On Jan 20, 2012, at 1:30 PM, Alexander Motin wrote: >>>>> On 01/20/12 13:08, Nikolay Denev wrote: >>>>>> On 20.01.2012, at 12:51, Alexander Motin<mav@freebsd.org> = wrote: >>>>>>=20 >>>>>>> On 01/20/12 10:09, Nikolay Denev wrote: >>>>>>>> Another thing I've observed is that active/active probably only = makes sense if you are accessing single LUN. >>>>>>>> In my tests where I have 24 LUNS that form 4 vdevs in a single = zpool, the highest performance was achieved >>>>>>>> when I split the active paths among the controllers installed = in the server importing the pool. (basically "gmultipath rotate $LUN" in = rc.local for half of the paths) >>>>>>>> Using active/active in this situation resulted in fluctuating = performance. >>>>>>>=20 >>>>>>> How big was fluctuation? Between speed of one and all paths? >>>>>>>=20 >>>>>>> Several active/active devices without knowledge about each other = with some probability will send part of requests via the same links, = while ZFS itself already does some balancing between vdevs. >>>>>>=20 >>>>>> I will test in a bit and post results. >>>>>>=20 >>>>>> P.S.: Is there a way to enable/disable active-active on the fly? = I'm >>>>>> currently re-labeling to achieve that. >>>>>=20 >>>>> No, there is not now. But for experiments you may achieve the same = results by manually marking as failed all paths except one. It is not = dangerous, as if that link fail, all other will resurrect automatically. >>>>=20 >>>> I had to destroy and relabel anyways, since I was not using = active-active currently. Here's what I did (maybe a little too verbose): >>>>=20 >>>> And now a very naive benchmark : >>>>=20 >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 7.282780 secs (73717855 bytes/sec) >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 38.422724 secs (13972745 bytes/sec) >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 10.810989 secs (49659740 bytes/sec) >>>>=20 >>>> Now deactivate the alternative paths : >>>> And the benchmark again: >>>>=20 >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 1.083226 secs (495622270 bytes/sec) >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 1.409975 secs (380766249 bytes/sec) >>>> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >>>> 512+0 records in >>>> 512+0 records out >>>> 536870912 bytes transferred in 1.136110 secs (472551848 bytes/sec) >>>>=20 >>>> P.S.: The server is running 8.2-STABLE, dual port isp(4) card, and = is directly connected to a 4Gbps Xyratex dual-controller (active-active) = storage array. >>>> All the 24 SAS drives are setup as single disk RAID0 LUNs. >>>=20 >>> This difference is too huge to explain it with ineffective paths = utilization. Can't this storage have some per-LUN port/controller = affinity that may penalize concurrent access to the same LUN from = different paths? Can't it be active/active on port level, but = active/passive for each specific LUN? If there are really two = controllers inside, they may need to synchronize their caches or bounce = requests, that may be expensive. >>>=20 >>> -- >>> Alexander Motin >>=20 >> Yes, I think that's what's happening. There are two controllers each = with it's own CPU and cache and have cache synchronization enabled. >> I will try to test multipath if both paths are connected to the same = controller (there are two ports on each controller). But that will = require remote hands and take some time. >>=20 >> In the mean time I've now disabled the writeback cache on the array = (this disables also the cache synchronization) and here are the results = : >>=20 >> ACTIVE-ACTIVE: >>=20 >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 2.497415 secs (214970639 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 1.076070 secs (498918172 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 1.908101 secs (281363979 bytes/sec) >>=20 >> ACTIVE-PASSIVE (half of the paths failed the same way as in the = previous email): >>=20 >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 0.324483 secs (1654542913 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 0.795685 secs (674727909 bytes/sec) >> :~# dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D512 >> 512+0 records in >> 512+0 records out >> 536870912 bytes transferred in 0.233859 secs (2295702835 bytes/sec) >>=20 >> This increased the performance for both cases, probably because = writeback caching does nothing for large sequential writes. >> Anyways, here ACTIVE-ACTIVE is still slower, but not by that much. >=20 > Thank you for numbers, but I have some doubts about them. 2295702835 = bytes/sec is about 18Gbps. If you have 4Gbps links, that would need more = then 4 of them, I think. >=20 > --=20 > Alexander Motin Hmm, thats silly of me. 512M is just too small, and probably I've = benched the ZFS cache. (I have only two 4Gbps links to the array). Here's run with 8G file: ACTIVE-ACTIVE: # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 62.120919 secs (136657207 bytes/sec) # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 65.066861 secs (130469969 bytes/sec) # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 64.011907 secs (132620190 bytes/sec) ACTIVE-PASSIVE: # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 34.297121 secs (247521398 bytes/sec) # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 31.709855 secs (267717127 bytes/sec) # dd if=3D/dev/zero of=3D/tank/TEST bs=3D1M count=3D8096 8096+0 records in 8096+0 records out 8489271296 bytes transferred in 34.111564 secs (248867840 bytes/sec) _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
文章代碼(AID): #1F6QlZnF (FB_current)
討論串 (同標題文章)
文章代碼(AID): #1F6QlZnF (FB_current)