Re: nfs-bug when server for 9-Stable becomes client as well ?

看板FB_stable作者時間12年前 (2012/07/07 02:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串3/6 (看更多)
Vincent Hoffman <vince@unsane.co.uk> writes: > On 06/07/2012 14:19, Arno J. Klaassen wrote: >> Hello, >> >> looks like I discouvered a probable bug in the nfs-code, very >> easy to reproduce in my setup : >> >> >> Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) >> >> Machine-2 : 8-stable as of April the 10th exporting /raid1 >> >> On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) >> and start a script on this mount looping something like : >> >> dd if=/dev/random of=BIG bs=1048576 count=${SIZE} >> cp -fp BIG BIG2 >> cmp -x BIG BIG2 >> >> I let this run for 24 hours (from time to time stressing Machine-1 with >> other scripts, including provoking heavy swapping), no problem at all. >> >> However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) >> on Machine-2, and *immediately* the above loop on Machine-1 fails : >> >> Copying file ...cp: BIG: Permission denied >> >> No console messages this time, last time I got >> >> kernel: nfs_getpages: error 13 >> kernel: vm_fault: pager read error, pid 87803 (cmp) >> >> on Machine-1. >> >> I repeated this scenario by replacing Machine-2 with a good old >> 6-4-stable one, same outcome. >> >> Please tell me what I could do to nail this down a bit more. > Its possible (although not definite) that you have hit the a mountd bug > as documented in PRs > > kern/131342 > kern/136865 especially kern/131342 looks similar and quite old; funny I never hit this before, I basically do the same tests since 'ages' on each new box. Could be that faster network/cpu unreveals some race condition; I notice as well that this server is the first (IIRC) who uses 3 different IRQs for network interrupts (em(4) Intel(R) PRO/1000). > I've recently asked on -CURRENT about this and had a patch to try from > Rick, I'm testing it now but it doesnt seem to fix it for me, just > improve it alothough I'm trying to get enough runs to be a valid sample. > (see > http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current > ) > > What I did for my production nas was edit mount.c so it didnt send a > SIGHUP to mountd as suggested by rick, as it was easy to do and non > intrusive. hmm, this means I should patch each fbsd-client, no? May be easier to patch mountd to ignore SIHGUP and use some non-standard signal to force re-init? Arno > Vince > >> >> Thanx in advance, >> >> Best, Arno >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
文章代碼(AID): #1FzoVVHr (FB_stable)
討論串 (同標題文章)
文章代碼(AID): #1FzoVVHr (FB_stable)