Re: SU+J systems do not fsck themselves

看板FB_current作者時間14年前 (2011/12/28 16:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串8/16 (看更多)
On Dec 28, 2011, at 12:34 AM, David Thiel wrote: > On Tue, Dec 27, 2011 at 11:54:20PM -0700, Scott Long wrote: >> The first run of fsck, using the journal, gives results that I would=20= >> expect. The second run seems to imply that the fixes made on the=20 >> first run didn't actually get written to disk. This is definitely an=20= >> oddity. I see that you're using geli, maybe there's some strange=20 >> side-effect there. No idea. Report as a bug, this is definitely=20 >> undesired behavior. >=20 > Not impossible, but I was seeing similar issues on two non-geli = systems=20 > as well, i.e. tons of errors fixed when doing a single-user=20 > non-journalled fsck, but journalled fsck not fixing stuff. I'll try to=20= > replicate on a test machine, as I already lost data on the last=20 > (non-geli) machine this happened to. >=20 >> For the love that is all good and holy, don't ever run fsck on a live=20= >> filesystem. It's going to report these kinds of problems! It's=20 >> normal; filesystem metadata updates stay cached in memory, and fsck=20= >> bypasses that cache. =20 >=20 > Ok. I expected fsck would be softupdate-aware in that way, but I=20 > understand it not doing so. >=20 >>> - SU+J and fsck do not work correctly together to fix corruption on=20= >>> boot, i.e. bgfsck isn't getting run when it should >>=20 >> The point of SUJ is to eliminate the need for bgfsck. Effectively,=20= >> they are exclusive ideas. =20 >=20 > This is surprising to me. It is my impression that under Linux at = least,=20 > ext3fs is checked against the journal, and gets a full e2fsck if it=20 > finds it's still dirty. Additionally, there's a periodic fsck after = 180=20 > days continuous runtime or x number of mounts (see tune2fs -i and -c). = =20 > Is SU+J somehow implemented in such a way that this is unnecessary? = What=20 > does it do that the ext3fs people have missed? >=20 SUJ isn't like ext3 journaling, it doesn't do 100% metadata logging. = Instead, it's an extension of softupdates. Softupdates (SU) is still = responsible for ordering dependent writes to the disk to maintain = consistency. What SU can't handle is the Unix/POSIX idiom of unlinking = a file from the namespace but keeping its inode active through = refcounts. When you have an unclean shutdown, you wind up with stale = blocks allocated to orphaned inodes. The point of bgfsck was to scan = the filesystem for these allocations and free them, just like fsck does, = but to do it in the background so that the boot could continue. SUJ is = basically just an intent log for this case; it tells fsck where to find = these allocations so that fsck doesn't have to do the lengthy scan. = FWIW, this problem is present in most any journaling implementation and = is usually solved via the use of intent records in a journal, not unlike = SUJ. So, there's an assumption with SUJ+fsck that SU is keeping the = filesystem consistent. Maybe that's a bad assumption, and I'm not = trying to discredit your report. But the intention with SUJ is to = eliminate the need for anything more than a cursory check of the = superblocks and a processing of the SUJ intent log. If either of these = fails then fsck reverts to a traditional scan. In the same vein, ext3 = and most other traditional journaling filesystems assume that the = journal is correct and is preserving consistency, and don't do anything = more than a cursory data structure scan and journal replay as well, but = then revert to a full scan if that fails (zfs seems to be an exception = here, with there being no actual fsck available for it). As for the 180 day forced scan on ext3, I have no public comment. SU = has matured nicely over the last 10+ years, and I'm happy with the = progress that SUJ has made in the last 2-3 years. If there are bugs, = they need to be exposed and addressed ASAP. Scott _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
文章代碼(AID): #1E-io-C6 (FB_current)
討論串 (同標題文章)
文章代碼(AID): #1E-io-C6 (FB_current)