Re: SU+J systems do not fsck themselves
On Dec 27, 2011, at 10:14 PM, David Thiel wrote:
> On Tue, Dec 27, 2011 at 02:48:22PM -0800, Xin Li wrote:
>>>> - use journalled fsck; - use normal fsck to check if the
>>>> journalled fsck did the right thing.
>=20
> Ok, here is the log of fsck with and without journal.
>=20
> http://redundancy.redundancy.org/fscklog3
>=20
The first run of fsck, using the journal, gives results that I would =
expect. The second run seems to imply that the fixes made on the first =
run didn't actually get written to disk. This is definitely an oddity. =
I see that you're using geli, maybe there's some strange side-effect =
there. No idea. Report as a bug, this is definitely undesired =
behavior.
> That was done the very next boot, after a clean shutdown. The errors=20=
> from the previous live fsck aren't there (oddly), but there are still=20=
> are apparently some corrections made. The next fsck still complains, =
but=20
> doesn't give any salvage prompts.
>=20
> Here is jsa@'s, done on a live FS with SU+J:
>=20
> http://redundancy.redundancy.org/fscklog4
>=20
For the love that is all good and holy, don't ever run fsck on a live =
filesystem. It's going to report these kinds of problems! It's normal; =
filesystem metadata updates stay cached in memory, and fsck bypasses =
that cache. Also, what you see in your log is a file that has been =
unlinked but held open. This is a common Unix idiom, and one that gets =
cleaned up by fsck on reboot, whether through the SUJ intent log =
processing or through a traditional fsck.
> I'm not actually looking to solve my particular problem per se. The=20
> issue is that almost everyone I've checked with that's running SU+J =
gets=20
> unref'd file and other errors when they check their filesystem (with =
the=20
> fs live). Unless I'm missing something, a running FS should never have=20=
> those kinds of errors unless you deliberately disabled fsck.
>=20
Nope, you are completely incorrect here.
> This leaves only a couple options:
>=20
> - SU+J and fsck do not work correctly together to fix corruption on=20
> boot, i.e. bgfsck isn't getting run when it should
The point of SUJ is to eliminate the need for bgfsck. Effectively, they =
are exclusive ideas. It's possible that there are still problems with =
SUJ and how fsck processes and commits the journal entires. However, =
bgfsck has nothing to do with this, and I'd also like to know if your =
use of geli is complicating the problem.
> - Stuff is getting completely screwed up after boot
Possibly but unlikely
> - fsck is giving incorrect results
Very unlikely
> - I'm completely clueless about how SU+J is supposed to behave or be=20=
> deployed
No comment =3D-)
>=20
> I'm pretty certain that the first is the issue here. It would be great=20=
> if others could check their own SU+J filesystems so we could get a few=20=
> more data points.
>=20
Indeed, more data is needed.
Scott
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
討論串 (同標題文章)
完整討論串 (本文為第 6 之 16 篇):