Re: Description of the Journaling topology
Matthew Dillon wrote:
> :I think that there is a basic synchronisation issue in such topology.
> :Due to buffering, delays, etc it is possible that in some cases
> :filesystem will commit changes to the permanent storage before
> :appropriate journaling entry is created, i.e.:
> :
> :1. App executes unlink("foo").
> :2. Kernel sends appropriate VOP to the filesystem and to the journal.
> :3. Filesystem commits metadata update, journal entry still sits
> :somewhere in the buffer.
> :4. App executes open("foo", O_CREAT).
> :5. Kernel sends appropriate VOP to the filesystem and to the journal.
> :6. Journaling system commits unlink() entry to the storage.
> :7. Filesystem commits metadata update, machine crashes before journal
> :entry for open() is committed.
> :
> :On reboot, kernel tries to replay journal as a result already created
> :file foo is lost. The same situation may happen for subsequent write's
> :and other operations - due to jounrnal lagging behing storage it is
> :possible that in the case of failure some data already written to the
> :storage is lost.
> :
> :How you are going to address this issue?
> :
> :-Maxim
>
> Solving this issue requires the filesystem to be aware of the journal's
> existance, which I've mentioned in past posts. The filesystem would
> have to buffer related disk operations until it gets positive
> confirmation that the related journal entries have been committed.
> This is similar to what softupdates does, but the implementation
> would not have to be anywhere near as sophisticated.
>
> Baring that you might not be able to guarentee that an incremental
> playback from the journal would be sufficient to fully recover the
> filesystem. But even in that case A full restore from backups and full
> playback from the journal would be able to fully recover the
> filesystem up to N seconds prior to the crash. It would just take longer.
> So the basic property of being able to restore within N seconds is
> still guarenteeable even without a journal-aware filesystem.
Yes, this is what I am talking about. So that you can forget about fast
recovery of filesystem into a consistend state after a crash - one of
the selling points of today's journaled fs.
-Maxim
討論串 (同標題文章)
完整討論串 (本文為第 27 之 42 篇):