Journaling layer update - any really good programmer want to sta

看板DFBSD_kernel作者dillon.時間21年前 (2005/03/04 14:32)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串1/1

I'm making more good progress with the journaling code. The journaling layer is now writing out most of the required information for most VOPs. It currently: * Has a working journal creation/deletion/listing mechanism via mountctl (most options not yet implemented). Still very primitive, but good enough for testing purposes. * Writes out the cred, vattr, security audit info (uid, gid, p_pid, p_comm), timestamps, file write data (but not putpages data yet), creations, deletions, link, softlink, renames, etc. Still on the TODO list: * Writing out the stuff I've forgotten. * Writing out the UNDO information. UNDO information is what makes a journal reversable, which is one of the big ticket items that I want to support, but it requires writing out the prior contents of data blocks, prior uid, gid, time info, and so forth. * Identification of vnodes in vnode operations so the journal target knows what file VOP_WRITE's are associated with without having to run the log backwards too much. I'll probably write out the file handle and then also write out the file path if it hadn't been written out in the last N seconds. * Direct SWAP backing support for the monitor. * Adding a crc (I have a field for it but I'm not generating it yet). * Two-way stream transaction id acknowledgement and a journal write failure / restart protocol (this is what is going to make the journal reliable over a network link). * We need a utility program which scans the (binary) journal and can generate shell commands to regenerate a filesystem and/or decode the journal and display the operations in a human readable format. NEED HELP! I am doing all the kernel work, but I am looking for someone to help engineer and write the user utility that actually does something real with the generated journal! Anyone interested in doing this? We want a utility that is capable of: * extracting a file subhierarchy and generating a mirror of the filesystem. * extracting a file subhierarchy and generating human readable output showing the audit trail of all changes made within the subhierachy. * extracting a file subhierarchy and generating a new raw journal containing only that subhierarchy. * extracting deleted files by name ('undelete' realized!) * extracting a file subhierarchy and generating a mirror that is 'as of' a particular date in the past. -- Technical Journal Record Format Details - The journal record format is in sys/journal.h. It's quite straightforward but it IS a multi-layer recursive record format. The first layer is a virtual stream layer (needed because multiple entities may be writing out transactions to the journal simultaniously). Virtual streams are typically short-lived entities that represent transactions. One transaction per virtual stream. The virtual stream layer is controlled by the journal_rawrecbeg and journal_rawrecend structure (designed so a utility program can scan the journal forwards or backwards). The second layer is a recursive record layer controlled by the journal_subrecord structure. Each transaction may contain a hierarchy of subrecords representing all the information required to understand and/or UNDO the transaction. So, for example, a file creation will have a JTYPE_CREATE subrecord which contains a number of other subrecords (JLEAF_PATH1, JLEAF_MODES, JLEAF_UID, JLEAF_GID, etc), and even other non-leaf nodes (JTYPE_UNDO). All records (with one exception) contain the actual record size but are always physically 16-byte aligned. There are three gotchas. First, the high level virtual stream may break up the subrecords. The stream transaction block must be reconstructed before the subrecords can be scanned. Second, a NON-LEAF journal subrecord may have a record size of 0, which means the utility program has to recurse through it to figure out how big it actually is. The recsize field is mandatory for LEAF subrecords so you don't have to worry about those. This occurs because some transactions exceed the size of the memory fifo and must be flushed out before the journaling code knows how large the subrecord is! The last gotcha is that the high level stream representing a transaction may be aborted at the stream level, right smack in the middle of the subrecord transaction. Scanning code must understand that the stream block may be 'truncated' relative to the record sizes indicated by the subrecords. This occurs if the journal is in the middle of a transaction and then determines that the operation, in fact, has failed. e.g. due to the VFS op failing. Yes, it's sophisticated, but the journal must be capable of doing sophisticated things and had other requirements like multiple processes building transactions at the same time, like transactions being potentially *huge* (gigabytes e.g. if you do a 1GB write(), that's a gigabyte-sized transaction!), having to store UNDO data in, wanting to make the format extensible, and so forth. I decided not to go for an ultra-compact format because I believe that can be done even better using e.g. a gzip layer. -Matt Matthew Dillon <dillon@backplane.com>

‣ 返回看板[ DFBSD_kernel ] DBSD

‣ 更多 dillon. 的文章

文章代碼(AID): #12A03h00 (DFBSD_kernel)