Description of the Journaling topology

看板DFBSD_kernel作者dillon.時間21年前 (2004/12/28 17:01)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串1/42 (看更多)

I'm making good progress on the journaling layer. It will still be a week or two before it will be operational, but I've got the protocol pretty much figured out and will be laying it down in the tree as I get it working. The work should have virtually no impact on the system since the codepaths are only exercised when a journal is attached to a mount point, so I will probably be making smaller commits to the tree then I did for the VFS work. When all is said and done the journaling mechanism is going to look like this: -> [ MEMORY FIFO ] -> [ worker thread ] -> STREAM / e.g. 16MB [ secondary spool] VFSOP -> [journal shim]-- e.g. 16GB (transaction id) \ -> [filesystem VFS op] ------> target (e.g. an off-site machine). STREAM | <----------+ [transid acks going back] STREAM = generic file descriptor. e.g. regular file, socket, fifo, pipe, whatever. Half or full duplex. The STREAM will optionally be two-way, allowing the journaling target to tell the journaling layer when a transaction id has been committed to hard storage. This will also allow the journaling layer to retain portions of the journaling stream in the MEMORY FIFO and SECONDARY SPOOL in case the stream connection breaks and needs to be re-created (as would happen quite often for an off-site journaling stream), without losing any data, and to handle data backups that occur if the STREAM is a slow off-site link or if a glitch occurs. The MEMORY FIFO will allow us to batch operations, reducing context switches and allowing me to implement the worker thread concept as a very efficient asynchronous design. Since memory is limited the worker thread will also implement a secondary spooling store for the case where the journaling stream descriptor is lost for a long period of time, or if it is simply a slow link (e.g. real time offsite backup). In these cases we need a secondary spool to absorb the journaled data and allow the filesystem to continue to operate instead of just locking it up until the stream is recreated or catches up. The idea here is to allow potentially huge secondary spools to be created to literally absorb many hours worth of filesystem activity, giving a system manager plenty of time to fix things if they break and so slow off-site links do not slow down normal system activity to the speed of the off-site link. I consider that extremely important, it makes the whole concept of a real-time off-site backup feasible. That's the concept in a nutshell. In addition to all of that the data being journaled will have a number of options... e.g. the journaling data stream could be a simple non-reversable stream (a 'replay' stream), or a fully reversable stream (the ability to 'move' the regenerated filesystem forwards or backwards in time simply by playing the journaling stream forwards or backwards), etc etc. It is going to be a *VERY* powerful mechanism that no other BSD (or even Linux) will have. Eventually (not in two weeks) the journaling layer will make these acked transaction ids available to any journal-aware VFS filesystem allowing the filesystem to leverage the kernel's journaling layer for its own use and/or to control the underlying filesystem's own management of commits to physical storage. I also intend to use the journaling layer, with suitable additional cache coherency protocols, to handle filesystem synchronization in a clustered environment. In particular, an ability to do high-level cache-coherent replication that would be immune to catastrophic corruption rather then block-device-level replication which tends to propogate corrupting events. As you can see, I have *BIG* plans for the journaling layer over the next few years. -Matt

‣ 返回看板[ DFBSD_kernel ] DBSD

‣ 更多 dillon. 的文章

文章代碼(AID): #11qI3L00 (DFBSD_kernel)