Re: approach on getting nullfs to work again

看板DFBSD_kernel作者dillon.時間21年前 (2005/02/10 07:01)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串3/15 (看更多)

:[original private post to matt, but something in my mail system must eat=20 :mails, so i'm hoping that this one will get through] Nope, I'm just overloaded. :What happens if a file or directory in the underlying filesystem is being=20 :renamed or deleted? Doesn't that mean that I need to adjust the namecache f= :or=20 :the nullfs layer, too? :... : :We thought of a solution: overlay filesystems must lock their covered (i'll= :=20 :call it "shadow") parallel namecache entries, too, if they are being locked= :=2E=20 :Whereas this is not complicated to implement in cache_lock(), there is=20 :another problem: the namecache doesn't know about overlay filesystems. if=20 :doesn't know that there exist shadow namecache entries. so there must be so= :me=20 :way of communication between namecache and vfs, maybe some=20 :vop_cache_create()? : :now this got a rather long mail, thanks for your attention :hoping for input, : simon Ok. We have two problems. The second is solved as you say... the overlay filesystem itself is aware of the underlying filesystem and must lock the underlying namecache record. That is fairly straight forward. The rename-in-underlying-filesystem problem is a cache-coherency issue, solved by our (not yet existant) cache coherency layer! :-) So the question begins: Can we construct a minimal cache coherency layer that can be used to help build nullfs and unionfs but that will not have to ripped out when we do the 'real' layer ? I think the answer is: yes, we can. We can create a minimal cache coherency layer based on the vnode's v_namecache list. Then it becomes a question of how complex a layer should we try to create? Taking for example a rename() in the underlying filesystem... do we want to try to propogate the rename to the overlay or do we simply want to invalidate the overlay? I think to begin with we just want to invalidate the overlay. When I designed the new namecache topology I considered the possibility of having to deal with multiple overlayed filesystems and made the vnode's v_namecache a list of namecache records instead of a pointer to a single record. The idea being that instead of having nullfs fake-up vnodes (like it does in FreeBSD) we instead have it return the *actual* vnode and only fake-up the namecache topology. The system has no problem with multiple namecache records referencing the same vnode. This greatly reduces the burden on nullfs to translate VOP calls... it only has to deal with namecache related translations, it does NOT have to deal with things like VOP_READ(). The notion of the 'current' directory is now a namecache record in DragonFly, so we can get away with this without confusing someone CD'd into a nullfs filesystem. (In FreeBSD the 'current directory' is a vnode and hence nullfs and unionfs had to fake-up the vnode. In DragonFly it is a namecache pointer and we do NOT have to fake-up the vnode). Ok, so once that is dealt with we need to make sure that the cache invalidation mechanism, our skeleton cache coherency layer, does not deadlock when it takes a locked namecache record and has to invalidate a namecache topology elsewhere. This case only occurs when a filesystem operation on the UNDERLYING filesystem occurs, because the underlying filesystem is not aware of the overlay. In the case of the nullfs overlay the nullfs code is aware of the underlying filesystem and will make the appropriate namecache calls to the underlying filesystem's namecache topology. For an operation being done directly on the underlying filesystem the underlying filesystem is not aware of the overlay, but the namecache code IS aware of the overlay because it sees multiple namecache records attached to the vnode. So the namecache code must scan the list of namecache structures associated with the vnode and issue the appropriate cache_inval*() calls on the namecache records other then the one it was called with. I think this is all very doable and, even better, does not represent any major surgery for systems not using nullfs (which is all of the right now), so we can keep things stable during the work. I know there are several people interested in making nullfs work again, especially Simon. Who has time to actually code? I would be able to help out but I'd prefer not to do the core coding. Questions? Interest? Simon, you want to code this up ? -Matt Matthew Dillon <dillon@backplane.com>

‣ 返回看板[ DFBSD_kernel ] DBSD

‣ 更多 dillon. 的文章

文章代碼(AID): #122fOp00 (DFBSD_kernel)