Re: kernel work week of 3-Feb-2010 HEADS UP

看板DFBSD_kernel作者fjwcash.時間16年前 (2010/02/06 02:01)推噓0(0推 0噓 0→)

留言0則, 0人參與討論串23/30 (看更多)

--001485ea892ab97d12047ed3a5cc Content-Type: text/plain; charset=UTF-8 On Thu, Feb 4, 2010 at 7:18 PM, Matthew Dillon <dillon@apollo.backplane.com>wrote: > > :Is the concern that people would be more inclined to remove an SSD than a > :regular drive by mistake, or that splitting off the log could lead to an > :"oops, I forgot that the log was separate" situation when changing out > :drives? Or something else? > : > :It seems like an odd thing to worry about, to be honest. If you can't > :trust users not to start removing important components from their > :systems... > : > :MAgnus > > Well, true enough. I guess the real issue I have is that one > is dedicated a piece of equipment to a really tiny piece of the > filesystem. Though I can't deny the utility of having a fast fsync(). > If the storage system is big enough then, sure. If you're talking > about going from one physical drive to two though it probably isn't > worth the added complexity it just to get a fast fsync(). > > This would be a setup similar to the ZFS L2ARC (cache) and SLOG (separate log device). The cache device is one or more read-optimised (ie MLC) SSDs. Any data that would be ejected from the in-memory ARC is then written to the cache device. Any future reads of that data are pulled from the cache device instead of from disk. These should be as big and as fast (for reads) as possible. It's basically treated as extra "RAM". The separate log device is a mirrored pair (redundancy is critical for this part) of write-optimised (ie SLC) SSDs. Any block writes smaller than 64K go directly into the ZIL and marked as "written to disk" while also being queued for writing to the pool. If the server crashes, the ZIL is read and any transaction groups that are missing from the pool are copied over from the ZIL. If the server never crashes, the data in the ZIL is never actually used. In most cases, the ZIL only needs to be a few GB in size. Until very, very recent versions of ZFS, removing log devices from a pool was impossible, so if it died, the pool was unusable and all data lost, which is why using mirrored sets was important. One can now remove log devices, which moves the ZIL back into the pool. This would be similar to the swap cache on MLC SSD, and the UNDO log/FIFO on SLC SSD. -- Freddie Cash fjwcash@gmail.com --001485ea892ab97d12047ed3a5cc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Thu, Feb 4, 2010 at 7:18 PM, Matthew Dillon <= span dir=3D"ltr"><<a href=3D"mailto:dillon@apollo.backplane.com">dillon@= apollo.backplane.com</a>></span> wrote:<br><blockquote class=3D"gmail_qu= ote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0p= t 0.8ex; padding-left: 1ex;"> <br> :Is the concern that people would be more inclined to remove an SSD than a<= br> <div class=3D"im">:regular drive by mistake, or that splitting off the log = could lead to an<br> :"oops, I forgot that the log was separate" situation when changi= ng out<br> :drives? =C2=A0Or something else?<br> :<br> :It seems like an odd thing to worry about, to be honest. =C2=A0If you can&= #39;t<br> :trust users not to start removing important components from their<br> :systems...<br> :<br> :MAgnus<br> <br> </div> =C2=A0 =C2=A0Well, true enough. =C2=A0I guess the real issue I have = is that one<br> =C2=A0 =C2=A0is dedicated a piece of equipment to a really tiny piece of t= he<br> =C2=A0 =C2=A0filesystem. =C2=A0Though I can't deny the utility of havi= ng a fast fsync().<br> =C2=A0 =C2=A0If the storage system is big enough then, sure. =C2=A0If you&= #39;re talking<br> =C2=A0 =C2=A0about going from one physical drive to two though it probably= isn't<br> =C2=A0 =C2=A0worth the added complexity it just to get a fast fsync().<br> <div><div></div><div class=3D"h5"><br></div></div></blockquote><div class= =3D"h5">=C2=A0This would be a setup similar to the ZFS L2ARC (cache) and SL= OG (separate log device).<br></div><div><br>The cache device is one or more= read-optimised (ie MLC) SSDs.=C2=A0 Any data that would be ejected from th= e in-memory ARC is then written to the cache device.=C2=A0 Any future reads= of that data are pulled from the cache device instead of from disk.=C2=A0 = These should be as big and as fast (for reads) as possible.=C2=A0 It's = basically treated as extra "RAM".<br> <br>The separate log device is a mirrored pair (redundancy is critical for = this part) of write-optimised (ie SLC) SSDs.=C2=A0 Any block writes smaller= than 64K go directly into the ZIL and marked as "written to disk&quot= ; while also being queued for writing to the pool.=C2=A0 If the server cras= hes, the ZIL is read and any transaction groups that are missing from the p= ool are copied over from the ZIL.=C2=A0 If the server never crashes, the da= ta in the ZIL is never actually used.=C2=A0 In most cases, the ZIL only nee= ds to be a few GB in size.<br> <br>Until very, very recent versions of ZFS, removing log devices from a po= ol was impossible, so if it died, the pool was unusable and all data lost, = which is why using mirrored sets was important.=C2=A0 One can now remove lo= g devices, which moves the ZIL back into the pool.<br> </div></div><br>This would be similar to the swap cache on MLC SSD, and the= UNDO log/FIFO on SLC SSD.<br>-- <br>Freddie Cash<br><a href=3D"mailto:fjwc= ash@gmail.com">fjwcash@gmail.com</a><br> --001485ea892ab97d12047ed3a5cc--

‣ 返回看板[ DFBSD_kernel ] DBSD

‣ 更多 fjwcash. 的文章

文章代碼(AID): #1BR5pf36 (DFBSD_kernel)