Re: expanding past 1 TB on amd64

看板FB_current作者時間11年前 (2013/07/17 05:32), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串4/11 (看更多)
On Tue, Jul 16, 2013 at 7:08 AM, Kurt Lidl <lidl@pix.net> wrote: > On Wed, Jun 19, 2013 at 1:32 AM, Chris Torek <chris.torek at gmail.com> >> wrote: >> >> In src/sys/amd64/include/vmparam.**h is this handy map: >>> >>> * 0x0000000000000000 - 0x00007fffffffffff user map >>> * 0x0000800000000000 - 0xffff7fffffffffff does not exist (hole) >>> * 0xffff800000000000 - 0xffff804020100fff recursive page table (512GB >>> slot) >>> * 0xffff804020101000 - 0xfffffdffffffffff unused >>> * 0xfffffe0000000000 - 0xfffffeffffffffff 1TB direct map >>> * 0xffffff0000000000 - 0xffffff7fffffffff unused >>> * 0xffffff8000000000 - 0xffffffffffffffff 512GB kernel map >>> >>> showing that the system can deal with at most 1 TB of address space >>> (because of the direct map), using at most half of that for kernel >>> memory (less, really, due to the inevitable VM fragmentation). >>> >>> New boards are coming soonish that will have the ability to go >>> past that (24 DIMMs of 64 GB each = 1.5 TB). Or, if some crazy >>> people :-) might want to use a most of a 768 GB board (24 DIMMs of >>> 32 GB each, possible today although the price is kind of >>> staggering) as wired-down kernel memory, the 512 GB VM area is >>> already a problem. >>> >>> I have not wrapped my head around the amd64 pmap code but figured >>> I'd ask: what might need to change to support larger spaces? >>> Obviously NKPML4E in amd64/include/pmap.h, for the kernel start >>> address; and NDMPML4E for the direct map. It looks like this >>> would adjust KERNBASE and the direct map appropriately. But would >>> that suffice, or have I missed something? >>> >>> For that matter, if these are changed to make space for future >>> expansion, what would be a good expansion size? Perhaps multiply >>> the sizes by 16? (If memory doubles roughly every 18 months, >>> that should give room for at least 5 years.) >>> >>> >>> Chris, Neel, >> >> The actual data that I've seen shows that DIMMs are doubling in size at >> about half that pace, about every three years. For example, see >> http://users.ece.cmu.edu/~**omutlu/pub/mutlu_memory-** >> scaling_imw13_invited-talk.**pdfslide<http://users.ece.cmu.edu/~omutlu/pub/mutlu_memory-scaling_imw13_invited-talk.pdfslide> >> #8. So, I think that a factor of 16 is a lot more than we'll need in >> the next five years. I would suggest configuring the kernel virtual >> address space for 4 TB. Once you go beyond 512 GB, 4 TB is the net >> "plateau" in terms of address translation cost. At 4 TB all of the PML4 >> entries for the kernel virtual address space will reside in the same L2 >> cache line, so a page table walk on a TLB miss for an instruction fetch >> will effectively prefetch the PML4 entry for the kernel heap and vice >> versa. >> > > The largest commodity motherboards that are shipping today support > 24 DIMMs, at a max size of 32GB per DIMM. That's 768GB, right now. > (So FreeBSD is already "out of bits" in terms of supporting current > shipping hardware.) Actually, this scenario with 768 GB of RAM on amd64 as it is today is analogous to the typical 32-bit i386 machine, where the amount of RAM has long exceeded the default 1 GB size of the kernel virtual address space. In theory, we could currently handle up to 1 TB of RAM, but the kernel virtual address space would only be 512 GB. .... The Haswell line of CPUs is widely reported to > support DIMMs twice as large, and it's due in September. That would > make the systems of late 2013 hold up to 1536GB of memory. > > Using your figure of doubling in 3 years, we'll see 3072GB systems by > ~2016. And in ~2019, we'll see 6TB systems, and need to finally expand > to using more than a single cache line to hold all the PML4 entries. > > Yes, this is a reasonable prognostication. Alan > Of course, that's speculating furiously about two generations out, and > assumes keeping the current memory architecture / board design > constraints. > > -Kurt > > > ______________________________**_________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/**mailman/listinfo/freebsd-**current<http://lists.freebsd.org/mailman/listinfo/freebsd-current> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@** > freebsd.org <freebsd-current-unsubscribe@freebsd.org>" > _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
文章代碼(AID): #1HvRlJqL (FB_current)
討論串 (同標題文章)
文章代碼(AID): #1HvRlJqL (FB_current)