Re: ECC memory driver in FreeBSD 10?

看板FB_current作者時間13年前 (2012/05/02 10:01), 編輯推噓0(000)
留言0則, 0人參與, 最新討論串6/6 (看更多)
On Apr 9, 2012, at 6:04 AM, O. Hartmann wrote: > Am 04/08/12 14:53, schrieb Miroslav Lachman: >> Nikolay Denev wrote: >>> On Apr 6, 2012, at 2:48 PM, O. Hartmann wrote: >>>=20 >>>> I'm looking for a way to force FreeBSD 10 to maintain/watch ECC = errors >>>> reported by UEFI (or BIOS). >>>> Since ECC is said to be essential for server systems both in = buisness >>>> and science and I do not question this, I was wondering if I can = not >>>> report ECC errors via a watchdog or UEFI (ACPI?) report to syslog >>>> facility on FreeBSD. >>>> FreeBSD is supposed to be a server operating system, as far as I = know, >>>> so I believe there must be something which didn't have revealed = itself >>>> to me, yet. >>=20 >>>=20 >>> If the hardware supports it, such errors should be logged as MCEs >>> (Machine Check Exceptions). >>> I can say for sure it works pretty well with Dell servers, as I had=20= >>> one with failing RAM module, and >>> it reported the corrected ECC errors in dmesg. >>=20 >> Memory ECC errors are logged in to messages and you can decode it by >> sysutils/mcelog. I did it in the past on one of our Sun Fire X2100 M2 >> with FreeBSD 8.x. >>=20 >> Miroslav Lachman >=20 > Seems that I have been blessed with non-faulty memory over tha past > three or four years. Last time I saw errors was around 2000. All of = our > 24/7 servers do have ECC RAM. >=20 > So, your replies all implies if I log the system's messages via syslog > properly (as we do remotely on a centralized server), then ECC errors > should be reported by FreeBSD/kernel in a canonical way as the = UEFI/BIOS > reports them? > Without special drivers/tools, scripts which scans for those errors > should report occurences? >=20 > Since my (FreeBSD) boxes didn't show up errors of that kind - Linux > boxes of a colleague did once! - doesn't imply missing capabilities. > This is nice to hear/read. >=20 > Thanks a lot, >=20 > Oliver >=20 This is what you see in syslog when sys/x86/x86/mca.c detects a memory = error: > Mar 16 12:37:33 hostname kernel: MCA: Bank 8, Status = 0x8c0000400001009f > Mar 16 12:37:33 hostname kernel: MCA: Global Cap 0x0000000000001c09, = Status 0x0000000000000000 > Mar 16 12:37:33 hostname kernel: MCA: Vendor "GenuineIntel", ID = 0x206c2, APIC ID 0 > Mar 16 12:37:33 hostname kernel: MCA: CPU 0 COR (1) RD channel ?? = memory error > Mar 16 12:37:33 hostname kernel: MCA: Address 0xb43ca6240 > Mar 16 12:37:33 hostname kernel: MCA: Misc 0x4ac8111000064808 mcelog will help you figure out which DIMM is affected. Also, if your server includes an IPMI controller, the BIOS should be set = up to log memory errors to the IPMI system event log (SEL). You can = look at the SEL with ipmitool from the ports collection. 'ipmitool sel = list' will show you if any errors have been reported. -Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
文章代碼(AID): #1Fe9Lihy (FB_current)
文章代碼(AID): #1Fe9Lihy (FB_current)