Re: SunFire X2200 ilo's bge1 DOWN/UP
On May 27, 2013, at 12:59 AM, Daniel Braniss wrote:
On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200,
If you're truly running stable/9, and it's up-to-date, you should have have=
already
SVN revisions 248858 and 250650. Both of which have significant impact for
(a) the SunFire X2200 (r248858) and (b) the DOWN/UP problem (r250650).
Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> =
mem
0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6
bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
miibus2: <MII bus> on bge0
brgphy0: <BCM5714 1000BASE-T media interface> PHY 1 on miibus2
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:1b:24:5d:5b:bd
bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> =
mem
0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6
bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
miibus3: <MII bus> on bge1
brgphy1: <BCM5714 1000BASE-T media interface> PHY 1 on miibus3
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: Ethernet address: 00:1b:24:5d:5b:be
sf-10> ifconfig bge1
bge1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=3D8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LI=
NKSTA
TE>
ether 00:1b:24:5d:5b:be
nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
Saw similar things happening over here with different broadcom chipset, and=
the above revisions
helped significantly (URLs below):
http://svnweb.freebsd.org/base?view=3Drevision&revision=3D248858
http://svnweb.freebsd.org/base?view=3Drevision&revision=3D250650
is toggeling bge1 DOWN/UP every few hours, this port is being used by the I=
LO.
To check, I upgraded another identical host, and the same problem appears.
What is the last known working revision?
I have no idea, but I have older versions, and ill start from the oldets
(9.1-prerelease), but
it will take time, since it takes hours till it happens.
There are ways you can speed up the replication time. I tend to flood a ser=
ver with
TCP while I've heard of it happening under UDP flood too.
Here's a nice way to flood a server with TCP (assuming you have SSH access =
to the
system via keys):
sh -c 'while :;do dd if=3D/dev/urandom of=3D/dev/stdout bs=3D1m count=3D102=
4 | ssh HOST2KILL /sbin/md5; done'
Run that about 16 times in separate screen sessions from various other host=
s on your network,
taking care to replace "HOST2KILL" with the hostname or IP of the box with =
the SunFire X2200.
Let that run for a while, and then when you think you've had a reset (if yo=
u weren't standing
there watching for one)=85
grep 'bge.*DOWN' /var/log/messages
On a system that has booted and stayed up-and-running, there shouldn't be a=
ny messages like this:
bge0: link state changed to DOWN
When you actually get this message (if your experience is like ours), you'l=
l be down for 90 seconds
while the NIC resets.
However, since you say you have some older 9.1 releases=85 I'd start by fir=
st trying to bring the
replication time of the problem down by using TCP and/or UDP floods. That w=
ay you'll be able to
test for resolution of the problem as you progress up to stable/9 (where th=
e problem should be fixed
by the aforementioned SVN revisions -- specific to your hardware).
There
is not correlation with time, since they happend at totaly different times.
I rebooted both hosts at almost the same time.
one host :
uptime: 5:24PM up 6:15, 0 users, load averages: 0.00, 0.00, 0.00
May 24 12:53:52 sf-04 kernel: bge1: link state changed to DOWN
May 24 12:53:55 sf-04 kernel: bge1: link state changed to UP
May 24 15:34:25 sf-04 kernel: bge1: link state changed to DOWN
May 24 15:34:28 sf-04 kernel: bge1: link state changed to UP
and
uptime: 5:24PM up 6:14, 0 users, load averages: 0.00, 0.00, 0.00
May 24 16:30:44 sf-10 kernel: bge1: link state changed to DOWN
May 24 16:30:44 sf-10 kernel: bge1: link state changed to UP
this is not serious, the ilo (ssh) connection is ok, but it's anoying, we h=
ave
more
than 10 of this hosts, and if I upgrade all of them, the logs will fill up
with this :-)
any ideas?
Well, you say the connection is OK=85 so it doesn't sound like a full reset=
as it
was in our case (we have a different chipset).
But I agree that a log full of those would be annoying.
Try getting up to stable/9 in its current state (note: stable/8 also has al=
l the
aforementioned revisions too).
--
Devin
_____________
The information contained in this message is proprietary and/or confidentia=
l. If you are not the intended recipient, please: (i) delete the message an=
d all copies; (ii) do not disclose, distribute or use the message in any ma=
nner; and (iii) notify the sender immediately. In addition, please be aware=
that any message addressed to our domain is subject to archiving and revie=
w by persons other than the intended recipient. Thank you.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
討論串 (同標題文章)
完整討論串 (本文為第 8 之 24 篇):