[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cobalt-users] RaQ 4R Networking Dying & Patches Connection?
- Subject: [cobalt-users] RaQ 4R Networking Dying & Patches Connection?
- From: "Michelle A. Hoyle" <mahlist@xxxxxxxxxxxxx>
- Date: Mon Oct 28 13:21:02 2002
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
For the last month and a half, I've been experiencing some unusual
behaviour on my RaQ4R. The symptoms are as follows:
- Swatch thinks all services are up
- The various logs can see localhost connections from Swatch
- No external connections are accepted.
- Serial connections from terminal work fine.
- Uptime drops to between 1-8 days if soft-resets are necessary
- Sometimes networking can be brought back up by running
/etc/rc.d/init.d/networking
Before this problem surfaced, a few things happened:
- Both fans in the machine failed, causing the
internal temp. to soar to 90 degrees C.
- I installed a huge backlog of system patches (dating back
to kernel patch in March), bringing the system up to date
as of the end of September.
Working with the theory that maybe we had flakey onboard hardware as
a result of the high temperatures, I had the Network Operations
Centre pull the drives/memory out of the server and place them into a
new RaQ4 body. Unfortunately, the problem still persists. I will
have them change the memory, but I'm losing faith in the hardware
problem idea.
The following patches/packages are installed:
Miva Empresa (RaQ3) Release 3.94
Relational Database Server and Client tools by InterBase. Release V6.0
Miva Merchant Package (RaQ3) Release 4.13
Cobalt MySQL Release 3.23.37-1
Cobalt OS Release 6.0
RaQ4-All-CMU Release 2.27
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Security Release 1.0.2-8762
RaQ4-All-Security Release 2.0.1-13323
RaQ4-All-Security Release 2.0.1-13453
RaQ4-All-Security Release 2.0.1-14559
RaQ4-All-Security Release 2.0.1-14997
RaQ4-All-Security Release 2.0.1-15417
RaQ4-All-Security Release 2.0.1-2-15787
RaQ4-All-System Release 2.0.1-12854
RaQ4-All-System Release 2.0.1-13993
RaQ4-All-System Release 2.0.1-14185
RaQ4-en-OSUpdateRelease 2.0
Third Party Disaster Recovery Release 1.0.2-9198
RaQ4_dbm_apache-1.3.12-1C9Release .5
Chili!Soft ASP Interbase upgrade Release 3.5.2.1
Traffic Statistics Light (Mermaid Consulting I/S) Release 1.1-1
OpenSSH Release 3.4p1-PM4
I'm not sure why the Kernel release appears three times in the list.
Before that was installed, another item appeared three times.
RaQ4_dbm_apache-1.3.12-1C9Release .5 is a custom Apache package from
Sun Cobalt incorporating modauth_dbm and modauth_db for us. Note
that much like many other people on this list, there is custom
software installed on our machine: extra Perl libraries, mrtg
watchers, log checkers, firewall software, portscan monitors, PHP
Zend Script Optimizer, raqbackup, newer versions of OpenSSL for
OpenSSH (compiled over top of Raq package), etc.
We have firewall scripts installed, which have been working since
last December. We run nmap, fcheck and chkroot every night and those
haven't reported anything unusual above and beyond the patch changes
(with respect to fcheck). Checking our MRTG reporter (updated every
5 minutes) shows no unusual traffic, temperature increases, swap
memory problems, or high process loads.
Timeline of Events:
Sept. 11th: Fans fail, new fans installed
Sept. 13th: RaQ4-All-Security Release 2.0.1-13453 installed (glibc)
RaQ4-All-Kernel-2.0.1-2.2.16C32III-2.0.1-2.2.16C32III
installed (kernel)
Sept. 16th: chkrootkit (0.37) installed
RaQ4-All-System-2.0.1-14185.pkg (pbhp control panel change)
RaQ4-All-Security Release 2.0.1-13323 (bind, proftp,
zlib, etc) installed
gcc permissions changed to 700 from -rwxr-xr-x
(slapper compilation prohibited)
Sept. 17th: Set /etc/saslb to g-r to remove group-readable error message in
logs after bind/mutt/pine/zlib update last night
Sept. 21st: Outage
Sept. 29th: Outage
Oct. 2nd: Outage
Oct. 10th: Outage
Oct. 11th: RaQ4-All-Security-2.0.1-15417.pkg (Apache) installed
RaQ4-All-Security-2.0.1-2-15787.pkg (Apache) installed
RaQ4-All-Security-2.0.1-14997.pkg (cgi-wrap) installed
Oct. 11th: Outage
Oct. 18th: Outage
Oct. 23rd: Outage.
Tried to restart from console using /etc/rc.d/init.d
networking -- failed
Oct. 25th: Outage
Oct. 27th: Drives/memory pulled and placed into new machine.
Oct. 28th: Outage
Tried to restart from console using /etc/rc.d/init.d
networking -- success
Is anybody else experiencing problems like this or have any idea
what's causing it? Better yet, does anybody have any ideas about how
to fix it?
Thanks,
Michelle A. Hoyle