[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-users] RaQ 4R Networking Dying & Patches Connection?



For the last month and a half, I've been experiencing some unusual behaviour on my RaQ4R. The symptoms are as follows:
	- Swatch thinks all services are up
	- The various logs can see localhost connections from Swatch
	- No external connections are accepted.
	- Serial connections from terminal work fine.
	- Uptime drops to between 1-8 days if soft-resets are necessary
	- Sometimes networking can be brought back up by running
	/etc/rc.d/init.d/networking


Before this problem surfaced, a few things happened:
	- Both fans in the machine failed, causing the
	internal temp. to soar to 90 degrees C.

	- I installed a huge backlog of system patches (dating back
	to kernel patch in March), bringing the system up to date
	as of the end of September.

Working with the theory that maybe we had flakey onboard hardware as a result of the high temperatures, I had the Network Operations Centre pull the drives/memory out of the server and place them into a new RaQ4 body. Unfortunately, the problem still persists. I will have them change the memory, but I'm losing faith in the hardware problem idea.

The following patches/packages are installed:
 Miva Empresa (RaQ3) Release 3.94
Relational Database Server and Client tools by InterBase. Release V6.0
Miva Merchant Package (RaQ3) Release 4.13
Cobalt MySQL Release 3.23.37-1
Cobalt OS Release 6.0
RaQ4-All-CMU Release 2.27
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Kernel Release 2.0.1-2.2.16C32III
RaQ4-All-Security Release 1.0.2-8762
RaQ4-All-Security Release 2.0.1-13323
RaQ4-All-Security Release 2.0.1-13453
RaQ4-All-Security Release 2.0.1-14559
RaQ4-All-Security Release 2.0.1-14997
RaQ4-All-Security Release 2.0.1-15417
RaQ4-All-Security Release 2.0.1-2-15787
RaQ4-All-System Release 2.0.1-12854
RaQ4-All-System Release 2.0.1-13993
RaQ4-All-System Release 2.0.1-14185
RaQ4-en-OSUpdateRelease 2.0
Third Party Disaster Recovery Release 1.0.2-9198
RaQ4_dbm_apache-1.3.12-1C9Release .5
Chili!Soft ASP Interbase upgrade Release 3.5.2.1
Traffic Statistics Light (Mermaid Consulting I/S) Release 1.1-1
OpenSSH Release 3.4p1-PM4

I'm not sure why the Kernel release appears three times in the list. Before that was installed, another item appeared three times. RaQ4_dbm_apache-1.3.12-1C9Release .5 is a custom Apache package from Sun Cobalt incorporating modauth_dbm and modauth_db for us. Note that much like many other people on this list, there is custom software installed on our machine: extra Perl libraries, mrtg watchers, log checkers, firewall software, portscan monitors, PHP Zend Script Optimizer, raqbackup, newer versions of OpenSSL for OpenSSH (compiled over top of Raq package), etc.

We have firewall scripts installed, which have been working since last December. We run nmap, fcheck and chkroot every night and those haven't reported anything unusual above and beyond the patch changes (with respect to fcheck). Checking our MRTG reporter (updated every 5 minutes) shows no unusual traffic, temperature increases, swap memory problems, or high process loads.


Timeline of Events:
Sept. 11th:	Fans fail, new fans installed

Sept. 13th:	RaQ4-All-Security Release 2.0.1-13453 installed (glibc)
RaQ4-All-Kernel-2.0.1-2.2.16C32III-2.0.1-2.2.16C32III installed (kernel)

Sept. 16th:	chkrootkit (0.37) installed
		RaQ4-All-System-2.0.1-14185.pkg (pbhp control panel change)
RaQ4-All-Security Release 2.0.1-13323 (bind, proftp, zlib, etc) installed gcc permissions changed to 700 from -rwxr-xr-x (slapper compilation prohibited)

Sept. 17th:	Set /etc/saslb to g-r to remove group-readable error message in
		logs after bind/mutt/pine/zlib update last night

Sept. 21st:	Outage

Sept. 29th:	Outage

Oct. 2nd:	Outage

Oct. 10th:	Outage

Oct. 11th:	RaQ4-All-Security-2.0.1-15417.pkg  (Apache) installed
		RaQ4-All-Security-2.0.1-2-15787.pkg (Apache) installed
		RaQ4-All-Security-2.0.1-14997.pkg (cgi-wrap) installed

Oct. 11th:	Outage

Oct. 18th:	Outage

Oct. 23rd:	Outage.
Tried to restart from console using /etc/rc.d/init.d networking -- failed

Oct. 25th:	Outage

Oct. 27th:	Drives/memory pulled and placed into new machine.

Oct. 28th:	Outage
Tried to restart from console using /etc/rc.d/init.d networking -- success



Is anybody else experiencing problems like this or have any idea what's causing it? Better yet, does anybody have any ideas about how to fix it?

Thanks,

Michelle A. Hoyle