[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-developers] bizarre RaQ4 problem



Greetings all,


Sorry for the redundant post if any of you also monitor the
-users list.  Although this isn't strictly a dev question, it
might be a bit more esoteric than what -users covers, so I'm
re-posting here.


I'm wrangling with a RaQ4.  After a seemingly-random amount of
uptime (anywhere from a few minutes to several hours), anything
utilizing MySQL ceases to work:

* MySQL apparently isn't running; see below.

* The MySQL unix domain socket is gone.

* The MySQL error log sometimes thinks mysqld _is_ running
  ("Number of processes running now: <some number>") when
  safe_mysqld restarts MySQL.  After that, however, no mysqld
  entries show in ps.

* "netstat -anpt" shows tcp 0.0.0.0:3306 in state LISTEN, but
  with a hyphen instead of the pid/cmdline.

* Observed from the client end, 3306/TCP server connection
  attempts appear successful, and the server side shows the new
  connection... but, again, there's no process actually listening
  to the socket.  IOW, the kernel is still accepting connections
  despite the death of the original listener.  Bad bad bad bad
  bad.  Bad.  BAD.  BAD!

* I've found no way to close these "ghost sockets" other than
  rebooting the machine.  The number of active TCP sockets just
  keeps increasing.

* The md5 checksums for ps and netstat appear clean.  None of the
  entries in /proc show a commandline of mysqld.  chkrootkit
  finds naught.  IOW, I'm hesitant to suspect a cracked box; the
  possibility can't be discarded, but I doubt any rootkits are
  present.

* I've increased fs.file-max and fs.inode-max sysctl values, but
  with no apparent effect.  The box is not running out of memory,
  and never eats into swap.

As a final tidbit, inetd seems to suffer the same "ghost socket"
fate as mysqld _after_ mysqld has been dead (between worlds?) for
some amount of time.  Oddly, inetd continues to show up in "ps"
output.

I'm virtually convinced of a kernel bug, but am not really sure
exactly where, or what's tickling it.  Kernel is 2.2.16C33_III,
and I know some changes to TCP and sockets have been made since
2.2.16.

Anyone ever encountered anything similar?

FWIW, I'm toying with a kernel upgrade to 2.4... which, of
course, has a few dependencies I'd need to address.  I'd stick
with 2.2, except the C33_III changes to the kernel do not play
nicely with 2.2.17, and I don't feel like attempting to merge
manually.  I've not tried a newer version yet... maybe a stock
Linux kernel with release date near that of 2.2.16C33_III would
work better.


Eddy
--
Brotsman & Dreger, Inc. - EverQuick Internet Division
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 (785) 865-5885 Lawrence and [inter]national
Phone: +1 (316) 794-8922 Wichita

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Mon, 21 May 2001 11:23:58 +0000 (GMT)
From: A Trap <blacklist@xxxxxxxxx>
To: blacklist@xxxxxxxxx
Subject: Please ignore this portion of my mail signature.

These last few lines are a trap for address-harvesting spambots.
Do NOT send mail to <blacklist@xxxxxxxxx>, or you are likely to
be blocked.