[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-users] Diagnosing network freezes on Raq4?



> > We have a server that has been relatively imperturbable.  It
> > has stopped chatting on the net twice in two days.  All
> > internal stuff seems to continue, as cron continues
> > processing its charges, internally generated email from
> > logcheck is submitted, the active monitor entries appear in
> > the appropriate logs, and mrtg seems to continue logging its
> > various datapoints. We can find no evidence of mischief afoot
> > and there have been no recent upgrades or installs (or even
> > site additions) on this particular box.
> >
> > Any pointers on how to figure out what is making it stop
> > responding on all ports/services?  Also, is there a data
> > corruption risk when rebooting from the front panel or does
> > it shutdown daemons and filesystems properly?  Now if I just
> > had a 700 mile pole to push the button with <g>.
> >
>
> /var/log/messages anything there? What do the various logs show? Maybe
> /var/log/httpd/access or error for the time it shuts down. I'm guessing
> you mean the http stops working.

messages contained the usual bad referrals, response from unexpected source,
lame servers, etc then all ceases except the cache releases and stats from
named.

auth showed nothing prior to the reboot

kernel showed some hits on port 161 stopped/logged by ipchains, but 10 hours
before

httpd/error showed some 'File does not exist' messages 4 hours prior and
then the shutdown/restart entries
httpd/access is similar, it just stops, but does continue to log the monitor
probes for the gui

maillog shows pop logins and smtp activity without errors until the 'freeze'
then shows logcheck mails to admin accounts and more monitor probes

When this occurs, although the system may be up, it will not respond on
http/https, admin GUI, ssh, pop3, dns, or smtp...so I'm suspecting that
either there is a bad router from the isp as was just suggested (but not
Interland), or an internal failure of the nic or its 'driver'...but how does
one tell?  It also does not coincide with any of the cron jobs...  All
patches are current

-- Paul