[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cobalt-users] Strange hang problem via FTP and web on Raq4
- Subject: [cobalt-users] Strange hang problem via FTP and web on Raq4
- From: Logan Lewis <logan@xxxxxxxxx>
- Date: Thu Jul 25 16:31:01 2002
- List-id: Mailing list for users to share thoughts on Sun Cobalt products. <cobalt-users.list.cobalt.com>
Recently I have been experiencing strange hang problems on at least one Raq4.
On the server I have diagnosed most carefully, the problem can be reproduced
trying to download files via FTP. The actual transfer of files is not where
it hangs, but usually with this message: "150 Opening ASCII mode data
connection for file list". As soon as the actual files start transferring, I
have never experienced it hang. However, the problem has also been
reproduced by spending time browsing around through CGI generated web pages
hosted by the server.
While the server is hanging, any open ssh sessions will not respond. Pinging
the server produces "Destination Host Unreachable" on my Linux workstation
and "Request timed out" on a Windows computer.
Sometimes the hang will last a few seconds or a fraction of a minute.
Sometimes it stays hanged until the FTP client is restarted, and sometimes it
needs the networking interface restarted. Sometimes even this does not work,
so the panel-button interface on the Cobalt is needed to restart it. Since
the button interface works, I at least know that the server is responding
locally if nothing else.
The log files show no information during the hang, and the last recorded
message is seemingly normal, the usual ttloop: read: Broken pipe. I do seem
to be getting the series of messages about "cannot bind [IP address] to server
'ProFTPD', already bound to 'ProFTPD' fairly often. I know that it means
there are virtual sites sharing the same IP and it only binds proftpd once to
the IP, but I get the whole series of messages multiple times in one hour. I
don't know if this is indicative of ProFTPD restarting itself or if it means
something else.
The only related message I have found in my searches is this one:
http://list.cobalt.com/pipermail/cobalt-users/2001-May/047418.html
It seems to have very similar symptoms, but I contacted him and his solution
was to ensure that DNS and reverse DNS was working for the IP addresses.
Though reverse DNS was not set up for each IP address at first, fixing it did
not help the problem.
I have tried using tcpdump to look at the packets being sent during the hang,
and they don't seem unusual. I'd be happy to provide log entries and tcpdump
info if anyone would think it was useful.
The problems seemed to start while two principle changes were taking place: I
updated the software on the Raq 4s for the latest security updates, and I
reformatted our primary DNS server. The DNS server (a Raq 2) broke during the
named update several months back, forcing me to use an older version. So
after formatting I was able to restore the named configuration files first
and run the updates, which worked this time. However, since the only other
instance of this problem was due to DNS, I thought I would mention this.
The server is not running out of memory - I created enough swap files (100
meg/each, I didn't know if Raq4s had the Linux swap file size issue) to have
more than 700 mb of swap space.
Though it hasn't been tested as thoroughly, the problem does not seem to
occur from outside the network. The possible requirement of a manual reboot
makes it more risky to test at a different location. If anyone else has
experience this problem, or has a suggestion, I would greatly appreciate it.
My apologies for the lengthy post.
Sincerely,
Logan