[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cobalt-users] Do RAID failures just go away????



> -----Original Message-----
> From: Josh Kuperman [mailto:josh@xxxxxxxxxxxxxxxxxx] 
> Sent: Wednesday, September 03, 2003 9:07 AM
> To: cobalt users
> Subject: [cobalt-users] Do RAID failures just go away????
> 
> 
> Last night I ssh'ed in and was checking my email. I couldn't 
> help but notice large amounts of it were missing along with a 
> slew of symlinks and the main web page address returned an 
> empty directory in a browser.  I'm on a RaQ XTR using RAID 0; 
> the was a notice mailed out to admin, that one of the disk 
> drives had failed. All the directories when I did "ls ../../" 
> weren't there (which of course couldn't be correct since I 
> was in my home directory when I logged in.)
> 
> There were some other oddities. My system load was at 2.00 as 
> opposed to a usual 0.15 and there were two runaway processes, 
> instances of VIM, owned by a user who hadn't logged in for a 
> while. They were using 48% of cpu each according to top.I 
> killed those.
> 
> Now when I get in today I see that it all looks OK -- except 
> for some nightly log rotations, vanishing squid log files, 
> and wherever mutt stuck my mail. (I'm assuming a temporary 
> folder someplace). I assume I'll find more problems once I 
> know where to look.
> 
> So I have the following questions:
> 
> 1. Is the disk bad or good - how do I tell?
> 2. Was the runaway VIM a sign of hacking, bad coding, or 
> irrelevant? 3. Is there anyway to tell what would have been 
> damaged by temporarily having one disk out of the loop?
> 
I can't give you absolute answers, but here are some of my observations on my Raq XTR:
2. I have had some runaway VIM processes that take up all the resources. During this time, I couldn't really do much or trust results from other commands until I killed those VIM processes. I have seen them 'hard' to kill as well, i.e. I've had to send the kill signal a couple of times or try a different signal (can't exactly remember which). Those VIM processes were generally left out there if I had SSHed in, and my firewall timed out the connection, leaving SSH and VIM 'hangin out there'. Once I killed the process(es), everything was fine. I do not remember who it said the process belonged to, but I would assume it was myself.

1. I had my server tell me a RAID disk (raid 5) was bad here a little while back ( about two months I guess). I rebooted, the server rebuild the raid, and all has been fine since. You might want to check the archives on this one. I know someone had it happen more frequently than me, but honestly can't remember if there was a remedy.

Vidar
> -- 
> Josh Kuperman                       
> josh@xxxxxxxxxxxxxxxxxx
> 
> _____________________________________
> cobalt-users mailing list
> cobalt-users@xxxxxxxxxxxxxxx
> To subscribe/unsubscribe, or to SEARCH THE ARCHIVES, go to: 
> http://list.cobalt.com/mailman/listinfo/cobalt> -users
>