[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-users] Followup: unrecoverable errors on /dev/hda4



Chip wrote:
>> I've had this happen twice now, where this non-production server, only
>> about 3 months old, experiences a power failure and refuses to reboot.

and then Jeff wrote:
>What do you mean by "experiences a power failure"? Do you mean the
>exernal power to the RaQ is turned off? Or do you mean the Raq turns
>itself off? Or what?

Well, in this particular case, someone in the office shut off the UPS attached to the Raq without realizing that the Raq was still on, so it was a full external power loss to the Raq. Absent human ineptitude, the Raq itself has been very stable.

>If any Linux/Unix system loses power without a proper shutodwn, it may
>have a lot of errors on fsck.

Yes, I've heard all sorts of horror stories from my hardcore Linux buddies :)

>But if the system is crashing or locking up, and then a reboot shows
>these errors, it may be drive replacement time.

Well, the problem now is that I don't seem to be able to get fsck to fix the errors. Each time I reboot, it says something close to "Errors in /dev/hda4. Exiting to shell." I then run fsck manually, but it gives these unrecoverable read errors and offers only to ignore them, not to fix them. So, when I reboot again, I get the same message.

So do I have to restore the drive (not a lot is on it), or is there something equivalent to SpinRite or Norton that will fix (i.e., write over) the unreadable sectors of the disk?

And, given these errors, would it be wise to replace the drive anyway, before the Raq goes to a colo NOC that is many miles from where I am?

Thanks again for everyone's help!

Chip