[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [cobalt-developers] RaQ 550 RAID problems?
- Subject: Re: [cobalt-developers] RaQ 550 RAID problems?
- From: "John Branca" <jbranca@xxxxxxxxxxxxx>
- Date: Tue Feb 4 09:22:02 2003
- List-id: Discussion Forum for developers on Sun Cobalt Networks products <cobalt-developers.list.cobalt.com>
John:
This one sounds vaguely like a problem we recently had with one of our
RaQ550s set up as RAID1.
>>I've been experiencing a fairly severe problem with one of my newer
RaQ 550's ( I have a number of them), and am looking for feedback to
indicate whether this is an >>isolated issue, or a wider problem.
Check the kernel log for errors. We found DMA timeout errors with
our 550 in this log. This lead to 4 to 5 times longer disk I/Os, etc.
Errors noticed in /var/log/kernel file as follows:
Jan 2 15:09:34 Rh kernel: hdc: timeout waiting for DMA
Jan 2 15:09:34 Rh kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Jan 2 15:09:34 Rh kernel: hdc: status timeout: status=0xd0 {
Busy }
Jan 2 15:09:34 Rh kernel: hdc: drive not ready for command
Jan 2 15:10:04 Rh kernel: ide1: reset timed-out, status=0x80
Jan 2 15:10:04 Rh kernel: hdc: status timeout: status=0x80 {
Busy }
Jan 2 15:10:04 Rh kernel: hdc: drive not ready for command
Jan 2 15:10:22 Rh kernel: ide1: reset: success
Jan 2 15:10:43 Rh kernel: hda: timeout waiting for DMA
Jan 2 15:10:43 Rh kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Jan 2 15:10:43 Rh kernel: hda: status error: status=0x58 {
DriveReady SeekComplete DataRequest
>>After running perfectly for a month or so, the machine began to
exhibit a symptom where it would take an extremely long time to respond
to updates made in the admin >>desktop, i.e. adding a user, or new
virtual site, and act generally lethargic from the command line.
Similar to what we saw.
>>After issuing a reboot command, the machine will not come back up, and
displays a kernel panic on the front panel LCD, and then cycles for
another attempt at a >>reboot.
Did not see this particular symptom.
>>Swapping the hard drives into another chassis, the problem will follow
the drives. Re-building the install from the original CD will cure the
>>problem, but at the expense of dumping all the old data. One may
>>conclude that this may indicate a problem with the drive(s), but I've
had the same machine exhibit this behavior 3 time over a 3 month period
with 3 different sets >>of drives installed, while the original drives
continue to work flawlessly in different RaQ 550 chassis.
Yes, we also saw that the OSRCD would "fix" the problem, but only
temporarily. It came back after a week or two of use. Our solution was
to ship the unit back for warranty repair, upon which Solectron
(Sun's repair vendor) replaced the motherboard.
>>>Do I ship this machine back?? I'm at a loss to explain this bizarre
behavior.
John Kraft
CIO / Wild Promotions, Inc.
Regards.
--------------------------------------------
John C. Branca
Manufacturing
Imprivata, Inc
10 Maguire Road, Suite 210
Lexington, MA 02421-3120
(781) 674-2738 p
(781) 674-2760 f
jbranca@xxxxxxxxxxxxx
http://www.imprivata.com