[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cobalt-developers] RaQ 550 RAID problems?



John:

This one sounds vaguely like a problem we recently had with one of our
RaQ550s set up as RAID1.

>>I've been experiencing a fairly severe problem with one of my newer
RaQ 550's ( I have a number of them), and am looking for feedback to
indicate whether this is an >>isolated issue, or a wider problem.

   Check the kernel log for errors.  We found DMA timeout errors with
our 550 in this log.  This lead to 4 to 5 times longer disk I/Os, etc.
   Errors noticed in /var/log/kernel file as follows:

	Jan 2 15:09:34 Rh kernel: hdc: timeout waiting for DMA 
	Jan 2 15:09:34 Rh kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14 
	Jan 2 15:09:34 Rh kernel: hdc: status timeout: status=0xd0 {
Busy } 
	Jan 2 15:09:34 Rh kernel: hdc: drive not ready for command 
	Jan 2 15:10:04 Rh kernel: ide1: reset timed-out, status=0x80 
	Jan 2 15:10:04 Rh kernel: hdc: status timeout: status=0x80 {
Busy } 
	Jan 2 15:10:04 Rh kernel: hdc: drive not ready for command 
	Jan 2 15:10:22 Rh kernel: ide1: reset: success 
	Jan 2 15:10:43 Rh kernel: hda: timeout waiting for DMA 
	Jan 2 15:10:43 Rh kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14 
	Jan 2 15:10:43 Rh kernel: hda: status error: status=0x58 {
DriveReady SeekComplete DataRequest

>>After running perfectly for a month or so, the machine began to
exhibit a symptom where it would take an extremely long time to respond
to updates made in the admin >>desktop, i.e. adding a user, or new
virtual site, and act generally lethargic from the command line.

   Similar to what we saw.

>>After issuing a reboot command, the machine will not come back up, and
displays a kernel panic on the front panel LCD, and then cycles for
another attempt at a >>reboot.

   Did not see this particular symptom.

>>Swapping the hard drives into another chassis, the problem will follow
the drives.  Re-building the install from the original CD will cure the
>>problem, but at the expense of dumping all the old data.   One may
>>conclude that this may indicate a problem with the drive(s), but I've
had the same machine exhibit this behavior 3 time over a 3 month period
with 3 different sets >>of drives installed, while the original drives
continue to work flawlessly in different RaQ 550 chassis.

   Yes, we also saw that the OSRCD would "fix" the problem, but only
temporarily.  It came back after a week or two of use. Our solution was
to ship the unit back for     	warranty repair, upon which Solectron
(Sun's repair vendor) replaced the motherboard.

>>>Do I ship this machine back??  I'm at a loss to explain this bizarre
behavior.

 

John Kraft

CIO / Wild Promotions, Inc.


Regards.
--------------------------------------------
John C. Branca
Manufacturing
Imprivata, Inc
10 Maguire Road, Suite 210
Lexington, MA 02421-3120
(781) 674-2738 p
(781) 674-2760 f
jbranca@xxxxxxxxxxxxx
http://www.imprivata.com