[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cobalt-developers] RaQ XTR Raid/Memory Failure = Lemon?



Over the past month we've acquired 3 XTRs directly from Cobalt.
3/4 have suffered catastrophic RAID-5 and RAID-1 failures...

The first machine would not even run, and was RMA'd (this was
later attributed to one of the first production runs, which as
it turns out, was susceptible to electrical interference).

The second had 3 separate RAID-5 failures in the course of a
single week (all on different drives), and finally suffered a
twin drive failure while rebuilding the RAID-5 array.

We also discovered that the XTR will not recognize any drives
beyond the factory configuration, unless you perform a clean
OS restore with all drives present.

After performing a clean OS restore using a RAID-1 array, both
drives failed upon startup and subsequent rebuilds. This unit
was also RMA'd back to Cobalt.

The third machine (an RMA for the first) has simply frozen on 2
separate occasions -- without any prior warning, etc. On the 1st
occasion, the RAID-5 array rebuilt fine. On the 2nd occasion,
the RAID-5 rebuild again failed (citing a drive failure).

Here's a segment from /var/log/kernel:

  May  9 00:25:36 ns1 kernel: md: md6: sync done.
  May  9 00:25:36 ns1 kernel: md: syncing RAID array md4
  May  9 00:25:36 ns1 kernel: md: minimum _guaranteed_
       reconstruction speed: 100 KB/sec.
  May  9 00:25:36 ns1 kernel: md: using maximum available idle IO
       bandwith for reconstruction.
  May  9 00:25:36 ns1 kernel: md: using 384k window.
  May  9 00:25:36 ns1 kernel: md: serializing resync, md3 has
       overlapping physical units with md4!
  May  9 00:25:36 ns1 kernel: md: serializing resync, md1 has
       overlapping physical units with md4!
  May  9 00:32:28 ns1 kernel: hdi: dma_intr: bad DMA status
  May  9 00:32:28 ns1 kernel: hdi: dma_intr: status=0x50 {
       DriveReady SeekComplete }

We're now on the fourth machine, and basically praying...

And now it has been confirmed that there is a problem with the
memory modules on the XTR -- that only 2/4 can be used at any
given time -- effectively limiting RAM to only 1 gigabyte.

We've been repeatedly told by Cobalt that these are "isolated"
incidents, and that we're the only customer who's experienced
these problems.

We'd be very interested and curious to hear from any other XTR
users who have shared a similar fate.
---

Jay Tingley, BlackSun
cobalt@xxxxxxxxxxx