08-19-2019 10:19 AM - edited 08-19-2019 10:20 AM
Our X3650 M4 recently had an issue where a bad hard disk failed and was reported "missing" by the system. This prevented the system from booting. Lenovo was prepared to replace the backplane but replacing an eccentrically flashing hard disk (reported as missing) seemed to do the trick. Now as we restore we are experiencing a significant number of Event 55 file corruptions. Can we do anything while the server is "live" to clean up this file system? Does this sound like anything more ominous than file corruption from a restoration?
08-21-2019 06:39 AM
We are running 16 300GB drives in RAID 10 with two partitions. This machine is notorious for the cache-clearing issue where we have to take the server down and unplug the power to clear cache.
It's days are numbered!
I am not involved with the restore so I'm not certain whether it involves an incremental restore or a complete rewrite. There were eventually a dozen folders reporting errors after three days of restoring and testing. The problem was getting worse. We ran Check Disk with repair and that solved the issue.
We're working with warranty support as we would still like to provide them with a analysis file to review. I'd like to know why the restore was so rocky. We've not had such a difficult time before.
08-21-2019 08:34 AM
If a drive on a RAID volume fails (or it is missing) a new drive (or the same one) needs to be rebuilt into the array, it cannot be forced in to the array, as its data would be out of sync with the other drives on the array (especially if it is a new drive) and it could corrupt the data.
A bad drive on the RAID 10 volume should not prevent access to the RAID volume. If the system halted at boot, it may have been to warn about the missing drive, specifically if the drive was good on shutdown and bad when powering back up, there should have been an option to continue the boot process.
My comment about restore was with respect to the RAID volume, on how the problem drive was replaced. That operation should be transparent at the data level, unless the drive was forced on line. Data restore and check disk are higher level operations and should not be impacted by a degraded or rebuilding RAID volume.
Not aware of a ‘cache-clearing issue’, but without detailed information about hardware/software being uses, only the server model, it is difficult to evaluate. If you are working with warranty, support will be able to gather all logs from the system to better understand the situation.
08-21-2019 09:08 AM
The issue began when the hard drive failed in such a way that it went missing to the controller. That issue caused the error message in the preboot which in turn prevented any access to preboot utilities and BIOS. The hard drive still had a green light although it was blinking eccentrically. When the drive was replaced the mirror did not rebuild. The drive had to be declared a hot spare in the RAID management software before the system put the drive to work. The restore did not go well. We chose to run check disk which fixed the the bad sectors. Today's DSA gave the server a clean bill of health. We will proceed by updating any and all firmware as soon as we can. As far as I'm concerned the issue is resolved.