English Community

Datacenter SystemsSystem x - X6, M5, M4
All Forum Topics
Options

3 Posts

03-30-2018

US

9 Signins

56 Page Views

  • Posts: 3
  • Registered: ‎03-30-2018
  • Location: US
  • Views: 56
  • Message 1 of 2

Array offline: following arrays have missing required members and cannot be configured

2018-03-30, 13:11 PM

Looking for insight/advice on recovering an array on an IBM-branded System X3650. Machine type is 7979-AC1 and the RAID controller is a ServeRAID 8k-l.

 

I believe the data on the drives is intact but it seems the array configuration is not being recognized properly or something along those lines.

 

I'll give the backstory also but here is the current state.

 

On boot, I get a message that "following array have missing required members and cannot be configured."

 

Array#254-RAID-1

Array#255-RAID-1

 

The array is a RAID 10 compromised of 3 RAID 1's. In the RAID controller BIOS / configuration utility, it shows the array as offline.

 

Two of the RAID 1's show as being intact with both members online. Beneath Array#254 and Array#255, it says "Missing Member."

 

Additionally, the two drives that comprise the missing RAID 1 are visible. If I go into the "create array" section of the configuration utility, they appear as being available to add to a new array whereas the other 4 drives compromising the other 2 RAID 1's are greyed out.

 

There has never been any visible indication of any drive failures--no amber fault lights, etc.

 

I have booted off the ServeRAID support CD and performed a verify on one of the two drives that compromise the "missing" RAID 1 and no errors were reported. I will verify the other but I believe it is functional and intact based on the fact that, as mentioned, there have been no failure indicators.

 

In the ServeRAID user's reference guide, there is a section beginning at the bottom of page 120 and continuing on page 121 that speaks of recreating an array from scratch without building/initializing it in order to recover data. This seems very promising given our situation. The part that I don't see explained is a situation like ours where the array configuration is still present and a person needs to add two drives back into the array vs. build it from scratch. If I were to follow the procedure described here and create a new logical drive containing the two "missing" RAID 1 members using the "recover data" option, I believe they would be seen as an independent RAID 1 logical drive / array and would not be seen as a RAID 1 in the RAID 10. Wondering if there is a way to remove the configuration (while making note of the array size and stripe size which are required to execute this procedure) and then perform the documented procedure to rebuild the array from scratch without re-initializing the drives in order to recover the data.

 

It seems I should be able to do this somehow as there is no indication of a physical drive failure among the six drives that comprise the RAID 10 array. I really don't understand what has occurred here and how these "missing" members of the RAID 10 array can be visible / available to add to a new array and yet seen as missing from the current configuration.

 

There is also a documented option to force the array online but my instinct is that this would be a bad thing to do given that the RAID controller sees one of the RAID 1's that comprise the RAID 10 as completely missing.

 

Here's the backstory...

 

The system went down unexpectedly several days ago. We were alerted by our network monitoring system that it was down. I do not know what error was seen on the screen immediately after the system crashed as I was not present.

 

Upon rebooting the system, the RAID controller reported "no logical drives found." We went into the RAID controller BIOS / configuration utility and in the manage arrays section, there were no arrays seen. We went into the create array section, and it showed 2 drives as available to add to a new array and 4 as greyed out. We exited and made no changes.

 

At this point, our hardware vendor replaced the RAID controller. No change was seen at this point. (If logs are stored on the controller, I do have the original controller so I could potentially get logs from it to see if any additional insight as to what errors, if any, were recorded prior to the system crash.)

 

Next, I upgraded the RAID controller firmware via the ServeRAID support CD to version 5.2.0.17003. Previously, we were running a VERY old version--possibly the 2nd revision ever released for the 8k-l controller.

 

Upgrading the controller improved the situation in that now the array was at least detected. Upon booting the system after upgrading the controller firmware, I got the message about "missing required members." I accepted the configuration and that led me to where I am now.

 

I feel there should be hope for recovery as the drives that compromise the "missing" RAID 1 seem to be intact. The procedure for rebuilding the array from scratch without re-initializing sounds very promising but I don't know how to do that with the array still seen as the procedure only covers a scenario where you are recreating the array from scratch--it doesn't cover our scenario where we the array configuration is there but drives that are present and part of the array are seen as missing. I don't see a documented way to add those to the exiting RAID 10 configuration using the "recover data" option described.

 

What I don't want to do is anything that would destory the data and preclude us from being able to use a recovery service should we decide to do so.

 

Another note that could be relevant though I rather doubt it... There were no fault indicators present on the system when the crash occurred. AFTER I upgraded the firwmare on the RAID controller, I noticed the BRD fault light was lit on the light path diagnostics panel. I know this was not lit when the system first crashed. I don't really think this is relevant because the "missing" members are seen by the RAID controller in both the RAID controller BIOS and ServeRAID Manager. The one missing members has, as stated, been verified already.

 

I can post additional screen shots from ServeRAID Manager if that would be helpful.

 

Any insight would be appreciated. I would also be open to engaging Lenovo support but they don't have any per-incident phone support--only per-incident on-site service and I just don't see any evidence of an actual hardware issue.

 

EDIT: Just adding that I am DEDUCING that the RAID 10 was comprised of 3 RAID 1's based on the size of the array and the fact that there are 6 drives in the system and members missing. I can't see how this could be wrong but am just noting that I did not have knowledge of the exact array configuration prior to the system crash having occurred so am going off what is seen in the controller configuration utility / ServeRAID Manager currently. As I mentioned, I do still have the original RAID controller so could potentially pull logs from it IF they are stored on the controller to confirm any configuration details or errors recorded around the time that the logical drive went offline and the system crashed.

 

 

 

Solved! See the solution
Reply
Options

3 Posts

03-30-2018

US

9 Signins

56 Page Views

  • Posts: 3
  • Registered: ‎03-30-2018
  • Location: US
  • Views: 56
  • Message 2 of 2

Re: Array offline: following arrays have missing required members and cannot be configured

2018-04-09, 14:14 PM

I was able to get the array back online and all of the data seems to be intact. I'll post what I did here but I would NOT encourage folks to blindly take the same action without proper research, understanding the implications, etc.

 

I tried putting the 6 drives comprising the array in another x3650 with a ServeRAID 8k (vs. 8k-l) controller just to rule out a hardware issue on the server since the BRD light was on. This second 3650 system saw the two sub-logical drives (the two RAID 1s in the RAID 10 that were seen as having their members present and being intact) as foreign arrays when the controller spun up during POST but, when the identified configuration was accepted and I went into the controller BIOS / configuration utility, it did not see any arrays and all 6 drives were seen as ready/available to add to a new array. When I put the drives back into the original system, the array configuration was also not seen at this point. I had previously captured the controller logs off of the original system using the ServeRAID boot CD so I knew the exact details of the array configuration.

 

I decided to test the recovery option described in the ServeRAID User's Reference on the x3650 with the 8k controller as the it was no longer used and there was no data of any significance on the drives--just an ESXi install. I deleted the array, accepting the warning that all data would be deleted, and then I recreated the array using the exact configuration parameters (of particular importance per the ServeRAID guide are the array size and stripe size) EXCEPT that I chose the "recover data" option for the initialization method. I was still able to boot from ESXi install previously installed on the array proving that the method described in the ServeRAID guide worked and that the data on the drives was not destroyed.

 

At this point, I was confident due to my testing that I would not do any harm by simply recreating the array on the original system using the "recover data" initialization method since I knew the exact array configuration that existed previously. Since there was no array configuration seen by the controller in the original system at this point, there was no array to delete at this point. (Had the array configuration still been visible with the one RAID 1 seen as missing, I think I would have had to delete the array at that point in order to recreate/restore the array configuration.)

 

I recreated the RAID 10 array using the same array size and stripe size as was used when it was originally created but I used the "recover data" initialization method. I had all of the same drives in the same slots they were in when the system failed. I rebooted the system after recreating the array and it loaded Windows and came right up. I have not found any evidence of data corruption yet so everything looks good.

 

I still have no explanation for what transpired. I did see in the controller log that, right around the time the system went down when the issue occurred, an "Unknown error" was recorded for all 6 drives (by serial number) at the exact same time.

 

So far as I can tell, the array configuration was the only thing that got corrupted and, as I mentioned, I was able to perform a verify on both of the drives that were seen as missing members but still visible to the controller after this occurred.

 

Like I said, I would encourage folks to do their own research, testing, etc. if they encounter a similar situation before attempting any of the processes/procedures that I used but this worked for me in this particular case where there were no drive failures but the array configuration was seemingly corrupted.

0 person found this solution to be helpful.

This helped me too

Reply
Forum Home

Community Guidelines

Please review our Guidelines before posting.

Learn More

Check out current deals!

Go Shop
X

Save

X

Delete