SITUATION: Server had what appears to be a raid controller failure complete with it blowing away the data on the drives. The only backup available is corrupt.
I began by mounting one of the drives on my local machine. Windows Disk Manager would not recognize the device as having a partition table. I then ran TestDisk. It was unable to recover a partition table. TestDisk also comes with a nice utility called PhotoRec. Photorec looks at the raw data on the device attempting to extract files. PhotoRec can bea bit of a pain when recovering a lot of files because it cannot recover filenames. Fortunately, this particular server was a virtual host, and the only files I cared about were the 10Gb+ VHD virtual disk files.
ADVENTURES WITH PhotoRec
I initially just ran PhotoRec just to see if and or what it could even find. After running for several hours, I was happy to see it found thousands of files. Upon closer inspection of the files I found that they were actually files that were inside the actual VHD’s. This makes sense because of how PhotoRec searches for files. I needed to create a custom PhotoRec file signature extension to handle the VHD files. www.cgsecurity.org/wiki/Add_your_own_ext…
Using a hex editor I opened several VHD files from my archive and noted thay all started with “conectix”. I then created photorec.sig in the PhotoRec directory with the following inside:
I then re-ran PhotoRec, this time using the options to only select my custom signature extension. It then began recovering VHD files. It found around 6 header signatures, and after some time had dumped them all out. I was able to mount one using diskmgmt.msc on my Windows7 laptop. The other VHDs were all unable to mount due to corruption or other factors.
Back to TestDisk
I decided to try TestDisk again, this time though when it asked what kind of partition, I selected none, then had it scan. It found the NTFS partition! I was able to navigate into the drive’s structure and instruct TestDisk to copy the VHD I needed to my local machine…
…After some hours, the dump froze. I was monitoring the file copy progress and it was making good progress (over 100G recovered) but then abruptly restarted. Or appeared to do so. Before I started this process I notices TestDisk was indicating that there were multiple VHD files with the same name. I am guessing it completed, then moved on the the next, overwriting the existing recovered file since no sane file system allows two files of the same name in the same directory. Eventually it froze as mentioned.
This should of been one if the first things I did, but I did it now since I wanted to simply attempt to repair/rebuild the NTFS partition table which should then give me direct access to the files rather than trying to extract them thru TestDisk. Obviously this should be the first thing done, but as I was confident in the physical drive’s health, I bypassed this at first.
I did the backup thru TestDisk but I believe it just used dd emulated thru cygwin anyway so it’s the same thing.
Rebuilding the NTFS Boot Sector On An NTFS Partition
I dumped the MBR and sadly it was all zeros thanks to the RAID controller nuking it.
Both NTFS boot sectors (the primary and the backup) are corrupted so we need to rebuild the NTFS boot sector.
TestDisk searches the MFT (Master File Table: $MFT) and its backup ($MFTMirr). It reads the MFT record size, it computes the cluster size, and it reads the size of the Index Allocation Entry in the root directory index. Using all these values, TestDisk can provide a new boot sector.
Finally it lets the user list the files before writing.
Update: That failed. The sector count is different now too. I suspect the data got hosed. I can easily recover with my dd command:
Loaded up a copy of my dd into TestDisk, Created a new log file and selected the drive. For partition type I selected “None”. Then select Analyse. It located the NTFS partition and the I was able to hit (f or p — not sure typing this from memory) to recover files. I attempted this exact method previously, but tried to restore the entire directory. This time I simply had it restore the individual VHD I was interested in. Set the destination path (make sure there is space!!) and a few hours later I had a VHD… working flawlessly.
11-17-2012 Note: This was mentioned by the CG Security authors here: www.cgsecurity.org/mw/index.php?title=In…