skip navigation

a2i Communications (rahul.net)
Home | Howto | Webmail | Servers

Navigation

  Dedicated Servers  |  Rate Chart  |  FAQ  |  Demo 1  |  Demo 2  |  Getting Started  |  Activity Links  |  Site Design

On This Page

  Top |  Repairing Boot Block |  Summary |  A Running Machine  |  Destroy Sector 0  |  PLD Rescue CD  |  Recover Partitions |  Install Boot Loader  |  Back to Normal!  |  Detailed Log |  Notice |  Bottom


Valid XHTML 1.0! Valid CSS!
 
Home > Dedicated Servers > Demo 2

Demo 2

Repairing a Damaged Boot Block

This is a log of how you might repair a damaged boot block and partition table on your TQMbox server, without leaving the comfort of your home or office.

TQMbox is a service mark of A2I COMMUNICATIONS.

We have tested this procedure on the RedHat 9 and Fedora Core 2 platforms and it worked to our satisfaction.

However, recovering a damaged boot block or partition table is a delicate operation. The success or failure of this operation depends on many factors, including: your Linux skills; the specific operating system revision and distribution in use; other factors including the specific circumstances under which the sector 0 damage occurred; and most of all, luck.. Nobody can guarantee success. The method below uses a very clever program called testdisk. This program searches your server’s disk drive for data that looks like a partition, and tries to identify what type of partition it might be. In many cases, but not all, it can correctly identify all paritions, and rebuild the partition table accordingly.

Below is a log of how we actually did this on a real machine running Fedora Core 2 that was using the grub boot loader. If your machine is using a different boot loader, for example, lilo, some of the steps will be substantially different from those shown below.

Summary

We begin with a fully functional machine. Then we deliberately wipe out its sector 0. This causes the boot block and partition table to be completely gone. The machine becomes unbootable. Then we invoke testdisk and manage to rebuild the partition table on sector 0. Then we re-install grub, the boot loader. The machine then boots normally.

A Running Machine

We begin with a machine that is operating normally. Below is what the filesystems initially look like. We have inserted additional comments beginning with a semicolon at the beginning of the line. The ‘#’ character at the beginning of a line is the prompt from a root shell.

; List the partition table.

# fdisk -l /dev/hda
Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1          13      104391   83  Linux
/dev/hda2              14        1415    11261565   83  Linux
/dev/hda3            1416        4609    25655805   83  Linux
/dev/hda4            4610        4863     2040255    f  W95 Ext'd (LBA)
/dev/hda5            4610        4863     2040223+  82  Linux swap

Destroy Sector 0

We will now deliberately destroy sector 0 of the disk, thus completely destroying both the boot code and the partition table. This can be efficiently done by copying 512 null bytes to sector 0 of the disk, as shown below.

# dd if=/dev/zero of=/dev/hda bs=512 count=1
1+0 records in
1+0 records out
Let’s check to see if the boot block and partition table are really gone:
# fdisk -l /dev/hda

Disk /dev/hda: 40.0 GB, 40000000000 bytes
255 heads, 63 sectors/track, 4863 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/hda doesn't contain a valid partition table
It does look like the partition table is gone. Will the machine now boot? When we try to reboot, here is what we finally see:
  Booting 'Boot hd0'

rootnoverify (hd0)
chainloader +1

Error 13: Invalid or unsupported executable format

Press any key to continue...

As expected, the machine cannot boot, because there is no executable code in the boot block (it is all zeroes). We have an unbootable machine.

If this were a typical dedicated server, this would be a good time to panic.

But our TQMbox server gives us another choice.

PLD Rescue CD

At the “Press any key to continue...” prompt above, we hit the ENTER key. Then, from the menu, we select “Utilities Menu”. From the next menu we select “PLD Rescue CD”. The PLD Rescue CD then boots, giving us various boot-time messages. After a minute or two, we finally see a login prompt. (See also Detailed PLD Rescue CD Boot Log.)

We log in as root. No password is needed for logins on the serial console. At this point, our dedicated server has booted into the PLD Rescue CD environment, which is running entirely in the server’s main memory. We can now use the tools on this Rescue CD to attempt to rebuild our missing sector 0 on the hard disk.

Recover Partitions

The tool we will now use is the program testdisk, which is included on the PLD Rescue CD. Since testdisk requires at least a 25-line screen, which we need to set explicitly, let’s do that first, and then invoke the program and ask it to rebuild the partition table. The steps that we follow are shown below in abbreviated form.

# stty rows 30

; We will now invoke testdisk and try to rebuild the partition table
# testdisk /dev/hda
> [Analyse ]
>   (hit ENTER a few times, then:)
> [ Write  ]
>  (enter confirmation)
> [ Quit ]
#

Above we have shown only the essential steps and abbreviated screen output. (See official testdisk documentation for detailed screens.)

Testdisk suppsedly has rebuilt the partition table. With fdisk, let’s check to see if partition table is back:

# fdisk -l /dev/hda

   Device Boot      Start         End      Blocks   Id  System
...
/dev/hda1   *           1          13      104391   83  Linux

The output (excerpts shown above) shows that the partition table is indeed back; and, most importantly, the partition beginning on cylinder 1, which is the /boot partition, is listed too. This is the partition that holds our kernel and our boot loader configuration.

Install Boot Loader

The partition table is back, but we still need to install the boot code into sector 0. To do this, we must first make the kernel re-read the partition table into memory, so that the in-memory copy matches the disk copy. To do this, we can simply cause fdisk to read the partition table from sector 0 and write it back unchanged. After writing the partition table to disk, fdisk automatically does a system call to cause the kenel to re-read the partition table. The following line suffices:

# echo w | fdisk /dev/hda

We will now mount the first partition as /tmp/boot. After doing so, we will do ls in it, to check to make sure it really is the /boot partition that is being mounted. If it is, it should contain kernels and initrds.

# mkdir -p /tmp/boot
# mount /dev/hda1 /tmp/boot
# ls /tmp/boot
System.map-2.6.8-1.521  grub                    os2_d.b
boot.b                  initrd-2.6.8-1.521.img  vmlinuz-2.6.8-1.521
chain.b                 lost+found
config-2.6.8-1.521      memtest86+-1.11

It does look like /dev/hda1 contains our original /boot partition (mounted here as /tmp/boot), based on the ls output above. This should give us some confidence that testdisk managed to find the right partition.

Installing grub, the boot loader, is now done with a single command:

# grub-install --recheck --root-directory=/tmp /dev/hda

That’s it!

Back to Normal!

We can now reboot the machine:

# reboot

When we did this, our machine rebooted back to normal operation.

(Your mileage may vary.)

We should emphasize at this point that recovering a damaged sector 0 is a risky and error-prone process, with no guarantee of success. Our dedicated servers give you a shot at it, but do not assure you of success. Try not to let your server’s boot block or partition table be damaged in the first place.

Also note that in the example above, grub has rebuilt /boot/grub/System.map using device names in effect while the PLD Rescue CD is active. These may not match the device names used by Fedora Core 2. We should manually check and correct these later. One way of doing so would be to simply re-install grub yet again after the machine is up for normal operation. The command in that case would be:

# grub-install --recheck --root-directory=/boot /dev/hda

At this point, our recovery effort is complete, and the machine should now be back to normal operation.

Appendix: Detailed Log

Detailed PLD Rescue CD Boot Log

When the PLD Rescue CD boots, messages similar to the following are printed to the serial console. (Note: We are really booting a local disk image of the PLD Rescue CD, not an actual CD.)

  Booting 'PLD Rescue CD from http://rescuecd.pld-linux.org/'


dhcp
Address: 192.160.13.39
Netmask: 255.255.255.0
Server: 192.160.13.6
Gateway: 192.160.13.21
root (nd)
 Filesystem type is tftp, using whole disk
kernel /pld-vmlinuz root=/dev/ram0 init=/linuxrc ramdisk_size=54000
  CONF="`/dev /fd0:/rescue`;;;;;;;;;;;"
  noapic quiet console=ttyS0,9600n81
   [Linux-bzImage, setup=0x1400, size=0xd4000]
initrd /pld-rescue.sqf
   [Linux-initrd @ 0x1cf03000, 0x30cd000 bytes]
INIT: version 2.85 booting
Started device management daemon v1.3.25 for /dev
Autodetecting PCI hardware
Eth0:e1000 Eth1:e1000

                     Powered by PLD Linux Distribution
                   Press I to enter interactive startup
Setting clock (local) [ DONE ]
Today`s date: Wed Oct 13 22:50:34 GMT 2004 [ DONE ]
Activating swap partitions [ DONE ]
Host: rescue [ DONE ]
Remounting root filesystem in rw mode [ DONE ]
INIT: Entering runlevel: 3 [
Entering non-interactive startup
Resource Manager: Entering runlevel number...............[ 3 ]
Setting network parameters............................[ DONE ]
Bringing up interface eth0............................[ DONE ]
Determining IP information for eth0 (dhcpcd)..........[ DONE ]
Bringing up interface eth1............................[ FAIL ]
Determining IP information for eth1 (dhcpcd)..........[ FAIL ]
Initializing random number generator..................[ DONE ]
Starting syslog-ng service............................[ DONE ]
Starting OpenSSH service..............................[ DONE ]
Loading console font and map..........................[ DONE ]
Enabling SAK sequence.................................[ DONE ]
cannot (un)set powersave mode
Allowing users to login...............................[ DONE ]
Resource Manager: Runlevel has been reached..............[ 3 ]

sunrise.rahul.net login:

Notice

This document is part of a web site and should not be read in isolation. All information is subject to change without notice. Not everything described here applies equally to every possible combination of hardware and software. All products and services are provided on an “as is” basis; “with all faults” and “as available.” THERE ARE NO WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THOSE OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. We are not responsible for the content of external web sites.


$Id: demo2.html,v 1.50 2005/03/12 20:17:45 ldhesi Exp $