Re: [slackware-sparcdevel] The UFS problem

From: Phil Howard (phil@ipal.net)
Date: Fri Jan 12 2001 - 04:35:03 PST


David Cantrell wrote:

> OK, I think I know what's causing this problem now. The ramdisk driver on
> my system has gone nuts. I currently can't make any ramdisks and I
> believe that was causing the problem on the mini ISO. When I run the
> script and the ramdisk screws up, it kicks out a 4k gzipped image. I
> never check for that particular error, so it was just getting thrown on
> the mini ISO.
>
> Now I'm working on figuring out why I can't make ramdisks... sigh.

There was, and may still be, a kernel bug which causes ramdisk corruption.
The ramdisk is actually implemented to store data in cache. There's no
actual backing store involved (if there were, it would be storing data
redundantly, taking more RAM space). For example, when a request for a
page actually reaches the driver, the driver just hands back a page of
all binary zeros, since such a request should never arrive unless the
page was not found in cache. A problem with this involves another issue
with the cache I believe is still in the 2.2 series. This is the issue
that involves compiling with the newer gcc. Compiling with the older gcc
cleared up the issue. Data written with different blocksizes goes into
cache separately. Not only does this result in redundant data, but it
also leads to inconsistency in finding the cached data, which ramdisk is
fully dependent on.

A possible scenario might be:

The ramdisk is initialized with 4K writes of binary zero. Then the
ramdisk is formatted with a blocksize of 1K. The writes of 1K will
match the 4K blocks for each 1st 1K of 4K, but miss on the 2nd, 3rd,
and 4th 1K. This results in 3 new 1K cache entries with data, and
a 4K cache entrie with 1K of data and 3K of binary zeros. All the
filesystem I/O is then done in 1K blocks, and appears correct since
the same cache entries will be found for the filesystem as for the
formatting. Once the ramdisk is unmounted, it is then read in 4K
sizes (probably directly by gzip which will use 4K or larger to
read the device). This will get the 4K cache entries only, giving
it 1K of data and 3K of zeros each time.

Workarounds:

1. Reboot. Initialize the ramdisk using dd and a blocksize of 1K.
    Format with 1K as usual. Mount, populate, and unmount as usual.
    Then read using dd to force a 1K read, piping to gzip, like:
        dd if=/dev/ram2 ibs=1024 | gzip -9 > initrd.gz

2. Compile the kernel used to build the ramdisk images with gcc version
    2.7.2.3 or egcs version 1.1.2.

3. Use loopback to mount a file, instead of ramdisks, since there
    a "backing store" (the file). This is how I build both my initrd
    as well as syslinux for Intel. I still do the 1K thing from #1
    so I don't know what problems may still be lingering here.

4. Use kernel 2.4 where I believe the problem has been fixed due to new
    code not triggering the compiler bug (at least I haven't seen it in
    2.4 yet, but then, I now do both #3 and #1 above, as well).

Note, this bug can be experienced with other devices, but usually results
in strange I/O errors because the cache does get written, but requests
will get cache entries with the wrong size. It's just with ramdisk this
gets worse because the data stays in cache and becomes duplicated.

-- 
-----------------------------------------------------------------
| Phil Howard - KA9WGN |   Dallas   | http://linuxhomepage.com/ |
| phil-nospam@ipal.net | Texas, USA | http://phil.ipal.org/     |
-----------------------------------------------------------------



This archive was generated by hypermail 2b30 : Thu Sep 19 2002 - 11:00:02 PDT