PDA

View Full Version : Kernel customization: initrd (miniroot) starts acting funny



mack
08-15-2003, 12:38 AM
I customized with new kernels in the past and never had a problem with it. Today, however, I tried two new kernel versions and they act weird. I must be missing something stupid so I hope someone here experienced this problem before and could tell me whats the cause.

I boot the new kernel with the new initrd (same miniroot but with newly-compiled modules). It boots till it gets to the initrd part, extracts it, prints a message about mounting it as ext2 (ro), but panics on lack of init.

Now, init shouldn't even be called at this time. linuxrc from the initrd should. For some reason, it fails to run and the system just skips the initrd phase and looks for init. If I pass the param init=/linuxrc, the linuxrc script from my initrd runs so its not like the kernel can't run this script or the required shell (knoppix-ash). Why can't it run it as linuxrc if it can later run it as init ?!

Another symptom is that the initrd ext2 fs is mounted read-only, so cp commands in linuxrc fail. a remount as rw right after cloop image is mounted solved this problem.

When the script exits, system panics because it thinks its init died. Its not treated as linuxrc. I fixed that by adding 'exec /etc/init' at the end of linuxrc.

Now the whole thing works but certainly not the way kernel designers intended.

Anyone had a problem running linuxrc from miniroot.gz before ?

garyng
08-15-2003, 12:47 AM
I came across this problem once but unfortunately forgot how I fixed it(tried the same init=linuxrc to temperarily work around it, also root=/dev/ram0). I remember somehow I have to re-create the initrd tree(or fs) again from scratch and the problem disappeared.

May be I am asking for stupid question, have you checked the attribute of linuxrc, I remembered I need to change it to 0755.

mack
08-15-2003, 03:11 AM
I came across this problem once but unfortunately forgot how I fixed it(tried the same init=linuxrc to temperarily work around it, also root=/dev/ram0). I remember somehow I have to re-create the initrd tree(or fs) again from scratch and the problem disappeared.

May be I am asking for stupid question, have you checked the attribute of linuxrc, I remembered I need to change it to 0755.

Of course permission of linuxrc is ok - otherwise it wouldn't have worked as init=/linuxrc.

However, I found the problem and I'm curious why it doesn't ALWAYS happen. The workaround is trivial. I'll explain it here although I'm answering my own question :lol:

After digging in the kernel's initrd code for a while, I realized that it checks whether root=/dev/ram0. If it is, ram0 is processed as a real rootfs rather than initrd, which would be ok if the linuxrc script was called init and written differently. :wink:

When the kernel gets Knoppix's initrd, it checks for the default rootdev (unless specified in cmdline). If it happens to be /dev/ram0, knoppix won't work and you get a panic due to "No init found". I saw questions about such message in the list before, so this should answer them as well.

:idea: The workaround: just specify root=/dev/hda1 or some other non-ram0 device in cmdline. Doesn't matter which, since linuxrc will set it to /dev/ram0 later anyway.

I think knoppix's default cmdline should include root=something, just in case.

Maintainers, care to comment ?

garyng
08-15-2003, 03:57 AM
now I am puzzled(as usual).

Hasn't the default root device been 'burnt in' to the kernel, I read somewhere that it is so by default, there shouldn't be any root= parameter needed.

mack
08-15-2003, 11:14 AM
now I am puzzled(as usual).

Hasn't the default root device been 'burnt in' to the kernel, I read somewhere that it is so by default, there shouldn't be any root= parameter needed.

The default can be "burned" into the kernel image by using rdev(8). However, it seems like the default for some compilations was 256 (1,0, /dev/ram). Its set during build, by arch/i386/boot/tools/build.c, but here's the default:

#
# ROOT_DEV specifies the default root-device when making the image.
# This can be either FLOPPY, CURRENT, /dev/xxxx or empty, in which case
# the default of FLOPPY is used by 'build'.
# This is i386 specific.
#

export ROOT_DEV = CURRENT

which totally explains the problem. Normally, when building a kernel, rootdev != 0x100 because normal distros don't use that. However, I built these kernels under knoppix, and with knoppix rootdev == 0x100 (ram0).

To sum it up, a kernel built using knoppix can't be used as a knoppix kernel unless you rdev the image later, or add root=something to your boot cmdline. Its kinda funny. I guess the maintainers don't build knoppix using knoppix :lol:

garyng
08-15-2003, 11:27 AM
To sum it up, a kernel built using knoppix can't be used as a knoppix kernel unless you rdev the image later, or add root=something to your boot cmdline

Ah, that explains why I encountered it once and mysteriously disappeared. I must have use KNOPPIX to build but later have my own distro making the problem disappear.

mack
08-15-2003, 11:47 AM
Ah, that explains why I encountered it once and mysteriously disappeared. I must have use KNOPPIX to build but later have my own distro making the problem disappear.

Makes sense. I guess most people don't encounter it because they hdinstall and work from there rather than remastering from the live-cd directly, as I did.

Yet another mystery solved. :lol:

Now I'm down to the last one on my list: the cloop hangs on high-load with recent kernels.

garyng
08-15-2003, 12:28 PM
Now I'm down to the last one on my list: the cloop hangs on high-load with recent kernels.

I have seen something related but in different situation. I am working on a broken man's install by loop mount the complete ISO, then cloop mount the compressed file system and saw this iffy behaviour too. Sometimes, it just hang there when it tries to start X, sometimes it works. That is when I put the ISO on a FAT partition.

When I put the ISO file on a NTFS partition, I am surprised to see that I can do this even using the 1.x NTFS drive but for this situation, it hangs ever time if I start X but seems to be working normally if I just stay at the command line(so I believe it is also load related).

Surprisingly, if I extract the clooped file and put it on NTFS partiton, I cannot cloop mount it directly(Segment fault), but if I go through the 9660 system->cloop, I can mount it but not very reliable.

Seems that there is changes in the kernel that cause all these.

mack
08-16-2003, 03:09 AM
I have seen something related but in different situation. I am working on a broken man's install by loop mount the complete ISO, then cloop mount the compressed file system and saw this iffy behaviour too. Sometimes, it just hang there when it tries to start X, sometimes it works. That is when I put the ISO on a FAT partition.

When I put the ISO file on a NTFS partition, I am surprised to see that I can do this even using the 1.x NTFS drive but for this situation, it hangs ever time if I start X but seems to be working normally if I just stay at the command line(so I believe it is also load related).

Surprisingly, if I extract the clooped file and put it on NTFS partiton, I cannot cloop mount it directly(Segment fault), but if I go through the 9660 system->cloop, I can mount it but not very reliable.

Seems that there is changes in the kernel that cause all these.

The problem seem to have existed in cloop even with older kernels but was rarely showing up under normal conditions. It seems that normal operation or even heavy (but single-threaded) reading doesn't trigger it. What triggers it fastest is a lot of simultaneous reading tasks. Try running a few find(1) processes which exec something like wc or sum on many files in cloop and you're likely to trigger it. Starting OpenOffice while doing some other disk ops is also a good way :lol:

The problem became much worse with recent (unofficial) kernels. With 2.4.22-pre7 and beyond (last I tested was 2.4.22-rc2), I get stuck while still in linuxrc, unless I place some sleeps and syncs there. Kernel 2.4.22-pre2 works pretty well with it, so some change between pre2 and pre7 makes the problem appear much faster than before. I'll check the diffs...

Since it happens only when several tasks are reading simultaneously, I expect it to be a deadlock, probably outside cloop code. It seem to happen faster when I use more block-dev related drivers at the same time (such as usb-storage + sd).

I'll dig into it soon and see what I can find. I wish Linux had a good mutex-analysis kernel-level tool. (Or maybe there is already one and I'm not aware of it).

At the moment, its only a problem for us users who like to use bleeding-edge kernels, but I'm afraid once the official 2.4.22 is out, it'll become everybody's problem.

garyng
08-16-2003, 04:18 AM
Try running a few find(1) processes which exec something like wc or sum on many files in cloop and you're likely to trigger it. Starting OpenOffice while doing some other disk ops is also a good way

Unable to reproduce it so far except when running the startup(init) when it start X. This seems to be echoing your experience with initrc. Kind of if the kernel is given a time to 'rest', it is less likely to trigger the bug.

Agreed with you though that there is definitely something in cloop that will be a problem when the main stream moves to 2.4.22+ or later 2.6

Hope Klaus can have some time to stress test it. I have seen that he has upgraded the cloop driver to support multiple device so at least it is still under active development(and maintainance).

mack
08-16-2003, 11:09 AM
Hope Klaus can have some time to stress test it. I have seen that he has upgraded the cloop driver to support multiple device so at least it is still under active development(and maintainance).

I asked Klaus about that. Understandably, he has no time to deal with unofficial kernels, but if/when the next official kernel has a problem, he'll look into it. I hope to find it and send him a patch before it gets to that.

In the meantime, I downgraded my knoppix kernel to 2.4.22-pre2, which is the first kernel that has the stuff I need, and is relatively stable with cloop-1.0.