PDA

View Full Version : improper shutdown - root unionfs and power off



kl522
04-24-2010, 06:59 AM
I have done some own customization on knoppix but I am not sure if it is originally like that or it's due to my own customization, the problem I have is my customized knoppix is not shutting down properly.

I have a knoppix-data.img which is unioned together with the readonly compressed imaged ( root file system ), the problem is that knoppix-data.img is not unmounted properly when I shutdown, and everytime it requires a fsck in the next reboot.

Anyone else has the same problem ?

In addition, my system sometimes did not power off properly, when it executes /etc/init.d/knoppix-halt, it will give 'input/output error' at certain line number of the script, and the machine will not power off. The problem is that it only happens intermittently - not always.

kl522
04-26-2010, 09:07 AM
After checking the web, it seems that the unable to power off thingie is due to bug in kernel 2.6.32.11.

kl522
04-26-2010, 01:44 PM
Sorry for giving confusing information but I think I finally fixed the problem of "intermittent" improper power off problem.

Here is the portion of the original /etc/init.d/knoppix-halt :-

case "$0" in *halt|*poweroff) poweroff -f ;;
*) reboot -f ;;
esac

Somehow at that stage of the execution, the file system access
can encounter error, thereby giving 'input/output error'.
So I replace it with :-

case "$0" in *halt|*poweroff) echo o > /proc/sysrq-trigger ;;
*) echo b > /proc/sysrq-trigger ;;
esac

After that it seems to be able to consistantly power off the notebook.

kl522
04-27-2010, 02:18 AM
What remain unanswered is :-

1. The knoppix-data.img aufs2 with the readonly compressed loop root file system remained
mounted, as such the file system is not clean when the system halted.

2. Why is there a file system error at that point in time when the system was trying to run
'/sbin/poweroff' ? ( I tried to run 'auplink' also failed ).

krishna.murphy
04-29-2010, 01:23 AM
What remain unanswered is :-

1. The knoppix-data.img aufs2 with the readonly compressed loop root file system remained
mounted, as such the file system is not clean when the system halted.

2. Why is there a file system error at that point in time when the system was trying to run
'/sbin/poweroff' ? ( I tried to run 'auplink' also failed ).

Did you run a self-test on the disk? Bad burns (common) and bad downloads (less common) can cause weird problems like this. Try one of these (as fits your disk) at boot time:
knoppix testcd
knoppix testdvd

Cheers!
Krishna:mrgreen:

kl522
04-30-2010, 02:12 AM
By the way, this is a hard disk install, ie the ( remastered ) readonly compressed rootfs is sitted on the ntfs
harddisk.

I have myself to blame if it is my ( remaster/customization ) fault. But if I remember correctly, since day one of
using vanilla knoppix 6.2, the knoppix-data.img was never unmounted properly which resulted in the need
to fsck when booted ( the fsck is coded in minirt.gz).

krishna.murphy
04-30-2010, 12:12 PM
By the way, this is a hard disk install, ie the ( remastered ) readonly compressed rootfs is sitted on the ntfs
harddisk.

I have myself to blame if it is my ( remaster/customization ) fault. But if I remember correctly, since day one of
using vanilla knoppix 6.2, the knoppix-data.img was never unmounted properly which resulted in the need
to fsck when booted ( the fsck is coded in minirt.gz).
I don't have a file knoppix-data.img in my setup; knoppix-data.aes is in my KNOPPIX folder (which comes from the "poor mans install" only, I think), an encrypted "persistent store" that is merged with the KNOPPIX file in that folder to become the effective filesystem. The difference is most likely due to hd-install nature of your setup.

Cheers!
Krishna:)

kl522
05-06-2010, 07:33 AM
I don't have a file knoppix-data.img in my setup; knoppix-data.aes is in my KNOPPIX folder (which comes from the "poor mans install" only, I think), an encrypted "persistent store" that is merged with the KNOPPIX file in that folder to become the effective filesystem. The difference is most likely due to hd-install nature of your setup.

Cheers!
Krishna:)

I was able to repeat this problem with vanilla knoppix 6.2. I haven't concluded what is the reason yet, but this is how I repeat it, so some of the information given here might be spurious or unimportant :-

1. Install knoppix onto a flash using knoppix 6.2 Live CD.
2. The flash is > 1G, knoppix will create it as 2 partitions.
knoppix-data.img will be stored in the first partition (VFAT),
directory /mnt-system/KNOPPIX.
3. I chose the maximum size of the knoppix-data.img, which
I think it is roughly ~300 M.
4. Test boot a few times, knoppix-data.img was cleanly unmounted.

Moving on :-
5. I re-create the second partition as ntfs, and copy the
entire KNOPPIX directory (together with knoppix-data.img ),
and rename the existing KNOPPIX directory to KNOPPIX.bak
( All these performed offline )
6. Then I reboot the system, now /mnt-system is sit on /dev/sda2.
I check knoppix-data.img to see if it is unmounted cleanly,
yet it did.

Moving on :-
7. I increased the size of knoppix-data.img to much bigger ( 2.5G),
can copy a huge /home directory ( about 800 M ) to knoppix-data.img
file system. Again all these performed offline.
8. Now test boot/reboot, knoppix-data.img is never cleanly unmounted
after that ( even though I e2fsck on it each time I repeat test ).

Now I am not sure if a huge knoppix-data.img which is causing the problem,
or there are something in the /home which causes it to be unable to unmount
properly.

krishna.murphy
05-06-2010, 05:00 PM
I was able to repeat this problem with vanilla knoppix 6.2. I haven't concluded what is the reason yet, but this is how I repeat it, so some of the information given here might be spurious or unimportant :-

1. Install knoppix onto a flash using knoppix 6.2 Live CD.
How did you do this - using the built-in flash install utility?

2. The flash is > 1G, knoppix will create it as 2 partitions.
My flash is 8GB, and the file system seems to be VFAT with but one partition, FWIW.

knoppix-data.img will be stored in the first partition (VFAT),
directory /mnt-system/KNOPPIX.
3. I chose the maximum size of the knoppix-data.img, which
I think it is roughly ~300 M.
4. Test boot a few times, knoppix-data.img was cleanly unmounted.
Great - problem solved.

Moving on :-
5. I re-create the second partition as ntfs, and copy the
entire KNOPPIX directory (together with knoppix-data.img ),
and rename the existing KNOPPIX directory to KNOPPIX.bak
( All these performed offline )
6. Then I reboot the system, now /mnt-system is sit on /dev/sda2.
I check knoppix-data.img to see if it is unmounted cleanly,
yet it did.

Moving on :-
7. I increased the size of knoppix-data.img to much bigger ( 2.5G),
can copy a huge /home directory ( about 800 M ) to knoppix-data.img
file system. Again all these performed offline.
8. Now test boot/reboot, knoppix-data.img is never cleanly unmounted
after that ( even though I e2fsck on it each time I repeat test ).

Now I am not sure if a huge knoppix-data.img which is causing the problem,
or there are something in the /home which causes it to be unable to unmount
properly.
Maybe it would be worth trying ext2 or ext3 filesystems instead of the always warned-against NTFS.:mrgreen:

kl522
05-07-2010, 12:05 AM
Maybe it would be worth trying ext2 or ext3 filesystems instead of the always warned-against NTFS.:mrgreen:

For the purpose of testing and getting down to the bottom of things, yes I tested using ext2 on partition 2, and yes, now it is unmounted cleanly.

Testing aside, in my actual setup, because my flash is a slower device compared to hard disk, that's why I chose to 'host' knoppix-data.img on an existing NTFS harddisk/partition ( and I want to retain by original NTFS ). It's a lot faster than doing it on a flash. But this testing also revealed that there is indeed a problem with hosting knoppix-data.img on NTFS.

krishna.murphy
05-07-2010, 01:01 AM
For the purpose of testing and getting down to the bottom of things, yes I tested using ext2 on partition 2, and yes, now it is unmounted cleanly.

Testing aside, in my actual setup, because my flash is a slower device compared to hard disk, that's why I chose to 'host' knoppix-data.img on an existing NTFS harddisk/partition ( and I want to retain by original NTFS ). It's a lot faster than doing it on a flash. But this testing also revealed that there is indeed a problem with hosting knoppix-data.img on NTFS.

Thanks for sharing the results of all your testing and glad it's working for you! I use NTFS on my hd, too, and it always unmounts cleanly unless the system crashes. In fact it's the same setup as my flash: ~4GB for the KNOPPIX OS files, 2GB for my knoppix-data.aes, and the rest for Windows/shared files. FWIW, I looked around a lot before I got the flash, and I'm pretty happy with it - a very fast and reliable Verbatim TUFF-'N'-TINY 8GB (http://www.amazon.com/o/ASIN/B001UHTDS2/sfrevu05), though it's not quite as fast as a hard drive.

Cheers!
Krishna:mrgreen:

kl522
05-08-2010, 12:09 AM
Your setup is similar but no exactly the same, the key difference is that you are hosting the knoppix-data.xxx on VFAT, whereas my knoppix-data.xxx is hosted on the NTFS. It would be interesting if you could make a copy of your /mnt-system/KNOPPIX directory onto the NTFS and rename your existing KNOPPIX directory on the VFAT to KNOPPIX.bak ( you would have to do it offline ) - just to see if it is repeatable.

Anyway I would look into /etc/init.d/knoppix-halt in greater detail to see if I could umount the rootfs which is aufs2-ed onto the NTFS.

Regards.

kl522
05-08-2010, 04:14 AM
I found the problem but not a fix. Somewhere in /etc/init.d/knoppix-halt, after the modules are freed, the process ntfs-3g is still around. Everything is still ok here.

But right after /bin/umount -t ....., the ntfs-3g is gone and /mnt-system is no more mounted, but /KNOPPIX-DATA is still mounted, ie the NTFS is umount properly but not knoppix-data.img. It will cause corruption to knoppix-data.img.

Looks like having knoppix-data.img hosted on NTFS is not a supported configuration.

krishna.murphy
05-08-2010, 03:03 PM
I found the problem but not a fix. Somewhere in /etc/init.d/knoppix-halt, after the modules are freed, the process ntfs-3g is still around. Everything is still ok here.

But right after /bin/umount -t ....., the ntfs-3g is gone and /mnt-system is no more mounted, but /KNOPPIX-DATA is still mounted, ie the NTFS is umount properly but not knoppix-data.img. It will cause corruption to knoppix-data.img.

Looks like having knoppix-data.img hosted on NTFS is not a supported configuration.

I think that's some pretty good detective work! For some unknown reason, with your hardware, the shutdown script isn't working quite properly when dealing with NTFS. I wonder if it might have something to do with sound, as playing the shutdown sound gives running programs more time to halt; do you have sound? I do, and as I stated previously, my NTFS (used on the HD, not on the flash) unmounts cleanly.

Cheers!
Krishna:mrgreen:

kl522
05-08-2010, 09:50 PM
and as I stated previously, my NTFS (used on the HD, not on the flash) unmounts cleanly.
Krishna:mrgreen:

I keep telling you but not sure if you got my point, I am not saying NTFS is not unmounted cleanly. I am saying knoppix-data.xxx which is a loop mounted ext2/ext3 file system sitted on NTFS which is not unmounted cleanly.

And you won't even notice it !!! Unless you have some means to check it offline, otherwise the only exhibits it gives is slower boot time, because the knoppix minirt.gz contains code which automatically perform e2fsck on the knoppix-data.xxx !

( As a side-track, even if NTFS is not cleanly removed from the mounts, you also will not know it. I have seen many cases where I poweroff NTFS in Linux, the system will still come back without needing a file system check. )

kl522
05-09-2010, 07:07 AM
Sorry to bore you with more information. As I did more and more testings, I found that this problem is not limited to hosting knoppix-data.img on NTFS. As I perform more tests, I found that this problem happens to almost all kinds of file systems which hosting the loop mounted knoppix-data.img, ie it also happens to ext2/ext3 and VFAT. The tendency of this to happen are :-

1. The knoppix-data.xxx must be big ( in my case it is 2.5 G ).
2. There were some file system activities prior to shutdown ( such as you have saved some files to knoppix home directory ).
3. Maybe the machine need to have huge memory ( > 1 G ).

You notice that after you shutdown/poweroff the system, the knoppix-data.img ( ie the /KNOPPIX-DATA directory ) was not unmounted cleanly.

Most of you do not notice this, as I mentioned before, it is because knoppix minirt.gz has e2fsck inside it. But as with any uncleanly detached file systems, there might be corruption or unsaved data.

krishna.murphy
05-09-2010, 11:48 PM
I keep telling you but not sure if you got my point, I am not saying NTFS is not unmounted cleanly. I am saying knoppix-data.xxx which is a loop mounted ext2/ext3 file system sitted on NTFS which is not unmounted cleanly.
Yes, I get it - what I was saying is, I have the same filesystems, and it doesn't appear to have difficulty unmounting cleanly on shutdown. If I have to kill it, like when Hulu, or any Flash video is maxing out the processor to do the NTFS filesystem stuff (complicated to obfuscate, thanks so much Mr. Gates!) then I notice it takes a little longer to boot up afterwards.

And you won't even notice it !!! Unless you have some means to check it offline, otherwise the only exhibits it gives is slower boot time, because the knoppix minirt.gz contains code which automatically perform e2fsck on the knoppix-data.xxx !

( As a side-track, even if NTFS is not cleanly removed from the mounts, you also will not know it. I have seen many cases where I poweroff NTFS in Linux, the system will still come back without needing a file system check. )


Sorry to bore you with more information. As I did more and more testings, I found that this problem is not limited to hosting knoppix-data.img on NTFS. As I perform more tests, I found that this problem happens to almost all kinds of file systems which hosting the loop mounted knoppix-data.img, ie it also happens to ext2/ext3 and VFAT. The tendency of this to happen are :-

1. The knoppix-data.xxx must be big ( in my case it is 2.5 G ).
2. There were some file system activities prior to shutdown ( such as you have saved some files to knoppix home directory ).
3. Maybe the machine need to have huge memory ( > 1 G ).

You notice that after you shutdown/poweroff the system, the knoppix-data.img ( ie the /KNOPPIX-DATA directory ) was not unmounted cleanly.
What's the best way to see that? I imagine re-booting without mounting knoppix-data might be necessary, but what else do you do?

Most of you do not notice this, as I mentioned before, it is because knoppix minirt.gz has e2fsck inside it. But as with any uncleanly detached file systems, there might be corruption or unsaved data.Thanks for doing all that testing!

Cheers!:mrgreen:

kl522
05-10-2010, 07:18 AM
First of all I would like to thank you for taking effort to follow this thread.


Yes, I get it - what I was saying is, I have the same filesystems, and it doesn't appear to have difficulty unmounting cleanly on shutdown. If I have to kill it, like when Hulu, or any Flash video is maxing out the processor to do the NTFS filesystem stuff (complicated to obfuscate, thanks so much Mr. Gates!) then I notice it takes a little longer to boot up afterwards.

This is not how knoppix works, according to /etc/init.d/knoppix-halt, it will perform a few things to try to unmount the file systems cleaningly, but failing which, eventually, it will just go ahead to power off the system. :)



What's the best way to see that? I imagine re-booting without mounting knoppix-data might be necessary, but what else do you do?

That's one way to do it. Another way is boot it off the CD-ROM/DVD and examine it. But since I have to do this so often, eventually what I did is to run knoppix 6.2 in qemu ( its almost as fast as native if you use 'kvm' module ). That offers me a very convenient and fast way to example the qemu file system, as they will appear as normal files on the host operating system.

kl522
05-15-2010, 10:53 AM
I would like to post an update to this issue.

Finally I managed to umount /KNOPPIX-DATA cleanly everytime I shutdown the system. But unfortunately, the fix needed is elaborate, and it seems no one using knoppix seems to be at all bothered with this. :) Maybe I am just too paranoid about it, the /etc/init.d/knoppix-halt script already perform sync before pulling the system to total halt/poweroff. Nevertheless the file system were not totally umounted cleanly.

The fix is to carefully craft out the programs so that during the halting stage, nothing is occupying the unionfs, and /KNOPPIX-DATA. I just broadly outline the changes needed :-

1. Change the initramfs minirt.gz :-
#mv bin bin.static
#mkdir bin
# cd bin
# ln -s /bin.static/busybox busybox
# ln -s /bin.static/sh sh

2. compile a static version of '/sbin/init', and keep it in directory 'bin.static'.

3. Edit the shell script 'init' inside minirt.gz :-
a) change ntfs-3g to /bin.static/ntfs-3g
b) change exec /sbin/init to exec /bin.static/init

4. Various changes to /etc/init.d/knoppix-halt so that it will run /bin.static/busybox and umount /home, umount /UNIONFS and umount /KNOPPIX-DATA.

Have fun.

krishna.murphy
05-15-2010, 01:13 PM
Thanks for being persistent! Please post a copy of the changed scripts; I, for one, wouldn't mind faster startup, and it might even have something to do with other issues (like the inability to hibernate, for instance.)

Yours Truly!
Krishna :mrgreen:

kl522
05-16-2010, 01:18 AM
I think I will wait for a while before I do further posts. As I was thinking about it, since I tested everything using knoppix 6.2 as the base, maybe in 6.2.1 or 6.3 these have been changed. Then I was probably shooting in the air then. :)

Unless someone could post the minirt.gz and /etc/init.d/knoppix-halt for 6.3 in this forum then at least I could confirm it .....

kl522
05-18-2010, 07:47 AM
More.

As it's not just a matter of pure script changes, it includes a static version of /sbin/init inside minirt.gz, I tried to upload my fixes here, which contains of 3 files, minirt.gz, /etc/init.d/knoppix-halt and /etc/init.d/knoppix-halt.actual ( all in it's slightly more than 1 MB ), unfortunately the forum limits the maximum file size which I could post here. And therefore I have to abort the idea of putting forward the fix here.

In any case, I am pretty contended with my changes, as it allows me to cleanly shutdown and umount everything, including umounting /KNOPPIX-DATA, /UNIONFS, /KNOPPIX and /mnt-system. Yes, ***EVERY*** single file system is cleanly detached before the system is put to power off or reboot. Hibernating knoppix is almost unnecessary, as knoppix boots up pretty fast, more so if knoppix-data.img resides on hard disk.

krishna.murphy
05-18-2010, 02:57 PM
More.

As it's not just a matter of pure script changes, it includes a static version of /sbin/init inside minirt.gz, I tried to upload my fixes here, which contains of 3 files, minirt.gz, /etc/init.d/knoppix-halt and /etc/init.d/knoppix-halt.actual ( all in it's slightly more than 1 MB ), unfortunately the forum limits the maximum file size which I could post here. And therefore I have to abort the idea of putting forward the fix here.
I suggest creating a google.com account for the purpose and using the relatively new "upload any document" feature to post files on the web. Once you share them with the world:) you can get the URL and post THAT on the forum (which would be most appreciated.)

Krishna :mrgreen:

kl522
05-19-2010, 02:48 AM
OK here is the links :-

http://docs.google.com/leaf?id=0B2_elGstDZXqZGUzZGMyODMtNTY4NC00MGNjLWJmN TgtNGQzMWEwYWZlNGUy&hl=en
http://docs.google.com/leaf?id=0B2_elGstDZXqYzhkYTlhZGYtMTI0NS00ZGM2LThhO GUtMWQ3M2JiNjNiMzk5&hl=en
http://docs.google.com/leaf?id=0B2_elGstDZXqYjUzZTFhYjUtYzkwZi00YWMyLTgyM zItYTNlYzdlODI0NTli&hl=en

The file 'minirt.gz' will have to go the replace the existing minirt.gz.
The file 'knoppix-halt' will also go to replace the existing /etc/init.d/knoppix-halt
The file 'knoppix-halt.actual' is a new file to reside in /etc/init.d, but make sure it has execution permission for root,
ie chmod u+rx /etc/init.d/knoppix-halt.actual.

Everyone can use the software but there is no warranty whatsoever. Certainly I am not responsible for whatever data loss or corruption it might cause. Nevertheless, if you have difficulties, you can also post here for community-based support. :)

For whatever reason if you don't trust the binaries included, you can also compile it yourselves and still use the scripts included.

kl522
05-19-2010, 03:05 AM
Please use this for knoppix-halt.actual as I found that the remnant of debugging code was still there :-

http://docs.google.com/leaf?id=0B2_elGstDZXqYzhkYTlhZGYtMTI0NS00ZGM2LThhO GUtMWQ3M2JiNjNiMzk5&hl=en

kl522
05-19-2010, 03:21 AM
Sorry I make big blunder, the GUI is confusing me :(

http://docs.google.com/leaf?id=0B2_elGstDZXqMGI0NzZmYzAtZTg0Mi00ODVmLTg0N 2YtZmQ3NDhjNzZlYTQ1&sort=name&layout=list&num=50

Whatever it is, inside the script, the line '/bin.static/busybox sh' below the line '#debug' should be commented.