pestie
07-28-2005, 05:53 PM
Like several other users here, I have a Toshiba laptop that refuses to fully boot Knoppix versions greater than 3.4 (or maybe 3.6 - I can't remember for sure). It hangs when trying to scan the hard drive partitions and create fstab. Booting with the nofstab cheatcode allows the system to boot, but any attempts to access the hard drive lock up.
As documented in this thread (http://www.knoppix.net/forum/viewtopic.php?t=18245), someone used strace to determine that a call to the read() system call was hanging upon access to the hard drive. I tried the same thing and let it sit for quite some time to allow for any possible timeouts and associated error messages. Indeed, I got a few kernel messages about IRQ problems.
This problem has been drivin' me nuts, as I know this laptop is perfectly capable of running Linux. I've run it successfuly with Mandrake 10.1 and Ubuntu 5.04. I've run Mandrake's stock 2.6.8 kernel, Ubuntu's stock 2.6.10 kernel and a custom-compiled 2.6.12 I built from kernel.org sources. Yet no combination of cheat codes or kernel parameters has allowed me to access the hard drive with Knoppix.
Last night I decided to test another theory - that it was somehow the hard drive itself rather than the machine that was causing the problem. More specifically, it's something about how the Knoppix kernel is compiled that conflicts with some characteristic of the hard drive, as other kernels run just fine. So I swapped the 20G Toshiba hard drive for a spare 3.5G Hitachi I had kicking around. Lo and behold, it worked fine! Knoppix booted flawlessly without any cheat codes, and was able to read and write the hard drive.
Here's what I know so far:
It's most likely a kernel problem, as the call to read() never returns.
It's unlikely to be a kernel bug, as earlier and later kernels work fine. I haven't tried 2.6.11 yet, but read on for more details on that.
It's definitely something about the hard drive itself that causes the problem.
It's not any of the obvious stuff - turning off ACPI doesn't help, pci=bios and/or pci=biosirq don't help. Failsafe mode still exhibits the same problem.
So now my new mission in life seems to be to figure this out and solve the problem. To that end, the next thing I'll do is download the 2.6.11 sources from kernel.org and use the Config-2.6.11 file from Knoppix to build a test kernel on my working Ubuntu installation, boot with it, and see what happens. Whether it still locks up or not (and, actually, I hope it does), it'll give me one more clue as to where the problem lies. If it locks up like I think it will, I'll start playing with various build parameters, comparing them to the ones I used to build my working 2.6.12 kernel, and see if I can find the culprit. I'll start with parameters in the IDE section of the config and, if I have no luck, will branch out and take a look at the PCI stuff. It could take a while if it's not in one of the more obvious places, but sooner or later I should find it, especially if I throw together a perl script to diff the config files. I also might try custom-building a 2.6.11 kernel from scratch just to see if that works or not. If it's a bug in the IDE driver, maybe a diff between 2.6.11 and 2.6.12 will show something, but I'm no kernel hacker, so if that's it I might just have to wait for Knoppix 4.0 (which I assume has a 2.6.12 kernel).
I apologize in advance for the sparse details, but I'm at work right now and don't have the hardware in question in front of me to play with. I'll add more specifics later.
As documented in this thread (http://www.knoppix.net/forum/viewtopic.php?t=18245), someone used strace to determine that a call to the read() system call was hanging upon access to the hard drive. I tried the same thing and let it sit for quite some time to allow for any possible timeouts and associated error messages. Indeed, I got a few kernel messages about IRQ problems.
This problem has been drivin' me nuts, as I know this laptop is perfectly capable of running Linux. I've run it successfuly with Mandrake 10.1 and Ubuntu 5.04. I've run Mandrake's stock 2.6.8 kernel, Ubuntu's stock 2.6.10 kernel and a custom-compiled 2.6.12 I built from kernel.org sources. Yet no combination of cheat codes or kernel parameters has allowed me to access the hard drive with Knoppix.
Last night I decided to test another theory - that it was somehow the hard drive itself rather than the machine that was causing the problem. More specifically, it's something about how the Knoppix kernel is compiled that conflicts with some characteristic of the hard drive, as other kernels run just fine. So I swapped the 20G Toshiba hard drive for a spare 3.5G Hitachi I had kicking around. Lo and behold, it worked fine! Knoppix booted flawlessly without any cheat codes, and was able to read and write the hard drive.
Here's what I know so far:
It's most likely a kernel problem, as the call to read() never returns.
It's unlikely to be a kernel bug, as earlier and later kernels work fine. I haven't tried 2.6.11 yet, but read on for more details on that.
It's definitely something about the hard drive itself that causes the problem.
It's not any of the obvious stuff - turning off ACPI doesn't help, pci=bios and/or pci=biosirq don't help. Failsafe mode still exhibits the same problem.
So now my new mission in life seems to be to figure this out and solve the problem. To that end, the next thing I'll do is download the 2.6.11 sources from kernel.org and use the Config-2.6.11 file from Knoppix to build a test kernel on my working Ubuntu installation, boot with it, and see what happens. Whether it still locks up or not (and, actually, I hope it does), it'll give me one more clue as to where the problem lies. If it locks up like I think it will, I'll start playing with various build parameters, comparing them to the ones I used to build my working 2.6.12 kernel, and see if I can find the culprit. I'll start with parameters in the IDE section of the config and, if I have no luck, will branch out and take a look at the PCI stuff. It could take a while if it's not in one of the more obvious places, but sooner or later I should find it, especially if I throw together a perl script to diff the config files. I also might try custom-building a 2.6.11 kernel from scratch just to see if that works or not. If it's a bug in the IDE driver, maybe a diff between 2.6.11 and 2.6.12 will show something, but I'm no kernel hacker, so if that's it I might just have to wait for Knoppix 4.0 (which I assume has a 2.6.12 kernel).
I apologize in advance for the sparse details, but I'm at work right now and don't have the hardware in question in front of me to play with. I'll add more specifics later.