PDA

View Full Version : The infamous Toshiba laptop problem - getting closer!



pestie
07-28-2005, 05:53 PM
Like several other users here, I have a Toshiba laptop that refuses to fully boot Knoppix versions greater than 3.4 (or maybe 3.6 - I can't remember for sure). It hangs when trying to scan the hard drive partitions and create fstab. Booting with the nofstab cheatcode allows the system to boot, but any attempts to access the hard drive lock up.

As documented in this thread (http://www.knoppix.net/forum/viewtopic.php?t=18245), someone used strace to determine that a call to the read() system call was hanging upon access to the hard drive. I tried the same thing and let it sit for quite some time to allow for any possible timeouts and associated error messages. Indeed, I got a few kernel messages about IRQ problems.

This problem has been drivin' me nuts, as I know this laptop is perfectly capable of running Linux. I've run it successfuly with Mandrake 10.1 and Ubuntu 5.04. I've run Mandrake's stock 2.6.8 kernel, Ubuntu's stock 2.6.10 kernel and a custom-compiled 2.6.12 I built from kernel.org sources. Yet no combination of cheat codes or kernel parameters has allowed me to access the hard drive with Knoppix.

Last night I decided to test another theory - that it was somehow the hard drive itself rather than the machine that was causing the problem. More specifically, it's something about how the Knoppix kernel is compiled that conflicts with some characteristic of the hard drive, as other kernels run just fine. So I swapped the 20G Toshiba hard drive for a spare 3.5G Hitachi I had kicking around. Lo and behold, it worked fine! Knoppix booted flawlessly without any cheat codes, and was able to read and write the hard drive.

Here's what I know so far:

It's most likely a kernel problem, as the call to read() never returns.
It's unlikely to be a kernel bug, as earlier and later kernels work fine. I haven't tried 2.6.11 yet, but read on for more details on that.
It's definitely something about the hard drive itself that causes the problem.
It's not any of the obvious stuff - turning off ACPI doesn't help, pci=bios and/or pci=biosirq don't help. Failsafe mode still exhibits the same problem.

So now my new mission in life seems to be to figure this out and solve the problem. To that end, the next thing I'll do is download the 2.6.11 sources from kernel.org and use the Config-2.6.11 file from Knoppix to build a test kernel on my working Ubuntu installation, boot with it, and see what happens. Whether it still locks up or not (and, actually, I hope it does), it'll give me one more clue as to where the problem lies. If it locks up like I think it will, I'll start playing with various build parameters, comparing them to the ones I used to build my working 2.6.12 kernel, and see if I can find the culprit. I'll start with parameters in the IDE section of the config and, if I have no luck, will branch out and take a look at the PCI stuff. It could take a while if it's not in one of the more obvious places, but sooner or later I should find it, especially if I throw together a perl script to diff the config files. I also might try custom-building a 2.6.11 kernel from scratch just to see if that works or not. If it's a bug in the IDE driver, maybe a diff between 2.6.11 and 2.6.12 will show something, but I'm no kernel hacker, so if that's it I might just have to wait for Knoppix 4.0 (which I assume has a 2.6.12 kernel).

I apologize in advance for the sparse details, but I'm at work right now and don't have the hardware in question in front of me to play with. I'll add more specifics later.

pestie
08-02-2005, 03:35 AM
I found it!!

The short version: The ALI15X3 IDE chipset driver requires that the kernel be compiled with DMA as default. It says so right in the help for that kernel option, in fact. A few revisions back, Knoppix changed the kernel to default to no-DMA. As a result, any attempt to access certain hard drives connected to an ALi 15X3 IDE chipset fail.

The details: I took the config file from the /boot directory of Knoppix and built a 2.6.11 kernel with it on my Toshiba laptop running Ubuntu. Attempting to boot from a kernel compiled that way resulted in "hda: unknown partition table" errors and a kernel panic (as it was trying to boot a root filesystem on a partition it couldn't even see). I changed that one option - CONFIG_IDEDMA_PCI_AUTO - from disabled to enabled, recompiled, and the system booted flawlessly.

The help for the ALI15X3 chipset support option says explicitly, "If you say Y here, you also need to say Y to 'Use DMA by default when available', above. Please read the comments at the top of <file:drivers/ide/pci/ali15x3.c>" However, I looked at that file and there are no comments regarding DMA anywhere.

So, my next thought was to try booting with the dma cheatcode. It must be the one cheatcode I never tried. I was so busy trying various cheatcodes to disable various features that it never occurred to me to try one that enabled something. Lo and behold, it did help! Generating /etc/fstab no longer broke the startup scripts and I got as far as a command prompt (I booted with "knoppix 2")! I was all excited until I realized that, at boot time, the ALI15X3 driver was still unable to see the partitions on the drive, which meant that the kernel was also blinded. fdisk no longer failed, and /dev/hda was readable, but any attempt to mount /dev/hda[123] resulted in an error. Yes, it's strange - the kernel sees and reads /dev/hda fine, but refuses to see/read /dev/hda[123]! This was not the case when I compiled my own kernel with DMA on by default, so there must be something "magical" about that kernel config option as far as the ALI15X3 driver is concerned.

So, if you're one of the unlucky ones who has a laptop with this chipset, your options seem to be:

Recompile the 2.6.11 kernel with DMA-by-default and remaster Knoppix. There's ample documentation on how to do that in the wiki and on the forums.
Swap hard drives. For some reason, swapping my 20G Toshiba drive with an old 3.5G Hitachi drive made the problem go away. Maybe the old Hitachi drive didn't support DMA, but that's just a guess. I don't promise that everything will work flawlessly in that case, either - I only tried it long enough to mount a partition and see everything work.
Convince Klaus Knopper to go back to the old DMA-by-default way of compiling the kernel.

I suppose there's a slim chance that using the dma cheatcode along with some other cheatcodes will magically make things work, but I doubt it. I think at this point I think I'm pretty much done messing with it. I have two other laptops that run Knoppix just fine (albeit very slowly, as they're quite old) and my troublesome Toshiba is running Ubuntu quite nicely, so unless someone comes up with something new (and please post in this thread if you do!) I think I'm just going to back away from this mess. But at least this particular mystery has been solved, and hopefully others who are having this problem will find this thread in the future.

brim146
10-16-2005, 07:59 PM
I've been wondering why I can't get knoppix to load up anymore! It sure would be nice if a kernel with dma could be included! Knoppix is great for everything other than my laptop anyway.

brim146
10-16-2005, 08:19 PM
Just got knoppix 4.0.2 to load up
boot: knoppix dma noswap
(I have the noswap option there because I'm intending to run partimage later, you'd probably want to leave that out)

So now you can boot, but your partitions won't be recognized correctly (i.e. if you fire up partimage not a single partition will be listed.) In this topic (http://www.knoppix.net/forum/viewtopic.php?t=6314&highlight=hdparm) I found that running: hdparm -z /dev/hda will get your partitions recognized. Now when running partimage the partitions should be listed.

Hope this helps somebody :)