PDA

View Full Version : Knoppix 6.4.4 sg(x) performance



willbrown
03-22-2011, 06:29 PM
I realise this is not a knoppix problem, just hoping for frendly pointers and direction.

I have updated to knoppix 6.4.4 with kernel 2.6.37 and I have discovered a big problem for me with testing scsi hard drives using the sg(x) driver, compared to Knoppix 6.0 with a 2.6.28.4 kernel.
Just one of the many ways I have found to use knoppix, is as a scsi hard drive test environment. I have been using knoppix with kernel 2.6.28.4 for a number of years very sucessfuly.

I commonly test 14 scsi drives simultaneously in an external drive bay using for example scsi write-same "erase" command. Although we use a product from santools, smartmon-ux, sg_write_same is available form scsitools etc. The performance from this new kernel makes the test operation impractical.

Although, I can test one drive at a time under 2.6.37 with the same test time as 2.6.28.4, as soon as I load additional drives the test times sore from 18m to 134m. To make clear this does not happen under 2.6.28.4, and the write-same command takes very little I/O overhead.

I have tried googling and come across I/O throttling. Am I on the right lines?
I have attempted to research cgroups but the info is mainly on limiting throughput.
I have posted on the linux kernel mailing list, those guys are very, very busy.
Any ideas would be appreciated, happy to run any tests etc. Many reasons why I would like to keep 2.6.37.

Regards,

Will

Forester
03-23-2011, 02:58 PM
Hi willbrown,

What makes you think this is not a Knoppix problem ? Knoppix is a mixture of packages from different Debian repositories. Anything not 'desktop' is likely to have had less test coverage. You may just be using a bunch of packages that are not totally compatible but I guess you know that already.

You're using scsitools. That is a user space package with no kernel module dependencies ? (unlike open-scsi for example). It's not on my 6.4.4 so I guess you installed it yourself. Did you get the 0.10.2.1 version from the squeeze repository or the 0.12.02 version from the sid repository ?

Have you tried using time to figure out whether the excessive time is kernel space CPU, user space CPU or wait time ?

Have you checked the Debian BTS to see if your problem is (was) a known problem (that has been fixed) ? Do you know if there is an upstream ? I can't find one.

Posting a question on the Linux kernel mailing list for what, at first glance, is not a kernel issue will just get you ignored. Why not try a Debian mailing list ? debian-knoppix@lists.debian.org perhaps. Keep your question short and specific else you risk a vague reply after a long pause.

Left-of-centre suggestion: have you tried running the 64-bit kernel ?

willbrown
03-23-2011, 06:31 PM
Hello Forester,

Thanks for your input.

You are quite correct, I don't know it's a kernel problem. My own attempts to find a resolution to this, merely pointed me in this direction. Hence, my post here to prevent this potential dead-end. Probably my poor wording.

For example:

With htop & top I can see the following while running write-same * 14 drives:
2.6.28.4 2.6.37
Cpu (user) 4.6% 0.7%
Cpu (system) 8.5% 1.5%
Load Avg 0.68 12.97
All dynamic figures of course.

Something appears to be stopping the cpu’s from doing any work, yet the overall load is up!
Htop - 2.6.37 shows cpu1 @ approx 1 to 2% & cpu2 is doing nothing, fixed at a solid 0.0%.
Htop - 2.6.28.4 cpu1 @ 13.6% cpu2 @ 11.8%
All memory stats appear the same between the two kernels

I indeed added scsitools to my remastered image and it is 0.10-2.1, squeeze.

Just trying 64 bit but it does not see any scsi drives, only ide, I guess no 64 bit drivers on the build.

Will try time and the mailing list tips you have suggested.

Regards

Will

Forester
03-23-2011, 08:16 PM
Hmm ... I've heard the 64-bit support is a bit limited on the CD edition. I have the MAXI (DVD) edition and it appears to have a full set of drivers.

I'd be interested in what top reports on the third row for wa, hi, si. All are normally 0 but I expect you've got a large number for wa (waiting on I/O). A large number in either of the other two would be bad news indeed.

The calculation of loadavg is odd. According to the Wikipedia entry for Load Average:


An idle computer has a load number of 0 and each process (http://en.wikipedia.org/wiki/Process_%28computing%29) using or waiting for CPU (http://en.wikipedia.org/wiki/Central_processing_unit) (the ready queue or run queue (http://en.wikipedia.org/wiki/Run_queue)) increments the load number by 1. Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states (http://en.wikipedia.org/wiki/Process_states). However, Linux also includes processes in uninterruptible sleep (http://en.wikipedia.org/wiki/Uninterruptible_sleep) states (usually waiting for disk (http://en.wikipedia.org/wiki/Hard_disk) activity), which can lead to markedly different results if many processes remain blocked in I/O (http://en.wikipedia.org/wiki/Input/output) due to a busy or stalled I/O system. This, for example, includes processes blocking due to an NFS (http://en.wikipedia.org/wiki/Network_File_System_%28protocol%29) server failure or to slow media (http://en.wikipedia.org/wiki/Data_storage_device) (e.g., USB (http://en.wikipedia.org/wiki/Universal_Serial_Bus) 1.x storage devices). Such circumstances can result in an elevated load average, which does not reflect an actual increase in CPU use (but still gives an idea on how long users have to wait).

This sounds like your problem.

Here's another left-of-centre suggestion. Do you have a IDE drive or the flash disk drive in your system at all ? With the older 2.6.28.4 kernel, do such drives still have the hda, hdb designation ? At some point in the last few years, folks have switched so all disc drives, not just SCSI drives, use the sda, sdb, ... designation. An old script that assumed sdx was SCSI might start asking questions of a non-SCSI disk the drive can't answer and perhaps your problem is low level software waiting for responses that never come. I remember when 'untu first switched there was a cock-up that meant some laptop (including mine) never got a response adding an unwelcome 90 second delay at boot time.

willbrown
03-29-2011, 03:12 PM
Hi Forester,

Sorry for the delay.

Thanks for the reply. As an aside, due to your input, I have built a 64bit knoppix644 CD. I just did'nt realise the remaining 64 bit deivers were only on the DVD version. So thanks for that.

Back to the scsi problem, the wa is firmly fixed at 0.0%.

2.6.37 - Via top i get these figures:
Cpu(s)
between 0.5 to 2.1%us
between 0.8 to 1.3%sy
0.0%ni
between 96.7 to 98.3%id
0.0%wa
0.0%hi
0.0%si
0.0%st
Avg Load varies but is made up of 3 figures, typically they would be:
12.06 10.48 5.89

#########

And to confirm, 2.6.28.4 - Via top i get these figures:
Cpu(s)
between 1.5 to 2.1%us
between 2.6 to 4.6%sy
0.0%ni
between 89.4.7 to 98.2%id
0.0%wa
0.0%hi
0.0%si
0.0%st
Avg Load , typically they would be:
0.23 0.27 0.12

Very Strange...

I have no ide devices installed but with this kernel they are seen as sda not hda etc.

Regards,

Will

Forester
03-29-2011, 08:57 PM
Hi WillBrown,

The figures show the kernel isn't blocked and your program isn't waiting on disk i/o.

If you look up the definitions of load average it is, roughly, a measure of how many processes are queued up waiting for the CPU but on some systems (and Knoppix/Linux may be one of these), it is the number of threads waiting to use the CPU. If sg_write_same creates a thread for each of the 14 disks and then each thread takes some semaphore around some critical operation, then 13 threads will be waiting. Looks like each thread takes the semaphore for a long time, instead of a short one. Looks like a bug in sg_write_same but that is, of course, speculation.

On Knoppix 6.4.4, the packages of interest are:



sg3-utils 1.29-1
libsgutils-2-2 1.29-1
BTW, scsitools is a different package - you had me confused for a while there.

Can you have a look to see what versions of these you have with your older Knoppix ? I suggest open a terminal and use grep to filter the output of dpkg -l.

There are no bug reports against either package in either the Debian or Ubuntu bug trackers.
The project home page is here:


http://freshmeat.net/projects/sg3_utils/There's no bug list and to report a problem you need to register. There is also:


http://sg.danny.cz/sg/index.htmlthat looks like the author's home page but it too is hosted on freshmeat.net (?) and the author is Douglas Gilbert, which doesn't sound terribly Czech. There is a mailto: for him there. On the project site, my guess is he is dpgilbert.

The only other thing I can suggest is that one or other of these sites has .deb files for both packages for versions old and new. You might try some of these (on old and new Knoppix) to see if you can find out when the problem was introduced.

Either way, I think you are going to have to contact the author. Sorry I could not help you solve this on your own.