-
small script to re-hardlink *ALL* equal files
Have fun, this is not well tested, it will prolly work fine with readonly CD-Images but certainly have problems with hd-installs since there are equal files which *should not* be hardlinked! Take care.
One could probably add filters to the 'find' command to exclude such files. Since I did this as fast hack for CD remastering so i have no need for better selections. I'm working on a much more advanced file-unifier for the vserver project. When that is finished i will announce it here too (ETA: Mid Feburary, i'm in hollyday next time)
Results (compressed image):
691M Jan 2 13:49 KNOPPIX.new
664M Jan 2 17:25 KNOPPIX.new.unified
PS: Klaus ... if you use this for the main Knoppix, the free space you get is dedicated for a recent version of 'distcc', thanks.
Code:
#!/bin/sh
find $1 -type f -exec md5sum {} \; |\
sort +0 |\
{
read sum file;
while [ "$sum" != "" ]; do
while read sum2 file2 && [ $sum = $sum2 ]; do
ln -f "$file" "$file2"
done
sum=$sum2
file="$file2"
done
}
-
Junior Member
registered user
Re: small script to re-hardlink *ALL* equal files
![Quote](images/misc/quote_icon.png)
Originally Posted by
cehteh
Have fun, this is not well tested, it will prolly work fine with readonly CD-Images but certainly have problems with hd-installs since there are equal files which *should not* be hardlinked! Take care.
One could probably add filters to the 'find' command to exclude such files. Since I did this as fast hack for CD remastering so i have no need for better selections. I'm working on a much more advanced file-unifier for the vserver project. When that is finished i will announce it here too (ETA: Mid Feburary, i'm in hollyday next time)
Results (compressed image):
691M Jan 2 13:49 KNOPPIX.new
664M Jan 2 17:25 KNOPPIX.new.unified
PS: Klaus ... if you use this for the main Knoppix, the free space you get is dedicated for a recent version of 'distcc', thanks.
Code:
#!/bin/sh
find $1 -type f -exec md5sum {} \; |\
sort +0 |\
{
read sum file;
while [ "$sum" != "" ]; do
while read sum2 file2 && [ $sum = $sum2 ]; do
ln -f "$file" "$file2"
done
sum=$sum2
file="$file2"
done
}
This looks a little dangerous. Do you realize that this makes all empty (0 bytes) files hardlinks to each other, even locks and logfiles? Also, some files with the same checksum may NOT be actually the same file or have desirably the same contents.
I use a harddisk-installed Debian system that should have all hardlinks intact. If you copied your system from CD, this may not always be the case. You should use rsync -Ha for copying to preserve the hardlinks.
-KK
-
sure .. i know it is a fast hack ... the next tool (vserver unifier) will be much better. Resolving *All* problems u showed.
>This looks a little dangerous. Do you realize that this makes all empty (0 bytes) files hardlinks to each other,
use '! -size 0' within the find
> locks and logfiles
there are no locks when mastering,
logs are not important when kept only in ram ..
! -path '*/var/log*' will fix it anyways
>Also, some files with the same checksum may NOT be actually the same file or have desirably the same contents.
with md5sum the chance of doubletes are microscopic (even less), but you are right .. for the vserver unifier i dont use checksuming
remember this is meant as fast hack and to show a 'general' concept for reducing the diskusage of knoppix.
cheers :)
-
last update
- compares files
- only hardlink files which are not hardlinked yet
- permission and uid/gid must be equal
- size 0 files are excluded
- can be called with multiple arguments which are used directly by find
- writes a script which can be used to undo all hardlinks
i will not do any more updates on this and focus on the vserver unifier now (it will get a small control language and tons of more sophisticated options)
example usage:
./knoppix.unify Path/To/KNOPPIX ! -path "*/var/log/*" >undo_hardlinks.sh
File knoppix.unify:
Code:
#!/bin/bash
find "$@" -type f ! -size 0 -exec md5sum {} \; |\
sort +0 |\
{
read sum file;
while [ "$sum" != "" ]; do
while read sum2 file2 &&\
[ $sum = $sum2 ] &&\
[ $(cut -d " " -f 1 <(ls -i "$file")) != $(cut -d " " -f 1 <(ls -i "$file2")) ] &&\
[ $(cut -c -32 <(ls -ln "$file")) = $(cut -c -32 <(ls -ln "$file2")) ] &&\
cmp "$file" "$file2";
do
ln -f "$file" "$file2" &&
echo "rm -f \"$file2\"; cp -pf \"$file\" \"$file2\""
done
sum=$sum2
file="$file2"
done
}
Similar Threads
-
By fizgig in forum Customising & Remastering
Replies: 1
Last Post: 09-30-2004, 03:35 AM
-
By brazilian in forum General Support
Replies: 2
Last Post: 04-09-2004, 06:10 PM
-
By spydie in forum General Support
Replies: 14
Last Post: 01-31-2004, 07:55 PM
-
By Dave_Bechtel in forum General Support
Replies: 0
Last Post: 06-04-2003, 11:54 AM
-
By .-=Ronin=-. in forum General Support
Replies: 2
Last Post: 04-24-2003, 07:45 PM
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
![Fanxiang SSD 4TB 2TB 1TB PS5 SSD M.2 NVME SSD 7300MBS PCIe 4.0 Solid State Drive picture](/store/img/g/yKsAAOSwNPJlqMdQ/s-l225/Fanxiang-SSD-4TB-2TB-1TB-PS5-SSD-M-2-NVME-SSD-7300.jpg)
Fanxiang SSD 4TB 2TB 1TB PS5 SSD M.2 NVME SSD 7300MBS PCIe 4.0 Solid State Drive
$259.99
![Fanxiang 4TB 2TB 1TB SSD 550MB/s 2.5'' SATA III Internal Solid State Drive lot picture](/store/img/g/pB0AAOSwrbRkIWhk/s-l225/Fanxiang-4TB-2TB-1TB-SSD-550MB-s-2-5-SATA-III-Inte.jpg)
Fanxiang 4TB 2TB 1TB SSD 550MB/s 2.5'' SATA III Internal Solid State Drive lot
$188.99
![Netac 1TB 2TB 512GB Internal SSD 2.5'' SATA III 6Gb/s Solid State Drive lot picture](/store/img/g/pJUAAOSwFaRkhuj-/s-l225/Netac-1TB-2TB-512GB-Internal-SSD-2-5-SATA-III-6Gb-.jpg)
Netac 1TB 2TB 512GB Internal SSD 2.5'' SATA III 6Gb/s Solid State Drive lot
$119.99
![4tb Ssd 870evo Internal Solid State Drive Hard Disk 2.5 Inch Sata SSD For Laptop picture](/store/img/g/eD8AAOSwXbhmYg9y/s-l225/4tb-Ssd-870evo-Internal-Solid-State-Drive-Hard-Dis.jpg)
4tb Ssd 870evo Internal Solid State Drive Hard Disk 2.5 Inch Sata SSD For Laptop
$47.79
![Patriot P210 128GB 256GB 512GB 1TB 2TB 2.5](/store/img/g/RzIAAOSwSZNlN-zB/s-l225/Patriot-P210-128GB-256GB-512GB-1TB-2TB-2-5-SATA-3-.jpg)
Patriot P210 128GB 256GB 512GB 1TB 2TB 2.5" SATA 3 6GB/s Internal SSD PC/MAC Lot
$19.99
![Patriot P300 128G 256GB 512GB 1TB 2TB M.2 2280 PCIe Gen3x4 NVMe Internal SSD Lot picture](/store/img/g/4A0AAOSw4aRlCzo8/s-l225/Patriot-P300-128G-256GB-512GB-1TB-2TB-M-2-2280-PCI.jpg)
Patriot P300 128G 256GB 512GB 1TB 2TB M.2 2280 PCIe Gen3x4 NVMe Internal SSD Lot
$16.99
![SSD 512GB 1/2/4TB 870 EVO SATA III SSD 2.5'' Solid State Drive Upgrade PC Laptop picture](/store/img/g/1xoAAOSwHrhmaWn2/s-l225/SSD-512GB-1-2-4TB-870-EVO-SATA-III-SSD-2-5-Solid-S.jpg)
SSD 512GB 1/2/4TB 870 EVO SATA III SSD 2.5'' Solid State Drive Upgrade PC Laptop
$39.99
![1080PRO 4TB SSD Solid State Hard Drive Ngff M.2 SSD Gaming Internal Hard Disk picture](/store/img/g/ADkAAOSwHoFmfNEh/s-l225/1080PRO-4TB-SSD-Solid-State-Hard-Drive-Ngff-M-2-SS.jpg)
1080PRO 4TB SSD Solid State Hard Drive Ngff M.2 SSD Gaming Internal Hard Disk
$28.99
![Fanxiang SSD 1TB 2TB 4TB Sata SSD Lot 2.5 512GB 256GB 6Gb/s Solid State Drive picture](/store/img/g/scUAAOSw9vJlm2Kg/s-l225/Fanxiang-SSD-1TB-2TB-4TB-Sata-SSD-Lot-2-5-512GB-25.jpg)
Fanxiang SSD 1TB 2TB 4TB Sata SSD Lot 2.5 512GB 256GB 6Gb/s Solid State Drive
$189.04
![Fanxiang M.2 SATA SSD 2TB 1TB 512GB 256GB SSD Internal M2 Solid State Drive Lot picture](/store/img/g/vt0AAOSwAgVls0tY/s-l225/Fanxiang-M-2-SATA-SSD-2TB-1TB-512GB-256GB-SSD-Inte.jpg)
Fanxiang M.2 SATA SSD 2TB 1TB 512GB 256GB SSD Internal M2 Solid State Drive Lot
$109.99