Results 1 to 4 of 4

Thread: small script to re-hardlink *ALL* equal files

  1. #1
    Junior Member
    Join Date
    Jan 2003
    Posts
    3

    small script to re-hardlink *ALL* equal files

    Have fun, this is not well tested, it will prolly work fine with readonly CD-Images but certainly have problems with hd-installs since there are equal files which *should not* be hardlinked! Take care.
    One could probably add filters to the 'find' command to exclude such files. Since I did this as fast hack for CD remastering so i have no need for better selections. I'm working on a much more advanced file-unifier for the vserver project. When that is finished i will announce it here too (ETA: Mid Feburary, i'm in hollyday next time)

    Results (compressed image):
    691M Jan 2 13:49 KNOPPIX.new
    664M Jan 2 17:25 KNOPPIX.new.unified



    PS: Klaus ... if you use this for the main Knoppix, the free space you get is dedicated for a recent version of 'distcc', thanks.

    Code:
    #!/bin/sh
    
    find $1 -type f -exec md5sum {} \; |\
    sort +0 |\
    {
            read sum file;
            while [ "$sum" != "" ]; do
                    while read sum2 file2 && [ $sum = $sum2 ]; do
                            ln -f "$file" "$file2"
                    done
                    sum=$sum2
                    file="$file2"
            done
    }

  2. #2
    Junior Member registered user
    Join Date
    Dec 2002
    Location
    Germany
    Posts
    22

    Re: small script to re-hardlink *ALL* equal files

    Quote Originally Posted by cehteh
    Have fun, this is not well tested, it will prolly work fine with readonly CD-Images but certainly have problems with hd-installs since there are equal files which *should not* be hardlinked! Take care.
    One could probably add filters to the 'find' command to exclude such files. Since I did this as fast hack for CD remastering so i have no need for better selections. I'm working on a much more advanced file-unifier for the vserver project. When that is finished i will announce it here too (ETA: Mid Feburary, i'm in hollyday next time)

    Results (compressed image):
    691M Jan 2 13:49 KNOPPIX.new
    664M Jan 2 17:25 KNOPPIX.new.unified



    PS: Klaus ... if you use this for the main Knoppix, the free space you get is dedicated for a recent version of 'distcc', thanks.

    Code:
    #!/bin/sh
    
    find $1 -type f -exec md5sum {} \; |\
    sort +0 |\
    {
            read sum file;
            while [ "$sum" != "" ]; do
                    while read sum2 file2 && [ $sum = $sum2 ]; do
                            ln -f "$file" "$file2"
                    done
                    sum=$sum2
                    file="$file2"
            done
    }
    This looks a little dangerous. Do you realize that this makes all empty (0 bytes) files hardlinks to each other, even locks and logfiles? Also, some files with the same checksum may NOT be actually the same file or have desirably the same contents.

    I use a harddisk-installed Debian system that should have all hardlinks intact. If you copied your system from CD, this may not always be the case. You should use rsync -Ha for copying to preserve the hardlinks.

    -KK

  3. #3
    Junior Member
    Join Date
    Jan 2003
    Posts
    3
    sure .. i know it is a fast hack ... the next tool (vserver unifier) will be much better. Resolving *All* problems u showed.

    >This looks a little dangerous. Do you realize that this makes all empty (0 bytes) files hardlinks to each other,

    use '! -size 0' within the find

    > locks and logfiles
    there are no locks when mastering,
    logs are not important when kept only in ram ..
    ! -path '*/var/log*' will fix it anyways

    >Also, some files with the same checksum may NOT be actually the same file or have desirably the same contents.
    with md5sum the chance of doubletes are microscopic (even less), but you are right .. for the vserver unifier i dont use checksuming

    remember this is meant as fast hack and to show a 'general' concept for reducing the diskusage of knoppix.

    cheers :)

  4. #4
    Junior Member
    Join Date
    Jan 2003
    Posts
    3

    last update

    - compares files
    - only hardlink files which are not hardlinked yet
    - permission and uid/gid must be equal
    - size 0 files are excluded
    - can be called with multiple arguments which are used directly by find
    - writes a script which can be used to undo all hardlinks

    i will not do any more updates on this and focus on the vserver unifier now (it will get a small control language and tons of more sophisticated options)


    example usage:
    ./knoppix.unify Path/To/KNOPPIX ! -path "*/var/log/*" >undo_hardlinks.sh

    File knoppix.unify:
    Code:
    #!/bin/bash
    
    find "$@" -type f ! -size 0 -exec md5sum {} \; |\
    sort +0 |\
    {
            read sum file;
            while [ "$sum" != "" ]; do
                    while read sum2 file2 &&\
                            [ $sum = $sum2 ] &&\
                            [ $(cut -d " " -f 1 <(ls -i "$file")) != $(cut -d " " -f 1 <(ls -i "$file2")) ] &&\
                            [ $(cut -c -32 <(ls -ln "$file")) = $(cut -c -32 <(ls -ln "$file2")) ] &&\
                            cmp "$file" "$file2";
                    do
                            ln -f "$file" "$file2" &&
                            echo "rm -f \"$file2\"; cp -pf \"$file\" \"$file2\""
                    done
                    sum=$sum2
                    file="$file2"
            done
    }

Similar Threads

  1. Small Remaster
    By fizgig in forum Customising & Remastering
    Replies: 1
    Last Post: 09-30-2004, 03:35 AM
  2. HD too small
    By brazilian in forum General Support
    Replies: 2
    Last Post: 04-09-2004, 06:10 PM
  3. Small USB/camera problem.
    By spydie in forum General Support
    Replies: 14
    Last Post: 01-31-2004, 07:55 PM
  4. HOWTO: Script to burn MP3 files to CD
    By Dave_Bechtel in forum General Support
    Replies: 0
    Last Post: 06-04-2003, 11:54 AM
  5. Small nix
    By .-=Ronin=-. in forum General Support
    Replies: 2
    Last Post: 04-24-2003, 07:45 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Dell PowerEdge R720 Server - 2x8c CPU,256Gb RAM, 128Gb SSD/3x900Gb SAS, Proxmox picture

Dell PowerEdge R720 Server - 2x8c CPU,256Gb RAM, 128Gb SSD/3x900Gb SAS, Proxmox

$340.00



HP ProLiant DL360 G9 Server | 2 x E5-2660V3 2.6Ghz | 64GB | 2 x 900GB SAS HDD picture

HP ProLiant DL360 G9 Server | 2 x E5-2660V3 2.6Ghz | 64GB | 2 x 900GB SAS HDD

$339.00



H261-Z61 2U 24SFF AMD Server 8x EPYC 7551 256-Cores 256GB RAM 8x25G NIC 2x2200W picture

H261-Z61 2U 24SFF AMD Server 8x EPYC 7551 256-Cores 256GB RAM 8x25G NIC 2x2200W

$2612.18



Dell Poweredge R640 Server | 2x Silver 4114 20 Cores | 96GB | 8x 1.8TB Dell SAS picture

Dell Poweredge R640 Server | 2x Silver 4114 20 Cores | 96GB | 8x 1.8TB Dell SAS

$2749.99



Dell PowerEdge R730XD 28 Core Server 2X Xeon E5-2680 V4 H730 128GB RAM No HDD picture

Dell PowerEdge R730XD 28 Core Server 2X Xeon E5-2680 V4 H730 128GB RAM No HDD

$389.99



Dell PowerEdge R620 Server 2x E5-2660 v1 2.2GHz 16 Cores 256GB RAM 2x 300GB HDD picture

Dell PowerEdge R620 Server 2x E5-2660 v1 2.2GHz 16 Cores 256GB RAM 2x 300GB HDD

$79.19



HP Proliant DL360 Gen9 28 Core SFF Server 2X E5-2680 V4 16GB RAM P440ar No HDD picture

HP Proliant DL360 Gen9 28 Core SFF Server 2X E5-2680 V4 16GB RAM P440ar No HDD

$196.95



Dell PowerEdge R720XD Xeon E5-2680 V2 2.8GHz 20 Cores 256GB RAM 12x4TB picture

Dell PowerEdge R720XD Xeon E5-2680 V2 2.8GHz 20 Cores 256GB RAM 12x4TB

$510.00



HP ProLiant DL380 Gen9 16SFF 2x E5-2680v4 2.4GHz =28 Cores 64GB P840 4xRJ45 picture

HP ProLiant DL380 Gen9 16SFF 2x E5-2680v4 2.4GHz =28 Cores 64GB P840 4xRJ45

$353.00



1U Supermicro Server 10 Bay 2x Intel Xeon 3.3Ghz 8C 128GB RAM 480GB SSD 2x 10GBE picture

1U Supermicro Server 10 Bay 2x Intel Xeon 3.3Ghz 8C 128GB RAM 480GB SSD 2x 10GBE

$297.00