Skip to content

Commit 73d2764

Browse files
wolfospealainchadnetzer
authored andcommitted
Fix clustering problem.
There is a problem when we meet two files that already have hardlinks each. It is possible for hardlink.py to idly move a file from one hardlink to another without ever ultimately saving space. This can happen on large disks with backup files, stored and occasionally hardlinked over years. The solution is to always use the file with the most hardlinks as the source if (stat_info.st_nlink > temp_stat_info.st_nlink).
1 parent 5f28a71 commit 73d2764

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

hardlink.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,11 @@ def hardlink_identical_files(filename, stat_info, options):
247247
for (temp_filename, temp_stat_info) in file_hashes[file_hash]:
248248
if are_files_hardlinkable(work_file_info, (temp_filename, temp_stat_info),
249249
options):
250-
hardlink_files(temp_filename, filename, temp_stat_info, options)
250+
# Always use the file with the most hardlinks as the source
251+
if (stat_info.st_nlink > temp_stat_info.st_nlink):
252+
hardlink_files(filename, temp_filename, temp_stat_info, options)
253+
else:
254+
hardlink_files(temp_filename, filename, temp_stat_info, options)
251255
break
252256
else:
253257
# The file should NOT be hardlinked to any of the other

0 commit comments

Comments
 (0)