Discussion:
diff(1) performance
Edd Barrett
2016-02-13 18:26:46 UTC
Permalink
Hey,

I've not looked into this at all, but looks like diff(1) could be
optimised:

# With GNU diff:
$ time gdiff -u file1 file2 > out-gdiff
gdiff -u file1 file2 > out-gdiff 0.16s user 0.13s system 101% cpu 0.286 total

# With OpenBSD diff:
$ time diff -u file1 file2 > out-bdiff
diff -u file1 file2 > out-bdiff 1005.24s user 0.33s system 99% cpu 16:46.44 total

Admittedly the files are big (~10MB each), but still.

Good news is, the outcomes are the same bar the diff header syntax:

---8<---
$ gdiff -u out-bdiff out-gdiff
--- out-bdiff 2016-02-13 18:15:42.991267362 +0000
+++ out-gdiff 2016-02-13 17:58:47.153912520 +0000
@@ -1,5 +1,5 @@
---- file1 Wed Feb 4 13:20:14 2015
-+++ file2 Wed Feb 4 13:20:14 2015
+--- file1 2015-02-04 13:20:14.899249552 +0000
++++ file2 2015-02-04 13:20:14.859249471 +0000
@@ -62,7 +62,7 @@
TaskState::packetPending:
HandlerTask::fn:
--->8---

Tarred up files: http://theunixzoo.co.uk/random/diff_files.tgz
--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk
Michal Mazurek
2016-02-13 19:52:42 UTC
Permalink
Post by Edd Barrett
Hey,
I've not looked into this at all, but looks like diff(1) could be
It looks like both NetBSD and FreeBSD use GNU diff.
--
Michal Mazurek
Todd C. Miller
2016-02-13 20:27:01 UTC
Permalink
GNU diff uses a superior (more modern) algorithm. Changing that
means rewriting the guts of diff(1). The GNU diff code includes
references to papers describing the algorithm.

- todd
Edd Barrett
2016-02-14 11:20:34 UTC
Permalink
Post by Todd C. Miller
GNU diff uses a superior (more modern) algorithm. Changing that
means rewriting the guts of diff(1). The GNU diff code includes
references to papers describing the algorithm.
Right. A task for a rainy day then.

Cheers
--
Best Regards
Edd Barrett

http://www.theunixzoo.co.uk
Loading...