Discussion:
Int overflow in diff(1), rcs(1), cvs(1)
Michael McConville
2016-02-09 20:13:22 UTC
Permalink
It looks like a few tools in base rely on two's complement integer
overflow for the hashing algorithm in readhash(). Overflow can easily be
observed using a manual check or a dynamic undefined behavior tool. This
function is also present in rcs(1) and cvs(1). Some code locations of
these overflows are:

/usr/src/usr.bin/diff/diffreg.c:1196
/usr/src/usr.bin/rcs/diff.c:1099
/usr/src/usr.bin/cvs/diff_internals.c:1169

This poses a bit of an issue because (at least in diff(1)) the value
field of struct line is represented with an int and is used in many
places. Changing the type of line.value to something unsigned could have
unintended consequences.

Thoughts? I haven't worked with these tools' code previously so I'm not
sure what the best/safest way of approaching this is.

Michael
Otto Moerbeek
2016-02-10 15:13:29 UTC
Permalink
Post by Michael McConville
It looks like a few tools in base rely on two's complement integer
overflow for the hashing algorithm in readhash(). Overflow can easily be
observed using a manual check or a dynamic undefined behavior tool. This
function is also present in rcs(1) and cvs(1). Some code locations of
/usr/src/usr.bin/diff/diffreg.c:1196
/usr/src/usr.bin/rcs/diff.c:1099
/usr/src/usr.bin/cvs/diff_internals.c:1169
This poses a bit of an issue because (at least in diff(1)) the value
field of struct line is represented with an int and is used in many
places. Changing the type of line.value to something unsigned could have
unintended consequences.
Thoughts? I haven't worked with these tools' code previously so I'm not
sure what the best/safest way of approaching this is.
Michael
I don't think there's a problem in practice.

-Ott
Todd C. Miller
2016-02-10 15:36:42 UTC
Permalink
Post by Michael McConville
It looks like a few tools in base rely on two's complement integer
overflow for the hashing algorithm in readhash(). Overflow can easily be
observed using a manual check or a dynamic undefined behavior tool. This
function is also present in rcs(1) and cvs(1). Some code locations of
/usr/src/usr.bin/diff/diffreg.c:1196
/usr/src/usr.bin/rcs/diff.c:1099
/usr/src/usr.bin/cvs/diff_internals.c:1169
This poses a bit of an issue because (at least in diff(1)) the value
field of struct line is represented with an int and is used in many
places. Changing the type of line.value to something unsigned could have
unintended consequences.
Making those values unsigned int should be sufficient but it is a
fairly intrusive change.

- todd
Todd C. Miller
2016-02-10 16:29:26 UTC
Permalink
Post by Todd C. Miller
Making those values unsigned int should be sufficient but it is a
fairly intrusive change.
Actually, this is trickier than it appears since diff uses the sign
bit in a clever way to save space.

- todd
Martin Natano
2016-02-10 16:56:00 UTC
Permalink
Post by Todd C. Miller
Post by Todd C. Miller
Making those values unsigned int should be sufficient but it is a
fairly intrusive change.
Actually, this is trickier than it appears since diff uses the sign
bit in a clever way to save space.
What do you mean by it being used in a clever way? I see the sign bit
being used as part of the hash itself, like all the other bits in the
sum. That shouldn't change when using unsigned instead. Am I missing
something?

For what it's worth, converting the hash to unsigned at least doesn't
seem to break anything obvious for me. See the diff below (generated
with the patched binary).

--- diffreg.c.orig Mon Feb 8 02:48:35 2016
+++ diffreg.c Mon Feb 8 03:48:44 2016
@@ -160,8 +160,8 @@
};

struct line {
- int serial;
- int value;
+ int serial;
+ unsigned int value;
} *file[2];

/*
@@ -200,7 +200,7 @@
static int skipline(FILE *);
static int isqrt(int);
static int stone(int *, int, int *, int *, int);
-static int readhash(FILE *, int);
+static unsigned int readhash(FILE *, int);
static int files_differ(FILE *, FILE *, int);
static char *match_function(const long *, int, FILE *);
static char *preadline(int, size_t, off_t);
@@ -495,7 +495,8 @@
prepare(int i, FILE *fd, off_t filesize, int flags)
{
struct line *p;
- int j, h;
+ int j;
+ unsigned int h;
size_t sz;

rewind(fd);
@@ -1168,11 +1169,11 @@
/*
* Hash function taken from Robert Sedgewick, Algorithms in C, 3d ed., p 578.
*/
-static int
+static unsigned int
readhash(FILE *f, int flags)
{
int i, t, space;
- int sum;
+ unsigned int sum;

sum = 1;
space = 0;

natano
Todd C. Miller
2016-02-10 17:03:42 UTC
Permalink
Post by Martin Natano
What do you mean by it being used in a clever way? I see the sign bit
being used as part of the hash itself, like all the other bits in the
sum. That shouldn't change when using unsigned instead. Am I missing
something?
I was trying to go a bit further and fix other sign compare issues
and fell down the rabbit hole. See the comment about equiv at the
top of the file.
Post by Martin Natano
For what it's worth, converting the hash to unsigned at least doesn't
seem to break anything obvious for me. See the diff below (generated
with the patched binary).
Just changing the hash value to unsigned is probably OK as it is
only used to compare to otherh hash values.

- todd

Loading...