Discussion:
bug in fputwc(3) error reporting
Ingo Schwarze
2015-12-24 21:24:01 UTC
Permalink
Third file, third bug.
SO FAR, EACH AND EVERY BLOODY PLACE I LOOKED AT WAS BROKEN.

When fputwc(3) encounters an encoding error, it neglects to set the
error indicator, just like fgetwc(3) did before i fixed it today.
Setting the error indicator is required by the manual and by the
standard.

Instead, it overrides errno again. That's pointless because
wcrtomb(3) already did that, and is required to do so by the standard.
It is even slightly dangerous because it might hide internal coding
errors in libc. For example, if libc would ever neglect to initialize
the state object correctly, wcrtomb(3) might fail with EINVAL.
That should NOT be plastered over by setting errno to EILSEQ.

Consider this test program:


#include <err.h>
#include <stdio.h>
#include <wchar.h>

int
main(void)
{
wchar_t ws[2];
int irc;

ws[0] = 0xdc00; /* UTF-16 surrogate, invalid code point. */
ws[1] = L'\0';
irc = fputws(ws, stdout);
printf("fputws returned %d\n", irc);
printf("error status %d\n", ferror(stdout));
err(1, "fputws");
}


Output before:

$ ./putws
fputws returned -1
error status 0
putws: fputws: Illegal byte sequence

Output with the patch below:

$ ./putws
fputws returned -1
error status 1
putws: fputws: Illegal byte sequence


OK?
Ingo


Index: fputwc.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c 1 Oct 2015 02:32:07 -0000 1.6
+++ fputwc.c 24 Dec 2015 21:01:10 -0000
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)

size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
- errno = EILSEQ;
+ fp->_flags |= __SERR;
return WEOF;
}
Jérémie Courrèges-Anglas
2015-12-26 22:13:13 UTC
Permalink
Ingo Schwarze <***@usta.de> writes:

[...]
Post by Ingo Schwarze
When fputwc(3) encounters an encoding error, it neglects to set the
error indicator, just like fgetwc(3) did before i fixed it today.
Setting the error indicator is required by the manual and by the
standard.
Hmm, the C standard and POSIX have slightly different texts regarding
this.

Quoting POSIX-2013:
-->8--
RETURN VALUE

Upon successful completion, fputwc() shall return wc. Otherwise, it
shall return WEOF, the error indicator for the stream shall be set,
[CX] and errno shall be set to indicate the error.
--<8--

Quoting the C99 draft, section 7.24.3.3:
--8<--
Returns

3 The fputwc function returns the wide character written. If a write
error occurs, the error indicator for the stream is set and fputwc
returns WEOF. If an encoding error occurs, the value of the macro EILSEQ
is stored in errno and fputwc returns WEOF.
-->8--

Note that the C11 draft has the same text.

So, the C standard doesn't say that the error indicator should be set on
the FILE in case of an encoding error, it only speaks about errno being
set to EILSEQ. I'd say that we should follow the C standard here -
after all "This volume of POSIX.1-2008 defers to the ISO C standard".

The same can be said about the fgetwc(3) diff that was recently
committed, and for which I gave you an ok.
Post by Ingo Schwarze
Instead, it overrides errno again. That's pointless because
wcrtomb(3) already did that, and is required to do so by the standard.
It is even slightly dangerous because it might hide internal coding
errors in libc. For example, if libc would ever neglect to initialize
the state object correctly, wcrtomb(3) might fail with EINVAL.
That should NOT be plastered over by setting errno to EILSEQ.
I agree that errno doesn't need to be overriden, but please leave _flags
alone.

[...]
Post by Ingo Schwarze
Index: fputwc.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c 1 Oct 2015 02:32:07 -0000 1.6
+++ fputwc.c 24 Dec 2015 21:01:10 -0000
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)
size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
- errno = EILSEQ;
+ fp->_flags |= __SERR;
return WEOF;
}
--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Anthony J. Bentley
2015-12-26 23:17:24 UTC
Permalink
Hi Jérémie,
Post by Jérémie Courrèges-Anglas
Hmm, the C standard and POSIX have slightly different texts regarding
this.
-->8--
RETURN VALUE
Upon successful completion, fputwc() shall return wc. Otherwise, it
shall return WEOF, the error indicator for the stream shall be set,
[CX] and errno shall be set to indicate the error.
...
Post by Jérémie Courrèges-Anglas
So, the C standard doesn't say that the error indicator should be set on
the FILE in case of an encoding error, it only speaks about errno being
set to EILSEQ. I'd say that we should follow the C standard here -
after all "This volume of POSIX.1-2008 defers to the ISO C standard".
The [CX] there means it's an intentional extension of ISO C. This is
more obvious in the HTML copy of POSIX,
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html

Quoting more POSIX (the [CX] defintion),

The functionality described is an extension to the ISO C standard.
Application developers may make use of an extension as it is
supported on all POSIX.1-2008-conforming systems.

With each function or header from the ISO C standard, a statement to
the effect that "any conflict is unintentional" is included. That is
intended to refer to a direct conflict. POSIX.1-2008 acts in part as
a profile of the ISO C standard, and it may choose to further
constrain behaviors allowed to vary by the ISO C standard.
--
Anthony J. Bentley
Jérémie Courrèges-Anglas
2015-12-26 23:42:52 UTC
Permalink
Post by Anthony J. Bentley
Hi Jérémie,
Hi,
Post by Anthony J. Bentley
Post by Jérémie Courrèges-Anglas
Hmm, the C standard and POSIX have slightly different texts regarding
this.
-->8--
RETURN VALUE
Upon successful completion, fputwc() shall return wc. Otherwise, it
shall return WEOF, the error indicator for the stream shall be set,
[CX] and errno shall be set to indicate the error.
...
Post by Jérémie Courrèges-Anglas
So, the C standard doesn't say that the error indicator should be set on
the FILE in case of an encoding error, it only speaks about errno being
set to EILSEQ. I'd say that we should follow the C standard here -
after all "This volume of POSIX.1-2008 defers to the ISO C standard".
The [CX] there means it's an intentional extension of ISO C.
The way I read it, [CX] here only affects the last part of the sentence,

"and errno shall be set to indicate the error."

(because the C standard doesn't require errno to be set in all error
cases.)

Thus [CX] here doesn't apply to the part that speaks about the error
indicator.
Post by Anthony J. Bentley
This is
more obvious in the HTML copy of POSIX,
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fputwc.html
Quoting more POSIX (the [CX] defintion),
The functionality described is an extension to the ISO C standard.
Application developers may make use of an extension as it is
supported on all POSIX.1-2008-conforming systems.
With each function or header from the ISO C standard, a statement to
the effect that "any conflict is unintentional" is included. That is
intended to refer to a direct conflict. POSIX.1-2008 acts in part as
a profile of the ISO C standard, and it may choose to further
constrain behaviors allowed to vary by the ISO C standard.
--
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Todd C. Miller
2015-12-29 22:34:26 UTC
Permalink
On Sun, 27 Dec 2015 00:42:52 +0100, =?utf-8?Q?J=C3=A9r=C3=A9mie_Courr=C3=A8ges-
Post by Jérémie Courrèges-Anglas
The way I read it, [CX] here only affects the last part of the sentence,
"and errno shall be set to indicate the error."
(because the C standard doesn't require errno to be set in all error
cases.)
Thus [CX] here doesn't apply to the part that speaks about the error
indicator.
Correct, it is the setting of errno that is the extension as can
be seen in the html version where that phrase is bracketed by the
[Option Start] and [Option End] glyphs (see the alt text).

Since POSIX defers to ISO C we should be following the ISO C standard
with respect to behavior when an encoding error occurs. As such,
I've changed my mind and now believe we should not be setting the
error flag in those cases.

- todd
Philip Guenther
2015-12-30 02:45:47 UTC
Permalink
On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller
<***@courtesan.com> wrote:
...
Post by Todd C. Miller
Since POSIX defers to ISO C we should be following the ISO C standard
with respect to behavior when an encoding error occurs. As such,
I've changed my mind and now believe we should not be setting the
error flag in those cases.
That sounds like it should be reported to austin-group so they can fix
the misleading wording.


Philip Guenther
Todd C. Miller
2015-12-28 17:28:23 UTC
Permalink
OK ***@.

I believe there are also instances of the same problem in the
non-wide code. In fact, we don't even document that the error
indicator is set in putc(3)! Someone else may wish to tackle that
one :-)

- todd
Ted Unangst
2015-12-30 06:22:17 UTC
Permalink
Post by Philip Guenther
On Tue, Dec 29, 2015 at 2:34 PM, Todd C. Miller
...
Post by Todd C. Miller
Since POSIX defers to ISO C we should be following the ISO C standard
with respect to behavior when an encoding error occurs. As such,
I've changed my mind and now believe we should not be setting the
error flag in those cases.
That sounds like it should be reported to austin-group so they can fix
the misleading wording.
What do solaris and glibc do? I'd expect the wording to be clarified in the
direction of existing practice.
Ingo Schwarze
2016-01-10 18:55:53 UTC
Permalink
Hi,
Post by Philip Guenther
Post by Todd C. Miller
Since POSIX defers to ISO C we should be following the ISO C standard
with respect to behavior when an encoding error occurs. As such,
I've changed my mind and now believe we should not be setting the
error flag in those cases.
That sounds like it should be reported to austin-group so they can fix
the misleading wording.
What do solaris and glibc do? I'd expect the wording to be clarified
in the direction of existing practice.
fgetwc(3):
FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234).
NetBSD sets the error flag since July 3, 2006 (rev. 1.5), referencing SUSv3.
Dragonfly sets the error flag since April 21, 2009,
(e0f95098eeba0176864b9cafe6d69b5b7bc0e73f), sync from FreeBSD.

fputwc(3):
FreeBSD sets the error flag since Oct 16, 2002 (rev. 105234).
NetBSD does NOT set the error flag.
Dragonfly sets the error flag since Sep 29, 2013,
(0d5acd7467c4e95f792ef49fceb3ab8e917ce86b), sync from FreeBSD.


In don't understand the illumos code at all, and the glibc code
looks even worse, but i tested on these systems, and they set the
error flag for encoding errors in fgetwc(3):

SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc SUNW,SPARC-Enterprise-T5220
Linux donnerwolke.usta.de 3.16.0-4-686-pae #1 \
SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) i686 GNU/Linux

This one does not set the error flag:

SunOS unstable9s 5.9 Generic_Virtual sun4u sparc SUNW,SPARC-Enterprise-T5220

Solaris and Linux tests of fputwc(3) are inconclusive, i don't
understand how to set up invalid wchar_t values on Solaris and
Linux.


So, my conclusion is that it's the C standard that is carelessly
worded, not POSIX. I don't think the C standard intends to say
that fgetwc(3) and fputwc(3) are not allowed to set the error
indicator on an encoding error, it just doesn't clearly say that
they are required to. POSIX then adds the additional requirement,
merely failing to make it clear that it's resolving a careless
wording in the C standard.

The interpretation that setting the error indicator is deliberately
forbidden is not reasonable because that would make detection of
encoding errors fragile, complicated, and error-prone.

Assuming the POSIX specification, fgetwc(3) is easy and safe to
use:

if (fgetwc(...) == WEOF) { /* EOF or error */
if (ferror(...)) /* error */
inspect errno for the reason
else /* EOF */
...
}

Assuming setting the error indicator is forbidden, there is one way
of using the interface that is counter-intutive and fragile:

if (fgetwc(...) == WEOF) { /* EOF or error */
if (feof(...)) /* EOF */
...
else /* error, no matter what ferror(...) */
inspect errno for the reason
}

There is another way of using the error indicator that is safe,
but so contorted that we should not recommend it:

int save_errno = errno;
errno = 0;
if (fgetwc(...) == WEOF) { /* EOF or error */
if (errno == EILSEQ) /* encoding error */
...
else if (ferror(...)) /* read error */
inspect errno for the reason
else /* EOF */
...
}
errno = save_errno


I think we should NOT revert the patch i committed that sets the
error indicator in fgetwc(3). It's reasonable and agrees with what
everybody else does. I think we should also commit the following
patch to fputwc(3), for the same reasons, and for consistency.

OK?
Ingo


Index: fputwc.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c 1 Oct 2015 02:32:07 -0000 1.6
+++ fputwc.c 10 Jan 2016 18:49:37 -0000
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)

size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
- errno = EILSEQ;
+ fp->_flags |= __SERR;
return WEOF;
}
Todd C. Miller
2016-01-10 19:33:07 UTC
Permalink
Post by Ingo Schwarze
So, my conclusion is that it's the C standard that is carelessly
worded, not POSIX. I don't think the C standard intends to say
that fgetwc(3) and fputwc(3) are not allowed to set the error
indicator on an encoding error, it just doesn't clearly say that
they are required to. POSIX then adds the additional requirement,
merely failing to make it clear that it's resolving a careless
wording in the C standard.
POSIX marks its extensions to ISO C, but this behavior is not
documented as an extension. I suggest you contant the Austin group
for clarification.

- todd
Ingo Schwarze
2016-01-10 21:22:28 UTC
Permalink
Hi Todd,
Post by Todd C. Miller
Post by Ingo Schwarze
So, my conclusion is that it's the C standard that is carelessly
worded, not POSIX. I don't think the C standard intends to say
that fgetwc(3) and fputwc(3) are not allowed to set the error
indicator on an encoding error, it just doesn't clearly say that
they are required to. POSIX then adds the additional requirement,
merely failing to make it clear that it's resolving a careless
wording in the C standard.
POSIX marks its extensions to ISO C, but this behavior is not
documented as an extension. I suggest you contant the Austin group
for clarification.
Done:

http://austingroupbugs.net/view.php?id=1022

Let's wait for feedback, then take that into consideration for our
decision.

Yours,
Ingo
Ingo Schwarze
2016-01-23 20:14:24 UTC
Permalink
Hi,
Post by Ingo Schwarze
Post by Todd C. Miller
Post by Ingo Schwarze
So, my conclusion is that it's the C standard that is carelessly
worded, not POSIX. I don't think the C standard intends to say
that fgetwc(3) and fputwc(3) are not allowed to set the error
indicator on an encoding error, it just doesn't clearly say that
they are required to. POSIX then adds the additional requirement,
merely failing to make it clear that it's resolving a careless
wording in the C standard.
POSIX marks its extensions to ISO C, but this behavior is not
documented as an extension. I suggest you contant the Austin group
for clarification.
http://austingroupbugs.net/view.php?id=1022
Let's wait for feedback, then take that into consideration for our
decision.
So, here is the result. The only answer that can be taken
:: ----------------------------------------------------------------------
:: (0003030) schwarze (reporter) - 2016-01-22 02:02
:: http://austingroupbugs.net/view.php?id=1022#c3030
:: ----------------------------------------------------------------------
:: Re: 0003027
::
:: Thanks for your feedback, geoffclare. To make really sure that i
:: understand correctly what you are saying: Your intention is that POSIX
:: should not be changed, that all the operating systems i listed should
:: continue what they are already doing, and that the C standard should be
:: amended to also require setting the error flag in case of an encoding
:: error. Right?

: I imagine that's what we would propose to the C commitee. Whether
: they accept our recommendation is another matter.

In conclusion, i would consider it a very bad idea to change all the
operating systems to not set the error indicator, violating POSIX
and making correct coding harder, just to please the not very nice
C standard, then possibly change them all a second time once the C
standard gets fixed.

Consequently, i propose to not revert our fgetwc(3) patch and to
commit this fputwc(3) patch, too, making us agree with FreeBSD,
NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
violating the C standard (but in a way that seems less dangerous
than the inverse violation of POSIX).

By the way, i'm running with this ever since and didn't notice
any adverse effects.

OK?
Ingo


Index: fputwc.c
===================================================================
RCS file: /cvs/src/lib/libc/stdio/fputwc.c,v
retrieving revision 1.6
diff -u -p -r1.6 fputwc.c
--- fputwc.c 1 Oct 2015 02:32:07 -0000 1.6
+++ fputwc.c 23 Jan 2016 20:00:59 -0000
@@ -62,7 +62,7 @@ __fputwc_unlock(wchar_t wc, FILE *fp)

size = wcrtomb(buf, wc, st);
if (size == (size_t)-1) {
- errno = EILSEQ;
+ fp->_flags |= __SERR;
return WEOF;
}
Todd C. Miller
2016-01-23 20:27:49 UTC
Permalink
Post by Ingo Schwarze
In conclusion, i would consider it a very bad idea to change all the
operating systems to not set the error indicator, violating POSIX
and making correct coding harder, just to please the not very nice
C standard, then possibly change them all a second time once the C
standard gets fixed.
Consequently, i propose to not revert our fgetwc(3) patch and to
commit this fputwc(3) patch, too, making us agree with FreeBSD,
NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
violating the C standard (but in a way that seems less dangerous
than the inverse violation of POSIX).
OK ***@.

- todd
Ted Unangst
2016-01-23 22:19:58 UTC
Permalink
Post by Ingo Schwarze
Consequently, i propose to not revert our fgetwc(3) patch and to
commit this fputwc(3) patch, too, making us agree with FreeBSD,
NetBSD, Dragonfly, SunOS, glibc, and POSIX, even tough nominally
violating the C standard (but in a way that seems less dangerous
than the inverse violation of POSIX).
when in doubt, do what the cool kids do. :)

Loading...