Tuesday, August 4, 2009

That Ambiguous Comma Operator

The ternary 'conditional operator' can be tricky sometimes. If you've ever looked at The 12 Bugs of Christmas and been frightened off by them, congratulations on your humanity.

After seeing a particularly confusing piece of code today, of the form:

func(a, b ? c : d, e);

I began to ponder whether func takes 2 parameters or 3, and what the heck I should assume is going on here. It also (for the first time) made the question of "does d,e evaluate to d, or to e" relevant to me. So I wrote a test program!

First, let's see what happens when we use commas in the ternary operator during assignment:

num = v0 ? v1 : v2, v3; /* each of these is a variable vn where n is the value */

This evaluates to 2. Simple enough-- let's move on.

What happens within 'func(0 ? 1 : 2, 3);'?

This case ends up being straight forward. The comma is taken as part of the parameter list and not as part of the conditional operator, regardless of the prototype of the function. I'd started out thinking that whatever works might be done (if func took 2 ints, this would treat the comma as a parameter separator, and if func took 1 int, it would be a comma operator inside the condition) - but it does not. The comma is strictly for parameters in this case. Neat - but I'm still curious what would happen to it used as a comma operator here. Should be the same as before, right? Let's force it to be used in the conditional and find out.

func((0 ? 1 : 2, 3), -1);

This is the function I tested. The second parameter (-1) isn't necessary for anything other than illustration, and so that I didn't have to rewrite the function. 'func' Now becomes meaningful to specify: it takes two ints and prints them out in the left-right order in which they were received. This is the output:

one, two: 3, -1

Wait a minute. This time, the comma operator's right hand value was treated as the expression's value! That's not the same as before! What the heck? Let's check the spec*:

3.3.17 Comma operator

Syntax

expression:
assignment-expression
expression , assignment-expression

Semantics

The left operand of a comma operator is evaluated as a void
expression; there is a sequence point after its evaluation. Then the
right operand is evaluated; the result has its type and value./43/

Example

As indicated by the syntax, in contexts where a comma is a
punctuator (in lists of arguments to functions and lists of
initializers) the comma operator as described in this section cannot
appear. On the other hand, it can be used within a parenthesized
expression or within the second expression of a conditional operator
in such contexts. In the function call

f(a, (t=3, t+2), c)

the function has three arguments, the second of which has the value 5.

Forward references: initialization ($3.5.7).
My own emphasis is added in the 'Semantics' paragraph. Clearly the right-hand operand is the one which should normally provide the value. What happened in the assignment then, which caused this to be reversed?

To the best of my knowledge, the answer is actually inside of the 'Example' paragraph. I quote,
in contexts where a comma is a
punctuator (in lists of arguments to functions and lists of
initializers) the comma operator as described in this section cannot
appear.
So the issue is that this comma is (again) not treated as part of the ternary conditional operator, but treated as an illegal comma operator floating around on the far right of an initialization. My guess is that this
If a ``shall'' or ``shall not'' requirement that appears outside of
a constraint is violated, the behavior is undefined.

(from the definition of 'undefined behaviour')
is what is allowing the situation to occur. A comma can not be placed within an initialization list and be treated as a comma operator. So if you force that to happen... whatever the heck the comma and the people who wrote the compiler have agreed on could occur.

In this particular case, it looks like what they've done (this is in gcc) is just ignore the comma and everything after it until the semicolon. Thus in the line

num = v0 ? v1 : v2, v3;

num evaluates to 2, and the ", v3" just doesn't enter the picture. Giving us our unexpected result.

Crazy cool.



* You may have noticed that this is not coming from an official ANSI or ISO standards site, and if you're particularly astute, you may have also noticed that this is merely a draft of the C90 spec. That's because I found it first and feel like C90 is good enough. But I checked the C99 spec as well (again just a draft), and the only modification is an explicit statement that a comma operator does not yield an lvalue. I only looked at drafts because ISO and ANSI are mean and don't like people to see specifications without paying money. I'm cheap, so our learning is lessened. :P

** What? a double star? That's right folks, you get a bonus note! During the writing of this post, I realized that my test program was being rather silly. I made several variables 'v0', 'v1', etc to hold simple int values. It was only while writing that I recognized I could have simply used '0', '1', etc, and that this would be less confusing to read about. I originally wrote

num = 0 ? 1 : 2, 3;

As the first expression, which now uses the variables. I didn't think this could possibly cause any difference - and was wrong! When writing a second program later to do some different tests, I used my newfound sensibility and bypassed the creation of variables. I was given the following unfriendly error code:

newterntest.c:5: error: syntax error before numeric constant

It took just a moment to recognize that this was because of the comma! The comma is (as we've discussed) unexpected and not allowed here, but it looks like this case is obviously wrong enough to the compiler writers that they generate a syntax error before the integer '3'. So whileThis on its own isn't too crazy - the crazy part is that

num = v0 ? v1 : v2, v3;

doesn't even throw a warning under -Wall -pedantic (with or without -ansi as well), while the literal integer actually causes a syntax error. Every other parenthesized use of comma operators in this situation gives at least a "warning: left-hand operand of comma expression has no effect", but our code draws a blank!

General weirdness.

edit: removed the pre/pre tags from the blockquotes. They were there by default, made the quoted text appear in a plaintext-ish font and I felt they added to the post. But in the published version, they rendered underneath the righthand column. So they're gone.

No comments: