From the first ISO C++ standard C++98
, this is described in 2.5/ Alternative tokens [lex.digraph]
:
- Alternative token representations are provided for some operators and punctuators.
- In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling. The set of alternative tokens is defined in Table 2.
Table 2 - Alternative tokens
alternative primary | alternative primary | alternative primary
--------------------+---------------------+--------------------
<% { | and && | and_eq &=
%> } | bitor | | or_eq |=
<: [ | or || | xor_eq ^=
:> ] | xor ^ | not !
%: # | compl ~ | not_eq !=
%:%: ## | bitand & |
So it's been around since the earliest days of the C++ standardisation process. The reason so few people are aware of it is likely because the main use case was for people operating in environments where the full character set wasn't necessarily available. For example (and this is stretching my memory), the baseline EBCDIC character set on the IBM mainframes did not have the square bracket characters [
and ]
.
This question (about the closely related digraphs) has the answer.
It boils down to the fact that the ISO 646 character set doesn't have all the characters of the C syntax, so there are some systems with keyboards and displays that can't deal with the characters (though I imagine that these are quite rare nowadays).
In general, you don't need to use them, but you need to know about them for exactly the problem you ran into. Trigraphs are the reason the the '?
' character has an escape sequence:
'\?'
So a couple ways you can avoid your example problem are:
printf( "What?\?!\n" );
printf( "What?" "?!\n" );
But you have to remember when you're typing the two '?' characters that you might be starting a trigraph (and it's certainly never something I'm thinking about).
In practice, trigraphs and digraphs are something I don't worry about at all on a day-to-day basis. But you should be aware of them because once every couple years you'll run into a bug related to them (and you'll spend the rest of the day cursing their existance). It would be nice if compilers could be configured to warn (or error) when it comes across a trigraph or digraph, so I could know I've got something I should knowingly deal with.
And just for completeness, digraphs are much less dangerous since they get processed as tokens, so a digraph inside a string literal won't get interpreted as a digraph.
For a nice education on various fun with punctuation in C/C++ programs (including a trigraph bug that would defintinely have me pulling my hair out), take a look at Herb Sutter's GOTW #86 article.
Addendum:
It looks like GCC will not process (and will warn about) trigraphs by default. Some other compilers have options to turn off trigraph support (IBM's for example). Microsoft started supporting a warning (C4837) in VS2008 that must be explicitly enabled (using -Wall or something).
Best Answer
Digraphs were created for programmers that didn't have a keyboard which supported the ISO 646 character set.
http://en.wikipedia.org/wiki/C_trigraph