Let's de-obfuscate it.
Indenting:
main(_) {
_^448 && main(-~_);
putchar(--_%64
? 32 | -~7[__TIME__-_/8%8][">'txiZ^(~z?"-48] >> ";;;====~$::199"[_*2&8|_/64]/(_&2?1:8)%8&1
: 10);
}
Introducing variables to untangle this mess:
main(int i) {
if(i^448)
main(-~i);
if(--i % 64) {
char a = -~7[__TIME__-i/8%8][">'txiZ^(~z?"-48];
char b = a >> ";;;====~$::199"[i*2&8|i/64]/(i&2?1:8)%8;
putchar(32 | (b & 1));
} else {
putchar(10); // newline
}
}
Note that -~i == i+1
because of twos-complement. Therefore, we have
main(int i) {
if(i != 448)
main(i+1);
i--;
if(i % 64 == 0) {
putchar('\n');
} else {
char a = -~7[__TIME__-i/8%8][">'txiZ^(~z?"-48];
char b = a >> ";;;====~$::199"[i*2&8|i/64]/(i&2?1:8)%8;
putchar(32 | (b & 1));
}
}
Now, note that a[b]
is the same as b[a]
, and apply the -~ == 1+
change again:
main(int i) {
if(i != 448)
main(i+1);
i--;
if(i % 64 == 0) {
putchar('\n');
} else {
char a = (">'txiZ^(~z?"-48)[(__TIME__-i/8%8)[7]] + 1;
char b = a >> ";;;====~$::199"[(i*2&8)|i/64]/(i&2?1:8)%8;
putchar(32 | (b & 1));
}
}
Converting the recursion to a loop and sneaking in a bit more simplification:
// please don't pass any command-line arguments
main() {
int i;
for(i=447; i>=0; i--) {
if(i % 64 == 0) {
putchar('\n');
} else {
char t = __TIME__[7 - i/8%8];
char a = ">'txiZ^(~z?"[t - 48] + 1;
int shift = ";;;====~$::199"[(i*2&8) | (i/64)];
if((i & 2) == 0)
shift /= 8;
shift = shift % 8;
char b = a >> shift;
putchar(32 | (b & 1));
}
}
}
This outputs one character per iteration. Every 64th character, it outputs a newline. Otherwise, it uses a pair of data tables to figure out what to output, and puts either character 32 (a space) or character 33 (a !
). The first table (">'txiZ^(~z?"
) is a set of 10 bitmaps describing the appearance of each character, and the second table (";;;====~$::199"
) selects the appropriate bit to display from the bitmap.
The second table
Let's start by examining the second table, int shift = ";;;====~$::199"[(i*2&8) | (i/64)];
. i/64
is the line number (6 to 0) and i*2&8
is 8 iff i
is 4, 5, 6 or 7 mod 8.
if((i & 2) == 0) shift /= 8; shift = shift % 8
selects either the high octal digit (for i%8
= 0,1,4,5) or the low octal digit (for i%8
= 2,3,6,7) of the table value. The shift table ends up looking like this:
row col val
6 6-7 0
6 4-5 0
6 2-3 5
6 0-1 7
5 6-7 1
5 4-5 7
5 2-3 5
5 0-1 7
4 6-7 1
4 4-5 7
4 2-3 5
4 0-1 7
3 6-7 1
3 4-5 6
3 2-3 5
3 0-1 7
2 6-7 2
2 4-5 7
2 2-3 3
2 0-1 7
1 6-7 2
1 4-5 7
1 2-3 3
1 0-1 7
0 6-7 4
0 4-5 4
0 2-3 3
0 0-1 7
or in tabular form
00005577
11775577
11775577
11665577
22773377
22773377
44443377
Note that the author used the null terminator for the first two table entries (sneaky!).
This is designed after a seven-segment display, with 7
s as blanks. So, the entries in the first table must define the segments that get lit up.
The first table
__TIME__
is a special macro defined by the preprocessor. It expands to a string constant containing the time at which the preprocessor was run, in the form "HH:MM:SS"
. Observe that it contains exactly 8 characters. Note that 0-9 have ASCII values 48 through 57 and :
has ASCII value 58. The output is 64 characters per line, so that leaves 8 characters per character of __TIME__
.
7 - i/8%8
is thus the index of __TIME__
that is presently being output (the 7-
is needed because we are iterating i
downwards). So, t
is the character of __TIME__
being output.
a
ends up equalling the following in binary, depending on the input t
:
0 00111111
1 00101000
2 01110101
3 01111001
4 01101010
5 01011011
6 01011111
7 00101001
8 01111111
9 01111011
: 01000000
Each number is a bitmap describing the segments that are lit up in our seven-segment display. Since the characters are all 7-bit ASCII, the high bit is always cleared. Thus, 7
in the segment table always prints as a blank. The second table looks like this with the 7
s as blanks:
000055
11 55
11 55
116655
22 33
22 33
444433
So, for example, 4
is 01101010
(bits 1, 3, 5, and 6 set), which prints as
----!!--
!!--!!--
!!--!!--
!!!!!!--
----!!--
----!!--
----!!--
To show we really understand the code, let's adjust the output a bit with this table:
00
11 55
11 55
66
22 33
22 33
44
This is encoded as "?;;?==? '::799\x07"
. For artistic purposes, we'll add 64 to a few of the characters (since only the low 6 bits are used, this won't affect the output); this gives "?{{?}}?gg::799G"
(note that the 8th character is unused, so we can actually make it whatever we want). Putting our new table in the original code:
main(_){_^448&&main(-~_);putchar(--_%64?32|-~7[__TIME__-_/8%8][">'txiZ^(~z?"-48]>>"?{{?}}?gg::799G"[_*2&8|_/64]/(_&2?1:8)%8&1:10);}
we get
!! !! !!
!! !! !! !! !! !! !! !! !!
!! !! !! !! !! !! !! !! !!
!! !! !! !!
!! !! !! !! !! !! !! !! !!
!! !! !! !! !! !! !! !! !!
!! !! !!
just as we expected. It's not as solid-looking as the original, which explains why the author chose to use the table he did.
Just about every modern operating system will recover all the allocated memory space after a program exits. The only exception I can think of might be something like Palm OS where the program's static storage and runtime memory are pretty much the same thing, so not freeing might cause the program to take up more storage. (I'm only speculating here.)
So generally, there's no harm in it, except the runtime cost of having more storage than you need. Certainly in the example you give, you want to keep the memory for a variable that might be used until it's cleared.
However, it's considered good style to free memory as soon as you don't need it any more, and to free anything you still have around on program exit. It's more of an exercise in knowing what memory you're using, and thinking about whether you still need it. If you don't keep track, you might have memory leaks.
On the other hand, the similar admonition to close your files on exit has a much more concrete result - if you don't, the data you wrote to them might not get flushed, or if they're a temp file, they might not get deleted when you're done. Also, database handles should have their transactions committed and then closed when you're done with them. Similarly, if you're using an object oriented language like C++ or Objective C, not freeing an object when you're done with it will mean the destructor will never get called, and any resources the class is responsible might not get cleaned up.
Best Answer
So, the file you have indeed does have two different single-byte character encodings on each line. That's quite the technical feat to have managed with any regular text editor! :)
Let's take the
hören
line68 3f 72 65 6e 20 2d 20 f1 eb f3 f8 e0 ec
as an example, but I'm going to modify it a bit because the hex dump you're showing is already broken; the byte3F
is the question mark, not what would beö
in ISO-8859-1 (F6
).I'm going to use Python to illustrate the problems you'll face because it's good at dealing with various encodings.
If we just decode the hexadecimal encoding of those bytes into a bytestring, we can see its Python
repr
esentation, where all of the printable 7-bit ASCII bytes are shown as themselves, but everything else is shown as an escape sequence. Don't be fooled, this is not human-readable text, it's just a sequence of bytes that partially looks readable.Alright, so let's try to decode this into text as ISO-8859-1 (aka latin-1) (which is near to the CP1252 codepage).
We can see that the
ö
for hören was decoded well, but the Cyrillic is unreadable mojibake.Let's do it the other way, then:
The German turns out a bit unfortunate, because the byte
\xf6
is interpreted asц
in CP1251 but the Russian checks out (according to Google Translate anyway).So – if we were using Python, we'd decode this by splitting it and decoding each half:
(and this indeed prints out just fine on my Mac's terminal, and would also do so in Python UTF-8 Mode on Windows).
Now, back to C land: the issue is that
fgets()
and friends don't give a darn about encodings – they're all just bytes (thoughfgets()
knows that the byte 0x0a (10 in decimal) is the newline character in ASCII encoding, and stops reading there).When you read those bytes, you get exactly those bytes, and it's up to your app to interpret them. When you output those bytes using
printf()
on your regular Windows terminal, it will use the current console output codepage to translate the bytes into glyphs.Technically, you could output these files correctly in your Windows terminal with something like
SetConsoleOutputCP(1252);
)SetConsoleOutputCP(1251);
)... rinse and repeat.
Another option would be to read your input into Unicode codepoints, e.g. UTF-8 or UTF-16. You'd still have to interpret each half of the lines differently, and UTF-8 in particular is a variable-width encoding, so you can't trust
strlen()
to give you the actual human-eyes length of a string anymore, but at least your playing ground would be level enough so you could use some of the answers in Properly print utf8 characters in windows console.