promotion of literals when compared to a object of type char

jeremiah · Joined: 20 Jul 2010 Posts: 1319

I was curious as the method the CCS compiler is supposed to be using when comparing a variable to a #define'd literal.

Take for example:

RF_Developer · Joined: 07 Feb 2011 Posts: 839

Maybe a pertinent question is why it even compiles?!

char2 has no type, only the type modifier unsigned. That should throw up a syntax error, but doesn't. So, the compiler must be assuming a type, as it does indeed compile. What then is that assumed type? Int, as its the most basic of all the types? No, that doesn't seem likely given what its doing... int16? Who knows? Unsigned yes, hence no sign extension, but what precisely?

The fact is that this code shouldn't actually compile at all. That it generates inconsistent code when it does (and my PCWH 4.107 generated different code from both examples, using subtraction for the conparison, though it did treat both types the same) perhaps shouldn't come as that much of a surprise.

I had issues with byte variables declared via a typedef being handled differently to otherwise similar varaible declared directly. That too involved some sort of confusion between two bytes and one.

jeremiah · Joined: 20 Jul 2010 Posts: 1319

That should have been "unsigned char", but it compiles because I believe unsigned is a type, it's the unsigned version of the default type for the compiler. That explains why the 2nd got promoted to 16 bits even after setting the #define to '\xB5'. I goofed on the type. However, there is still an oddity.

I fixed it to be unsigned char char2 and made the #define be 0xB5 (the original case) and noticed the following:

RF_Developer · Joined: 07 Feb 2011 Posts: 839

Yes, odd indeed... however the CCS compilers are known to do this sort of thing occasionally :-(( As I wrote, I found another related anomily which in that case has since been fixed. Code generation is also dependant on the processor, which is your case is a 24F while I am using 18Fs. Is the size of int 8 or 16 bits in the 24F version of CCS C?

On the technical aspects, which I feel do have some significance here, there is no "default type" in C. "unsigned" is not a type, its a type *modifier* and cannot syntactially be used without a type, such as char or int etc, to modify. I repeat, the compiler should be flagging up the use of unsigned without a type as a syntax error... but doesn't in this case. Yes, it does appear to be assuming a "default" type, and it appears in your case to be something 16 bits, but what that type is is unknown, and we cannot rely on it. It shouldn't be doing this at all. In any case the meaning of char has changed over the evolution of C. K & R defined it as the smallest addressable unit of the machine. That generally meant 8 bits, and that characters were ASCII coded as PDP-11s were to commonest hosts for C. C was developed as part of the migration of Unix to the PDP-11 - 16 bit, byte addressed - from the PDP-7 - an 18 bitter, probably with 6 bit characters.

Its perhaps a somewhat arcane comment, but in terms of characters, what is the meaning and relevance of "signed" and "unsigned"? Its only as an abstracted unit of memory, and one on which arithmetic computations can be performed, does signedness come into play. That abstraction allows code such as the familar int digit = Number_Ch - '0'; to work, but what, exactly, is a 'negative character'? Ints are another matter, and were defined as the most natural memory unit if the machine. CCS C follows the K & R definition and makes int 8 bits... it then makes it *unsigned* which is odd in any sense. Odd as in not K & R nor ANSI. I know I'll offend someone by saying it but ANSI C attempts to force the x86 model on to C. Or at least makes it the default and the expected. Many people think an ANSI int is always, and must be 32 bit signed, but they needn't be and sometimes are not. How would ANSI C map on to the PDP-7, or any 18 or 36 bitter?

All that however is only peripheral to what's going on here. What's going on is that CCS C sometime generates inconsistent code in some circumstances. Yes, it does unfortuately. What we need to work out is how to avoid falling foul of that inconsistent behaviour. I suggest keeping types simple and direct. Such as avoiding using unsigned with char. If you want to deal with 8 bit numbers, then use int8 or signed int8, if you want to deal with characters then use char and forget about signing. Confine arithmetic to the cases it makes sense and be careful when mixing types. Use casts to force type conversions, even when not strictly syntactically and sematically necessary. Debug carefully, single stepping to make sure the generated code is really doing what you expect. I'm pretty sure you already do all this, as that most likely how you found this compiler bug in the first place.

Another possibly relevant technical point to bear in mind is that in C character literals, including '\xB5', are not defined as type char, they are ints. In CCS 18F C this should have no effect as int and char are both 8 bits. Also bear in mind that CCS C is NOT ANSI compliant, nor pretends to be. Nor even is it fully compliant even if we tell it to be, its just rather more so. Anyway, this technicality may have some revelance when it comes to 0xB5, a hex literal, and therefore *untyped* representation of a integer, and '\xB5' which is an int character literal. We might well expect both to fit nicely in a char, as I'm sure you and I do, but its not all that straightforward. Should we expect 0x00B5 to fit in a char? Well yes, we do, but SHOULD we? These are just some if the issues that langauge definers and compiler writer face. I'm reminded of C# where chars are unicode and definitely NOT 8 bits. Indeed explicit conversion is required to deal with ASCII coded byte streams, such as sent and received by our PICs. It came as a surprise to me that conversion was required, an d not assumed, but actually it makes sense when I consider the implications of how C# strings are defined and implemented. So we must be careful about assuming a simple one to one correspondence between 8 bits bytes and chars and between any particular form of int.

Happy coding!

Ttelmah · Joined: 11 Mar 2010 Posts: 19245

Yes.
There are some comments though. If you read the #type manual. you find the comment:
"The #TYPE directive allows the keywords UNSIGNED and SIGNED to set the default data type."

Depending on how you read this....., it appears that this may also adjust the default 'size' allocated to a signed or unsigned variable.

If you try:

unsigned fred;

The compiler does not complain.

However add the #type declaration:

#type signed

and try the 'unsigned' declaration again, and now the compiler complains.

So it appears first as if CCS, is defaulting to a particular size for 'unsigned', unless you change this yourself....

Now, the comments about data sizes, are why the 'int8, int16' etc., types were launched, so that there _was_ a fixed size associated with a type, and why really these are the much safer forms to use.

A character literal _in single inverted commas_, _is_ defined as a character. Quote from K&R:
"A character constant is a sequence of one or more characters enclosed in single quotes. The value of a character constant with only one character is the numeric value of the character in the machine's character set at execution time. The value of a multi-character constant is implementation defined" Then it goes on to describe the multi-character constants like \n, and the octal and hex escapes.

So by enclosing the hex value in single inverted commas, the statement is effectively being converted to:

#define TEST_CHAR (char)(0xB5)

Now, it would be interesting to try using 'BYTE', rather than 'char'. BYTE, is #defined as an unsiged int8, giving a forced size. Char is defaulted to being the same type as an int, on CCS, but this charges on the PIC24's, where int increases to an int16, and char remains as int8. It is appearing as if in the change to make the signed directive also affect char, is also making it default to int16.....

You could also use a forced type statement, to make the sizes stay the way you want.

This sort of problem, is why it is posted again and again here 'treat new CCS releases as beta releases at best', and why the download page also offers 4.099. Basically unless you 'need' the latest, because support is added for a processor you want to use, or there is a specific bugfix for a fault you know you have, it is much easier to stick with slightly older releases.....

I'd report the change, and see if it is fixed in a few releases time.

Best Wishes

jeremiah · Joined: 20 Jul 2010 Posts: 1319

@RF_Developer
default int type for mine is 16 bits atm, but characters are treated as 8bits. The promotion to 16 bits wouldn't really surprise me if it did it for both an unsigned char and a signed char types, which have the same size, just different representations (well other than it didn't use to promote and now it does, but that is fine now that I know).

@Ttelmah
Thanks for the response! I really appreaciate it.

Oddly enough, I myself tend to use only int8, int16, etc. One of our other programmers wrote the code that was affected. I just put it into a test program to supply to CCS and here.

I definitely reported the issue to CCS. Mostly concerning the promotion to 16 bits when using a signed char type versus not promoting to 16 bits when using an unsigned char type, which seems inconsistent. I would have figured promotion in both cases or no promotion in both cases since a char is 8 bits whether it is signed or not.

I agree and understand with the idea of only using a stable release. I was more posting here because I wanted to make sure it sounded like an actual problem rather than me misunderstanding how it is supposed to work. You guys/gals have worked with this compiler a lot longer than I have. We actually keep with a stable release on everyone elses copy. I am trying to figure out if the new releases will work for us and also reporting any bugs I may or may not find in the processs. Honestly, 4.121 (haven't tried 4.122 yet) has fixed a lot of bugs we have seen (mostly in how the compiler handles bit manipulation in structures with bit fields).

I kind of wish they kept an active list of known bugs so we could kind of work around that.

miketwo · Joined: 04 Aug 2010 Posts: 24

I'm not much of an assembly guy, but I think I've run across the same problem. When I try to compare a char to the hex literal 0xff, it comes back false when it should be true. Ditto for anything with a 1 in the top bit. (for example: 0xfe)

A signed int8 variable will fail the same way. An unsigned int8 works. This is all on a PIC24F, compiler version 4.122.

The workaround is to replace all my chars with unsigned int8's...I guess... but that makes me nervous. We have several thousand lines of code.

Any comments? Suggestions? Can you guys duplicate the problem? Does anyone know if CCS is working on this? Is it an error on my end (to assume the char is unsigned when, it seems, it isn't)?

Here is my test code:

FvM · Joined: 27 Aug 2008 Posts: 2337 Location: Germany

With recent PCD versions, your code compiles as shown below:

RF_Developer · Joined: 07 Feb 2011 Posts: 839

Technically in C all integer literals are positive. Decimal, hex, octal and binary are simply different ways of expressing positive literals. Putting '-' in front of any literals adds the unary negation operator rather than chaning the literal itself. So -1 is -(1) and -(0xFF) is -(255) rather than +(1). What C compilers are meant to do (and clear CCS in its various forms isn't doing it...) is to barf when presented with:

signed int8 a = 0xFF;

as 0xFF is 255 and won't fit into a signed int8. The largest hex constant we ought to be able to fit in is 0x7F, after that we have to go negative. -1 in hex would be -0x01. Looks odd, but that's technically correct.

Now then, in comparisons such as:

if (a == 0xFF)...

what should happen is that as 0xFF is outside the range of a (its a signed int8) then both should be promoted to the next size up that they will fit before the comparison is made. That would be signed int16. a would still be -1 but the literal would now be 0x00FF because 0xFF is NOT signed, literals are always positive. The comparison therefore should fail. If chars are signed then this will happen with all literals greater than 0x7F, as indeed someone's posted. If chars are unsigned int8s then no promotion is required on either side.

I've tried this with my 4.123 PCWH and for what its worth all three cases, char, signed int8 an int8 generate the same code.

So CCS is compiling code that it shouldn't (is syntactically incorrect), is generatinging technically incorrect but intuitively "right" code is some versions and technically correct but non-intuitive promotions in others. What each compiler does is unpredictable...

So, everything perfectly normal...

RF_Developer.

FvM · Joined: 27 Aug 2008 Posts: 2337 Location: Germany