Implementation

From C

Jump to: navigation, search

Implementation Vs Specification

A frequent cause of trouble for new C programmers is the confusion between the implementation of C on their computer and the ISO C Standard. To understand the difference, one must first understand what is an implementation and what is a specification. The C standard committee must accommodate the large number of systems on which C is implemented. These systems differ in how their hardware works and in how types are represented. This is why, for instance, the C Standard does not say that an int is 32 bits, and a double is 64, because on various systems an int may be 128 bits and a double may be 512 bits. The standard instead states the minimum value that an int and double are required to store and does not impose a limit on the upper bound of representable values. The C standard actually defines "implementation":

 a particular set of software, running in a particular translation environment under
 particular control options, that performs translation of programs for, and supports
 execution of functions in, a particular execution environment 

This may seem cryptic to someone not familiar with the language of the standard (indeed even parts of the standard seem cryptic to those that are familiar with the language). The statement is, more or less, stating that an implementation is software which turns C code into an executable in a specific environment. That may or may not be any better than the standard's definition.

Now, the C standard states various things that an implementation of C needs to abide by and certain things the implementation can decide on its own. For example, the C standard states than the range of values that an int must be able to hold on is -32767 to +32767. If Joe Shmoe makes a compiler or interpreter for C that gives the range of ints -100 to +100, Joe Shmoe does not have a valid implementation of C. On the same note, if Joe Shmoe's compiler supports a range of -100000 to +100000 this is valid, however you cannot depend on any other implementation supporting this and should not be relied on. On the other hand, the C standard says that an implementation may decide if a 'char' is signed or unsigned. So if one implementation decides a char should be signed and another decides a char should be unsigned, these are both valid implementations. Finally, the standard says that the action of dereferencing a NULL pointer is undefined. This means that your implementation could cause anything to happen. Your particular implementation could make nothing happen or it could cause your computer to set on fire.

Many people reading this by now might be thinking "Who cares? The same thing happens on all computers I've used and tested my program on, that is how things are". Well that is a wonderful thought and the world might be a better place if this is how things work, but this is unfortunately not true. Confusing the limited number of implementations of C one has used with the set of all implementations of C will more than likely cause one to write software which will break under unpredictable conditions.

fflush is a simple demonstration of the different of implementations. According to the final draft of the ISO C99 Standard, what fflush does is undefined if it is used on an anything other than an output stream.

 If stream points to an output stream or an update stream in which the most
 recent operation was not input, the fflush function causes any unwritten
 data for that stream to be delivered to the host environment to be written
 to the file; otherwise, the behavior is undefined. 

The stream referred to in the quote from the standard is the one passed to fflush. As noted above, undefined behavior means anything could happen. So if stdin were to be given to fflush, anything could happen. On Microsoft's implementation of fflush, all of the data in stdin is removed, emptying the buffer. However most UNIX implementations do not even define what happens under such a situation, causing anything to happen. This, seemingly small, difference directly affects your code. If fflush cannot empty out stdin, it alters how you must handle reading and removing data from stdin. For instance, when scanf cannot convert a value, it leaves it on the buffer. How will you remove this invalid data from stdin if you cannot fflush it?

A common argument in #C is often that pointers and integers are interchangeable. People often attempt to pass a pointer to a function which wants an integer or print out the value of a pointer with the integer format specifier. They are surprised to hear that, in C, pointers and integers are not interchangeable. The closest the standard comes to relating pointers and integers is:

 An integer may be converted to any pointer type. The result is
 implementation-defined, might not be properly aligned, and might not point
 to an entity of the referenced type. Any pointer type may be converted to
 an integer type; the result is implementation defined. If the result
 cannot be represented in the integer type, the behavior is undefined.
 The result need not be in the range of values of any integer type. 

So the standard says a pointer may be converted to an int or an int to a pointer, but the result is implementation defined and not guaranteed to work. If you write code requiring a pointer to be converted to an integer, you cannot assume it will work everywhere. While one can respond by saying "Every implementation works in a fashion so that pointers and integers are the same", the answer is No, not all do. On various architectures, pointers do not have integer-like attributes. (Someone should add a list of places this might not work). Then one can respond by saying "All implementations I care about do". Ok, if that is really true then nobody can stop that person from implementing a project based on this idea. But they need to ask themselves if this fact really is true and if it will stay true. Project goals often grow and during the lifetime of the project. Can one be sure that the list of implementations the project will run on will always work based upon this assumption? A similar line of thinking can be stated for most implementation defined or undefined behavior that one takes for granted.

Another instance where Implementation Vs Specification plays a roll is in a discussion of language speed. A reason people cite for not using certain languages is "Language X is too slow". To people with more than a passing interest in programming, this statement send shivers down their spine. A language is defined by its specification. It is just an idea, it has no speed associated with it. Implementations, on the other hand, do. Along a similar line of thinking, many people will state that they do not want to use language X because it is interpreted. Again, it is implementations of the language which are fast or slow. While too many people these issues sound like silly points and perhaps they are, but one should at least be mildly aware that there is a distinction. There are, after all, Software#C-like_Interpreters. It would be more correct to say that most implementations of C are faster than most implementations of language X.

Personal tools