Why wont it link

From C

Jump to: navigation, search

$Id: why_wont_it_link.txt,v 1.3 2009/06/10 17:15:58 db Exp db $

I have been struck at the utter confusion linking and external variables causes newcomers to 'C'. Apparently, it is not taught anywhere or the books out there are not covering this. If you are confused about

1) "undefined reference to a variable n"

2) or get multiply defined errors

3) I get an 'Undefined reference" and it's a standard function.

this document is for you.

I'll try my very best to simplify what is going on, the purists can complain privately to me.

The term compilation unit comes up from time to time, to most newcomers to 'C' this will mean a c program which usually will have the .c extension. But could mean an assembler program as well, or even fortran, lets pretend all compilation units are c programs for now.

When a compilation unit is compiled, it is simply a translation from the C programming language into the machine code equivalent, whether that is x86, powerpc, or even some virtual machine (byte code interpreter). Of course if I was talking about assembler compilation units as well, this translation would be much less work for the computer to do.

A compilation unit results in a pile of data in a file which is called an object module, which often has the .o extension (windows has .obj instead). This object module has the translation of your code into the machine language (ML) of your target machine. However, it is not in the final form needed to actually run on your target machine. It simply has a copy of the translated machine codes from the c source and information needed by what is called a "linking loader" or "linker" for short, in order to link your object module with the necessary other machine code needed to run your program on your target.

When you compile a simple compilation unit, say a "Hello world" program.

 #include <stdio.h>
 int main()
 {
       printf("Hello world!\n");
 }

I'll use CC to denote your particular c compiler, there are many compilers.

 CC -o hello hello.c


The compiler compiles your translation unit into machine code somewhere, in this case into a temporary .o file.

The compiler then links that module with the necessary other object code. Something like (where LD is your linking loader):

 LD -o hello startup.o tmp_hello.o somelibrary_code

Things like printf() which was defined in stdio.h are in libc.

Just for fun, here is what the process looks like on a netbsd OS.

 % cc -v -o hello hello.c
 Using built-in specs.Configured with: /usr/src/tools/gcc/../../gnu/dist/gcc4/configure --enable-long-long --disable-      multilib --enable-threads --disable-symvers --build=x86_64-unknown-netbsd4.99.72 --host=i386--netbsdelf --target=i386--netbsdelf --enable-__cxa_atexit
 Thread model: posix
 gcc version 4.1.3 20080704 prerelease (NetBSD nb2 20081120)
  /usr/libexec/cc1 -quiet -v hello.c -quiet -dumpbase hello.c -auxbase hello -version -o /var/tmp//ccw60sKT.s
 #include "..." search starts here:
 #include <...> search starts here:
  /usr/include
 End of search list.
 GNU C version 4.1.3 20080704 prerelease (NetBSD nb2 20081120) (i386--netbsdelf)
         compiled by GNU C version 4.1.3 20080704 (prerelease) (NetBSD nb2 20081120).
 GGC heuristics: --param ggc-min-expand=40 --param ggc-min-heapsize=20407
 Compiler executable checksum: 82df582ae77d27892425f205fa010350
  as -o /var/tmp//cctLsTUU.o /var/tmp//c
  ld -dc -dp -e __start -dynamic-linker /usr/libexec/ld.elf_so -o hello /usr/lib/crt0.o /usr/lib/crti.o /usr/lib  /crtbegin.o /var/tmp//cctLsTUU.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/crtend.o /usr/lib/crtn.o
 % ./hello
 Hello world!

Phew! a lot of work for a simple Hello World.

Notice how the compilation unit, hello.c got translated to assembler code /var/tmp//ccw60sKT.s then to machine object code /var/tmp//cctLsTUU.o in this line: "as -o /var/tmp//cctLsTUU.o /var/tmp//ccw60sKT.s"

Leaving out all the complicated stuff in the LD link you can see that the hello world temporary object file gets linked with startup code provided by the linker and various other startup object code, then finally linked with libc -lc Using the c compiler on my netbsd system, I have created a hello world object module, using the commands specific to this compiler. Lets have a quick look inside.

 nm -na hello.o | more
          U puts
 00000000 b .bss
 00000000 n .comment
 00000000 d .data
 00000000 r .rodata
 00000000 t .text
 00000000 a hello.c
 00000000 T main

nm on unix like systems gives a listing of the dictionary inside an object module. I am going to gloss over the sections for now, that's homework for the reader (man nm). Notice however the T for main, that's unix talk for program text, main is a program and is in the Text section. It's a capital T because it is a global symbol that wiso it is marked with a U.

Finally! I can explain 1) "undefined reference to a variable n"

Lets modify the simple hello world program.

 #include <stdio.h>
 extern int n;
 int main()
 {
       printf("Hello world!\n");
       n = 0;
 }
 % cc -c hello.c
 % cc -o hello hello.o
 hello.o: In function `main':
 hello.c:(.text+0x23): undefined reference to `n'

And now

 % nm -na hello.o
          U n
          U puts
 00000000 b .bss
 00000000 n .comment
 00000000 d .data
 00000000 r .rodata
 00000000 t .text
 00000000 a hello.c
 00000000 T main

Notice that the puts global undefined was satisfied by the libc library, but who satisified the global undefined n? No one did!

Lets satisfy it.

 % more n.c
 int n;
 % cc -c n.c
 % cc -o hello hello.o n.o
 ./hello
 % nm -na n.o
 00000000 b .bss
 00000000 n .comment
 00000000 d .data
 00000000 t .text
 00000000 a n.c
 00000004 C n

Assigning 0 to n does nothing useful in this program at all, but at least the undefined reference is fixed.

What about multiple references? Lets modify our poor hello world program yet again.

 #include <stdio.h>
 extern foo(void);
 int main()
 {
       printf("Hello world!\n");
       foo();
 }
 % cc -o hello hello.o
 hello.o: In function `main':
 hello.c:(.text+0x22): undefined reference to `foo'  

Which is what I would expect.

 % nm -na hello.o
          U foo
          U puts
 00000000 b .bss
 00000000 n .comment
 00000000 d .data
 00000000 r .rodata
 00000000 t .text
 00000000 a hello.c
 00000000 T main

Notice I have an undefined Text reference to puts, which I know will be satisfied by libc and an undefined Text reference to foo. No one has satisfied the foo reference, so let me do that.

 % more foo.c
 #include <stdio.h>
 int foo()
 {
       printf("foo!\n");
 }
 % cc -c foo.c
 % cc -o hello hello.o foo.o
 % ./hello
 Hello world!
 foo!

Easy !

Now look what happens when I have a multiple definition of foo.

 % cp foo.c goo.c
 % cc -c goo.c
 % cc -o hello hello.o foo.o goo.o
 goo.o: In function `foo':
 goo.c:(.text+0x0): multiple definition of `foo'
 foo.o:foo.c:(.text+0x0): first defined here

"Ok but I still have an undefined reference problem with something I saw in a man page."

This time I'll compile on a FreeBSD machine.

 % more l.c
 #include <stdio.h>
 #include <math.h>
 int main()
 {
       printf("%g\n", log(1));
 }
 % cc -c l.c
 % cc -o l l.o
 /var/tmp//cc70k4KI.o(.text+0x17): In function `main':
 : undefined reference to `log'

cc -o l l.c would result in the same error, but now let me look at the nm for l.o

 % nm -na l.o
          U log
          U printf
 00000000 d 
 00000000 b 
 00000000 n 
 00000000 r 
 00000000 t 
 00000000 a l.c
 00000000 T main

Notice the global undefines for log and printf. Remember I mentioned libc earlier, the c compiler knows enough to always automatically link with libc anything it is given, if you bypassed your c compiler and directly used the linking loader, the work would be considerably more.

In this case, the linker is complaining about `log', I defined it by

  1. include <math.h> so where did it go? The answer is simple.

The #include only tells the compiler what the paramters to log should be like and that it exists, you still need to link it with the object module that is in the math library.

 % cc -o l l.o -lm

-l is the special command to tell the compiler and linker to load in object modules from a library, in this case, the math library is linked in using -lm.

 % ./l
 0

I have glossed over the term library before, a library is nothing more than a collection of object modules all stuffed together into one file for convenience.

 ar t /usr/lib/libc.a
 ...
 strndup.o:
          U malloc00000000 T strndup
 00000000 a strndup.c
 ...
 getw.o:
          U fread
 00000000 t 
 00000000 d 
 00000000 n 
 00000000 b 
 00000000 T getw
 00000000 a getw.c
 ...

The library libc.a has many smaller object modules inside, in this case I have shown strndup and strmode. Notice that the dictionary is listed for each object module inside the library.

For further information, refer to your man pages on ar.

Personal tools