We know that for the operating system code is considered read only, and separate from data. It seems logical then that if programs can not modify code and have large amounts of common code, instead of replicating it for every executable it should be shared between many executables.
With virtual memory this can be easily done. The physical pages of memory the library code is loaded into can be easily referenced by any number of virtual pages in any number of address spaces. So while you only have one physical copy of the library code in system memory, every process can have access to that library code at any virtual address it likes.
Thus people quickly came up with the idea of a shared library which, as the name suggests, is shared by multiple executables. Each executable contains a reference essentially saying "I need library foo". When the program is loaded, it is up to the system to either check if some other program has already loaded the code for library foo into memory, and thus share it by mapping pages into the executable for that physical memory, or otherwise load the library into memory for the executable.
This process is called dynamic linking because it does part of the linking process "on the fly" as programs are executed in the system.
Libraries are very much like a program that never gets started. They have code and data sections (functions and variables) just like every executable; but no where to start running. They just provide a library of functions for developers to call.
Thus ELF can represent a dynamic library just as it does an executable. There are some fundamental differences, such as there is no pointer to where execution should start, but all shared libraries are just ELF objects like any other executable.
The ELF header has two mutually exclusive flags, ET_EXEC and ET_DYN to mark an ELF file as either an executable or a shared object file.
When you compile your program that uses a dynamic library, object files are left with references to the library functions just as for any other external reference.
You need to include the header for the library so that the compiler knows the specific types of the functions you are calling. Note the compiler only needs to know the types associated with a function (such as, it takes an int and returns a char *) so that it can correctly allocate space for the function call.[1]
Even though the dynamic linker does a lot of the work for shared libraries, the traditional linker still has a role to play in creating the executable.
The traditional linker needs to leave a pointer in the executable so that the dynamic linker knows what library will satisfy the dependencies at runtime.
The dynamic section of the executable requires a NEEDED entry for each shared library that the executable depends on.
Again, we can inspect these fields with the readelf program. Below we have a look at a very standard binary, /bin/ls
Example 9-1. Specifying Dynamic Libraries
$ readelf --dynamic /bin/ls
Dynamic segment at offset 0x22f78 contains 27 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libacl.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1]
0x000000000000000c (INIT) 0x4000000000001e30
... snip ...
You can see that it specifies three libraries. The most common library shared by most, if not all, programs on the system is libc. There are also some other libraries that the program needs to run correctly.
Reading the ELF file directly is sometimes useful, but the usual way to inspect a dynamically linked executable is via ldd. ldd "walks" the dependencies of libraries for you; that is if a library depends on another library, it will show it to you.
Example 9-2. Looking at dynamic libraries
$ ldd /bin/ls
librt.so.1 => /lib/tls/librt.so.1 (0x2000000000058000)
libacl.so.1 => /lib/libacl.so.1 (0x2000000000078000)
libc.so.6.1 => /lib/tls/libc.so.6.1 (0x2000000000098000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x20000000002e0000)
/lib/ld-linux-ia64.so.2 => /lib/ld-linux-ia64.so.2 (0x2000000000000000)
libattr.so.1 => /lib/libattr.so.1 (0x2000000000310000)
$ readelf --dynamic /lib/librt.so.1
Dynamic segment at offset 0xd600 contains 30 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
... snip ...
We can see above that libpthread has been required from somewhere. If we do a little digging, we can see that the requirement comes from librt.
[1] | This has not always been the case with the C standard. Previously, compilers would assume that any function it did not know about returned an int. On a 32 bit system, the size of a pointer is the same size as an int, so there was no problem. However, with a 64 bit system, the size of a pointer is generally twice the size of an int so if the function actually returns a pointer, its value will be destroyed. This is clearly not acceptable, as the pointer will thus not point to valid memory. The C99 standard has changed such that you are required to specify the types of included functions. |