Libraries may contain many functions, and a program may end up including many libraries to get its work done. A program may only use one or two functions from each library of the many available, and depending on the run-time path through the code may use some functions and not others.
As we have seen, the process of dynamic linking is a fairly computationally intensive one, since it involves looking up and searching through many tables. Anything that can be done to reduce the overheads will increase performance.
The Procedure Lookup Table (PLT) facilitates what is called lazy binding in programs. Binding is synonymous with the fix-up process described above for variables located in the GOT. When an entry has been "fixed-up" it is said to be "bound" to its real address.
As we mentioned, sometimes a program will include a function from a library but never actually call that function, depending on user input. The process of binding this function is quite intensive, involving loading code, searching through tables and writing memory. To go through the process of binding a function that is not used is simply a waste of time.
Lazy binding defers this expense until the actual function is called by using a PLT.
Each library function has an entry in the PLT, which initially points to some special dummy code. When the program calls the function, it actually calls the PLT entry (in the same was as variables are referenced through the GOT).
This dummy function will load a few parameters that need to be passed to the dynamic linker for it to resolve the function and then call into a special lookup function of the dynamic linker. The dynamic linker finds the real address of the function, and writes that location into the calling binary over the top of the dummy function call.
Thus, the next time the function is called the address can be loaded without having to go back into the dynamic loader again. If a function is never called, then the PLT entry will never be modified but there will be no runtime overhead.
Things start to get a bit hairy here! If nothing else, you should begin to appreciate that there is a fair bit of work in resolving a dynamic symbol!
Let us consider the simple "hello World" application. This will only make one library call to printf to output the string to the user.
Example 9-8. Hello World PLT example
$ cat hello.c #include <stdio.h> int main(void) 5 { printf("Hello, World!\n"); return 0; } 10 $ gcc -o hello hello.c $ readelf --relocs ./hello Relocation section '.rela.dyn' at offset 0x3f0 contains 2 entries: 15 Offset Info Type Sym. Value Sym. Name + Addend 6000000000000ed8 000700000047 R_IA64_FPTR64LSB 0000000000000000 _Jv_RegisterClasses + 0 6000000000000ee0 000900000047 R_IA64_FPTR64LSB 0000000000000000 __gmon_start__ + 0 Relocation section '.rela.IA_64.pltoff' at offset 0x420 contains 3 entries: 20 Offset Info Type Sym. Value Sym. Name + Addend 6000000000000f10 000200000081 R_IA64_IPLTLSB 0000000000000000 printf + 0 6000000000000f20 000800000081 R_IA64_IPLTLSB 0000000000000000 __libc_start_main + 0 6000000000000f30 000900000081 R_IA64_IPLTLSB 0000000000000000 __gmon_start__ + 0;
We can see above that we have a R_IA64_IPLTLSB relocation for our printf symbol. This is saying "put the address of symbol printf into memory address 0x6000000000000f10". We have to start digging deeper to find the exact procedure that gets us the function.
Below we have a look at the disassembly of the main() function of the program.
Example 9-9. Hello world main()
4000000000000790 <main>: 4000000000000790: 00 08 15 08 80 05 [MII] alloc r33=ar.pfs,5,4,0 4000000000000796: 20 02 30 00 42 60 mov r34=r12 400000000000079c: 04 08 00 84 mov r35=r1 5 40000000000007a0: 01 00 00 00 01 00 [MII] nop.m 0x0 40000000000007a6: 00 02 00 62 00 c0 mov r32=b0 40000000000007ac: 81 0c 00 90 addl r14=72,r1;; 40000000000007b0: 1c 20 01 1c 18 10 [MFB] ld8 r36=[r14] 40000000000007b6: 00 00 00 02 00 00 nop.f 0x0 10 40000000000007bc: 78 fd ff 58 br.call.sptk.many b0=4000000000000520 <_init+0xb0> 40000000000007c0: 02 08 00 46 00 21 [MII] mov r1=r35 40000000000007c6: e0 00 00 00 42 00 mov r14=r0;; 40000000000007cc: 01 70 00 84 mov r8=r14 40000000000007d0: 00 00 00 00 01 00 [MII] nop.m 0x0 15 40000000000007d6: 00 08 01 55 00 00 mov.i ar.pfs=r33 40000000000007dc: 00 0a 00 07 mov b0=r32 40000000000007e0: 1d 60 00 44 00 21 [MFB] mov r12=r34 40000000000007e6: 00 00 00 02 00 80 nop.f 0x0 40000000000007ec: 08 00 84 00 br.ret.sptk.many b0;;;
The call to 0x4000000000000520 must be us calling the printf function. We can find out where this is by looking at the sections with readelf.
Example 9-10. Hello world sections
$ readelf --sections ./hello There are 40 section headers, starting at offset 0x25c0: Section Headers: 5 [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 ... 10 [11] .plt PROGBITS 40000000000004c0 000004c0 00000000000000c0 0000000000000000 AX 0 0 32 [12] .text PROGBITS 4000000000000580 00000580 00000000000004a0 0000000000000000 AX 0 0 32 [13] .fini PROGBITS 4000000000000a20 00000a20 15 0000000000000040 0000000000000000 AX 0 0 16 [14] .rodata PROGBITS 4000000000000a60 00000a60 000000000000000f 0000000000000000 A 0 0 8 [15] .opd PROGBITS 4000000000000a70 00000a70 0000000000000070 0000000000000000 A 0 0 16 20 [16] .IA_64.unwind_inf PROGBITS 4000000000000ae0 00000ae0 00000000000000f0 0000000000000000 A 0 0 8 [17] .IA_64.unwind IA_64_UNWIND 4000000000000bd0 00000bd0 00000000000000c0 0000000000000000 AL 12 c 8 [18] .init_array INIT_ARRAY 6000000000000c90 00000c90 25 0000000000000018 0000000000000000 WA 0 0 8 [19] .fini_array FINI_ARRAY 6000000000000ca8 00000ca8 0000000000000008 0000000000000000 WA 0 0 8 [20] .data PROGBITS 6000000000000cb0 00000cb0 0000000000000004 0000000000000000 WA 0 0 4 30 [21] .dynamic DYNAMIC 6000000000000cb8 00000cb8 00000000000001e0 0000000000000010 WA 5 0 8 [22] .ctors PROGBITS 6000000000000e98 00000e98 0000000000000010 0000000000000000 WA 0 0 8 [23] .dtors PROGBITS 6000000000000ea8 00000ea8 35 0000000000000010 0000000000000000 WA 0 0 8 [24] .jcr PROGBITS 6000000000000eb8 00000eb8 0000000000000008 0000000000000000 WA 0 0 8 [25] .got PROGBITS 6000000000000ec0 00000ec0 0000000000000050 0000000000000000 WAp 0 0 8 40 [26] .IA_64.pltoff PROGBITS 6000000000000f10 00000f10 0000000000000030 0000000000000000 WAp 0 0 16 [27] .sdata PROGBITS 6000000000000f40 00000f40 0000000000000010 0000000000000000 WAp 0 0 8 [28] .sbss NOBITS 6000000000000f50 00000f50 45 0000000000000008 0000000000000000 WA 0 0 8 [29] .bss NOBITS 6000000000000f58 00000f50 0000000000000008 0000000000000000 WA 0 0 8 [30] .comment PROGBITS 0000000000000000 00000f50 00000000000000b9 0000000000000000 0 0 1 50 [31] .debug_aranges PROGBITS 0000000000000000 00001010 0000000000000090 0000000000000000 0 0 16 [32] .debug_pubnames PROGBITS 0000000000000000 000010a0 0000000000000025 0000000000000000 0 0 1 [33] .debug_info PROGBITS 0000000000000000 000010c5 55 00000000000009c4 0000000000000000 0 0 1 [34] .debug_abbrev PROGBITS 0000000000000000 00001a89 0000000000000124 0000000000000000 0 0 1 [35] .debug_line PROGBITS 0000000000000000 00001bad 00000000000001fe 0000000000000000 0 0 1 60 [36] .debug_str PROGBITS 0000000000000000 00001dab 00000000000006a1 0000000000000001 MS 0 0 1 [37] .shstrtab STRTAB 0000000000000000 0000244c 000000000000016f 0000000000000000 0 0 1 [38] .symtab SYMTAB 0000000000000000 00002fc0 65 0000000000000b58 0000000000000018 39 60 8 [39] .strtab STRTAB 0000000000000000 00003b18 0000000000000479 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) 70 I (info), L (link order), G (group), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific);
That address is (unsurprisingly) in the .plt section. So there we have our call into the PLT! But we're not satisfied with that, let's keep digging further to see what we can uncover. We disassemble the .plt section to see what that call actually does.
Example 9-11. Hello world PLT
40000000000004c0 <.plt>: 40000000000004c0: 0b 10 00 1c 00 21 [MMI] mov r2=r14;; 40000000000004c6: e0 00 08 00 48 00 addl r14=0,r2 40000000000004cc: 00 00 04 00 nop.i 0x0;; 5 40000000000004d0: 0b 80 20 1c 18 14 [MMI] ld8 r16=[r14],8;; 40000000000004d6: 10 41 38 30 28 00 ld8 r17=[r14],8 40000000000004dc: 00 00 04 00 nop.i 0x0;; 40000000000004e0: 11 08 00 1c 18 10 [MIB] ld8 r1=[r14] 40000000000004e6: 60 88 04 80 03 00 mov b6=r17 10 40000000000004ec: 60 00 80 00 br.few b6;; 40000000000004f0: 11 78 00 00 00 24 [MIB] mov r15=0 40000000000004f6: 00 00 00 02 00 00 nop.i 0x0 40000000000004fc: d0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 4000000000000500: 11 78 04 00 00 24 [MIB] mov r15=1 15 4000000000000506: 00 00 00 02 00 00 nop.i 0x0 400000000000050c: c0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 4000000000000510: 11 78 08 00 00 24 [MIB] mov r15=2 4000000000000516: 00 00 00 02 00 00 nop.i 0x0 400000000000051c: b0 ff ff 48 br.few 40000000000004c0 <_init+0x50>;; 20 4000000000000520: 0b 78 40 03 00 24 [MMI] addl r15=80,r1;; 4000000000000526: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 400000000000052c: 01 08 00 84 mov r14=r1;; 4000000000000530: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 4000000000000536: 60 80 04 80 03 00 mov b6=r16 25 400000000000053c: 60 00 80 00 br.few b6;; 4000000000000540: 0b 78 80 03 00 24 [MMI] addl r15=96,r1;; 4000000000000546: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 400000000000054c: 01 08 00 84 mov r14=r1;; 4000000000000550: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 30 4000000000000556: 60 80 04 80 03 00 mov b6=r16 400000000000055c: 60 00 80 00 br.few b6;; 4000000000000560: 0b 78 c0 03 00 24 [MMI] addl r15=112,r1;; 4000000000000566: 00 41 3c 70 29 c0 ld8.acq r16=[r15],8 400000000000056c: 01 08 00 84 mov r14=r1;; 35 4000000000000570: 11 08 00 1e 18 10 [MIB] ld8 r1=[r15] 4000000000000576: 60 80 04 80 03 00 mov b6=r16 400000000000057c: 60 00 80 00 br.few b6;;;
Let us step through the instructions. Firstly, we add 80 to the value in r1, storing it in r15. We know from before that r1 will be pointing to the GOT, so this is saying "store in r15 80 bytes into the GOT". The next thing we do is load into r16 the value stored in this location in the GOT, and post increment the value in r15 by 8 bytes. We then store r1 (the location of the GOT) in r14 and set r1 to be the value in the next 8 bytes after r15. Then we branch to r16.
In the previous chapter we discussed how functions are actually called through a function descriptor which contains the function address and the address of the global pointer. Here we can see that the PLT entry is first loading the function value, moving on 8 bytes to the second part of the function descriptor and then loading that value into the gp register before calling the function.
But what exactly are we loading? We know that r1 will be pointing to the GOT. We go 80 bytes past the got (0x50)
Example 9-12. Hello world GOT
$ objdump --disassemble-all ./hello Disassembly of section .got: 6000000000000ec0 <.got>: 5 ... 6000000000000ee8: 80 0a 00 00 00 00 data8 0x02a000000 6000000000000eee: 00 40 90 0a dep r0=r0,r0,63,1 6000000000000ef2: 00 00 00 00 00 40 [MIB] (p20) break.m 0x1 6000000000000ef8: a0 0a 00 00 00 00 data8 0x02a810000 10 6000000000000efe: 00 40 50 0f br.few 6000000000000ef0 <_GLOBAL_OFFSET_TABLE_+0x30> 6000000000000f02: 00 00 00 00 00 60 [MIB] (p58) break.m 0x1 6000000000000f08: 60 0a 00 00 00 00 data8 0x029818000 6000000000000f0e: 00 40 90 06 br.few 6000000000000f00 <_GLOBAL_OFFSET_TABLE_+0x40> Disassembly of section .IA_64.pltoff: 15 6000000000000f10 <.IA_64.pltoff>: 6000000000000f10: f0 04 00 00 00 00 [MIB] (p39) break.m 0x0 6000000000000f16: 00 40 c0 0e 00 00 data8 0x03b010000 6000000000000f1c: 00 00 00 60 data8 0xc000000000 20 6000000000000f20: 00 05 00 00 00 00 [MII] (p40) break.m 0x0 6000000000000f26: 00 40 c0 0e 00 00 data8 0x03b010000 6000000000000f2c: 00 00 00 60 data8 0xc000000000 6000000000000f30: 10 05 00 00 00 00 [MIB] (p40) break.m 0x0 6000000000000f36: 00 40 c0 0e 00 00 data8 0x03b010000 25 6000000000000f3c: 00 00 00 60 data8 0xc000000000;
0x6000000000000ec0 + 0x50 = 0x6000000000000f10, or the .IA_64.pltoff section. Now we're starting to get somewhere!
We can decode the objdump output so we can see exactly what is being loaded here. Swapping the byte order of the first 8 bytes f0 04 00 00 00 00 00 40 we end up with 0x4000000000004f0. Now that address looks familiar! Looking back up at the assemble output of the PLT we see that address.
The code at 0x4000000000004f0 firstly puts a zero value into r15, and then branches back to 0x40000000000004c0. Wait a minute! That's the start of our PLT section.
We can trace this code through too. Firstly we save the value of the global pointer (r2) then we load three 8 byte values into r16, r17 and finally, r1. We then branch to the address in r17. What we are seeing here is the actual call into the dynamic linker!
We need to delve into the ABI to understand exactly what is being loaded at this point. The ABI says two things -- dynamically linked programs must have a special section (called the DT_IA_64_PLT_RESERVE section) that can hold three 8 byte values. There is a pointer where this reserved area in the dynamic segment of the binary.
Example 9-13. Dynamic Segment
Dynamic segment at offset 0xcb8 contains 25 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6.1] 0x000000000000000c (INIT) 0x4000000000000470 5 0x000000000000000d (FINI) 0x4000000000000a20 0x0000000000000019 (INIT_ARRAY) 0x6000000000000c90 0x000000000000001b (INIT_ARRAYSZ) 24 (bytes) 0x000000000000001a (FINI_ARRAY) 0x6000000000000ca8 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 10 0x0000000000000004 (HASH) 0x4000000000000200 0x0000000000000005 (STRTAB) 0x4000000000000330 0x0000000000000006 (SYMTAB) 0x4000000000000240 0x000000000000000a (STRSZ) 138 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 15 0x0000000000000015 (DEBUG) 0x0 0x0000000070000000 (IA_64_PLT_RESERVE) 0x6000000000000ec0 -- 0x6000000000000ed8 0x0000000000000003 (PLTGOT) 0x6000000000000ec0 0x0000000000000002 (PLTRELSZ) 72 (bytes) 0x0000000000000014 (PLTREL) RELA 20 0x0000000000000017 (JMPREL) 0x4000000000000420 0x0000000000000007 (RELA) 0x40000000000003f0 0x0000000000000008 (RELASZ) 48 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x000000006ffffffe (VERNEED) 0x40000000000003d0 25 0x000000006fffffff (VERNEEDNUM) 1 0x000000006ffffff0 (VERSYM) 0x40000000000003ba 0x0000000000000000 (NULL) 0x0;
Do you notice anything about it? It's the same value as the GOT. This means that the first three 8 byte entries in the GOT are actually the reserved area; thus will always be pointed to by the global pointer.
When the dynamic linker starts it is its duty to fill these values in. The ABI says that the first value will be filled in by the dynamic linker giving this module a unique ID. The second value is the global pointer value for the dynamic linker, and the third value is the address of the function that finds and fixes up the symbol.
Example 9-14. Code in the dynamic linker for setting up special values (from libc sysdeps/ia64/dl-machine.h)
/* Set up the loaded object described by L so its unrelocated PLT entries will jump to the on-demand fixup code in dl-runtime.c. */ static inline int __attribute__ ((unused, always_inline)) 5 elf_machine_runtime_setup (struct link_map *l, int lazy, int profile) { extern void _dl_runtime_resolve (void); extern void _dl_runtime_profile (void); 10 if (lazy) { register Elf64_Addr gp __asm__ ("gp"); Elf64_Addr *reserve, doit; 15 /* * Careful with the typecast here or it will try to add l-l_addr * pointer elements */ reserve = ((Elf64_Addr *) 20 (l->l_info[DT_IA_64 (PLT_RESERVE)]->d_un.d_ptr + l->l_addr)); /* Identify this shared object. */ reserve[0] = (Elf64_Addr) l; /* This function will be called to perform the relocation. */ 25 if (!profile) doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_resolve)->ip; else { if (GLRO(dl_profile) != NULL 30 && _dl_name_match_p (GLRO(dl_profile), l)) { /* This is the object we are looking for. Say that we really want profiling and the timers are started. */ GL(dl_profile_map) = l; 35 } doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_profile)->ip; } reserve[1] = doit; 40 reserve[2] = gp; } return lazy; };
We can see how this gets setup by the dynamic linker by looking at the function that does this for the binary. The reserve variable is set from the PLT_RESERVE section pointer in the binary. The unique value (put into reserve[0]) is the address of the link map for this object. Link maps are the internal representation within glibc for shared objects. We then put in the address of _dl_runtime_resolve to the second value (assuming we are not using profiling). reserve[2] is finally set to gp, which has been found from r2 with the __asm__ call.
Looking back at the ABI, we see that the relocation index for the entry must be placed in r15 and the unique identifier must be passed in r16.
r15 has previously been set in the stub code, before we jumped back to the start of the PLT. Have a look down the entries, and notice how each PLT entry loads r15 with an incremented value? It should come as no surprise if you look at the relocations the printf relocation is number zero.
r16 we load up from the values that have been initialised by the dynamic linker, as previously discussed. Once that is ready, we can load the function address and global pointer and branch into the function.
What happens at this point is the dynamic linker function _dl_runtime_resolve is run. It finds the relocation; remember how the relocation specified the name of the symbol? It uses this name to find the right function; this might involve loading the library from disk if it is not already in memory, or otherwise sharing the code.
The relocation record provides the dynamic linker with the address it needs to "fix up"; remember it was in the GOT and loaded by the initial PLT stub? This means that after the first time the function is called, the second time it is loaded it will get the direct address of the function; short circuiting the dynamic linker.
You've seen the exact mechanism behind the PLT, and consequently the inner workings of the dynamic linker. The important points to remember are
Library calls in your program actually call a stub of code in the PLT of the binary.
That stub code loads an address and jumps to it.
Initially, that address points to a function in the dynamic linker which is capable of looking up the "real" function, given the information in the relocation entry for that function.
The dynamic linker re-writes the address that the stub code reads, so that the next time the function is called it will go straight to the right address.