Libraries

The Procedure Lookup Table

Libraries may contain many functions, and a program may end up including many libraries to get its work done. A program may only use one or two functions from each library of the many available, and depending on the run-time path through the code may use some functions and not others.

As we have seen, the process of dynamic linking is a fairly computationally intensive one, since it involves looking up and searching through many tables. Anything that can be done to reduce the overheads will increase performance.

The Procedure Lookup Table (PLT) facilitates what is called lazy binding in programs. Binding is synonymous with the fix-up process described above for variables located in the GOT. When an entry has been "fixed-up" it is said to be "bound" to its real address.

As we mentioned, sometimes a program will include a function from a library but never actually call that function, depending on user input. The process of binding this function is quite intensive, involving loading code, searching through tables and writing memory. To go through the process of binding a function that is not used is simply a waste of time.

Lazy binding defers this expense until the actual function is called by using a PLT.

Each library function has an entry in the PLT, which initially points to some special dummy code. When the program calls the function, it actually calls the PLT entry (in the same was as variables are referenced through the GOT).

This dummy function will load a few parameters that need to be passed to the dynamic linker for it to resolve the function and then call into a special lookup function of the dynamic linker. The dynamic linker finds the real address of the function, and writes that location into the calling binary over the top of the dummy function call.

Thus, the next time the function is called the address can be loaded without having to go back into the dynamic loader again. If a function is never called, then the PLT entry will never be modified but there will be no runtime overhead.

The PLT in action

Things start to get a bit hairy here! If nothing else, you should begin to appreciate that there is a fair bit of work in resolving a dynamic symbol!

Let us consider the simple "hello World" application. This will only make one library call to printf to output the string to the user.

Example 9-8. Hello World PLT example

    $ cat hello.c
    #include <stdio.h>
    
    int main(void)
  5 {
            printf("Hello, World!\n");
            return 0;
    }
    
 10 $ gcc -o hello hello.c
    
    $ readelf --relocs ./hello
    
    Relocation section '.rela.dyn' at offset 0x3f0 contains 2 entries:
 15   Offset          Info           Type           Sym. Value    Sym. Name + Addend
    6000000000000ed8  000700000047 R_IA64_FPTR64LSB  0000000000000000 _Jv_RegisterClasses + 0
    6000000000000ee0  000900000047 R_IA64_FPTR64LSB  0000000000000000 __gmon_start__ + 0
    
    Relocation section '.rela.IA_64.pltoff' at offset 0x420 contains 3 entries:
 20   Offset          Info           Type           Sym. Value    Sym. Name + Addend
    6000000000000f10  000200000081 R_IA64_IPLTLSB    0000000000000000 printf + 0
    6000000000000f20  000800000081 R_IA64_IPLTLSB    0000000000000000 __libc_start_main + 0
    6000000000000f30  000900000081 R_IA64_IPLTLSB    0000000000000000 __gmon_start__ + 0;

We can see above that we have a R_IA64_IPLTLSB relocation for our printf symbol. This is saying "put the address of symbol printf into memory address 0x6000000000000f10". We have to start digging deeper to find the exact procedure that gets us the function.

Below we have a look at the disassembly of the main() function of the program.

Example 9-9. Hello world main()

    4000000000000790 <main>:
    4000000000000790:       00 08 15 08 80 05       [MII]       alloc r33=ar.pfs,5,4,0
    4000000000000796:       20 02 30 00 42 60                   mov r34=r12
    400000000000079c:       04 08 00 84                         mov r35=r1
  5 40000000000007a0:       01 00 00 00 01 00       [MII]       nop.m 0x0
    40000000000007a6:       00 02 00 62 00 c0                   mov r32=b0
    40000000000007ac:       81 0c 00 90                         addl r14=72,r1;;
    40000000000007b0:       1c 20 01 1c 18 10       [MFB]       ld8 r36=[r14]
    40000000000007b6:       00 00 00 02 00 00                   nop.f 0x0
 10 40000000000007bc:       78 fd ff 58                         br.call.sptk.many b0=4000000000000520 <_init+0xb0>
    40000000000007c0:       02 08 00 46 00 21       [MII]       mov r1=r35
    40000000000007c6:       e0 00 00 00 42 00                   mov r14=r0;;
    40000000000007cc:       01 70 00 84                         mov r8=r14
    40000000000007d0:       00 00 00 00 01 00       [MII]       nop.m 0x0
 15 40000000000007d6:       00 08 01 55 00 00                   mov.i ar.pfs=r33
    40000000000007dc:       00 0a 00 07                         mov b0=r32
    40000000000007e0:       1d 60 00 44 00 21       [MFB]       mov r12=r34
    40000000000007e6:       00 00 00 02 00 80                   nop.f 0x0
    40000000000007ec:       08 00 84 00                         br.ret.sptk.many b0;;;

The call to 0x4000000000000520 must be us calling the printf function. We can find out where this is by looking at the sections with readelf.

Example 9-10. Hello world sections

    $ readelf --sections ./hello
    There are 40 section headers, starting at offset 0x25c0:
    
    Section Headers:
  5   [Nr] Name              Type             Address           Offset
           Size              EntSize          Flags  Link  Info  Align
      [ 0]                   NULL             0000000000000000  00000000
           0000000000000000  0000000000000000           0     0     0
    ...
 10   [11] .plt              PROGBITS         40000000000004c0  000004c0
           00000000000000c0  0000000000000000  AX       0     0     32
      [12] .text             PROGBITS         4000000000000580  00000580
           00000000000004a0  0000000000000000  AX       0     0     32
      [13] .fini             PROGBITS         4000000000000a20  00000a20
 15        0000000000000040  0000000000000000  AX       0     0     16
      [14] .rodata           PROGBITS         4000000000000a60  00000a60
           000000000000000f  0000000000000000   A       0     0     8
      [15] .opd              PROGBITS         4000000000000a70  00000a70
           0000000000000070  0000000000000000   A       0     0     16
 20   [16] .IA_64.unwind_inf PROGBITS         4000000000000ae0  00000ae0
           00000000000000f0  0000000000000000   A       0     0     8
      [17] .IA_64.unwind     IA_64_UNWIND     4000000000000bd0  00000bd0
           00000000000000c0  0000000000000000  AL      12     c     8
      [18] .init_array       INIT_ARRAY       6000000000000c90  00000c90
 25        0000000000000018  0000000000000000  WA       0     0     8
      [19] .fini_array       FINI_ARRAY       6000000000000ca8  00000ca8
           0000000000000008  0000000000000000  WA       0     0     8
      [20] .data             PROGBITS         6000000000000cb0  00000cb0
           0000000000000004  0000000000000000  WA       0     0     4
 30   [21] .dynamic          DYNAMIC          6000000000000cb8  00000cb8
           00000000000001e0  0000000000000010  WA       5     0     8
      [22] .ctors            PROGBITS         6000000000000e98  00000e98
           0000000000000010  0000000000000000  WA       0     0     8
      [23] .dtors            PROGBITS         6000000000000ea8  00000ea8
 35        0000000000000010  0000000000000000  WA       0     0     8
      [24] .jcr              PROGBITS         6000000000000eb8  00000eb8
           0000000000000008  0000000000000000  WA       0     0     8
      [25] .got              PROGBITS         6000000000000ec0  00000ec0
           0000000000000050  0000000000000000 WAp       0     0     8
 40   [26] .IA_64.pltoff     PROGBITS         6000000000000f10  00000f10
           0000000000000030  0000000000000000 WAp       0     0     16
      [27] .sdata            PROGBITS         6000000000000f40  00000f40
           0000000000000010  0000000000000000 WAp       0     0     8
      [28] .sbss             NOBITS           6000000000000f50  00000f50
 45        0000000000000008  0000000000000000  WA       0     0     8
      [29] .bss              NOBITS           6000000000000f58  00000f50
           0000000000000008  0000000000000000  WA       0     0     8
      [30] .comment          PROGBITS         0000000000000000  00000f50
           00000000000000b9  0000000000000000           0     0     1
 50   [31] .debug_aranges    PROGBITS         0000000000000000  00001010
           0000000000000090  0000000000000000           0     0     16
      [32] .debug_pubnames   PROGBITS         0000000000000000  000010a0
           0000000000000025  0000000000000000           0     0     1
      [33] .debug_info       PROGBITS         0000000000000000  000010c5
 55        00000000000009c4  0000000000000000           0     0     1
      [34] .debug_abbrev     PROGBITS         0000000000000000  00001a89
           0000000000000124  0000000000000000           0     0     1
      [35] .debug_line       PROGBITS         0000000000000000  00001bad
           00000000000001fe  0000000000000000           0     0     1
 60   [36] .debug_str        PROGBITS         0000000000000000  00001dab
           00000000000006a1  0000000000000001  MS       0     0     1
      [37] .shstrtab         STRTAB           0000000000000000  0000244c
           000000000000016f  0000000000000000           0     0     1
      [38] .symtab           SYMTAB           0000000000000000  00002fc0
 65        0000000000000b58  0000000000000018          39    60     8
      [39] .strtab           STRTAB           0000000000000000  00003b18
           0000000000000479  0000000000000000           0     0     1
    Key to Flags:
      W (write), A (alloc), X (execute), M (merge), S (strings)
 70   I (info), L (link order), G (group), x (unknown)
      O (extra OS processing required) o (OS specific), p (processor specific);

That address is (unsurprisingly) in the .plt section. So there we have our call into the PLT! But we're not satisfied with that, let's keep digging further to see what we can uncover. We disassemble the .plt section to see what that call actually does.

Example 9-11. Hello world PLT

    40000000000004c0 <.plt>:
    40000000000004c0:       0b 10 00 1c 00 21       [MMI]       mov r2=r14;;
    40000000000004c6:       e0 00 08 00 48 00                   addl r14=0,r2
    40000000000004cc:       00 00 04 00                         nop.i 0x0;;
  5 40000000000004d0:       0b 80 20 1c 18 14       [MMI]       ld8 r16=[r14],8;;
    40000000000004d6:       10 41 38 30 28 00                   ld8 r17=[r14],8
    40000000000004dc:       00 00 04 00                         nop.i 0x0;;
    40000000000004e0:       11 08 00 1c 18 10       [MIB]       ld8 r1=[r14]
    40000000000004e6:       60 88 04 80 03 00                   mov b6=r17
 10 40000000000004ec:       60 00 80 00                         br.few b6;;
    40000000000004f0:       11 78 00 00 00 24       [MIB]       mov r15=0
    40000000000004f6:       00 00 00 02 00 00                   nop.i 0x0
    40000000000004fc:       d0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
    4000000000000500:       11 78 04 00 00 24       [MIB]       mov r15=1
 15 4000000000000506:       00 00 00 02 00 00                   nop.i 0x0
    400000000000050c:       c0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
    4000000000000510:       11 78 08 00 00 24       [MIB]       mov r15=2
    4000000000000516:       00 00 00 02 00 00                   nop.i 0x0
    400000000000051c:       b0 ff ff 48                         br.few 40000000000004c0 <_init+0x50>;;
 20 4000000000000520:       0b 78 40 03 00 24       [MMI]       addl r15=80,r1;;
    4000000000000526:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
    400000000000052c:       01 08 00 84                         mov r14=r1;;
    4000000000000530:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
    4000000000000536:       60 80 04 80 03 00                   mov b6=r16
 25 400000000000053c:       60 00 80 00                         br.few b6;;
    4000000000000540:       0b 78 80 03 00 24       [MMI]       addl r15=96,r1;;
    4000000000000546:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
    400000000000054c:       01 08 00 84                         mov r14=r1;;
    4000000000000550:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
 30 4000000000000556:       60 80 04 80 03 00                   mov b6=r16
    400000000000055c:       60 00 80 00                         br.few b6;;
    4000000000000560:       0b 78 c0 03 00 24       [MMI]       addl r15=112,r1;;
    4000000000000566:       00 41 3c 70 29 c0                   ld8.acq r16=[r15],8
    400000000000056c:       01 08 00 84                         mov r14=r1;;
 35 4000000000000570:       11 08 00 1e 18 10       [MIB]       ld8 r1=[r15]
    4000000000000576:       60 80 04 80 03 00                   mov b6=r16
    400000000000057c:       60 00 80 00                         br.few b6;;;

Let us step through the instructions. Firstly, we add 80 to the value in r1, storing it in r15. We know from before that r1 will be pointing to the GOT, so this is saying "store in r15 80 bytes into the GOT". The next thing we do is load into r16 the value stored in this location in the GOT, and post increment the value in r15 by 8 bytes. We then store r1 (the location of the GOT) in r14 and set r1 to be the value in the next 8 bytes after r15. Then we branch to r16.

In the previous chapter we discussed how functions are actually called through a function descriptor which contains the function address and the address of the global pointer. Here we can see that the PLT entry is first loading the function value, moving on 8 bytes to the second part of the function descriptor and then loading that value into the gp register before calling the function.

But what exactly are we loading? We know that r1 will be pointing to the GOT. We go 80 bytes past the got (0x50)

Example 9-12. Hello world GOT

    $ objdump --disassemble-all ./hello 
    Disassembly of section .got:
    
    6000000000000ec0 <.got>:
  5         ...
    6000000000000ee8:       80 0a 00 00 00 00                   data8 0x02a000000
    6000000000000eee:       00 40 90 0a                         dep r0=r0,r0,63,1
    6000000000000ef2:       00 00 00 00 00 40       [MIB] (p20) break.m 0x1
    6000000000000ef8:       a0 0a 00 00 00 00                   data8 0x02a810000
 10 6000000000000efe:       00 40 50 0f                         br.few 6000000000000ef0 <_GLOBAL_OFFSET_TABLE_+0x30>
    6000000000000f02:       00 00 00 00 00 60       [MIB] (p58) break.m 0x1
    6000000000000f08:       60 0a 00 00 00 00                   data8 0x029818000
    6000000000000f0e:       00 40 90 06                         br.few 6000000000000f00 <_GLOBAL_OFFSET_TABLE_+0x40>
    Disassembly of section .IA_64.pltoff:
 15 
    6000000000000f10 <.IA_64.pltoff>:
    6000000000000f10:       f0 04 00 00 00 00       [MIB] (p39) break.m 0x0
    6000000000000f16:       00 40 c0 0e 00 00                   data8 0x03b010000
    6000000000000f1c:       00 00 00 60                         data8 0xc000000000
 20 6000000000000f20:       00 05 00 00 00 00       [MII] (p40) break.m 0x0
    6000000000000f26:       00 40 c0 0e 00 00                   data8 0x03b010000
    6000000000000f2c:       00 00 00 60                         data8 0xc000000000
    6000000000000f30:       10 05 00 00 00 00       [MIB] (p40) break.m 0x0
    6000000000000f36:       00 40 c0 0e 00 00                   data8 0x03b010000
 25 6000000000000f3c:       00 00 00 60                         data8 0xc000000000;

0x6000000000000ec0 + 0x50 = 0x6000000000000f10, or the .IA_64.pltoff section. Now we're starting to get somewhere!

We can decode the objdump output so we can see exactly what is being loaded here. Swapping the byte order of the first 8 bytes f0 04 00 00 00 00 00 40 we end up with 0x4000000000004f0. Now that address looks familiar! Looking back up at the assemble output of the PLT we see that address.

The code at 0x4000000000004f0 firstly puts a zero value into r15, and then branches back to 0x40000000000004c0. Wait a minute! That's the start of our PLT section.

We can trace this code through too. Firstly we save the value of the global pointer (r2) then we load three 8 byte values into r16, r17 and finally, r1. We then branch to the address in r17. What we are seeing here is the actual call into the dynamic linker!

We need to delve into the ABI to understand exactly what is being loaded at this point. The ABI says two things -- dynamically linked programs must have a special section (called the DT_IA_64_PLT_RESERVE section) that can hold three 8 byte values. There is a pointer where this reserved area in the dynamic segment of the binary.

Example 9-13. Dynamic Segment

    Dynamic segment at offset 0xcb8 contains 25 entries:
      Tag        Type                         Name/Value
     0x0000000000000001 (NEEDED)             Shared library: [libc.so.6.1]
     0x000000000000000c (INIT)               0x4000000000000470
  5  0x000000000000000d (FINI)               0x4000000000000a20
     0x0000000000000019 (INIT_ARRAY)         0x6000000000000c90
     0x000000000000001b (INIT_ARRAYSZ)       24 (bytes)
     0x000000000000001a (FINI_ARRAY)         0x6000000000000ca8
     0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 10  0x0000000000000004 (HASH)               0x4000000000000200
     0x0000000000000005 (STRTAB)             0x4000000000000330
     0x0000000000000006 (SYMTAB)             0x4000000000000240
     0x000000000000000a (STRSZ)              138 (bytes)
     0x000000000000000b (SYMENT)             24 (bytes)
 15  0x0000000000000015 (DEBUG)              0x0
     0x0000000070000000 (IA_64_PLT_RESERVE)  0x6000000000000ec0 -- 0x6000000000000ed8
     0x0000000000000003 (PLTGOT)             0x6000000000000ec0
     0x0000000000000002 (PLTRELSZ)           72 (bytes)
     0x0000000000000014 (PLTREL)             RELA
 20  0x0000000000000017 (JMPREL)             0x4000000000000420
     0x0000000000000007 (RELA)               0x40000000000003f0
     0x0000000000000008 (RELASZ)             48 (bytes)
     0x0000000000000009 (RELAENT)            24 (bytes)
     0x000000006ffffffe (VERNEED)            0x40000000000003d0
 25  0x000000006fffffff (VERNEEDNUM)         1
     0x000000006ffffff0 (VERSYM)             0x40000000000003ba
     0x0000000000000000 (NULL)               0x0;

Do you notice anything about it? It's the same value as the GOT. This means that the first three 8 byte entries in the GOT are actually the reserved area; thus will always be pointed to by the global pointer.

When the dynamic linker starts it is its duty to fill these values in. The ABI says that the first value will be filled in by the dynamic linker giving this module a unique ID. The second value is the global pointer value for the dynamic linker, and the third value is the address of the function that finds and fixes up the symbol.

Example 9-14. Code in the dynamic linker for setting up special values (from libc sysdeps/ia64/dl-machine.h)

    /* Set up the loaded object described by L so its unrelocated PLT
       entries will jump to the on-demand fixup code in dl-runtime.c.  */
    
    static inline int __attribute__ ((unused, always_inline))
  5 elf_machine_runtime_setup (struct link_map *l, int lazy, int profile)
    {
      extern void _dl_runtime_resolve (void);
      extern void _dl_runtime_profile (void);
    
 10   if (lazy)
        {
          register Elf64_Addr gp __asm__ ("gp");
          Elf64_Addr *reserve, doit;
    
 15       /*
           * Careful with the typecast here or it will try to add l-l_addr
           * pointer elements
           */
          reserve = ((Elf64_Addr *)
 20                  (l->l_info[DT_IA_64 (PLT_RESERVE)]->d_un.d_ptr + l->l_addr));
          /* Identify this shared object.  */
          reserve[0] = (Elf64_Addr) l;
    
          /* This function will be called to perform the relocation.  */
 25       if (!profile)
            doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_resolve)->ip;
          else
            {
              if (GLRO(dl_profile) != NULL
 30               && _dl_name_match_p (GLRO(dl_profile), l))
                {
                  /* This is the object we are looking for.  Say that we really
                     want profiling and the timers are started.  */
                  GL(dl_profile_map) = l;
 35             }
              doit = (Elf64_Addr) ((struct fdesc *) &_dl_runtime_profile)->ip;
            }
    
          reserve[1] = doit;
 40       reserve[2] = gp;
        }
    
      return lazy;
    };

We can see how this gets setup by the dynamic linker by looking at the function that does this for the binary. The reserve variable is set from the PLT_RESERVE section pointer in the binary. The unique value (put into reserve[0]) is the address of the link map for this object. Link maps are the internal representation within glibc for shared objects. We then put in the address of _dl_runtime_resolve to the second value (assuming we are not using profiling). reserve[2] is finally set to gp, which has been found from r2 with the __asm__ call.

Looking back at the ABI, we see that the relocation index for the entry must be placed in r15 and the unique identifier must be passed in r16.

r15 has previously been set in the stub code, before we jumped back to the start of the PLT. Have a look down the entries, and notice how each PLT entry loads r15 with an incremented value? It should come as no surprise if you look at the relocations the printf relocation is number zero.

r16 we load up from the values that have been initialised by the dynamic linker, as previously discussed. Once that is ready, we can load the function address and global pointer and branch into the function.

What happens at this point is the dynamic linker function _dl_runtime_resolve is run. It finds the relocation; remember how the relocation specified the name of the symbol? It uses this name to find the right function; this might involve loading the library from disk if it is not already in memory, or otherwise sharing the code.

The relocation record provides the dynamic linker with the address it needs to "fix up"; remember it was in the GOT and loaded by the initial PLT stub? This means that after the first time the function is called, the second time it is loaded it will get the direct address of the function; short circuiting the dynamic linker.

Summary

You've seen the exact mechanism behind the PLT, and consequently the inner workings of the dynamic linker. The important points to remember are

  • Library calls in your program actually call a stub of code in the PLT of the binary.

  • That stub code loads an address and jumps to it.

  • Initially, that address points to a function in the dynamic linker which is capable of looking up the "real" function, given the information in the relocation entry for that function.

  • The dynamic linker re-writes the address that the stub code reads, so that the next time the function is called it will go straight to the right address.