PE File Analysis - Walking a PE File to Find the Import Address Table

Synopsis

From gleaming information about the PE file to recognizing when code is loading a PE file into memory (identifying constants), knowing the PE file format can be invaluable when reversing binaries. There are a number of tutorials online stepping through this process but I wanted to walk the file myself to gain a better understanding of the PE file format to improve my RE skill set. Today my goal is to find the Import Address Table in the PE file.

Pre-Exercise ReadMe

In this post, I'll be making use of radare2 to walk the PE file. Here is a quick review of the commands being used:

  • psz print zero terminated string [1]
  • s Change current seek position.[2]
    • Can take a mathematical argument; which we will take advantage of
    • When performing math, it starts from the beginning of the file versus the current position. Make use of $$ start from the current position.
    • You'll notice that when I take advantage of the math functionality, I strive to break it up so the length of the corresponding segments or offsets is represented individually to help add clarity.
  • px or x "gives a user-friendly output showing 16 pairs of numbers per row with offsets and raw representations"[3]

Pay attention to the length value in the commands. Some are decimal based and others are hexadecimal based.

  • x 4
    • Shows the first 4 bytes interpreted as a decimal number from the current position
  • x 80h
    • Shows the first 128 bytes or the first 0x80 bytes. The h means that the number is to be interpreted as a hexadecimal number.

Also, only one of the seek (s) commands shown is dependent on the current position. The rest are all calculated from the starting position of the file.

Make sure to pay attention to the endianness and the conversions that need to be done when following offsets.

If you want to follow along, the file being walked here is from the third Flare-On Challenge of 2016[4] and is a 32 bit binary.

Finally, the bulk of the information used in this post came from Goppit, "Portable Executable Format - a Reverse Enginnering View", v1(2), Code Breakers Magzine, January 2006.[5]


Steps to Locate the Import Address Table

Ok, take a deep breath, here we go.

  1. Load file into radare2
  • r2 unknown

  • Go to PE header pointer e_lfanew in the DOS Header. In a PE file it is always located at offset 0x3c from the start of the file

      [0x00000000]> s 3ch
      [0x0000003c]> x 4
      - offset -   0 1  2 3
      0x0000003c  e800 0000
    
  • Remember it is in little endian so use 0x000000e8 for the next step

  • Move to the beginning of the PE header section IMAGE_NT_HEADERS

      [0x0000003c]> s e8h
      [0x000000e8]> x 4
      - offset -   0 1  2 3
      0x000000e8  5045 0000
    
    • The first dword contains the PE signature
  • Next we need to find the Import Directory located in the Data Directory (an array of IMAGE_DATA_DIRECTORY structures 8 bytes each) which in turn is located in the last 128 bytes at the end of the Optional Header (224 bytes). Finally, the Import Directory is the second entry in the Data Directory array.

  • So to locate the Import Directory sum the following:

    • PE signature (4 bytes)

    • File Header (20 bytes)

    • Offset into the Optional Header (224 - 128 + 8)

    • This equals 128 or 0x80.

           [0x000000e8]> s $$+80h
           [0x00000168]> x 8h  
           - offset -   0 1  2 3  4 5  6 7
           0x00000168  1c5f 0100 2800 0000
      
  • The IMAGE_DATA_DIRECTORY structure is made up of two fields; each a dword in size.

    • The first dword is the Relative Virtual Address of the location of the data structure
      • In this case the Import Table location is at 0x15f1c
    • The second dword is the size of the data structure
      • In this case the Import Table size is 0x28
  • Now we need to know which section the Import Table location resides in.

  • First, we need to get the number of sections from the File Header which comes immediately after the PE signature.

         [0x00000168]> s e8h + 4h + 2h
         [0x000000ee]> x 2h
         - offset -   0 1
         0x000000ee  0400 
    
    • This PE file has four (0x4) sections
  • The Section Headers start right after the not so optional Optional Header and each are 40 bytes long

         [0x000001e0]> s e8h + 4 + 20 + 224
         [0x000001e0]> x (40 * 4)
         - offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
         0x000001e0  2e74 6578 7400 0000 7af2 0000 0010 0000  .text...z.......
         0x000001f0  00f4 0000 0004 0000 0000 0000 0000 0000  ................
         0x00000200  0000 0000 2000 0060 2e72 6461 7461 0000  .... ..`.rdata..
         0x00000210  b854 0000 0010 0100 0056 0000 00f8 0000  .T.......V......
         0x00000220  0000 0000 0000 0000 0000 0000 4000 0040  [email protected]@
         0x00000230  2e64 6174 6100 0000 2032 0000 0070 0100  .data... 2...p..
         0x00000240  0014 0000 004e 0100 0000 0000 0000 0000  .....N..........
         0x00000250  0000 0000 4000 00c0 2e72 7372 6300 0000  [email protected]
         0x00000260  e001 0000 00b0 0100 0002 0000 0062 0100  .............b..
         0x00000270  0000 0000 0000 0000 0000 0000 4000 0040  [email protected]@
    
  • We need to compare the virtual address of each section to the relative virtual address of the Import Table location. The virtual address located at offset 12 from the start of the Section Header.

    • Name - 8 Bytes

    • VirtualSize - dword

    • VirtualAddress - dword

           [0x000001ec]> s e8h + 4 + 20 + 224 + (40 * 0) + 8 + 4
           [0x000001ec]> x 4h
           - offset -   0 1  2 3
           0x000001ec  0010 0000
      
           [0x000001ec]> s e8h + 4 + 20 + 224 + (40 * 1) + 8 + 4
           [0x00000214]> x 4h
           - offset -   0 1  2 3  
           0x00000214  0010 0100
      
           [0x00000214]> s e8h + 4 + 20 + 224 + (40 * 2) + 8 + 4
           [0x0000023c]> x 4h
           - offset -   0 1  2 3
           0x0000023c  0070 0100
      
           [0x0000023c]> s e8h + 4 + 20 + 224 + (40 * 3) + 8 + 4
           [0x00000264]> x 4h
           - offset -   0 1  2 3
           0x00000264  00b0 0100
      
      • Summarizing the Virtual Address Space:
        • .text : 0x1000
        • .rdata: 0x11000
        • .data : 0x17000
        • .rsrc : 0x1b000
  • Based upon the information above, the Import Table resides in the .rdata section.

  • Now we need to get the location in the raw file where the Import Table resides. Remember we know the Relative Virtual Offset but we need to find out where in the physical file the Import Table is at; for that we need to do some math.

    • Formula: Import Table virtual address - Section RVA + Section Raw Offset[6]

    • We have everything but the section's raw offset and to find that, we have to go back to the Section Header to find it

          [0x00000214]> s e8h + 4 + 20 + 224 + (40 * 1) + 8 + 4 + 4 + 4
          [0x0000021c]> x 4
          - offset -   0 1  2 3
          0x0000021c  00f8 0000
      
      • With this we can fill in the formula: 0x15f1c - 0x11000 + 0xf800

             [0x0001af50]> s (15f1ch - 11000h + f800h)
             [0x0001471c]> x 28h
             - offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
             0x0001471c  445f 0100 0000 0000 0000 0000 e460 0100  D_...........`..
             0x0001472c  0010 0100 0000 0000 0000 0000 0000 0000  ................
             0x0001473c  0000 0000 0000 0000 
        
        • Each entry is 20 bytes
        • Remember 0x28 was the length of the Import Table specified in step 5. The last entry will be all zeros. So in this case there is only one entry.
  • We are interested in the following three fields (there are more fields than this in the Image_Import_Descriptor). Each is a dword in size.

    • OriginalFirstThunk @ offset 0x0 from the start of the structure.
      • 0x15f44
    • Name1 @ offset 0xc
      • 0x160e4
    • FirstThunk @ offset 0x10
      • 0x11000
  • Let's start with the name of the library being imported. The Name1 field is a pointer to the ASCII value (null terminated), so another lookup is required.

     [0x0001471c]> s (160e4h - 11000h + f800h)
     [0x000148e4]> x
     - offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
     0x000148e4  4b45 524e 454c 3332 2e64 6c6c 0000 0003  KERNEL32.dll....
    
  • KERNEL32.dll

  • The OriginalFirstThunk and the FirstThunk are made up of same the set of structures. Let's walk the FirstThunk. FirstThunk is an array of Image_Thunk_Data structures, each a dword in size and the end of the array is marked by a null dword.

      [0x0001471c]> s (11000h - 11000h + f800h)
      [0x0000f800]> x
      - offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
      0x0000f800  4060 0100 4e60 0100 5a60 0100 6660 0100  @`..N`..Z`..f`..
      0x0000f810  7860 0100 a060 0100 b060 0100 bc60 0100  x`...`...`...`..
      0x0000f820  cc60 0100 9864 0100 8664 0100 f260 0100  .`...d...d...`..
      0x0000f830  0661 0100 1c61 0100 3461 0100 4c61 0100  .a...a..4a..La..
      0x0000f840  5c61 0100 6e61 0100 8a61 0100 9861 0100  \a..na...a...a..
      0x0000f850  ae61 0100 c061 0100 d661 0100 ec61 0100  .a...a...a...a..
      0x0000f860  fc61 0100 0862 0100 1e62 0100 2a62 0100  .a...b...b..*b..
      0x0000f870  3662 0100 4862 0100 5862 0100 6662 0100  6b..Hb..Xb..fb..
      0x0000f880  7662 0100 8c62 0100 9a62 0100 ac62 0100  vb...b...b...b..
      0x0000f890  c662 0100 dc62 0100 f662 0100 1063 0100  .b...b...b...c..
      0x0000f8a0  2a63 0100 4663 0100 6463 0100 6c63 0100  *c..Fc..dc..lc..
      0x0000f8b0  8063 0100 9463 0100 a063 0100 ae63 0100  .c...c...c...c..
      0x0000f8c0  bc63 0100 c663 0100 da63 0100 ec63 0100  .c...c...c...c..
      0x0000f8d0  fe63 0100 0864 0100 1464 0100 2064 0100  .c...d...d.. d..
      0x0000f8e0  2e64 0100 3e64 0100 5264 0100 6664 0100  .d..>d..Rd..fd..
      0x0000f8f0  7664 0100 a864 0100 0000 0000 0000 0000  vd...d..........
    
  • The array ending null dword starts @ 0xf8f8

  • Each of these dwords can be one of four follow dwords

    • ForwarderString
    • Function
    • Ordinal
    • AddressOfData
  • Most of the time we will see either Ordinal or AddressOfData

    • AddressOfData is a pointer to yet another struct call Image_Import_By_Name which contains two members:
      • hint : word
      • Name1 : variable length, null terminated string
    • We want to query the Name1 to get the list of Imports from the IAT.
  • All most there! For our final trick we will need to resolve each of the pointers to Image_Import_By_Name to get the list of imported functions

  • We could do the lookup one at a time, but where is the fun in that ;-). Let's create a file that has the formula to look up each (one per line)

        (16040h + 2 - 11000h + f800h)
        (1604eh + 2 - 11000h + f800h)
        (1605ah + 2 - 11000h + f800h)
        (16066h + 2 - 11000h + f800h)
        (16078h + 2 - 11000h + f800h)
        (160a0h + 2 - 11000h + f800h)
        (160b0h + 2 - 11000h + f800h)
        (160bch + 2 - 11000h + f800h)
        (160cch + 2 - 11000h + f800h)
        (16498h + 2 - 11000h + f800h)
        (16486h + 2 - 11000h + f800h)
        (160f2h + 2 - 11000h + f800h)
        (16106h + 2 - 11000h + f800h)
        (1611ch + 2 - 11000h + f800h)
        (16134h + 2 - 11000h + f800h)
        (1614ch + 2 - 11000h + f800h)
        (1615ch + 2 - 11000h + f800h)
        (1616eh + 2 - 11000h + f800h)
        (1618ah + 2 - 11000h + f800h)
        (16198h + 2 - 11000h + f800h)
        (161aeh + 2 - 11000h + f800h)
        (161c0h + 2 - 11000h + f800h)
        (161d6h + 2 - 11000h + f800h)
        (161ech + 2 - 11000h + f800h)
        (161fch + 2 - 11000h + f800h)
        (16208h + 2 - 11000h + f800h)
        (1621eh + 2 - 11000h + f800h)
        (1622ah + 2 - 11000h + f800h)
        (16236h + 2 - 11000h + f800h)
        (16248h + 2 - 11000h + f800h)
        (16258h + 2 - 11000h + f800h)
        (16266h + 2 - 11000h + f800h)
        (16276h + 2 - 11000h + f800h)
        (1628ch + 2 - 11000h + f800h)
        (1629ah + 2 - 11000h + f800h)
        (162ach + 2 - 11000h + f800h)
        (162c6h + 2 - 11000h + f800h)
        (162dch + 2 - 11000h + f800h)
        (162f6h + 2 - 11000h + f800h)
        (16310h + 2 - 11000h + f800h)
        (1632ah + 2 - 11000h + f800h)
        (16346h + 2 - 11000h + f800h)
        (16364h + 2 - 11000h + f800h)
        (1636ch + 2 - 11000h + f800h)
        (16380h + 2 - 11000h + f800h)
        (16394h + 2 - 11000h + f800h)
        (163a0h + 2 - 11000h + f800h)
        (163aeh + 2 - 11000h + f800h)
        (163bch + 2 - 11000h + f800h)
        (163c6h + 2 - 11000h + f800h)
        (163dah + 2 - 11000h + f800h)
        (163ech + 2 - 11000h + f800h)
        (163feh + 2 - 11000h + f800h)
        (16408h + 2 - 11000h + f800h)
        (16414h + 2 - 11000h + f800h)
        (16420h + 2 - 11000h + f800h)
        (1642eh + 2 - 11000h + f800h)
        (1643eh + 2 - 11000h + f800h)
        (16452h + 2 - 11000h + f800h)
        (16466h + 2 - 11000h + f800h)
        (16476h + 2 - 11000h + f800h)
        (164a8h + 2 - 11000h + f800h)
    
    • We add 2 to the formula to account for the size of the hint

    • Save this into a file and then load it like so:

        [0x0000f900]> psz @@.unknown_image_import_by_name.txt
        0x00014842: psz
        ...
        0x00014caa: psz
        HeapReAlloc
        HeapAlloc
        HeapFree
        GetProcessHeap
        InitializeCriticalSectionAndSpinCount
        GetLastError
        HeapSize
        DecodePointer
        DeleteCriticalSection
        SetEndOfFile
        GetStringTypeW
        IsDebuggerPresent
        OutputDebugStringW
        EnterCriticalSection
        LeaveCriticalSection
        EncodePointer
        GetCommandLineW
        IsProcessorFeaturePresent
        ExitProcess
        GetModuleHandleExW
        GetProcAddress
        MultiByteToWideChar
        WideCharToMultiByte
        GetStdHandle
        WriteFile
        GetModuleFileNameW
        RtlUnwind
        ReadFile
        GetConsoleMode
        ReadConsoleW
        CloseHandle
        SetLastError
        GetCurrentThreadId
        GetFileType
        GetStartupInfoW
        QueryPerformanceCounter
        GetCurrentProcessId
        GetSystemTimeAsFileTime
        GetEnvironmentStringsW
        FreeEnvironmentStringsW
        UnhandledExceptionFilter
        SetUnhandledExceptionFilter
        Sleep
        GetCurrentProcess
        TerminateProcess
        TlsAlloc
        TlsGetValue
        TlsSetValue
        TlsFree
        GetModuleHandleW
        LoadLibraryExW
        IsValidCodePage
        GetACP
        GetOEMCP
        GetCPInfo
        CreateFileW
        SetStdHandle
        SetFilePointerEx
        FlushFileBuffers
        GetConsoleCP
        LCMapStringW
        WriteConsoleW
      

And there we go, the list of imports found in the IAT!

Additional Info

The astute reader probably noticed that there were two arrays of Image_Thunk_Data structures pointed to by the OriginalFirstThunk and the FirstThunk. The FirstThunk points to an array that is commonly referred to as the Import Address Table while the OriginalFirstThunk points to what is know as the Import Name Table or the Import Lookup Table. So why are there two arrays of the same thing? The Import Name Table is a static structure, once written when the executable is built, it does not change. The Import Lookup Table may be used as a fall back to identify which imports to populate the IAT, but it is not uncommon to see the Import Lookup Table set to null. The IAT on the other hand has its values overwritten with the address of the corresponding function by the loader. Thus the Image_Import_By_Name is replaced with the address of the function. When dumping an executable, the IAT is normally what needs to be rebuilt for the application to run correctly; but, that is a discussion for another time.


Phew! Thanks for sticking with me. Hopefully the next time you see the list of imports in IDA or a tool like CFF explorer, you'll have a better idea how they extrapolated this data and a deeper understanding of what it means.

If you spot any errors or see any improvements, please leave a comment below.

Appendix

Struts
  1. IMAGE_DOS_HEADER[7]

     typedef struct _IMAGE_DOS_HEADER {
         WORD  e_magic;      /* 00: MZ Header signature */
         WORD  e_cblp;       /* 02: Bytes on last page of file */
         WORD  e_cp;         /* 04: Pages in file */
         WORD  e_crlc;       /* 06: Relocations */
         WORD  e_cparhdr;    /* 08: Size of header in paragraphs */
         WORD  e_minalloc;   /* 0a: Minimum extra paragraphs needed */
         WORD  e_maxalloc;   /* 0c: Maximum extra paragraphs needed */
         WORD  e_ss;         /* 0e: Initial (relative) SS value */
         WORD  e_sp;         /* 10: Initial SP value */
         WORD  e_csum;       /* 12: Checksum */
         WORD  e_ip;         /* 14: Initial IP value */
         WORD  e_cs;         /* 16: Initial (relative) CS value */
         WORD  e_lfarlc;     /* 18: File address of relocation table */
         WORD  e_ovno;       /* 1a: Overlay number */
         WORD  e_res[4];     /* 1c: Reserved words */
         WORD  e_oemid;      /* 24: OEM identifier (for e_oeminfo) */
         WORD  e_oeminfo;    /* 26: OEM information; e_oemid specific */
         WORD  e_res2[10];   /* 28: Reserved words */
         DWORD e_lfanew;     /* 3c: Offset to extended header */
     } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
    
  • IMAGE_OPTIONAL_HEADER[8]

      typedef struct _IMAGE_OPTIONAL_HEADER {
    
        /* Standard fields */
    
        WORD  Magic; /* 0x10b or 0x107 */	/* 0x00 */
        BYTE  MajorLinkerVersion;
        BYTE  MinorLinkerVersion;
        DWORD SizeOfCode;
        DWORD SizeOfInitializedData;
        DWORD SizeOfUninitializedData;
        DWORD AddressOfEntryPoint;		/* 0x10 */
        DWORD BaseOfCode;
        DWORD BaseOfData;
    
        /* NT additional fields */
    
        DWORD ImageBase;
        DWORD SectionAlignment;		/* 0x20 */
        DWORD FileAlignment;
        WORD  MajorOperatingSystemVersion;
        WORD  MinorOperatingSystemVersion;
        WORD  MajorImageVersion;
        WORD  MinorImageVersion;
        WORD  MajorSubsystemVersion;		/* 0x30 */
        WORD  MinorSubsystemVersion;
        DWORD Win32VersionValue;
        DWORD SizeOfImage;
        DWORD SizeOfHeaders;
        DWORD CheckSum;			/* 0x40 */
        WORD  Subsystem;
        WORD  DllCharacteristics;
        DWORD SizeOfStackReserve;
        DWORD SizeOfStackCommit;
        DWORD SizeOfHeapReserve;		/* 0x50 */
        DWORD SizeOfHeapCommit;
        DWORD LoaderFlags;
        DWORD NumberOfRvaAndSizes;
        IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; /* 0x60 */
        /* 0xE0 */
      } IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
    
  • IMAGE_NT_HEADERS[9]

      typedef struct _IMAGE_NT_HEADERS {
        DWORD Signature; /* "PE"\0\0 */	/* 0x00 */
        IMAGE_FILE_HEADER FileHeader;		/* 0x04 */
        IMAGE_OPTIONAL_HEADER32 OptionalHeader;	/* 0x18 */
      } IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
    
  • IMAGE_FILE_HEADER[10]

      typedef struct _IMAGE_FILE_HEADER {
        WORD  Machine;
        WORD  NumberOfSections;
        DWORD TimeDateStamp;
        DWORD PointerToSymbolTable;
        DWORD NumberOfSymbols;
        WORD  SizeOfOptionalHeader;
        WORD  Characteristics;
      } IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
    
  • IMAGE_DATA_DIRECTORY[11]

      typedef struct _IMAGE_DATA_DIRECTORY {
        DWORD VirtualAddress;
        DWORD Size;
      } IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
    
  • IMAGE_SECTION_HEADER[12]

      #define IMAGE_SIZEOF_SHORT_NAME 8
    
      typedef struct _IMAGE_SECTION_HEADER {
        BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
        union {
          DWORD PhysicalAddress;
          DWORD VirtualSize;
        } Misc;
        DWORD VirtualAddress;
        DWORD SizeOfRawData;
        DWORD PointerToRawData;
        DWORD PointerToRelocations;
        DWORD PointerToLinenumbers;
        WORD  NumberOfRelocations;
        WORD  NumberOfLinenumbers;
        DWORD Characteristics;
      } IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
    
      #define	IMAGE_SIZEOF_SECTION_HEADER 40
    
  • IMAGE_IMPORT_DESCRIPTOR[13]

      typedef struct _IMAGE_IMPORT_DESCRIPTOR {
      	union {
      		DWORD	Characteristics; /* 0 for terminating null import descriptor  */
      		DWORD	OriginalFirstThunk;	/* RVA to original unbound IAT */
      	} DUMMYUNIONNAME;
      	DWORD	TimeDateStamp;	/* 0 if not bound,
      				 * -1 if bound, and real date\time stamp
      				 *    in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT
      				 * (new BIND)
      				 * otherwise date/time stamp of DLL bound to
      				 * (Old BIND)
      				 */
      	DWORD	ForwarderChain;	/* -1 if no forwarders */
      	DWORD	Name;
      	/* RVA to IAT (if bound this IAT has actual addresses) */
      	DWORD	FirstThunk;
      } IMAGE_IMPORT_DESCRIPTOR,*PIMAGE_IMPORT_DESCRIPTOR;
    
  • IMAGE_THUNK_DATA32[14]

      typedef struct _IMAGE_THUNK_DATA32 {
      	union {
      		DWORD ForwarderString;
      		DWORD Function;
      		DWORD Ordinal;
      		DWORD AddressOfData;
      	} u1;
      } IMAGE_THUNK_DATA32,*PIMAGE_THUNK_DATA32;
    
  • IMAGE_IMPORT_BY_NAME[15]

      typedef struct _IMAGE_IMPORT_BY_NAME {
      	WORD	Hint;
      	BYTE	Name[1];
      } IMAGE_IMPORT_BY_NAME,*PIMAGE_IMPORT_BY_NAME;
    

References

  1. Radare - Print Modes ↩︎

  2. Radare - Seeking ↩︎

  3. Radare - Seeking ↩︎

  4. Flare-On ↩︎

  5. v1(2), Code Breakers Magazine, January 2006. Goppit, "Portable Executable Format - a Reverse Enginnering View" ↩︎

  6. Rva2Offset function ↩︎

  7. winnt.h ↩︎

  8. winnt.h ↩︎

  9. winnt.h ↩︎

  10. winnt.h ↩︎

  11. winnt.h ↩︎

  12. winnt.h ↩︎

  13. winnt.h ↩︎

  14. winnt.h ↩︎

  15. winnt.h ↩︎

comments powered by Disqus