Pages

October 27, 2014

How Big Are Your Functions?

Couple of weeks back, there was a need to find out the code size of functions in a program I had written. I was just curious as to which functions were the biggest space hoggers. By 'code size of functions', I mean the number of bytes a function's machine code occupy in the .text section of the binary file (or in memory when loaded). This does not include the stack/heap space used by the function. Doing this exercise will come in handy when there is a need to reduce your program's memory footprint; in other words you want to achieve space optimization. Of course, optimizing for space includes stack/heap optimization along with code size reductions. Keep in mind that code size does affect a program's memory footprint if you are using a lot of macros and inline functions.

Disassembler to the rescue!

First thing that came to my mind was to use dumpbin to disassemble the binary code and find out the code size of a function by subtracting the function start address and the end address. This works fine but is tedious if you want to measure the size of multiple (or all) functions within a program.

?DestroyDirInfo_NoHash@@YAXPAU_DirectoryInfo@@@Z:
  00413560: 55                 push    ebp
  00413561: 8B EC              mov     ebp,esp
  00413563: 81 EC D8 00 00 00  sub     esp,0D8h
  ...
  ...
  00413662: 8B E5              mov     esp,ebp
  00413664: 5D                 pop     ebp
  00413665: C3                 ret

This is a snippet from the disassembly of the function DestroyDirInfo_NoHash, one of the functions in FDiffDelete program. So we can calculate the code size as:
0x00413665 - 0x00413560 + 1 = 0x106 bytes (or 262 bytes)


DIA, I've got my eye on you

Sometime back, a colleague mentioned that the Debug Interface Access (DIA) SDK can be used for the same purpose. I didn't get more details from him that day and the topic just slipped my mind until this weekend. I started digging around MSDN to find out how I can use the DIA SDK to find out the functions' code size.

Reading through the articles, I learned that the DIA is an abstraction layer over PDB files and is implemented as an in-proc COM server. A PDB file is the Program DataBase file that holds debug information for a binary file - stuff like local variable address, function address, etc. So I thought - 'Yes, this could give me the code size of functions'. Digging further into the articles, I found that what you need is:
  • Path to the PDB file corresponding to the exe/dll file whose functions you want to analyze.
  • (OR) Path to the exe/dll file and optionally a search path where to look for the corresponding PDB file.
With the above information in hand, the process of enumerating all functions in the binary and finding out their size is quite straight-forward. Only hurdle is the usage of COM specific stuff if you aren't familiar with it. So the basic outline of using DIA for our 'code sizing' purpose is shown below:

Control Flow

I've uploaded this code to github here: CodeSizer. It shows the undecorated function names and their size in bytes in descending order of size. It also has options to show all functions or specific functions or functions that match the specified sub-string.

Going back to our earlier example of calculating the size of the function DestroyDirInfo_NoHash, see the output from CodeSizer, the size is 262 bytes:
> CodeSizer.exe  
> /f ..\..\FDiffDelete\Source\Debug\FDiffDelete.exe 
> /s DestroyDirInfo_NoHash
Function Name                  Size In Bytes
-------------                  -------------
DestroyDirInfo_NoHash          262
Enumerated #functions = 1

Sizes of all functions that have 'Build' as a sub-string:
> CodeSizer.exe 
> /f ..\..\FDiffDelete\Source\Debug\FDiffDelete.exe 
> /r Build
Function Name                  Size In Bytes
-------------                  -------------
BuildFilesInDir_NoHash         2620
BuildFilesInDir_Hash           2350
BuildDirTree_Hash              938
BuildDirTree_NoHash            938
BuildDirInfo                   889
BuildFilesInDir                107
BuildDirTree                   99
Enumerated #functions = 7

Thus, a manual task becomes an easy automation that can be used frequently in future. This is what we programmers are good at, isn't it? We are lazy, and that's a good thing, in a way!

Peeking Into cryptbase.dll

I was curious to look at the sizes of some system DLL and chose to look into the AES functions in cryptbase. Strange thing I noticed when doing this is I couldn't get function information from this dll (or ntdll.dll either) by using the same technique as for my own DLLs. I had to look for public symbols in the PDB's global scope and then filter in only the functions. My guess is that system DLLs do not have debug information for non-exported functions. Here are the top 10 largest AES functions from cryptbase.dll:

Function Name                  Size In Bytes
-------------                  -------------
AesExpandKey                   864
AesCbcDecrypt                  832
AesCbcEncrypt                  752
AesDecrypt                     656
AesEncrypt                     640
AesCtrRng_Generate             448
AesCtrRng_Instantiate          292
AesCtrRng_Update               292
AesCtrRng_Reseed               216
AesCtr_safe_startup            172

What's Next?

The DIA SDK, although very powerful, is quite tedious to use because it involves a lot of function calls to get stuff out. So, building an abstraction layer on top of it such as in the CodeSizer project will make it simple. Such an abstraction could be used as the DIA front-end in a debugger since the debugger makes heavy use of PDB files in order to show debug information.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.