Disassembler to the rescue!
First thing that came to my mind was to use dumpbin to disassemble the binary code and find out the code size of a function by subtracting the function start address and the end address. This works fine but is tedious if you want to measure the size of multiple (or all) functions within a program.?DestroyDirInfo_NoHash@@YAXPAU_DirectoryInfo@@@Z:
00413560: 55 push ebp
00413561: 8B EC mov ebp,esp
00413563: 81 EC D8 00 00 00 sub esp,0D8h
...
...
00413662: 8B E5 mov esp,ebp
00413664: 5D pop ebp
00413665: C3 ret
This is a snippet from the disassembly of the function DestroyDirInfo_NoHash, one of the functions in FDiffDelete program. So we can calculate the code size as:
0x00413665 - 0x00413560 + 1 = 0x106 bytes (or 262 bytes)
DIA, I've got my eye on you
Sometime back, a colleague mentioned that the Debug Interface Access (DIA) SDK can be used for the same purpose. I didn't get more details from him that day and the topic just slipped my mind until this weekend. I started digging around MSDN to find out how I can use the DIA SDK to find out the functions' code size.
Reading through the articles, I learned that the DIA is an abstraction layer over PDB files and is implemented as an in-proc COM server. A PDB file is the Program DataBase file that holds debug information for a binary file - stuff like local variable address, function address, etc. So I thought - 'Yes, this could give me the code size of functions'. Digging further into the articles, I found that what you need is:
- Path to the PDB file corresponding to the exe/dll file whose functions you want to analyze.
- (OR) Path to the exe/dll file and optionally a search path where to look for the corresponding PDB file.
With the above information in hand, the process of enumerating all functions in the binary and finding out their size is quite straight-forward. Only hurdle is the usage of COM specific stuff if you aren't familiar with it. So the basic outline of using DIA for our 'code sizing' purpose is shown below:
I've uploaded this code to github here: CodeSizer. It shows the undecorated function names and their size in bytes in descending order of size. It also has options to show all functions or specific functions or functions that match the specified sub-string.
Going back to our earlier example of calculating the size of the function DestroyDirInfo_NoHash, see the output from CodeSizer, the size is 262 bytes:
> CodeSizer.exe
> /f ..\..\FDiffDelete\Source\Debug\FDiffDelete.exe
> /s DestroyDirInfo_NoHash
Function Name Size In Bytes
------------- -------------
DestroyDirInfo_NoHash 262
Enumerated #functions = 1
Sizes of all functions that have 'Build' as a sub-string:
> CodeSizer.exe
> /f ..\..\FDiffDelete\Source\Debug\FDiffDelete.exe
> /r Build
Function Name Size In Bytes
------------- -------------
BuildFilesInDir_NoHash 2620
BuildFilesInDir_Hash 2350
BuildDirTree_Hash 938
BuildDirTree_NoHash 938
BuildDirInfo 889
BuildFilesInDir 107
BuildDirTree 99
Enumerated #functions = 7
Thus, a manual task becomes an easy automation that can be used frequently in future. This is what we programmers are good at, isn't it? We are lazy, and that's a good thing, in a way!
Peeking Into cryptbase.dll
I was curious to look at the sizes of some system DLL and chose to look into the AES functions in cryptbase. Strange thing I noticed when doing this is I couldn't get function information from this dll (or ntdll.dll either) by using the same technique as for my own DLLs. I had to look for public symbols in the PDB's global scope and then filter in only the functions. My guess is that system DLLs do not have debug information for non-exported functions. Here are the top 10 largest AES functions from cryptbase.dll:
Function Name Size In Bytes
------------- -------------
AesExpandKey 864
AesCbcDecrypt 832
AesCbcEncrypt 752
AesDecrypt 656
AesEncrypt 640
AesCtrRng_Generate 448
AesCtrRng_Instantiate 292
AesCtrRng_Update 292
AesCtrRng_Reseed 216
AesCtr_safe_startup 172
What's Next?
The DIA SDK, although very powerful, is quite tedious to use because it involves a lot of function calls to get stuff out. So, building an abstraction layer on top of it such as in the CodeSizer project will make it simple. Such an abstraction could be used as the DIA front-end in a debugger since the debugger makes heavy use of PDB files in order to show debug information.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.