John Bentley talks about timing software code in his book Programming Pearls. I was recently reading this and thought I'd time my hashtable implementation(source). Windows has two APIs that can help in doing this: QueryPerformanceFrequency and QueryPerformanceCounter. There are other ways to measure time but I chose these for their high resolution. These APIs provide values from the high-performance counter in the system, if one is present. This counter is like a variable that keeps incrementing many many times per second. The frequency of increments depends on the specific system you have. MSDN says that sometimes this can be the cycle rate of the processor's clock.
QueryPerformanceFrequency returns this frequency value and this value does not change during the system's up-time operation. My system, for example, has a counter that ticks 2143623 times per second and this translates to a resolution of 466.499940 nanoseconds.
QueryPerformanceCounter returns the value of the counter when it is called. By calling this immediately before and after a block of code, we can calculate the number of increments that took place while the code was running and then convert this to a time value. Both the APIs take a pointer to a LARGE_INTEGER variable. If we look in WinNT.h, we see that this is an 8-byte structure with members for accessing each of the 4-byte values. 8-bytes gives plenty of bits to hold the counter value.
Coming to the timing of the hashtable implementation. I timed only the insertion operation using a DWORD type for both key and value. Ran the tests with multiple number of entries each time: 50000, 100000, 700000, 8388608(1<<23), 16777216(1<<24). I initially had a count of 33554432(1<<25) and this run failed because of an out-of-memory condition! Each process by default is limited to 2GB of main memory space. Part of the scaffolding code I used for this is here:
LARGE_INTEGER freq = {0}; LARGE_INTEGER counterStart = {0}; LARGE_INTEGER counterEnd = {0}; LONGLONG elapsedCounts = 0; double elapsedTime[10] = {0}; int numRuns = 0; int sizes[MAX_RUNS] = { 50000, 100000, 700000, 1<<23, 1<<24 }; DWORD dwError = 0; BOOL testSuccess[MAX_RUNS] = {TRUE}; // This retrieves the counts per second if(!QueryPerformanceFrequency(&freq)) { dwError = GetLastError(); wprintf(L"testHashtable(): QueryPerformanceFrequency() failed %d\n", dwError); return; } while(numRuns < MAX_RUNS) { wprintf(L"Beginning run %d\n", numRuns); // Begin counter if(!QueryPerformanceCounter(&counterStart)) { dwError = GetLastError(); wprintf(L"testHashtable(): QueryPerformanceCounter() failed %d\n", dwError); return; } // // Code to test // testSuccess[numRuns] = testHashtable(HT_KEY_DWORD, HT_VAL_DWORD, sizes[numRuns], FALSE); // End counter if(!QueryPerformanceCounter(&counterEnd)) { dwError = GetLastError(); wprintf(L"testHashtable(): QueryPerformanceCounter() failed %d\n", dwError); return; } // Get the elapsed time elapsedCounts = counterEnd.QuadPart - counterStart.QuadPart; elapsedTime[numRuns] = (double)(elapsedCounts / (double)freq.QuadPart); ++numRuns; } wprintf(L"Performance counter ticks %I64u times per second\n", freq.QuadPart); wprintf(L"Resolution is %lf nanoseconds\n", (1.0/(double)freq.QuadPart)*1e9); wprintf(L"%16s %13s %19s %s\n-----------------------------------------------\n", L"RunSize", L"TimeSeconds", L"TimeMicro", L"Result"); for(numRuns = 0; numRuns < MAX_RUNS; ++numRuns) { wprintf(L"%16d %5.8lf %16.3lf %s\n", sizes[numRuns], elapsedTime[numRuns], elapsedTime[numRuns] * 1e6, testSuccess[numRuns]?L"PASSED":L"FAILED"); }
This code gave the following output:
--[ Hashtable insertion operation ]-- ** Release mode, running inside Visual Studio ** RunSize TimeSeconds TimeMicro Result --------------------------------------------------------------- 50000 3.25054825 3250548.254 PASSED 100000 6.17252521 6172525.206 PASSED 700000 45.90315601 45903156.012 PASSED 8388608 648.03144909 648031449.093 PASSED 16777216 1480.01800177 1480018001.766 PASSED --------------------------------------------------------------- ** Release mode, running outside in a command window ** RunSize TimeSeconds TimeMicro Result --------------------------------------------------------------- 50000 0.07000345 70003.447 PASSED 100000 0.12546702 125467.025 PASSED 700000 0.98070789 980707.895 PASSED 8388608 11.25297219 11252972.188 PASSED 16777216 19.48300751 19483007.506 PASSED
As you see from the results above, the interesting learning is the amount of increase in processing time when running under a debugger. In these tests, running under a debugger was orders of magnitude slower than otherwise. In order to see whether this really was the case with even simple code, I timed the execution of a system API - GetCurrentThread() using the same technique. In order to get consistent times I executed the system API multiple times in a for loop. Here are the results. We see the same increase in execution time. The RunSize is the number of times GetCurrentThread() was executed in TimeSeconds seconds. Do note that this API took only ~366 microseconds to execute 100,000 times on my Core i7 powered laptop. That is blazing fast!
--[ Multiple Calls to GetCurrentThread() ]-- ** Release mode, inside Visual Studio ** RunSize TimeSeconds TimeMicro --------------------------------------------------------- 500 0.00000746 7.464 50000 0.00062698 626.976 100000 0.00125349 1253.485 ** Release mode, running outside in a command window ** RunSize TimeSeconds TimeMicro --------------------------------------------------------- 500 0.00000187 1.866 50000 0.00016747 167.473 100000 0.00033635 336.346
This brings us to another interesting idea that is probably already being used. This technique can be used by a program to check whether it is running under a debugger or not. I recall reading somewhere that using IsDebuggerPresent() is not a fool-proof way of determining this. So timing checks may well be better. However, we must account for the fact that processing times vary widely depending on the system hardware and system load while running. What can be done is draw up a table with a list of expected processing times for a particular operation on various types of hardware configurations and for various system loads. Then when our program is running, we time the same operation and compare it against the list we have. The program must be able to determine the hardware configuration(CPUID instruction comes to mind) and the system load(can do this by querying Windows - #processes, memory used,...) and use this information to make a good comparison.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.