Monday, June 10, 2013

SEH: Subtle Exception Handling!

Malwares are a never-ending source of obfuscation tricks: some of them are accurately crafted, whereas others just happen to be there. Sometimes it depends on the compiler itself: how it deals with optimizations, how it translates some language specific constructs, and so on. In this case, we are going to discuss how SEH is implemented in Visual C++. After a short low level explanation, I will propose a procedure that exploits this mechanism in order to obfuscate code and I will provide the source code of a working POC too.


SEH in Visual Studio: how does it work?

Let's begin with a brief description of a little trick that I've found while analyzing a malware detected by ESET as Win32/Rootkit.Avatar. For more information about it you can check this detailed article.

In particular, the article mentions a specific behaviour of the malware: "the malware raises an exception to pass control to an installed exception-handler". That is:

...
.text:0040235B                 push    offset sub_402B26
.text:00402360                 mov     eax, large fs:0
.text:00402366                 push    eax
.text:00402367                 mov     large fs:0, esp
...


This is the standard way an exception handler is installed, and it corresponds to a try-catch statement in C++.
Anyway, if we keep analyzing the code we'll notice that this isn't the only exception handler being installed. In fact, we are going to see how the malware takes advantage of another one, in the attempt to hide a common debugger check (the PEB one) inside it.

Exception handlers have already been extensively documented in the past, but this one is a little bit trickier because it makes use of a Visual Studio specific implementation: the try-except statement. Here is how it is implemented in the malware:

...
.text:00401CC5                 push    offset dword_4044C8

.text:00401CCA                 push    offset __except_handler3
.text:00401CCF                 mov     eax, large fs:0
.text:00401CD5                 push    eax
.text:00401CD6                 mov     large fs:0, esp
...


This is the installation code for a SEH in Visual Studio, and there are two substantial differences in respect to the previous code: the first one is that it seems to be installing a standard library routine ("__except_handler3") as an exception handler, which doesn't look suspicious; the second one is a little bit confusing if you haven't read the specifications before. 
However, a closer look will reveal the trick. In fact another value is being pushed, that is:

.text:00401CC5                 push    offset dword_4044C8


We usually wouldn't expect to see this additional "push", and we would think that it isn't related to the SEH, but... it actually is!
In particular, this "push" is putting on the stack the address of a data structure named "scopetable entry", documented by Matt Pietrek, which has the following definition:

 typedef struct _SCOPETABLE
 {
     DWORD       previousTryLevel;
     DWORD       lpfnFilter;
     DWORD       lpfnHandler;
 } SCOPETABLE, *PSCOPETABLE;


It specifies the addresses of the code blocks to be executed for the filter expression ("lpfnFilter") and for the except body ("lpfnHandler"):

__try {
   ... code
}
__except(filter expression) {
   ... except body
}


The library routine "__except_handler3" uses this information first to call the code for the filter expression, which will decide if the exception is handled or not, and then to dispatch execution to the except body (in case it's handled). So, actually, the real exception handler installed by the malware is not the library one, but it is the one inside the except body. We can see this structure in the malware:

.rdata:004044C8 dword_4044C8    dd 0FFFFFFFFh
.rdata:004044CC                 dd offset filter
.rdata:004044D0                 dd offset except_body


and the related code:

.text:00401CFA                 mov     [ecx], al   ; trigger exception!
.text:00401CFC                 jmp     short loc_401D13
.text:00401CFE ; ----------------------------------------------
.text:00401CFE
.text:00401CFE filter:
.text:00401CFE                 mov     eax, 1
.text:00401D03                 retn
.text:00401D04 ; ----------------------------------------------
.text:00401D04
.text:00401D04 except_body:
.text:00401D04                 mov     esp, [ebp+var_18]
.text:00401D07                 mov     eax, large fs:30h
.text:00401D0D                 mov     al, [eax+2]
.text:00401D10                 mov     [ebp+var_1C], eax
.text:00401D13
.text:00401D13 loc_401D13:
.text:00401D13                 mov     [ebp+var_4], 0FFFFFFFFh
.text:00401D1A                 mov     al, byte ptr [ebp+var_1C]
.text:00401D1D                 mov     ecx, [ebp+var_10]
.text:00401D20                 mov     large fs:0, ecx
.text:00401D27                 pop     edi
.text:00401D28                 pop     esi
.text:00401D29                 pop     ebx
.text:00401D2A                 mov     esp, ebp
.text:00401D2C                 pop     ebp
.text:00401D2D                 retn


From this listing you can see that the filter code always returns true, which means that the except body is always executed when an exception happens (and the code triggers one on purpose on line 00401CFA). On execution, the except body checks PEB.BeingDebugged in order to detect a debugger attached to the process, and returns true or false depending on the result. Later, the function that called the above code, will check such a flag and terminate execution in case of debugger detection.


A better way to exploit the SEH implementation.

So, all this trouble just to hide the check for the debugger inside a try-except statement and to make it a bit more difficult to trace but, as it is, this trick is not really being effective. Is it possible to do better?

Well, if we put the debugger check inside the filter code rather than in the except body, we can make the filter return false in case of debugger detection, which means the library handler "__except_handler3" won't call the except body, and will terminate the execution instead. This would confuse things, because the decision on whether to terminate execution or not is taken inside a library code routine, rather than in the malware code itself. In this case, if someone debugs the malware he will find that the execution always terminates when running the standard Visual Studio exception handler code, and will have to dig into it to understand what's happening.

It would look like this:

__try
{
   //...
   RaiseException(0, 0, 0, 0);
}
__except(!IsDebuggerPresent())
{
   //...
}


Briefly: the code guarded in the try block will cause an exception; the filter routine is the check implemented via the IsDebuggerPresent API, which returns true if the debugger is attached and false otherwise. So, in case a debugger is detected, the filter returns zero, and the except block is never called, causing the process to simply crash.





Of course, you can obfuscate the code in the filter routine and make it not so obvious, and this will leave the analyst puzzling in why is the code crashing inside Visual Studio standard library routine :).


"__except_handler4"?!

"__except_handler3" is the standard library code, but it was susceptible to corruption in case of stack overflow, and this caused security problems. So with new versions of Visual Studio, the function was updated to "__except_handler4", which is essentially the same routine with additional features. 

In particular, it uses canaries to protect the SEH data, in order to make sure that the pointers to the exception handlers have not been overwritten: 

.text:004010C5 @__security_check_cookie@4 proc near    ; DATA XREF: __except_handler4+11 o
.text:004010C5                 cmp     ecx, ___security_cookie
.text:004010CB                 jnz     short loc_4010CF
.text:004010CD                 rep retn
.text:004010CF
.text:004010CF loc_4010CF:                             ; CODE XREF: __security_check_cookie(x)+6 j
.text:004010CF                 jmp     ___report_gsfailure
.text:004010CF @__security_check_cookie@4 endp


Furthermore, the old "__except_handler3" was library code that was linked and embedded in the user executable, while "__except_handler4" instead is only a small wrapper for the API "_except_handler4_common", exported by the Visual Studio runtime dll (module msvcr*.dll):

.text:00401799                 mov     edi, edi
.text:0040179B                 push    ebp
.text:0040179C                 mov     ebp, esp
.text:0040179E                 push    [ebp+arg_C]
.text:004017A1                 push    [ebp+arg_8]
.text:004017A4                 push    [ebp+arg_4]
.text:004017A7                 push    [ebp+arg_0]
.text:004017AA                 push    offset @__security_check_cookie@4 ; __security_check_cookie(x)
.text:004017AF                 push    offset ___security_cookie
.text:004017B4                 call    _except_handler4_common
.text:004017B9                 add     esp, 18h
.text:004017BC                 pop     ebp
.text:004017BD                 retn


Obfuscating algorithms.

Now that we know all the details related to the SEH implementation in Visual Studio, I would like to propose a simple yet powerful idea to obfuscate algorithms.

Briefly, you can:

  • Create a set of basic virtualized opcodes, each one represented by a different function.
  • Use these opcodes to write an algorithm encoding it in a data structure (each opcode will be associated to a particular "id number").
  • Execute each instruction of the program through a different filter expression. This means that if your algorithm consists of "n" opcodes, you will have "n" try-except blocks (that is, "n" filter expressions) and you will have to generate "n" exceptions as well.



Here is the source code of a working POC that implements the RC4 algorithm:

 #include <windows.h>  
 #include <stdio.h>  
   
 // globals used to keep the jl flags and the ip  
 int flags, eip;  
   
 // opcodes  
 #define    OPC_MOD    0x11  
 #define    OPC_XOR    0x12  
 #define    OPC_CMP    0x13  
 #define    OPC_JL     0x14  
 #define    OPC_JMP    0x15  
 #define    OPC_HLT    0x16  
 #define    OPC_MOV    0x17  
 #define    OPC_ADD    0x18  
   
 // operand types  
 #define    OP_V       1 // variable  
 #define    OP_C       2 // constant  
 #define    OP_P       3 // pointer  
   
 // sizes  
 #define    OP_BYTE    1  
 #define    OP_DWORD   2  
   
 // opcode characterization  
 typedef struct _OPCODE  
 {  
   BYTE  opcode;  
   BYTE  type_op1;  
   BYTE  type_op2;  
   BYTE  size;  
 } OPCODE;  
   
 // macro to fill the opcode arrays quickly  
 #define    MAKE_OPC(__opc, __op1, __op2, __size, __param1, __param2)    \  
           (__opc | (__op1 << 8) | (__op2 << 16) | (__size << 24)),   \  
           (DWORD)__param1,                       \  
           (DWORD)__param2  
   
 // EXC_RUN to execute the opcodes arrays "rc4_init_op" and "rc4_crypt_op"  
 #define    EXC_RUN(__myprogram)    \  
           eip = 0; flags = 0;  \  
           while(eip != EIP_HALT){ EXC_TRY() EXC_EXCEPTION(__myprogram) EXC_USED_OPCODES() eip += 3;}  
   
 #define    EIP_HALT  0xFFFFFFFF  
   
 #define    EXC_TRY()        \  
           __try{ __try{ __try{ __try{ __try{ __try{ __try{ __try{  
 #define    EXC_EXCEPTION(__program) RaiseException(__program[eip], 0, 2, (ULONG_PTR*)(&__program[eip+1]));  
 #define    EXC_INSTR(__opc) }__except(__opc(GetExceptionCode(), GetExceptionInformation())){}  
 #define    EXC_USED_OPCODES()  \  
           EXC_INSTR(cmp) EXC_INSTR(mov) EXC_INSTR(add) EXC_INSTR(hlt)  \  
           EXC_INSTR(jmp) EXC_INSTR(jl) EXC_INSTR(mod) EXC_INSTR(xor)  
   
 // flags values after cmp  
 #define    GT       0  
 #define    LT       1  
 #define    EQ       2  
   
 // checks the opcode and extracts its operands  
 BOOL chckopc_extr(BYTE opcode, BYTE opc, DWORD **op1, DWORD **op2, struct _EXCEPTION_POINTERS *ep)  
 {  
     
   EXCEPTION_RECORD *er;  
   
   if(opcode != opc) return false;    
   
   er = ep->ExceptionRecord;  
   *op1 = (DWORD*)(er->ExceptionInformation[0]);  
   *op2 = (DWORD*)(er->ExceptionInformation[1]);  
     
   return true;  
 }  
   
 // reads an operand given its type and size  
 DWORD readop(DWORD *op, BYTE type, BYTE size)  
 {  
   switch(type)  
   {  
     case OP_V:  
       if(size == OP_BYTE)  
         return *((BYTE*)op);  
       else  
         return *op;  
   
     case OP_C:  
       return (DWORD)op;  
         
     case OP_P:  
       if(size == OP_BYTE)  
         return *((BYTE*)(*op));  
       else  
         return *((DWORD*)(*op));  
   }  
   
   return 0;  
 }  
   
 // assigns data to an operand given its type and size  
 void assignop(DWORD *op, BYTE type, BYTE size, DWORD data)  
 {  
   switch(type)  
   {  
     case OP_V:  
       if(size == OP_BYTE)  
         *((BYTE*)op) = (BYTE)data;  
       else  
         *op = data;  
       break;  
   
     case OP_C:  
         *op = data;  
       break;  
   
     case OP_P:  
       if(size == OP_BYTE)  
         *((BYTE*)(*op)) = (BYTE)data;  
       else  
         *((DWORD*)(*op)) = data;  
       break;  
   }  
 }  
   
 // -----------------------------------------------------------------  
   
 // Opcodes  
   
 // x = x % y  
 int mod(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD *op1, *op2;  
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_MOD, &op1, &op2, ep))  
     return false;  
   *op1 = *op1 % *op2;  
   return true;  
 }  
   
 // x = x ^ y  
 int xor(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD *op1, *op2;  
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_XOR, &op1, &op2, ep))  
     return false;  
   *op1 = *op1 ^ *op2;  
   return true;  
 }  
   
 // unsigned compare  
 int cmp(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD src1, src2;  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_CMP, &op1, &op2, ep))  
     return false;  
   
   src1 = readop(op1, ((OPCODE*)&code)->type_op1, ((OPCODE*)&code)->size);  
   src2 = readop(op2, ((OPCODE*)&code)->type_op2, ((OPCODE*)&code)->size);  
   (src1 > src2) ? flags = GT : ((src1 < src2) ? flags = LT : flags = EQ);  
     
   return true;  
 }  
   
 // eip = x IFF flags == LT  
 int jl(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_JL, &op1, &op2, ep))  
     return false;  
   
   if(flags == LT)  
     eip = ((DWORD)op1 * 3) - 3;  
   return true;  
 }  
   
 // eip = x  
 int jmp(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_JMP, &op1, &op2, ep))  
     return false;  
   
   eip = ((DWORD)op1 * 3) - 3;  
   return true;  
 }  
   
 // eip = EIP_HALT  
 int hlt(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_HLT, &op1, &op2, ep))  
     return false;  
   
   eip = EIP_HALT - 3;  
   return true;  
 }  
   
 // move data   
 int mov(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD src2;  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_MOV, &op1, &op2, ep))  
     return false;  
   
   src2 = readop(op2, ((OPCODE*)&code)->type_op2, ((OPCODE*)&code)->size);  
   assignop(op1, ((OPCODE*)&code)->type_op1, ((OPCODE*)&code)->size, src2);  
   
   return true;  
 }  
   
 // add data   
 int add(unsigned int code, struct _EXCEPTION_POINTERS *ep)  
 {  
   DWORD src1, src2;  
   DWORD *op1, *op2;  
   
   if(!chckopc_extr((((OPCODE*)&code)->opcode), OPC_ADD, &op1, &op2, ep))  
     return false;  
   
   src1 = readop(op1, ((OPCODE*)&code)->type_op1, ((OPCODE*)&code)->size);  
   src2 = readop(op2, ((OPCODE*)&code)->type_op2, ((OPCODE*)&code)->size);  
   src2 += src1;  
   assignop(op1, ((OPCODE*)&code)->type_op1, ((OPCODE*)&code)->size, src2);  
   
   return true;  
 }  
   
 // -----------------------------------------------------------------  
   
 void main(void)  
 {  
   // test vector:  
   // ascii key    0123456789abcdef  
   // hex plaintext:  0000000000000000  
   // hex ciphertext: 7494c2e7104b0879  
   
   BYTE *temp_perm, *temp_perm2, *temp_key, *temp_plain, *temp_cipher;  
   BYTE perm_byte, swap_byte;  
   DWORD j, index1, index2, key_index, key_byte;  
   int i, keylen = 8, plainlen = 8;  
   BYTE perm[256];  
   BYTE key[8] = {0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef};  
   BYTE plaintext[8] = {0, 0, 0, 0, 0, 0, 0, 0};  
   BYTE ciphertext[8];  
   
   temp_perm = perm;  
   
   DWORD rc4_init_op[] = {  
     /* 000 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &i, 0),    // init permutation box  
     /* 001 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_perm, &i),  
     /* 002 */ MAKE_OPC(OPC_ADD, OP_V, OP_C, OP_DWORD, &temp_perm, 1),  
     /* 003 */ MAKE_OPC(OPC_ADD, OP_V, OP_C, OP_DWORD, &i, 1),  
     /* 004 */ MAKE_OPC(OPC_CMP, OP_V, OP_C, OP_DWORD, &i, 256),  
     /* 005 */ MAKE_OPC(OPC_JL, 0, 0, 0, 1, 0),  
     /* 006 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_BYTE, &index1, 0),  
     /* 007 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_BYTE, &index2, 0),  
   
     /* 008 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &j, 0),      // apply the key to the permutation box  
     /* 009 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &i, 0),  
     /* 010 */ MAKE_OPC(OPC_MOV, OP_V, OP_V, OP_DWORD, &key_index, &i),  
     /* 011 */ MAKE_OPC(OPC_MOD, 0, 0, 0, &key_index, &keylen),  
     /* 012 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_key, key),  
     /* 013 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_key, &key_index),  
     /* 014 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &key_byte, &temp_key),  
     /* 015 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),  
     /* 016 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &i),  
     /* 017 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &perm_byte, &temp_perm),  
     /* 018 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_BYTE, &j, &perm_byte),  
     /* 019 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_BYTE, &j, &key_byte),  
   
     /* 020 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),    // swap bytes  
     /* 021 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &j),  
     /* 022 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &swap_byte, &temp_perm),  
     /* 023 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_perm, &perm_byte),  
     /* 024 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),  
     /* 025 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &i),  
     /* 026 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_perm, &swap_byte),  
   
     /* 027 */ MAKE_OPC(OPC_ADD, OP_V, OP_C, OP_DWORD, &i, 1),  
     /* 028 */ MAKE_OPC(OPC_CMP, OP_V, OP_C, OP_DWORD, &i, 256),  
     /* 029 */ MAKE_OPC(OPC_JL, 0, 0, 0, 10, 0),  
   
     /* 030 */ MAKE_OPC(OPC_HLT, 0, 0, 0, 0, 0)  
   };  
   
   DWORD rc4_crypt_op[] = {  
     /* 000 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &i, 0),  
     /* 001 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &index1, 0),  
     /* 002 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &index2, 0),  
     /* 003 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &j, 0),  
   
     /* 004 */ MAKE_OPC(OPC_ADD, OP_V, OP_C, OP_BYTE, &index1, 1),        // update indices  
     /* 005 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),  
     /* 006 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &index1),  
     /* 007 */ MAKE_OPC(OPC_ADD, OP_V, OP_P, OP_BYTE, &index2, &temp_perm),  
   
     /* 008 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &swap_byte, &temp_perm),  // swap bytes  
     /* 009 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm2, perm),  
     /* 010 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm2, &index2),  
     /* 011 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &perm_byte, &temp_perm2),  
     /* 012 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_perm2, &swap_byte),  
     /* 013 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_perm, &perm_byte),  
   
     /* 014 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),    // xor  
     /* 015 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &index1),  
     /* 016 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &j, &temp_perm),  
     /* 017 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),  
     /* 018 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &index2),  
     /* 019 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &perm_byte, &temp_perm),  
     /* 020 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_BYTE, &j, &perm_byte),  
   
     /* 021 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_plain, plaintext),  
     /* 022 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_plain, &i),  
     /* 023 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &perm_byte, &temp_plain),  
     /* 024 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_perm, perm),  
     /* 025 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_perm, &j),  
     /* 026 */ MAKE_OPC(OPC_MOV, OP_V, OP_P, OP_BYTE, &swap_byte, &temp_perm),  
     /* 027 */ MAKE_OPC(OPC_XOR, 0, 0, 0, &swap_byte, &perm_byte),  
     /* 028 */ MAKE_OPC(OPC_MOV, OP_V, OP_C, OP_DWORD, &temp_cipher, ciphertext),  
     /* 029 */ MAKE_OPC(OPC_ADD, OP_V, OP_V, OP_DWORD, &temp_cipher, &i),  
     /* 030 */ MAKE_OPC(OPC_MOV, OP_P, OP_V, OP_BYTE, &temp_cipher, &swap_byte),  
   
     /* 031 */ MAKE_OPC(OPC_ADD, OP_V, OP_C, OP_DWORD, &i, 1),  
     /* 032 */ MAKE_OPC(OPC_CMP, OP_V, OP_V, OP_DWORD, &i, &plainlen),  
     /* 033 */ MAKE_OPC(OPC_JL, 0, 0, 0, 4, 0),  
   
     /* 034 */ MAKE_OPC(OPC_HLT, 0, 0, 0, 0, 0)  
   };  
   
   EXC_RUN(rc4_init_op);  
   
   EXC_RUN(rc4_crypt_op);  
   
   printf("cipher: %02X %02X %02X %02X %02X %02X %02X %02X\n",  
     ciphertext[0], ciphertext[1], ciphertext[2], ciphertext[3],  
     ciphertext[4], ciphertext[5], ciphertext[6], ciphertext[7]);  
 }  


Following this idea, you can easily implement any other algorithm and this bears several advantages in term of obfuscation. In fact, in order to understand the code, you have to analyze the array containing all the opcodes, that is dynamically generated:

...
.text:00401678                 mov     [ebp+var_160], ebx
.text:0040167E                 mov     [ebp+var_15C], 1
.text:00401688                 mov     [ebp+var_158], 2020113h
.text:00401692                 mov     [ebp+var_154], ebx
.text:00401698                 mov     [ebp+var_150], 100h
.text:004016A2                 mov     [ebp+var_14C], 14h
.text:004016AC                 mov     [ebp+var_148], 0Ah
.text:004016B6                 xor     ebx, ebx
.text:004016B8                 mov     [ebp+var_144], ebx
.text:004016BE                 mov     [ebp+var_140], 16h
.text:004016C8                 mov     [ebp+var_13C], ebx
.text:004016CE                 mov     [ebp+var_138], ebx
.text:004016D4                 mov     [ebp+var_44C], eax
.text:004016DA                 lea     ebx, [ebp+var_458]
.text:004016E0                 mov     [ebp+var_448], ebx
.text:004016E6                 mov     [ebp+var_444], 0
.text:004016F0                 mov     [ebp+var_440], eax
.text:004016F6                 lea     ebx, [ebp+var_464]
.text:004016FC                 mov     [ebp+var_43C], ebx
.text:00401702                 mov     [ebp+var_438], 0
.text:0040170C                 mov     [ebp+var_434], eax
.text:00401712                 lea     ebx, [ebp+var_460]
.text:00401718                 mov     [ebp+var_430], ebx
.text:0040171E                 mov     [ebp+var_42C], 0
.text:00401728                 mov     [ebp+var_428], eax
...


Moreover, the opcodes aren't referenced by any direct call, because they are executed only due to the "RaiseException" API, which is guarded within various nested try-except blocks. This results in a chain of filter expressions and except bodies (which constitute an additional layer above the opcode routines) that are triggered by the scopetable mechanism:

; while(eip != EIP_HALT)
.text:00401ACE loc_401ACE:                             ; CODE XREF: _main+A92 j
.text:00401ACE                 mov     dword_404370, eax
.text:00401AD3                 cmp     eax, 0FFFFFFFFh
.text:00401AD6                 jz      loc_401D87
...

; this is the code guarded inside the nested try/excepts
...
.text:00401B11                 lea     edx, [ebp+eax*4+Arguments]
.text:00401B18                 push    edx             ; lpArguments
.text:00401B19                 push    2               ; nNumberOfArguments
.text:00401B1B                 push    ecx             ; dwExceptionFlags
.text:00401B1C                 mov     eax, [ebp+eax*4+dwExceptionCode]
.text:00401B23                 push    eax             ; dwExceptionCode
.text:00401B24                 call    ds:RaiseException
...
.text:00401B57                 jmp     loc_401D71
...

; a couple of filter expressions and except bodies
...
.text:00401B5C loc_401B5C:                             ; DATA XREF: .rdata:00403290 o
.text:00401B5C                 mov     eax, [ebp+var_14]
.text:00401B5F                 mov     ecx, [eax]
.text:00401B61                 mov     edx, [ecx]
.text:00401B63                 mov     [ebp+var_4F0], edx
.text:00401B69                 call    sub_401050
.text:00401B6E                 retn
.text:00401B6F ; ---------------------------------------------------------------------------
.text:00401B6F
.text:00401B6F loc_401B6F:                             ; DATA XREF: .rdata:00403294 o
.text:00401B6F                 mov     esp, [ebp+var_18]
.text:00401B72                 mov     [ebp+var_4], 6
.text:00401B79                 mov     [ebp+var_4], 5
.text:00401B80                 mov     [ebp+var_4], 4
.text:00401B87                 mov     [ebp+var_4], 3
.text:00401B8E                 mov     [ebp+var_4], 2
.text:00401B95                 mov     [ebp+var_4], 1
.text:00401B9C                 mov     [ebp+var_4], 0
.text:00401BA3                 jmp     loc_401D71
.text:00401BA8 ; ---------------------------------------------------------------------------
.text:00401BA8
.text:00401BA8 loc_401BA8:                             ; DATA XREF: .rdata:00403284 o
.text:00401BA8                 mov     eax, [ebp+var_14]
.text:00401BAB                 mov     edx, [eax]
.text:00401BAD                 mov     ecx, [edx]
.text:00401BAF                 mov     [ebp+var_4D4], ecx
.text:00401BB5                 call    sub_401170
.text:00401BBA                 retn
.text:00401BBB ; ---------------------------------------------------------------------------
.text:00401BBB
.text:00401BBB loc_401BBB:                             ; DATA XREF: .rdata:00403288 o
.text:00401BBB                 mov     esp, [ebp+var_18]
.text:00401BBE                 mov     [ebp+var_4], 5
.text:00401BC5                 mov     [ebp+var_4], 4
.text:00401BCC                 mov     [ebp+var_4], 3
.text:00401BD3                 mov     [ebp+var_4], 2
.text:00401BDA                 mov     [ebp+var_4], 1
.text:00401BE1                 mov     [ebp+var_4], 0
.text:00401BE8                 jmp     loc_401D71
...

; outside the nested try/block there is the code to increase the virtual EIP
...
.text:00401D71 loc_401D71:                             ; CODE XREF: _main+867 j
.text:00401D71                                         ; _main+8B3 j 
...

As you can see, the algorithm is all broken and it's not easy to figure out what the code is attempting to do, neither it is to automate the detection of specific routines.

Thursday, March 21, 2013

Binary Instrumentation for Exploit Analysis Purposes (part 2)

Introduction.

This is the second part of the article about binary instrumentation for exploit analysis purposes and this time we will discuss a real pdf exploit: a Stack-based buffer overflow in CoolType.dll (CVE-2010-2883). You can retrieve it from the metasploit module exploit/windows/fileformat/adobe_cooltype_sing .

In order to bypass DEP, this exploit makes use of Heap Spraying to run its ROP shellcode. On the other hand, our goal is to come closer to the point where the vulnerability occurs, so one clever thing to do is to use Pintool to detect the ROP itself.

To do that, we can simply check if the instruction executed after a RET is located after a CALL, but be aware that performing this test alone could lead to false positives. A better test would be to control wether this check works for three times in a row, but this gives rise to some Pintool's problems that we will discuss later.
Another method to detect ROP is to control the ESP register and look for the "0c0c0c0c" value, but inspecting the register with Pin is very slow and will degrade the performance of your Pintool. So we won't implement this one.
Finally, one last check is to log the "pop ESP" instruction, that is a common ROP gadget employed right before the ROP shellcode itself.


Detecting the ROP with a Pintool.

Here is the function to detect the ROP:

#define LAST_EXECUTED 1000

ADDRINT LastExecutedBuf[LAST_EXECUTED];
UINT32 LastExecutedPos = 0;
UINT32 PreviousOpcode;
char TempString[12];

#define     PREV_OPCODE(__dist) (((UINT16*)(AddrEip - __dist))[0])

typedef struct _OPC_CHECK  
{
 UINT8 Delta;
 UINT16 Opcode;
} OPC_CHECK;

OPC_CHECK OpcCheck[] = 
{
 6, 0x15ff, 2, 0x12ff, 2, 0x11ff, 2, 0x13ff, 2, 0x17ff, 2, 0x16ff, 2, 0x10ff, 
 3, 0x55ff, 3, 0x50ff, 3, 0x51ff, 3, 0x52ff, 3, 0x53ff, 4, 0x54ff, 3, 0x55ff, 
 3, 0x56ff, 3, 0x57ff, 3, 0x59ff, 6, 0x95ff, 6, 0x97ff, 6, 0x76ff, 6, 0x96ff, 
 6, 0x94ff, 6, 0x93ff, 6, 0x92ff, 6, 0x91ff, 6, 0x90ff, 7, 0x14ff, 7, 0x94ff, 
 3, 0x14ff, 4, 0x54ff, 2, 0xd0ff, 2, 0xd1ff, 2, 0xd2ff, 2, 0xd3ff, 2, 0xd4ff, 
 2, 0xd5ff, 2, 0xd6ff, 2, 0xd7ff, 0, 0
};

char* QuickDwordToString(char *String, UINT32 Value)
{
 int i;
 UINT32 TempVal = Value;
 UINT8 TempByte;

 for(i = 0; i < 8; i++)
 {
  TempByte = (TempVal & 0xF) + 0x30;
  if(TempByte > 0x39) TempByte += 7;
  String[7-i] = TempByte;
  TempVal >>= 4;
 }

 return String;
}

VOID DetectPopEsp(ADDRINT AddrEip, UINT32 Opcode) 
{
 UINT32 i, k;

 if(PreviousOpcode == 557 &&   // int for RET
  AddrEip < 0x70000000 &&
  ((UINT8*)(AddrEip-5))[0] != 0xE8)
 {
  k = 0;
  while(OpcCheck[k].Delta != 0)
  {
   if( PREV_OPCODE(OpcCheck[k].Delta) == OpcCheck[k].Opcode)
    break;

   k++;
  }

  if(OpcCheck[k].Delta == 0)
  {
   fprintf(OutTrace, "%s RETurned here, but not after call\n", QuickDwordToString(TempString, AddrEip));
  }
 }

 if(Opcode == 486)   // int for POP
 {
  if(((UINT8*)AddrEip)[0] == 0x5C)
  {
   fprintf(OutTrace, "%s  POP ESP DETECTED!!\n", QuickDwordToString(TempString, AddrEip)); 
   fprintf(OutTrace,"Dumping list of previously executed EIPs \n");
   // dump last executed buffer on file
   for(i = LastExecutedPos; i < LAST_EXECUTED; i++)
   {
    fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, LastExecutedBuf[i])); 
   }
   for(i = 0; i < LastExecutedPos; i++)
   {
    fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, LastExecutedBuf[i])); 
   }
   fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, AddrEip)); 
   fflush(OutTrace);
  }
 }

 LastExecutedBuf[LastExecutedPos] = AddrEip;
 LastExecutedPos++;
 if(LastExecutedPos >= LAST_EXECUTED)
 {
  // circular logging
  LastExecutedPos = 0;
 }

 PreviousOpcode = Opcode;
}

Include it in the source code of the basic Pintool provided in the first part of the article and use the following line:

INS_InsertCall(Ins, IPOINT_BEFORE, (AFUNPTR)DetectEip, IARG_INST_PTR, IARG_UINT32, INS_Opcode(Ins), IARG_END);

in the "Instruction()" function to call the "DetectEip()" function before every instruction is executed.

Also, add these lines:

UINT32 Opcode;

va_list VaList;
va_start( VaList, AddrEip);

Opcode = va_arg(VaList, UINT32);

va_end(VaList);

DetectPopEsp(AddrEip, Opcode);


in the "DetectEip()" function (where specified by the comments).

Now a brief description of what the code does. Basically, this Pintool looks for two opcodes: the one corresponding to RET (Pin code 557) and the one corresponding to POP (Pin code 486).

If a RET is encountered, the Pintool follows it and checks if the previous opcode is a CALL, looking for the E8 opcode or the ones provided in the "OpcCheck[].Opcode" array (the list may not be complete, but while testing it was reasonably accurate). In case it's not, it notifies the user with the message: "*Address* RETurned here, but not after call".

If a POP is encountered, it checks if it is a "POP ESP" and, in case it is, it notifies the user by printing "*Adress* POP ESP DETECTED!!" and dumps the last executed instructions on file.

That's it. You are finally ready to compile the Pintool and run it within Adobe Acrobat Reader to analyse the PDF exploit.


Analyzing the output

Here is an excerpt from the output produced by the Pintool:

Exception handler address: 7C91EAEC 
Starting Pintool
Loading module C:\Programmi\Adobe\Reader 9.0\Reader\AcroRd32.exe 
Main exe Base: 00400000  End: 00453FFF
Loading module C:\WINDOWS\system32\kernel32.dll 
Module Base: 7C800000 
Module end: 7C8FEFFF 
Loading module C:\WINDOWS\system32\ntdll.dll 
Module Base: 7C910000 
Module end: 7C9C5FFF 
Starting thread 0
...
0D6D8192 RETurned here, but not after call
02D43FA5 RETurned here, but not after call
22326DB0 RETurned here, but not after call
5B18174F RETurned here, but not after call
08171CF0 RETurned here, but not after call
08171D47 RETurned here, but not after call
06066EED RETurned here, but not after call
0633DE6B RETurned here, but not after call
...
4A82A714 RETurned here, but not after call
4A82A714  POP ESP DETECTED!!
Dumping list of previously executed EIPs 
0803DDC6
0803DDCA
0803DDCC
0803DDCD
...
0808B304
0808B305
0808B307
0808B308
4A80CB38
4A80CB3E
4A80CB3F
4A82A714


From the log above we can see all the modules being loaded and threads being created. Then, we notice some false positives: these are legitimate RETs, which don't return to an instruction after a CALL.
Finally, we get to the part where both checks are detected: the code returns to an instruction not located after a call and a "POP ESP" instruction is executed.

In particular, the last logged EIPs correspond to following ROP gadgets:

 4A80CB38   81C5 94070000    ADD EBP,794
 4A80CB3E   C9               LEAVE
 4A80CB3F   C3               RETN

 4A82A714   5C               POP ESP
(4A82A715   C3               RETN)


So we have located where the exploit occurs (i.e. the address "0808B308"): not bad!

Note that the last instruction reported here (the RETN between parentheses) is not logged by the Pintool because a crash happened right after its execution... but...


...Why???

As I said before, this exploit makes use of Heap Spraying. In particular, we can see it by debugging Adobe Acrobat Reader while Pin is not instrumenting it and setting a breakpoint on address "0808B308". Now, if we open the PDF exploit and leave the debugger running, we can inspect the memory when the code hits the breakpoint:





This is exactly what we were expecting: you can notice the ROP shellcode at "0c0c0c0c" and the Heap Spraying all around. On the other hand, if we debug the Adobe Acrobat Reader while Pin is instrumenting it, we obtain:




So... no ROP, nor Heap Spraying... but the blocks of memory are still allocated. Who has allocated them?
To get the answer we need to look inside the code window:



... It's Pin itself!
Pin allocates a lot of memory to perform binary instrumentation, occupying also the addresses usually employed by the Heap Spraying. This means that when the ROP shellcode is executed, it's not located where it is supposed to be and this will result in Adobe Acrobat Reader crashing.


Another problem I ran into, is that even when I modified the Pintool in order to force the exploit to work with the shellcode that was placed at a different address than 0x0C0C0C0C, the exploit still crashed.
This time I could see it run all the ROP shellcode, which allocates a block of executable memory, copies itself to it and then jumps to it.

However, this executable shellcode (not ROP) tried to decrypt (and therefore overwrite) itself causing a memory access violation and making the instrumented shellcode crash. 

I haven't investigated the problem yet, but it seems that the instrumented shellcode is placed in an area that is read only, therefore the self decryption failed when writing the decrypted bytes back to the shellcode memory. 

Sunday, March 10, 2013

Binary Instrumentation for Exploit Analysis Purposes (part 1)

Introduction.

This article is about binary instrumentation over various exploit scenarios. In particular, we are going to use Pin, a software developed by Intel, to show how this approach can help with the analysis.

Pin is employed to create dynamic program analysis tools, the so called "Pintools". Once executed, a Pintool acts almost like a virtual machine that runs the code from a target executable image and rebuilds it by adding the code you need to perform your own analysis. For example, you can: install a callback that is invoked every time a single instruction is executed; inspect registers; alter the context and so on.

Note: I've tested the whole work using Windows XP 32 bit and Visual Studio 2010.


How to compile and execute a Pintool.


The simplest way to compile a Pintool is to use the Visual Studio project provided by Intel, located in the Pin folder at: \source\tools\MyPinTool .

To run it, simply type: pin -t <your_pintool.dll> -- <application_path>.
In this way your Pintool will be executed within the application you want to test.


How to code a Pintool: a (very) short description.

A Pintool begins with a standard initialization of the Pin engine by using the "PIN_Init()" function; then, you need to register the callbacks for the events you want to handle. 
For instance, you can use:
  • "INS_AddInstrumentFunction()" to register a callback that is invoked at every executed instruction;
  • "IMG_AddInstrumentFunction()" to register a callback that notifies you every time an executable module is loaded;
  • "PIN_AddThreadStartFunction()" and "PIN_AddThreadFiniFunction()" to handle thread creation and ending.

In particular, if you register a callback with "INS_AddInstrumentFunction()", you can then use the "INS_InsertCall()" function from it and register other callbacks.
These callbacks have a special property: they can be invoked before or after an instruction is executed. Also, you can pass to them any kind of data, including the value of specific registers (the instruction pointer, for instance), memory addresses and so on.

Finally, you'll have to use "PIN_AddFiniFunction()" to register the callback that is invoked when the application quits.

Once all the callbacks are registered, you can start the instrumented program by calling "PIN_StartProgram()".

Your Pintool can filter specific conditions with an incredibly accurate resolution, but bear in mind that the performances may degrade badly depending on what kind of actions you choose to do.

As an example, let's consider again the "INS_AddInstrumentFunction()", and suppose that we are going to register a callback that logs every executed instruction to a file: if you are distracted, you might generate a file I/O for every single instruction, which is very inefficient. Another operation that will reduce your Pintool's performances, if called frequently, is the disassembler functionality.
So be careful: your instrumented application can run almost at realtime speed if your Pintool is well written, but a bad implementation may slow down your application up to the point where it will take minutes to run.


A basic Pintool.

Here is a very basic Pintool to which we will add more specific functions later.

 #include <stdio.h>  
 #include "pin.H"   
   
 namespace WINDOWS  
 {  
     #include <windows.h>  
 }  
   
 FILE * OutTrace;  
 ADDRINT ExceptionDispatcher = 0;
   
 /* ===================================================================== */  
 /* Instrumentation functions                                             */  
 /* ===================================================================== */  
   
 VOID DetectEip(ADDRINT AddrEip, ...)   
 {  
     if(AddrEip == ExceptionDispatcher)  
     {  
         fprintf(OutTrace, "%08x Exception occurred!\n", AddrEip);   
     } 

     // Here you can call the functions that we will add
     //(you should also remove the next line to avoid tracing every instruction being executed)
   
     fprintf(OutTrace, "%08x \n", AddrEip);  
 }  
   
 // Pin calls this function every time a new instruction is encountered  
 VOID Instruction(INS Ins, VOID *v)  
 {  
     // Insert a call to DetectEip before every instruction, and pass it the IP  
     INS_InsertCall(Ins, IPOINT_BEFORE, (AFUNPTR)DetectEip, IARG_INST_PTR, IARG_END);  
 }  
   
 VOID ImageLoad(IMG Img, VOID *v)  
 {  
     fprintf(OutTrace, "Loading module %s \n", IMG_Name(Img).c_str());  
     fprintf(OutTrace, "Module Base: %08x \n", IMG_LowAddress(Img));  
     fprintf(OutTrace, "Module end: %08x \n", IMG_HighAddress(Img));  
     fflush(OutTrace);  
 }  
   
 /* ===================================================================== */  
 /* Finalization function                                                 */  
 /* ===================================================================== */  
   
 // This function is called when the application exits  
 VOID Fini(INT32 code, VOID *v)  
 {  
     fprintf(OutTrace, "Terminating execution\n");  
     fflush(OutTrace);  
     fclose(OutTrace);  
 }  
   
 /* ===================================================================== */  
 /* Print Help Message                                                    */  
 /* ===================================================================== */  
   
 INT32 Usage()  
 {  
     PIN_ERROR("Init error\n");  
     return -1;  
 }  
   
 /* ===================================================================== */  
 /* Main                                                                  */  
 /* ===================================================================== */  
   
 int main(int argc, char * argv[])  
 {  
     OutTrace = fopen("itrace.txt", "wb");  
   
     WINDOWS::HMODULE hNtdll;  
     hNtdll = WINDOWS::LoadLibrary("ntdll");  
     ExceptionDispatcher = (ADDRINT)WINDOWS::GetProcAddress(hNtdll, "KiUserExceptionDispatcher");  
     fprintf(OutTrace, "Exception handler address: %08x \n", ExceptionDispatcher);  
     WINDOWS::FreeLibrary(hNtdll);  
   
     // Initialize pin  
     if (PIN_Init(argc, argv))   
     {  
         Usage();  
     }  
   
     // Register Instruction to be called to instrument instructions  
     INS_AddInstrumentFunction(Instruction, 0);  
   
     // Register ImageLoad to be called at every module load  
     IMG_AddInstrumentFunction(ImageLoad, 0);  
   
     // Register Fini to be called when the application exits  
     PIN_AddFiniFunction(Fini, 0);  
     
     // Start the program, never returns  
     fprintf(OutTrace, "Starting Pintool\n");   
     PIN_StartProgram();  
   
     return 0;  
 }    

It basically logs to a file: the address of each instruction being executed; all the exceptions occurred; the name of each module being loaded, including the base and the end address.

I have also put a comment in the "DetectEip()" function, to specify where you can call the functions we will add later.


First exploit scenario: stack overflow.

As a first case study, we are going to consider a specially crafted sample:

 #include <stdio.h>  
 #include <string.h>  
   
 unsigned char Var[2] = {0xFF, 0xE4};  
   
 void GetPassword(){  
  char Password[12];  
   
  memset(Password, 0, sizeof(Password));  
  printf("Insert your password (max 12 chars):\n");  
   
  int i = -1;  
  do{  
    i++;  
    Password[i] = getchar();  
  } while (Password[i] != 0x0D && Password[i] != 0x0A);  
  Password[i] = 0;  
   
  printf("Your password is: %s \n", Password);  
 }  
   
 void main(void){  
  GetPassword();  
 }  

Before compiling and linking it (I used Visual Studio 10), be sure to disable all the security options (stack canaries, DEP, ASLR) and to set the Base Address to 0x41410000.
I know it might sound a little unreal, and in fact... it is! But don't worry, as I said before, this is just the simplest example that crossed my mind and we are going to use it as a first test. Anyway the methodology I'm proposing is very effective and we will see a real case study later.

First, we need to "exploit" this little test: I'll be quick. We can open the executable with Ollydbg and debug it until we find the "getchar" function, that grabs an input string. Then, we enter the following (in my case at least, you should check the parameters explained later if you want to be 100% sure!): "123456789abcAAAA 0AABBBBBBBBBBBBBBBBBBB" (remove the " ").

What's the meaning of it? We are going to fill all the 12 required bytes, and because of the lack of control over the size of the input, we also type:

  • "AAAA", that is the padding added by the compiler;
  • " 0AA", that corresponds to the 0x41413020 address (= "AA0 ", because of the endianness) where the "JMP ESP" instruction (= "0xFF 0xE4" as an opcode) is located --- this will overwrite the return address of the "main" function;
  • a bunch of "B", that corresponds to the "INC EDX" instruction --- this is where you will usually put the shellcode, but as a test every valid instruction will be fine!

Now that you have tested that the string I provided works also in your case, or you have built your own valid string, we are ready to analyze our first exploit scenario: a simple stack overflow. How can we detect that?
The most natural idea is to perform a check over EIP to see whether its value corresponds to a non-executable area (the stack in this case).

The Pintool maintains two variables containing the base and end address of the module being executed.
If the value of the EIP isn't in the range specified by these two addresses, Pintool accesses the modules list maintained by Pin, looking for a new executable module in which the value of EIP resides (for instance, after an API call). When such a module is found, the variables containing the base and end address are updated (making it the current module).
If the value of EIP isn't located within any of the modules, the Pintool reports it as suspicious and logs the list of the last 1000 executed values of EIP.

Here is the code to do that:

 #define LAST_EXECUTED 1000  
 ADDRINT LastExecutedBuf[LAST_EXECUTED];  
 UINT32 LastExecutedPos;
 ADDRINT CurrentModuleBase, CurrentModuleEnd;  
   
 bool IsModuleFound(ADDRINT Addr)  
 {  
     for(IMG Img = APP_ImgHead(); IMG_Valid(Img); Img = IMG_Next(Img))  
     {  
         if(Addr >= IMG_LowAddress(Img) &&  
             Addr <= IMG_HighAddress(Img))    // <=, not <  
         {  
             CurrentModuleBase = IMG_LowAddress(Img);  
             CurrentModuleEnd = IMG_HighAddress(Img);  
             return true;  
         }  
     }  
   
     return false;  
 }  
   
 void CheckEipModule(ADDRINT AddrEip)  
 {  
     int i;  
     if(! (AddrEip >= CurrentModuleBase && AddrEip < CurrentModuleEnd) )  
     {  
         if(!IsModuleFound(AddrEip))  
         {  
             // eip is no within an executable image!  
             fprintf(OutTrace, "EIP detected not within an executable module: %08x \n", AddrEip);  
             fprintf(OutTrace,"Dumping list of previously executed EIPs \n");  
             for(i = LastExecutedPos; i < LAST_EXECUTED; i++)  
             {  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             }  
             for(i = 0; i < LastExecutedPos; i++)  
             {  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             }  
             fprintf(OutTrace, "%08x \n --- END ---", AddrEip);   
             fflush(OutTrace);  
             WINDOWS::ExitProcess(0);  
         }  
     }  
   
     LastExecutedBuf[LastExecutedPos] = AddrEip;  
     LastExecutedPos++;  
     if(LastExecutedPos >= LAST_EXECUTED)  
     {  
         // circular logging  
         LastExecutedPos = 0;  
     }  
 }  

You can simply copy it in the provided basic Pintool, but remember to also add the line:

CheckEipModule(AddrEip);

in the "DetectEip()" function (where specified by the comment).

Compile/link the Pintool and execute it.

Once executed, it will generate a log (I've cut some lines!) like the following:

Exception handler address: 7c91eaec 
Starting Pintool
Loading module C:\...\StackBof.exe 
Module Base: 41410000 
Module end: 41414fff 
Loading module C:\WINDOWS\system32\kernel32.dll 
Module Base: 7c800000 
Module end: 7c8fefff 
Loading module C:\WINDOWS\system32\ntdll.dll 
Module Base: 7c910000 
Module end: 7c9c5fff 
Loading module C:\WINDOWS\system32\MSVCR100.dll 
Module Base: 78aa0000 
Module end: 78b5dfff 
EIP detected not within an executable module: 0012ff84 
Dumping list of previously executed EIPs 
78ac005f 
78ac0061 
78ac0062 
78ac0063 
78ac0069 
...
78ab0cd7 
78ab0cd8 
78b05747 
4141104f 
41411052 
41411053 
41411054 
41411056 
41411057 
41411059 
4141105a 
41413020 
0012ff84 
 --- END ---

It's very simple to understand what happened just by reading the log:

  • the RET instruction is located at the address "0x4141105A";
  • it jumps to the overwritten return address, that is the address "0x41413020", where a "JMP ESP" is located;
  • Pintool successfully detects that we are trying to execute code within a non executable module (that is the "0x0012FF84" address, belonging to the stack).


Conclusions

This was an introductory article on binary instrumentation for exploit analysis purposes and I really hope you liked it! See you for the second part in a few days, where I will discuss another scenario: a real pdf exploit, that makes use of ROP and Heap Spraying.