Practical Reverse Engineering Solutions – Page 11

my go at exercise 1 on page 11

featured_image

This blog post presents my solutions to exercises from the book Practical Reverse Engineering by Bruce Dang, Alexandre Gazet and Elias Bachaalany (ISBN: 1118787315). The book is my first contact with reverse engineering, so take my statements with a grain of salt. All code snippets are on GitHub. For an overview of my solutions consult this progress page.

Problem Statement

This function uses a combination SCAS and STOS to do its work. First, explain what is the type of the [EBP+8] and [EBP+C] in line 1 and 8, respectively. Next, explain what this snippet does:


01: 8B 7D 08    mov edi, [ebp+8] 02: 8B D7       mov edx, edi
03: 33 C0       xor eax, eax
04: 83 C9 FF    or ecx, 0FFFFFFFFh
05: F2 AE       repne scasb
06: 83 C1 02    add ecx, 2
07: F7 D9       neg ecx
08: 8A 45 0C    mov al, [ebp+0Ch] 09: 8B AA       mov edi, edx
10: F3 AA       rep stosb
11: 8B C2       mov eax, edx

Context of the Snippet

The function snippet probably get’s its parameters in C style. This convention places the function parameter on the stack before the call is made. The parameters are placed in reverse order from the prototype of the function, i.e., the last parameter is placed first. The CALL then places the instruction pointer EIP on the stack. Finally, the standard function prologue pushes the base pointer on the stack and sets the value of EBP to the stack pointer ESP. This leads to the following stack image before line 1 of the exercise snippet is executed (see left hand side):
stack

  • In the following analysis we see that [EBP+8] (the first function parameter) is of type char *, i.e., a pointer to a sequence of bytes. The function snippet requires that sequence is delimited by zero, so it probably is a null-terminated string.
  • The value at [EBP+C] (the second function parameter) is of type char , i.e., a single Byte like a letter.

I’m using the string “The pool on the roof must have a leak.” (with null byte at the end) as argument 1 at [EBP+8] and character 'x' for the second parameter at [EBP+12]. See the right stack in the above figure. Note that while 'x' is actually placed at EBP+C, the frame at EBP+8 contains a memory address pointing to the first letter of the string.

To check my guesses of what the code snippet does, I put the function prologue and epilogue around it and added a caller to get a fully functional assembly code (GitHub link):

I compiled the code on a 64bit machine with:

and started debugging with:

The caller first pushes the second function parameters 'x' on the stack:

Then it pushes the first parameter "The pool on the roof must have a leak.":

In contrast to the second parameter, the stack value is a pointer to the string in memory. The command x/xw $esp gives the value in memory referenced by ESP:

So the string is stored at 0x080490c0:

The next three instructions call the function and run the function prologue:

After that we enter the snippet that is analyzed step-by-step in the next secion.

Walk-Through

► Line 1: mov edi, [ebp+8]

As discussed before, [ebp+8] is a value in stack representing the first function parameter (see right hand side of stack image). This instruction copies the parameter, a pointer to the string, to register EDI. Now EDI references our string:

► Line 2: mov edx, edi

This simply makes a copy of EDI. The reason for that will be clear in line 5. For reference, EDI and EDX contain the double word 0x80490c0:

► Line 3: xor eax, eax

This sets the value of EAX to zero:

Again, the purpose of this will be clear in line 5.

► Line 4: or ecx, 0FFFFFFFFh

This sets the value of ECX to 0xFFFFFFFF:

We interpret ECX as a signed integer -1:

The register ECX is used in the next instruction.

► Line 5: repne scasb

Line 5 is where a lot of the magic happens. The instruction scasb searches the memory for the byte in EAX, starting at EDI. The instruction decreases the value of ECX after each byte comparison by one, and increases the value of EDI by one.

In our example, we search the null byte (in EAX) in the null terminated string “The pool on the roof must have a leak.” (referenced by EDI). The counter ECX starts from -1. The following image illustrates the registers before and after repne scasb:

scasb

So ECX ends up being -40

The value of EDI changes too, that’s why in line 2 we made a copy of the value:

(the start of the string is at 0x80490c0).

► Line 6: add ecx, 2

Add 2 to ECX so ECX becomes -38:

This corresponds to -1 times the length of the string. Adding two compensates for firstly not starting to count down from 0 (remember we started at -1), and secondly also counting the null byte.

► Line 7: neg ecx

This simply negates the value of ECX, so now it actually corresponds to the string length:

To summarize: Up to and including line 7, the snippet actually calculates the length of the string passed at [EBP+8].

► Line 8: mov al, [ebp+0Ch]

Starting with line 8, we enter the second part of the snippet. This instruction copies the byte at stack location [EBP+8] to register AL, i.e., the second function parameter. Since the second parameter is of type char – only one byte in size – the value fits in the lower 8 bits of the EAX register. AL now holds the character 'x':

► Line 9: mov edi, edx

The instruction following in line 10 again operates on EDI. Since line 5 modified the value and it no longer points to the start of the string, we restore it from the backup in EDX that we created in line 2. After that, EDI should once again point to the string:

(compare 0x80490c0 to the output in line 2).

► Line 10: rep stosb

Again a very powerful instruction. It copies the byte in AL (in our case the character 'x') to every byte in the sequence starting at EDI (in our case the string “The pool on the roof must have a leak.”). It does it exactly ECX times (so in our case for the entire length of the string). In other words, this instruction does a memset, effectively overwriting the entire string with a single character. After the instruction, the content of our string is blacked out by 'x's:

(The instruction again modifies EDI, so you have to use EDX to reference the string.)

► Line 11: mov eax, edx

This copies the address of the string to EAX. EAX holds the return value of the function, so the snippet returns a pointer to the modified string.

C-Code

The walk-through demonstrated that the function is overwriting every character in the string passed as the first function parameter with a character passed as the second argument. Here’s a working C-Code, where the function black_out corresponds to the snippet in this exercise:

The function can be simplified by using the strlen and memset functions:

One Comment

  • eric says:

    This is such a beautifully formatted and organized writeup. I scribbled my answers on a napkin while reading this book in my car at lunch. Your rigorous approach has made me realize that I should at least invest in a notebook :)

    But seriously, great work. This is the standard that all reversing write-ups should be held to. I will certainly be referring back to this for inspiration in any of my work.

Leave a Reply

Your email address will not be published.


3 + = eight