Directory traversal is a problem where the web server displays listings of files and directories. Often this can lead to unexpected disclosures of the inner workings of the application. Source code of files or data files that influence the application’s execution might be disclosed. We want to traverse the site, given known valid URLs, and look for directories that are implied by those URLs. Then we will make sure that the URLs don’t work.
Before you conduct the test, you need a list of directories or paths that you want to try. You might get the list of URLs by spidering your website. You might also consider what you know about your application and any particular paths that it protects with access control.
You need to create two files: a shell script, as shown in Example 1, and a plain-text file of URLs, similar to what is shown in Example 2.
Example 1. Testing directory traversal with cURL
#!/bin/bashCURL=/sw/bin/curl# a file with known pages, one URL per lineURLFILE=pages.txt# file descriptor 3 is our URLs3<"${URLFILE}"typeset -i FAILED# for each URL in the URLFILEwhile read -u 3 URLdo FAILED=0 # call curl to fetch the page. Get the headers, too. We're # interested in the first line that gives the status RESPONSE=$(${CURL} -D - -s "${URL}" | head -1) OIFS="$IFS" set - ${RESPONSE} result=$2 IFS="$OIFS" # If we got something in the 200 series, it's probably a failure if [ $result -lt 300 ] then echo "FAIL: $result ${URL}" FAILED=${FAILED}+1 else # response in the 300 series is a redirect. Need to check manually if [ $result -lt 400 ] then echo "CHECK: $result ${URL}" FAILED=${FAILED}+1 else # response in the 400 series is some kind of # denial. That's generally considered "success" if [ $result -lt 500 ] then echo "PASS: $result ${URL}" else # response in the 500 series means server # failure. Anything we haven't already accounted for # will be called a failure. echo "FAIL: $result ${URL}" FAILED=${FAILED}+1 fi fi fidone
Example 2. Example pages.txt
http://www.example.com/imageshttp://www.example.com/images/http://www.example.com/css/http://www.example.com/js/
The script will base its pass/fail decision on whether or not it was denied access to the directory, that is, an HTTP 200 response code (which normally indicates success) is considered failure because it means we actually saw something we shouldn’t. If our request is denied (e.g., HTTP 400-series codes), then it is considered a passing result because we assume we were not shown the directory’s contents. Unfortunately, there are lots of reasons why this simplistic approach might return false results.
Some applications are configured to respond with HTTP 200 on virtually every request, regardless of whether or not it was an error. In this case, the text of the page might say “object not found,” but the HTTP response code gives our script no clue. It will be reported as a failure, when it should technically pass.
Likewise, some applications redirect to an error page when there is an error. An attempt to access a protected resource might receive an HTTP 302 (or similar) response that redirects the browser to the login page. The solution in this recipe will flag that with “CHECK,” but it might turn out that every URL you try ends up being a “CHECK.”
The input to this script is the key to its success, but only a human can make good input. That is, someone has to know which URLs should be retrievable and which should not. For example, the site’s main page (http://www.example.com/
) should definitely respond with HTTP 200, but that is not an error. In many cases, the main page will respond with HTTP 302 or 304, but that’s normal and okay as well. It is not (normally) an instance of directory traversal. Likewise, some sites use pretty URLs like http://www.example.com/news/
, which will return HTTP 200, but again is not an error. A person must sit down with some of the directories in the filesystem and/or use clues in the HTML source and come up with examples like those shown in the example pages.txt
file. The directories have to be chosen so that if the server responds with an HTTP 200, it is a failure.
Lastly, applications that respond consistently with a 200 or 302 response, regardless of input, can still be tested this way. You have to combine the existing solution with some of the techniques of Recipe 1. Remove −i
from the command line so you fetch the page (instead of the headers) to a temporary file, and then grep
for the correct string. The correct string might be <title>Access Denied</title>
or something similar, but make sure it corresponds to your actual application.
Note
This solution flags all server responses 500 and above as errors. That is the official HTTP standard and it is pretty consistent across all web platforms. If your web server hands out an error 500 or above, something seriously wrong has probably occurred, either in the server itself or in your software. If you do modify this solution, we strongly recommend that you keep the check for HTTP 500 intact.