Web Security : Automating with LibWWWPerl - Editing a Page Programmatically

11/17/2013 8:33:58 PM

1. Problem

You want to fetch a page from your application, read it, and then modify part of it to send back in your response. For our example, we will modify a page on Wikipedia.

2. Solution

See Example 1.

Example 1. Editing a Wikipedia page with Perl

#!/usr/bin/perl
use LWP::UserAgent;
use HTTP::Request::Common qw(GET POST);
use HTML::Parser; use URI;
use HTML::Entities;

use constant MAINPAGE =>
  'http://en.wikipedia.org/wiki/Wikipedia:Tutorial_%28Keep_in_mind%29/sandbox';
use constant EDITPAGE => 'http://en.wikipedia.org/w/index.php'
  . '?title=Wikipedia:Tutorial_%28Keep_in_mind%29/sandbox';

# These are form inputs we care about on the edit page
my @wpTags = qw(wpEditToken wpAutoSummary wpStarttime wpEdittime wpSave );

sub findPageData {
    my ( $self, $tag, $attr ) = @_;
    # signal to the endHandler handler if we find the text
    if ( $attr->{name} eq "wpTextbox1" ) {
        $main::wpTextboxFound = 1;
        return;
    }
    elsif ( grep( /$attr->{name}/, @wpTags ) > 0 ) {
    # if it's one of the form parameters we care about,
    # record the parameter's value for use in our submission later.
        $main::parms{ $attr->{name} } = $attr->{value};
        return;
    }
}

# This is called on closing tags like </textarea>
sub endHandler {
    next unless $main::wpTextboxFound;
    my ( $self, $tag, $attr, $skipped ) = @_;
    if ( $tag eq "textarea" ) {
        $main::parms{"wpTextbox1"} = $skipped;
        undef $main::wpTextboxFound;
    }
}

sub checkError {
    my $resp = shift;
    if ( ( $resp->code() < 200 ) || ( $resp->code() >= 400 ) ) {
        print "Error: " . $resp->status_line . "\n";
        exit 1;
    }
}

###
### MAIN
###

# First, fetch the main wikipedia sandbox page. This just confirms
# our connectivity and makes sure it really works.
$UA   = LWP::UserAgent->new();
$req  = HTTP::Request->new( GET => MAINPAGE );
$resp = $UA->request($req);

checkError($resp);

# Now fetch the edit version of that page
$req->uri( EDITPAGE . '&action=edit' );
$resp = $UA->request($req);

checkError($resp);

# Build a parser to parse the edit page and find the text on it.
my $p = HTML::Parser->new(
    api_version   => 3,
    start_h       => [ \&findPageData, "self,tagname,attr" ],
    end_h         => [ \&endHandler, "self,tagname,attr,skipped_text" ],
    unbroken_text => 1,
    attr_encoded  => 0,
    report_tags   => [qw(textarea input)]
);
$p->parse( $resp->content );
$p->eof;

# The text will have entities encoded (e.g., &lt; instead of <)
# We have to decode them and submit raw characters.
$main::parms{wpTextbox1} = decode_entities($main::parms{wpTextbox1});

# make our trivial edit. append text to whatever was already there.
$main::parms{wpTextbox1} .= "\r\n\r\n===Test 1===\r\n\r\n"
  . "ISBN: 9780596514839\r\n\r\nThis is a test.\r\n\r\n";

# POST our edit
$req = HTTP::Request::Common::POST(
    EDITPAGE,
    Content_Type => 'form-data',
    Content      => \%main::parms
);
$req->uri( EDITPAGE . '&action=submit' );

$resp = $UA->request($req);
checkError($resp);
# We expect a 302 redirection if it is successful.

3. Discussion

This kind of test is most applicable in web applications that change a lot between requests. Perhaps it is a blog, forum, or document management system where multiple users may be simultaneously be introducing changes to the application’s state. If you have to find parameters before you can modify them and send them back, this is the recipe for you.

The script in Example 1 is pretty complex. The main reason for that complexity is the way <textarea> elements are handled in HTML::Parser. Many form elements are self-contained (i.e., the value is inside the element itself) like <input type="hidden" name="date" value="20080101">. In an element like that, you just find the one named “date” and look at its value. In a text area, we have a start tag, an end tag, and the text we care about in between. Our parser, therefore, has a “start” handler and an “end” handler. If the start handler sees the start of the textarea, we check to see if it’s the one we want (the one named wpTextbox1). If we found the textarea<) encoded (like <). We have to decode those because Wikipedia expects raw input (i.e., it wants the real, raw < we want, it sets a signal variable to tell the end handler that we just passed the text we want. The text handler scoops up the “skipped” text from the parser and we’re done. The skipped text has HTML entities (like character). Once we know what we originally received, we will simply append our demonstration text to it.

There’s another bit of special handling we’re doing that relates to the URLs we are GETting and POSTing. We append the action to the URL using concatenation instead of just embedding it in the EDITPAGE constant. That is, we set the URL using $req->uri(EDITPAGE . '&action=edit'). If the ampersand is in the original URL that is passed to HTTP::Request::Common::POST, then the ampersand will be encoded as %26, which won’t be parsed by Wikipedia correctly.

Other

Web Security : Automating with LibWWWPerl - Uploading Viruses to Applications, Parsing for a Received Value with Perl

Web Security : Automating with LibWWWPerl - Uploading Malicious File Contents, Uploading Files with Malicious Names

Windows Server 2008 and Windows Vista : Controlling GPOs with Scripts and Automation - VBScript Scripting , Windows PowerShell

Windows Server 2008 and Windows Vista : GPO Security (part 2)

Windows Server 2008 and Windows Vista : GPO Security (part 1)

Windows Server 2008 and Windows Vista : Advanced Group Policy Management Delegation - Approving, Reviewing

Windows Server 2008 and Windows Vista : Advanced Group Policy Management Delegation - Full Control, Editing

Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Modeling GPOs, RSoP of GPOs

Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Managing GPOs, Editing GPOs

Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Linking GPOs