programming4us
programming4us
SECURITY

Web Security : Automating with LibWWWPerl - Editing a Page Programmatically

- How To Install Windows Server 2012 On VirtualBox
- How To Bypass Torrent Connection Blocking By Your ISP
- How To Install Actual Facebook App On Kindle Fire
11/17/2013 8:33:58 PM
1. Problem

You want to fetch a page from your application, read it, and then modify part of it to send back in your response. For our example, we will modify a page on Wikipedia.

2. Solution

See Example 1.

Example 1. Editing a Wikipedia page with Perl
#!/usr/bin/perl
use LWP::UserAgent;
use HTTP::Request::Common qw(GET POST);
use HTML::Parser; use URI;
use HTML::Entities;

use constant MAINPAGE =>
'http://en.wikipedia.org/wiki/Wikipedia:Tutorial_%28Keep_in_mind%29/sandbox';
use constant EDITPAGE => 'http://en.wikipedia.org/w/index.php'
. '?title=Wikipedia:Tutorial_%28Keep_in_mind%29/sandbox';

# These are form inputs we care about on the edit page
my @wpTags = qw(wpEditToken wpAutoSummary wpStarttime wpEdittime wpSave );

sub findPageData {
my ( $self, $tag, $attr ) = @_;
# signal to the endHandler handler if we find the text
if ( $attr->{name} eq "wpTextbox1" ) {
$main::wpTextboxFound = 1;
return;
}
elsif ( grep( /$attr->{name}/, @wpTags ) > 0 ) {
# if it's one of the form parameters we care about,
# record the parameter's value for use in our submission later.
$main::parms{ $attr->{name} } = $attr->{value};
return;
}
}

# This is called on closing tags like </textarea>
sub endHandler {
next unless $main::wpTextboxFound;
my ( $self, $tag, $attr, $skipped ) = @_;
if ( $tag eq "textarea" ) {
$main::parms{"wpTextbox1"} = $skipped;
undef $main::wpTextboxFound;
}
}

sub checkError {
my $resp = shift;
if ( ( $resp->code() < 200 ) || ( $resp->code() >= 400 ) ) {
print "Error: " . $resp->status_line . "\n";
exit 1;
}
}

###
### MAIN
###

# First, fetch the main wikipedia sandbox page. This just confirms
# our connectivity and makes sure it really works.
$UA = LWP::UserAgent->new();
$req = HTTP::Request->new( GET => MAINPAGE );
$resp = $UA->request($req);

checkError($resp);

# Now fetch the edit version of that page
$req->uri( EDITPAGE . '&action=edit' );
$resp = $UA->request($req);

checkError($resp);

# Build a parser to parse the edit page and find the text on it.
my $p = HTML::Parser->new(
api_version => 3,
start_h => [ \&findPageData, "self,tagname,attr" ],
end_h => [ \&endHandler, "self,tagname,attr,skipped_text" ],
unbroken_text => 1,
attr_encoded => 0,
report_tags => [qw(textarea input)]
);
$p->parse( $resp->content );
$p->eof;

# The text will have entities encoded (e.g., &lt; instead of <)
# We have to decode them and submit raw characters.
$main::parms{wpTextbox1} = decode_entities($main::parms{wpTextbox1});

# make our trivial edit. append text to whatever was already there.
$main::parms{wpTextbox1} .= "\r\n\r\n===Test 1===\r\n\r\n"
. "ISBN: 9780596514839\r\n\r\nThis is a test.\r\n\r\n";

# POST our edit
$req = HTTP::Request::Common::POST(
EDITPAGE,
Content_Type => 'form-data',
Content => \%main::parms
);
$req->uri( EDITPAGE . '&action=submit' );

$resp = $UA->request($req);
checkError($resp);
# We expect a 302 redirection if it is successful.


3. Discussion

This kind of test is most applicable in web applications that change a lot between requests. Perhaps it is a blog, forum, or document management system where multiple users may be simultaneously be introducing changes to the application’s state. If you have to find parameters before you can modify them and send them back, this is the recipe for you.

The script in Example 1 is pretty complex. The main reason for that complexity is the way <textarea> elements are handled in HTML::Parser. Many form elements are self-contained (i.e., the value is inside the element itself) like <input type="hidden" name="date" value="20080101">. In an element like that, you just find the one named “date” and look at its value. In a text area, we have a start tag, an end tag, and the text we care about in between. Our parser, therefore, has a “start” handler and an “end” handler. If the start handler sees the start of the textarea, we check to see if it’s the one we want (the one named wpTextbox1). If we found the textarea<) encoded (like &lt;). We have to decode those because Wikipedia expects raw input (i.e., it wants the real, raw < we want, it sets a signal variable to tell the end handler that we just passed the text we want. The text handler scoops up the “skipped” text from the parser and we’re done. The skipped text has HTML entities (like character). Once we know what we originally received, we will simply append our demonstration text to it.

There’s another bit of special handling we’re doing that relates to the URLs we are GETting and POSTing. We append the action to the URL using concatenation instead of just embedding it in the EDITPAGE constant. That is, we set the URL using $req->uri(EDITPAGE . '&action=edit'). If the ampersand is in the original URL that is passed to HTTP::Request::Common::POST, then the ampersand will be encoded as %26, which won’t be parsed by Wikipedia correctly.
Other  
  •  Web Security : Automating with LibWWWPerl - Uploading Viruses to Applications, Parsing for a Received Value with Perl
  •  Web Security : Automating with LibWWWPerl - Uploading Malicious File Contents, Uploading Files with Malicious Names
  •  Windows Server 2008 and Windows Vista : Controlling GPOs with Scripts and Automation - VBScript Scripting , Windows PowerShell
  •  Windows Server 2008 and Windows Vista : GPO Security (part 2)
  •  Windows Server 2008 and Windows Vista : GPO Security (part 1)
  •  Windows Server 2008 and Windows Vista : Advanced Group Policy Management Delegation - Approving, Reviewing
  •  Windows Server 2008 and Windows Vista : Advanced Group Policy Management Delegation - Full Control, Editing
  •  Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Modeling GPOs, RSoP of GPOs
  •  Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Managing GPOs, Editing GPOs
  •  Windows Server 2008 and Windows Vista : Group Policy Management Console Delegation - Linking GPOs
  •  
    Top 10
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
    - Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
    - Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
    - Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
    - Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
    REVIEW
    - First look: Apple Watch

    - 3 Tips for Maintaining Your Cell Phone Battery (part 1)

    - 3 Tips for Maintaining Your Cell Phone Battery (part 2)
    programming4us programming4us
    programming4us
     
     
    programming4us