Tuesday 11 September 2007

The Beast Remade: Enter WWW::Mechanize

I had recently seen something called WWW::Mechanize mentioned in Google Hacks, 2nd edition. So I determined to have a play around with it, to see if there was something to be gained. An initial look seemed promising. In fairly short order, I had the web interface yelling at me to use a civilized browser, not this scary WWW-Mechanize thingy.

use WWW::Mechanize;

my $url = "http://spamproxy:8888/login.php";

$mech = WWW::Mechanize->new(autocheck => 1);

$mech->get($url);
print $mech->content;

A quick adjustment to $mech->agent took care of that ill temper. Adding invocations to $mech->save_content() allowed me to get a look at what it expected of me - which was simple, at first. It just wanted a password in the single visible form element:

my $password = "foobarword";
$mech->submit_form(fields => { password => $password });

A $mech->save_content() later, I had won a small victory. Next target had to be the web application's menu structure. The story so far:

use WWW::Mechanize;

my $url = "http://spamproxy:8888/login.php";
my $password = "foobarword";

$mech = WWW::Mechanize->new(autocheck => 1);

# Yeah, sure, this is a tested and approved browser
$mech->agent ("Mozilla/5.0 (X11; U; Linux i686;"
."en-US; rv:1.7.13) Gecko/20060418 Firefox/1.0.8");

# Get login page
$mech->get($url);

# Try to log in
$mech->submit_form(fields => { password => $password });

# Save the response page for a look around
$mech->save_content("loggedin.html");

Starting out, I worried that extensive Javascript use would make this hard or impossible. As perldoc WWW::Mechanize::FAQ states, Javascript is not supported and not likely to be until someone looks seriously into it. It turned out that this particular web interface had followed web accessibility guidelines pretty well and was navigable without Javascript support. If your local friendly web interface is less accommodating, well, the FAQ has some tips.

The menu structure turned out to be a mixed bag. Link names were constructed by Javascript, so I could not rely on them. Link URLs, however, were static and thus available for use. Some calls to ->follow_link() later, I had the quarantine main menu dumped to a HTML file for my perusal. This all looked very parsable and scriptable, and a sneak peek at the quarantine listings seemed to confirm my gut feeling.

2 comments:

Harald Korneliussen said...

It may be nice for scripting for sysadmins, but last time someone asked me for a tool like this, it was because he was a 13-year old boy who wanted to automate one of those "click here all day round"-web games :-)

I recommended to him the python DSL Twill. I had never heard of this one, but it's perhaps best to start him off with python rather than perl anyway?

Also, I hear IBM are experimenting with an allegedly super-friendly web form script thing called Coscripter. It sounds kind of promising --- I didn't succeed at downloading it, though, so that's just an impression.

Erik Inge Bolsø said...

HttpUnit is the equivalent for Java, and is said to support JavaScript even. Which could come in handy.