Monday, 17 September 2007

The Beast Remade: The lay of the land

HTML::TableExtract emerged as my poison of choice for doing the grunt work of HTML parsing. It was the easiest to improvise extraction criteria with, of the three or four Perl modules I tried in the course of a frustrating morning.

The easiest way to pick out the table of interest from the quarantine main menu, for example, turned out to be:

my $te = new HTML::TableExtract(
attribs => { width => 525 }, keep_html => 1 );
$te->parse($mech->content);

...since the attribute "width=525" was its only distinguishing feature. Once parsed, however, it was the work of moments to make a "cd" command to enter the quarantine section of interest. An "ls" command followed shortly behind, with momentary difficulties only.

No comments: