
Struggling all day with Gutenberg. Someone (not naming them as I don't have permission) sent me code to let me use Redland for my RDF parsing and it looks lovely. Too bad Redland doesn't compile for anyone. Didn't compile for me, either.
I put this aside for a bit and tried parsing result pages.
Tried to use the Web::Scraper module to at least pull results from Web pages, but I'm too stupid to figure out its syntax. Learning a new API, CSS selectors and battling strange "don't know what to do with undef" errors proved too much. Embarrassing.
I thought to use HTML::TableParser for some stuff, but that doesn't seem to let me at the attributes I need.
I thought XPath would be good, but it's not well-formed XML. Someone mentioned to me that there might be an XPath module which might have an option which might let you parse malformed XML. I didn't follow up on that.
I finally switch to my HTML::TokeParser::Simple module for this. It's not a good fit for this problem. No, scratch that. It's a bad fit for this problem, but it worked. Then I turned back to search. For this, I used WWW::Mechanize. Notice anything, um, crap about these damned results?
sub search {
my $self = shift;
my $mech = WWW::Mechanize->new(
agent => 'App::Gutenberg (perl)',
autocheck => 1,
);
$mech->get(App::Gutenberg->search_url);
$mech->submit_form(
form_number => 1,
fields => {
'author' => ($self->author || ''),
'title' => ($self->title || ''),
}
);
my $uri = $mech->uri;
if ( $uri =~/#([[:word:]]+)\z/ ) {
# you have got to
}
else {
# be kidding me
}
}
If that URL matches, you're indexing into a list of <li> elements. Otherwise, you're parsing a table. Either way, it's a right pain to get the data you want. Oh, and it's subtly different sets of data and the criteria for why it would be one type of result or another is unclear.
This is why I want to see REST for just about anything today. It's simple. It's straightforward. It doesn't make me cry. Now I know why you don't see Perl command line clients for Gutenberg. Everything I'm writing is so damned fragile it will break if you look at it funny. *sniff*
Update: it looks like any search with an author will return a list, but all other searches (only tested the basic form) return tables.
