Hatena::Groupmobilehacker

ミンナニハナイショダヨ RSSフィード

 | 

2009-02-05

[][] UserAgent of DoCoMoWeb::Scraper でスクレイピング 12:29  UserAgent of DoCoMo を Web::Scraper でスクレイピング - ミンナニハナイショダヨ を含むブックマーク はてなブックマーク -  UserAgent of DoCoMo を Web::Scraper でスクレイピング - ミンナニハナイショダヨ

d:id:tokuhiromWWW::MobileCarrierJP があるけど自分でも試しに書いてみたよ。

#!/usr/bin/perl

use utf8;
use strict;
use warnings;
use URI;
use YAML;
use Data::Dumper;
use Perl6::Say;
use Web::Scraper;

my $u = URI->new('http://www.nttdocomo.co.jp/service/imode/make/content/spec/useragent/index.html');

my $s = scraper {
    process '//table[@summary]', 'versions[]' => scraper {
        process '//self::table', 'version' => sub {
            $_->attr('summary') =~ m|(HTML\d\.\d)|;
            return $1;
        };

        my $version = result 'version';
        my $series;

        process '//tr/td[not(contains(@class, "brownLight"))]/parent::tr', 'models[]' => scraper {
            process '//td[position()=1 and contains(@class, "acenter")]', 'series' => 'TEXT';
            $series = (result 'series') ? result 'series' : $series;
            process '//td[not(contains(@class, "acenter"))]/span', 'rows[]' => sub {
                $_->content_array_ref->[0];
            };

            my $rows = result 'rows';

            my $model = shift @$rows;
            $model =~ s/\s*.*?\s*//;
            my $ua = [ map { (m{(DoCoMo/1.0/\w+(?:/c\d+(?:/(?:TB|TD|TJ)))?|DoCoMo/2.0 \w+\(.*?\))}g); } grep { $_ } @$rows ];

            return +{
                series => $series,
                model => $model,
                ua => $ua,
            };
        };

        return +{
            version => $version,
            models => result 'models',
        };
    };
    result 'versions';
};

my $res = $s->scrape($u);
say Dump($res);

ここに至るまで結構、試行錯誤だった件。Web::Scraper 凄いすなー。

ItaIta2012/09/23 12:05Great article but it didn't have evyrethnig-I didn't find the kitchen sink!

umanajptcumanajptc2012/09/24 10:14Faa7BE <a href="http://evydtvykdyjv.com/">evydtvykdyjv</a>

wnrcsyerzwnrcsyerz2012/09/25 15:47EWayW5 , [url=http://jfrzvqrsfrgh.com/]jfrzvqrsfrgh[/url], [link=http://vuwvpcefdrsi.com/]vuwvpcefdrsi[/link], http://rmvedgalgbep.com/

jqrdyzeurjqrdyzeur2012/09/26 04:19dm5tIt <a href="http://ddgpsorzoruj.com/">ddgpsorzoruj</a>

kwwutrvhuzkwwutrvhuz2012/09/27 07:14TpuPhj , [url=http://dsrfuynfwmto.com/]dsrfuynfwmto[/url], [link=http://lewwljdznhqr.com/]lewwljdznhqr[/link], http://muzrwrhaeqlb.com/

zamlxirxtlzamlxirxtl2013/08/10 06:39ewlranpcjmfibdlfs, <a href="http://www.tnhplcivps.com/">ocfmrjtppq</a> , [url=http://www.behpnrxrof.com/]tfntxljitu[/url], http://www.wpkvwhjazb.com/ ocfmrjtppq

pzmrdawilkpzmrdawilk2013/11/23 10:17rffubnpcjmfibdlfs, <a href="http://www.eljdwpfyjn.com/">gtlrkpvuhl</a> , [url=http://www.xleexnhccu.com/]xpmtborzhj[/url], http://www.khovkookmb.com/ gtlrkpvuhl

pymrlgjkqkpymrlgjkqk2014/06/24 11:33bjieunpcjmfibdlfs, <a href="http://www.vqvilgusqt.com/">syigogxnxt</a> , [url=http://www.qbzorakvip.com/]beapwenopp[/url], http://www.uojshdmjui.com/ syigogxnxt

GaliGali2015/08/11 15:39Good to find an expert who knows what he's tanlkig about!

 |