Scraping with Goutte : Snippet codes.

Snippet codes of Goutte.

Get tag that has 2 classes

html : <div class=”class1 class2″>
php : $crawler->filter(‘div.class1.class1’);

Get tag that has id

html : <div id=”hello”>
php : $crawler->filter(‘div#hello’);

Get src of Img tag

html : <img src=”http://reafo.net/image.png”>
php : $crawler->filter(‘img’)->attr(‘src’);

Get text include html

html : <div class=”catchMeIfYouCan”><span id=”hello”>Hello</span>world</div>
php : $crawler->filter(‘catchMeIfYouCan’)->html();

Get table rows tips

html : <table><tr><th>title title</th><td>string string</td></tr><tr><th>title 2</th><td>hello hello</td></tr></table>
php :
$crawler->filter(‘table tr’)->each(function($element){
$th = $element->filter(‘th’)->text();
$td = $element->filter(‘td’)->text();

if($th =””){ /* … */}
});

Case of witch is the site no use th tags
html : <table><tr><td>title1</td><td>String</td></tr><tr><td>title2</td><td>hello hello</td></tr></table>
php :
$crawler->filter(‘table tr’)->each(function($element){
$th = $element->filter(‘td’)->eq(0)->text();
$td = $element->filter(‘td’)->eq(1)->text();

if($th =””){ /* … */}
});

Reference

・WebスクレイピングライブラリGoutteで遊んでみる
・Goutteを使用してHTMLを解析する方法