Snippet codes of Goutte.
Get tag that has 2 classes
html : <div class=”class1 class2″>
php : $crawler->filter(‘div.class1.class1’);
Get tag that has id
html : <div id=”hello”>
php : $crawler->filter(‘div#hello’);
Get src of Img tag
html : <img src=”http://reafo.net/image.png”>
php : $crawler->filter(‘img’)->attr(‘src’);
Get text include html
html : <div class=”catchMeIfYouCan”><span id=”hello”>Hello</span>world</div>
php : $crawler->filter(‘catchMeIfYouCan’)->html();
Get table rows tips
html : <table><tr><th>title title</th><td>string string</td></tr><tr><th>title 2</th><td>hello hello</td></tr></table>
php :
$crawler->filter(‘table tr’)->each(function($element){
$th = $element->filter(‘th’)->text();
$td = $element->filter(‘td’)->text();
if($th =””){ /* … */}
});
Case of witch is the site no use th tags
html : <table><tr><td>title1</td><td>String</td></tr><tr><td>title2</td><td>hello hello</td></tr></table>
php :
$crawler->filter(‘table tr’)->each(function($element){
$th = $element->filter(‘td’)->eq(0)->text();
$td = $element->filter(‘td’)->eq(1)->text();
if($th =””){ /* … */}
});
Reference
・WebスクレイピングライブラリGoutteで遊んでみる
・Goutteを使用してHTMLを解析する方法