florence / scrapa
A simple php web scraper package
Requires
- satooshi/php-coveralls: ^1.0
Requires (Dev)
- mockery/mockery: dev-master
- phpunit/phpunit: ^4.8
This package is not auto-updated.
Last update: 2020-08-19 05:41:21 UTC
README
#WebScrapa
###DISCLAIMER This package was put together for learning purposes only.
A simple PHP web scraper package -
Most websites do not offer the functionality to save a copy of the data which they display to your computer. The only option then is to manually copy and paste the data displayed by the website in your browser to a local file in your computer - a very tedious job which can take many hours or sometimes days to complete. Web Scraping is the technique of automating this process.
WebScrapa is a simple web scraper package written with PHP. It uses cURL to request and download a webpage. The downloaded webpage is converted to XML DOM object and XPath is used to navigate through elements in the XML DOM object.
##Installation
composer require "florence/scrapa: v1.0"
##Usage
-
Create an instance of the Scrap class:
$url = 'https://www.youtubecom/JustinBieber/about'; $query = '//ul[@class="about-custom-links"]//a[@class="about-channel-link "]/@href'; $scrap = new Scrap($url, $query);
Learn about XPath and how to scrape the elements based on their tags and attributes, such as CSS classes and IDs. https://goo.gl/Gjd3R3
-
Use the toArrayScrapDOM method to get the results of your query in array format
print_r($scrap->toArrayScrapDOM());
-
Use the toStringScrapDOM method to get the results of your query in string format
print_r($scrap->toStringScrapDOM())
Run the example file
clone the repo
git clone https://github.com/andela-fokosun/webscrapa
run
composer install
from your terminal, run:
php example.php
run tests
vendor/bin/phpunit