spekulatius/spatie-crawler-toolkit-for-laravel

Handy classes for Spatie's crawler when using it with Laravel.

0.5.0 2022-09-28 09:01 UTC

This package is auto-updated.

Last update: 2024-12-28 14:21:50 UTC


README

Laravel 9 should work, but is not extensively tested. Please report any issues you might find!

Software License Total Downloads Awesome PHP crawler

A set of classes to use Spatie's crawler with Laravel. Aim is to simplify building crawler applications or adding a crawler to an existing Laravel project. It can be conveniently integrated into PHP Scraper, for example. At the moment the following helper classes are implemented:

Cache Crawl Queue

The CacheCrawlQueue allows use the pre-configured Cache in Laravel to store the queue. It stores any actions performed on the queue directly to avoid the need to manually store the queue. You can add it directly to your crawler:

Crawler::create()
    ->setCrawlQueue(new \Spekulatius\SpatieCrawlerToolkit\Queues\CacheCrawlQueue($url))
    ->startCrawling($url);

With this you can stop the crawl and restart at any time. This requires a cache-driver being configured in your .env file.

Crawl Logger

The Crawl Logger is an observer you can add to your crawler to enable logging of crawl events:

Crawler::create()
    ->setCrawlObserver(new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlLogger)
    ->startCrawling($url);

You can export the configuration (see below) to tweak which events are logged.

Crawl Events

The toolkit contains an observer to send you Laravel events allowing you to react to crawl events. This covers the following events:

By default, no events are emitted. To enable events, you will need to add the event observer to your crawler:

$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents;

Crawler::create()
    ->setCrawlObserver($eventObserver)
    ->startCrawling($url);

An optional identifier can be passed to the crawl events to distinguish between different crawls:

$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents('my-crawl');

Planned functionality

  • Batched crawling using Laravel Queues.

For any suggestions on how to enhance this, please raise an issue.

Requirements & Install

Requirements

  • Laravel 6, 7, 8, 9. Laravel 9 is still in testing. Please report any issues.
  • Cache and Log configured in Laravel.

Installation

composer require spekulatius/spatie-crawler-toolkit-for-laravel

Optionally, you can publish the configuration file:

php artisan vendor:publish --tag=crawler-toolkit-config

Contributing

Please raise a PR or issue.

License

Released under the MIT license. Please see License File for more information.