nzo/grabber-bundle

The NzoGrabberBundle is a Symfony Bundle used to Crawl and to Grab all types of links and Tags for (img, js, css) from any website

Installs: 56

Dependents: 0

Suggesters: 0

Security: 0

Stars: 8

Watchers: 2

Forks: 2

Type:symfony-bundle

v3.0.0 2019-04-23 09:27 UTC

This package is auto-updated.

Last update: 2024-12-23 21:28:08 UTC


README

Build Status Latest Stable Version

The NzoGrabberBundle is a Symfony Bundle used to Crawl and to Grab all types of links, URLs and Tags for (img, js, css) from any website.

Features include:

  • Compatible Symfony version 3 & 4
  • Url Grabber/Crawler for HTTP/HTTPS
  • Url Grabber/Crawler for HREF / SRC / IMG types
  • Exclude any type of file by extension
  • Prevent specified URLs from Grabbing
  • Compatible php version 5 & 7

Installation

Through Composer:

Install the bundle:

$ composer require nzo/grabber-bundle

Register the bundle in app/AppKernel.php (Symfony V3):

// app/AppKernel.php

public function registerBundles()
{
    return array(
        // ...
        new Nzo\GrabberBundle\NzoGrabberBundle(),
    );
}

Usage

In the controller use the Grabber service and specify the options needed:

Get all URLs:

     public function indexAction($url)
    {
        $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url);

        //....
    }

OR .. get all URLs not recursively:

Get all URLs no recursive:

     public function indexAction($url)
    {
        $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrlsNoRecursive($url);

        //....
    }

OR .. get all URLs that does not figure in the exclude array:

     public function indexAction($url)
    {
        $notScannedUrlsTab = ['http://www.exemple.com/about']
        $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, $notScannedUrlsTab);

        //....
    }

OR .. you can exclude URLs that contains a specified text and also you can select by file extension:

     public function indexAction($url)
    {
        $exclude = 'someText_to_exclude';
        $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, null, $exclude, array('png', 'pdf'));

        //....
    }

OR .. get all URLs selected by file extension:

     public function indexAction($url)
    {
        $tableOfUrls = $this->get('nzo_grabber.grabber')->grabUrls($url, null, null, array('png', 'pdf'));

        //....
    }

OR .. get all Img Files from the specified URL:

     public function indexAction($url)
    {
        $img = $this->get('nzo_grabber.grabber')->grabImg($url);

        //....
    }

OR .. get all Js Files from the specified URL:

     public function indexAction($url)
    {
        $js = $this->get('nzo_grabber.grabber')->grabJs($url);

        //....
    }

OR .. get all Css Files from the specified URL:

     public function indexAction($url)
    {
        $css = $this->get('nzo_grabber.grabber')->grabCss($url);

        //....
    }

OR .. get all Css, Img and Js Files from the specified URL:

     public function indexAction($url)
    {
        $extrat = $this->get('nzo_grabber.grabber')->grabExtrat($url);

        //....
    }

License

This bundle is under the MIT license. See the complete license in the bundle:

See Resources/doc/LICENSE