memeddev / mysql-to-google-bigquery
MySQL to Google BigQuery Sync Tool
Installs: 775
Dependents: 0
Suggesters: 0
Security: 0
Stars: 62
Watchers: 8
Forks: 27
Open Issues: 6
Requires
- doctrine/dbal: ^2.5
- facile-it/doctrine-mysql-come-back: ^1.6
- google/cloud: ^0.11.1
- php-di/php-di: ^5.4
- symfony/console: ^3.1
- vlucas/phpdotenv: ^2.4
This package is auto-updated.
Last update: 2024-11-19 22:12:49 UTC
README
MySQL to Google BigQuery Sync Tool
Table of Contents
How it works
Steps when no order column has been supplied:
- Count MySQL table rows
- Count BigQuery table rows
- MySQL rows > BigQuery rows?
- Get the rows diff, split in batches of XXXXX rows/batch
Steps when order column has been supplied:
- Get max value for order column from MySQL table
- Get max value for order column from BigQuery table
- Max value MySQL > Max value BigQuery?
- Delete all rows with order column value = max value BigQuery to make sure no duplicate records are being created in BigQuery
- Get max value for order column from BigQuery table
- Get the rows diff based on new max value BigQuery, split in batches of XXXXX rows/batch
Final three steps:
- Dump MySQL rows to a JSON
- Send JSON to BigQuery
- Repeat until all batches are sent
Tip: Create a cron job for keep syncing the tables using an interval like 15 minutes (respect the Load Jobs quota policy)
Requirements
The following PHP versions are supported:
- PHP 7
- HHVM
- PDO Extension with MySQL driver
Usage
Download the library using composer:
$ composer require memeddev/mysql-to-google-bigquery
Now, define some environment variables or create a .env
file on the root of the project, replacing the values:
BQ_PROJECT_ID=bigquery-project-id
BQ_KEY_FILE=google-service-account-json-key-file.json
BQ_DATASET=bigquery-dataset-name
DB_DATABASE_NAME=mysql-database-name
DB_USERNAME=mysql_username
DB_PASSWORD=mysql_password
DB_HOST=mysql-host
DB_PORT=3306
IGNORE_COLUMNS=password,hidden_column,another_column
PS: To create the Google Service Account JSON Key File
, access https://console.cloud.google.com/apis/credentials/serviceaccountkey
Run:
vendor/bin/console sync table-name
If you want to auto create the table on BigQuery:
vendor/bin/console sync table-name --create-table
If you want to delete (and create) the table on BigQuery for a full dump:
vendor/bin/console sync table-name --delete-table
Credits
❤️ Memed SA (memed.com.br)
License
MIT license, see LICENSE