- 07 Dec 2022
- 3 Minutes to read
-
Print
-
DarkLight
MySQL to BigQuery Transfer (CDC)
- Updated on 07 Dec 2022
- 3 Minutes to read
-
Print
-
DarkLight
This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.
summary
Using the CDC (Change Data Capture) method from MySQL binlog, only changed data can be identified and transferred to BigQuery.
This reduces the amount of transfer and speeds up the transfer.
On the first run, you must transfer all records. After that, in the transfer, only the differences are linked using the CDC method.
prerequisite
MySQL-enabled tables
Master | Read Replica | Supported Versions | |
---|---|---|---|
Amazon Aurora MySQL RDS | ○ | x | 5.6.2+ |
Amazon MySQL RDS | ○ | ○ | 5.6.2+ |
Google CloudSQL MySQL | ○ | x | 5.6.2+ |
MariaDB | ○ | ○ | 5.6.2+ |
MySQL | ○ | ○ | 5.6.2+ |
MySQL parameter settings
[mysqld]
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=7
binlog_expire_logs_seconds=604800
log_bin=mysql-binlog
log_slave_updates=1
The supplementary explanation is as follows.
You may need to restart the DB for the binlog settings to take effect.
Column:
binlog_expire_logs_seconds
Specify MySQL 8.0.1 or later.
We have specified 7 days, but it can be shortened.
It should be long enough than the frequency of each transfer.Column:
expire_logs_days
Set it for MySQL 8.0.0 or lower.
We have specified 7 days, but it can be shortened.
It should be long enough than the frequency of each transfer.Column:
log_bin
You are free to change it.Column:
log_slave_updates
Set it when retrieving binlog from Read Replica.RDS MySQL
expire_logs_days
mustmysql.rds_set_configuration
be set using a procedure.
Reference: https://docs.aws.amazon.com/ja_jp/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MySQL.htmlCloud SQL
point-in-time recovery
Please enable
Reference: https://cloud.google.com/sql/docs/mysql/backup-recovery/backups?authuser=2#tips-pitr
Permissions for the connecting user
The following three permissions are required.
SELECT
- Get Data
- Table schema information reference
Example:
GRANT SELECT ON `<database_name>`.`<table_name>` to `<username>``@'%'`;
REPLICATION CLIENT、REPLICATION SLAVE
Example:
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO `<username>`@'%';
Primary Key of the source table
The transfer source table must have a Primary Key setting.
Multiple Primary Keys (Composite Primary Keys) are also supported.
Conversion function during transfer
CDC transfer for MySQL to BigQuery supports the following:
It does not support all of TROCCO's normal transfer conversion functions. This will be expanded sequentially.
- Dropping Columns
- Column data conversion
- Rename a column
Data transformations for columns are CAST in BigQuery. So follow the rules of BigQuery CAST.
See below for details.
https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules?hl=ja
Metadata
_trocco_fetched_at
is added and linked to DWH.
This column has the value of the time that the transfer was initiated by the trocco transfer job.
Due to the fact that this column is added,_trocco_fetched_at
the string cannot be used in the column name.
Transfer source MySQL SQL correspondence
- INSERT
- UPDATE
- DELETE
- Deleted data is soft-deleted on the DWH side.
Limited SQL
ALTER TABLE ... RENAME COLUMN old_col_name TO new_col_name
If schema tracking is enabled, the data will be linked with the new name.
Columns that are already linked are not deleted and the value is NULL.ALTER TABLE ... ADD col_name date_type
Only when schema tracking is enabled, it is automatically transferred.
When adding a column, set it on the management screen before adding a MySQL column.
If the column is already linked after being added, data may be omitted.ALTER TABLE ... DROP col_name
Data linked to DWH is not deleted, and NULL is entered after deletion.ALTER TABLE ... MODIFY col_name column_definition
If the changed type is modifiable, it can continue to be transferred.
If an error occurs, it is necessary to transfer all records.
Unsupported SQL
The SQL mentioned below cannot be caught in binlog, so the data is not transferred correctly to DWH.
- TRUNCATE
- DROP
- LOAD
- RENAME
- CASCADING DELETES
- CASCADING UPDATES
- ALTER TABLE ... SET DEFAULT
If the data is already linked and it is NULL, it will remain NULL and will not be updated.
MySQL and BigQuery schema types
In the data type conversion function, data types that can be changed can be converted to the specified type.
By default, the following is supported. Binary data is converted to hexadecimal and treated as STRING.
data
MySQL | typesBigQuery data types |
---|---|
BINARY BLOB LONGBLOB MEDIUMBLOB TINYBLOB VARBINARY CHAR ENUM LONGTEXT MEDIUMTEXT SET TIME TINYTEXT VARCHAR | STRING |
BIT TINYINT | BOOLEAN |
BIGINT INT MEDIUMINT SMALLINT YEAR | INTEGER |
DECIMAL NUMERIC DOUBLE FLOAT | FLOAT |
DATE | DATE |
TIMESTAMP DATETIME | TIMESTAMP |
GEOMETRY GEOMETRYCOLLECTION JSON LINESTRING MULTILINESTRING MULTIPOINT MULTIPOLYGON POINT POLYGON | STRING(STRUCT is not currently supported.) ) |
CURVE MULTICURVE MULTISURFACE | SURFACE IS NOT SUPPORTED. |
Setup flow
1. Create forwarding settings
Select the source MySQL binlog, the destination BigQuery, and create the transfer settings.
* In "STEP2 Data Preview / Advanced Settings", be sure to press "Save BigQuery column settings" to save the settings.
2. Initial All Transfers
For the first time only, all MySQL tables are transferred to BigQuery.
On the Execute New Job screen, check the box to Transfer all records and execute the job.
3. Schedule incremental transfers
Register the schedule from the schedule setting screen.
The job starts at the frequency you set and the delta is captured.
When all transfers are required
All transfers are required in the following cases: If you encounter any other errors, please contact Torocco Support.
- First time transfer
- Type conversion runtime error occurs on BigQuery.
- I want to erase all data instead of soft delete.
Delete the transferred table before running. - When executing the query described in "Correspondence of the source MySQL SQL"
Technical Overview
CDC Technical Overview
target table: The table that will eventually contain the data
trocco reads the MySQL binlog and forwards the binlog of the configured table to the BQ.
With binlog data, there are duplicates, so create a set of the latest records for each record.
Merge the table created in step 2 with the table that is already linked.
Replace the merged table in step 3 with the target table.