MySQL to BigQuery Transfer (CDC)
  • 17 Jul 2024
  • PDF

MySQL to BigQuery Transfer (CDC)

  • PDF

Article summary

summary

Using the Change Data Capture (CDC) method from MySQL's binlog, only data that has changed can be identified and transferred to BigQuery.
This will result in reduced transfer volume and faster transfers.
All cases must be transferred on the first run. Then, in the transfer, only the differences are linked using the CDC method.

essential condition

MySQL Support Table

masterRead ReplicaSupported Versions
Amazon Aurora MySQL RDSsymbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)an unknown5.6.2+.
Amazon MySQL RDSsymbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)symbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)5.6.2+.
Google CloudSQL MySQLsymbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)an unknown5.6.2+.
MariaDBsymbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)symbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)5.6.2+.
MySQLsymbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)symbol used as a placeholder (either because a number of other words could be used in that position or because of censorship)5.6.2+.

MySQL Parameter Settings

[mysqld]
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=7
binlog_expire_logs_seconds=604800
log_bin=mysql-binlog
log_slave_updates=1

The supplemental explanation is as follows
* DB may need to be restarted for binlog settings to take effect.
* Column: binlog_expire_logs_seconds
MySQL 8.0.1 or later must be specified.
Seven days is specified, but can be shortened.
It must be sufficiently longer than the frequency of each transfer.
* Column: expire_logs_days
Set if MySQL 8.0.0 or lower.
Seven days is specified, but can be shortened.
It must be sufficiently longer than the frequency of each transfer.
* Column: log_bin
You are free to change it as you wish.
* Column: log_slave_updates
Set when retrieving binlog from Read Replica.

Authority of connecting users

The following three authorizations are required

SELECT

  • Get Data
  • Table schema information reference

Example: The

GRANT SELECT ON `<database_name>`. `<table_name>` to `<username>``@'%'`;

REPLICATION CLIENT, REPLICATION SLAVE

Example: The

GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *. * TO `<username>`@'%';

Primary Key of the source table

Primary Key must be set for the source table.
Multiple Primary Keys (Composite Primary Key) are also supported.

Conversion function during transfer

MySQL to BigQuery CDC transfer supports the following
* Drop columns
* Column data conversion
* Change column name

Data transformation of columns is CAST in BigQuery. So we follow BigQuery's CAST rules.
For more information, please see below.
https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules?hl=ja

metadata

A column named _trocco_fetched_at is added and linked to the DWH.
This column has as its value the time the transfer was initiated by TROCCO's transfer job.
The string _trocco_fetched_at cannot be used as the column name due to the convenience of adding this column.

SQL support for source MySQL

  • INSERT
  • UPDATE.
  • DELETE
    • Deleted data will be logically deleted on the DWH side.

SQL with limitations

  • ALTER TABLE ... RENAME COLUMN old_col_name TO new_col_name
    If schema tracking is enabled, data will be linked with the changed name.
    Columns that are already linked will not be deleted and will have a value of NULL.
  • ALTER TABLE ... ADD col_name date_type
    Only when schema tracking is enabled, the transfer is automatic.
    When adding columns, please set them in the admin panel before adding MySQL columns.
    If the columns are already linked after they are added, data may be missing.
  • ALTER TABLE ... DROP col_name
    Data linked to the DWH will not be deleted and will be NULL after deletion.
  • ALTER TABLE ... MODIFY col_name column_definition
    If the modified type is type-changeable, it can continue to be transferred.
    If an error occurs, the entire case must be transferred.

Unsupported SQL

The following SQL cannot be caught by binlog, so the data will not be transferred correctly to the DWH.

  • TRUNCATE
  • DROP
  • LOAD
  • RENAME
  • CASCADING DELETES
  • CASCADING UPDATES
  • ALTER TABLE ... SET DEFAULT
    If the data is already linked and is NULL, it will remain NULL and will not be updated.

MySQL and BigQuery schema types

The data type conversion function allows you to convert data types that can be changed to the type you specify.
The default response is as follows Binary data is converted to hexadecimal and treated as a STRING.

MySQL Data TypesBigQuery Data Types
BINARY
BLOB
LONGBLOB
MEDIUMBLOB
TINYBLOB.
VARBINARY
CHAR
ENUM
LONGTEXT
MEDIUMTEXT
SET
TIME
TINYTEXT
VARCHAR
STRING
BIT
TINYINT
BOOLEAN
BIGINT
INT
MEDIUMINT
SMALLINT
YEAR
INTEGER
DECIMAL
NUMERIC
DOUBLE
FLOAT
FLOAT
DATEDATE
TIMESTAMP
DATETIME
TIMESTAMP
GEOMETRY
GEOMETRYCOLLECTION
JSON
LINESTRING
multilinestring
MULTIPOINT
Multipolygon
POINT
POLYGON
STRING (STRUCT is currently not supported.)
CURVE
multicurve
MULTISURFACE
SURFACE Not supported.

Configuration Flow

1. create transfer settings

Select the source MySQL binlog and destination BigQuery and create the transfer settings.
image.png

*Be sure to save the settings by pressing "Save BigQuery Column Settings" in "STEP2 Data Preview and Detailed Settings".

2. first time all transfers

For the first time only, all transfers are performed and all current MySQL tables are transferred to BigQuery.
On the "Execute New Job" screen, check the " Transfer All " checkbox and execute the job.
image.png

3. setting up a schedule for differential transfer

Register a schedule from the Schedule Setup screen.
The job is started at the set frequency and the differences are captured.

If you need to forward all cases

All transfers are required in the following cases

  • Initial Transfer
  • Runtime error on type conversion on BigQuery.
  • I want to erase all data, not logical deletion.
    Delete the transferred table before executing.
  • If you execute the query described in "SQL support for source MySQL

technical overview

Overview of the technical aspects of CDC
target table: the table where the final data will be placed.
TROCCO reads the MySQL binlog and transfers the binlog of the configured table onto the BQ.
image.png

  1. With binlog data, there are duplicates, so we create a set of the most recent records for each record.
    image.png

  2. Merge the tables created in step 2 with the tables already linked.
    image.png

  3. Replace the table merged in 3 with the target table.
    image.png


Was this article helpful?