MySQL to BigQuery Transfer (CDC)
  • 07 Dec 2022
  • Dark
    Light
  • PDF

MySQL to BigQuery Transfer (CDC)

  • Dark
    Light
  • PDF

Article Summary

Note

This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.

summary

Using the CDC (Change Data Capture) method from MySQL binlog, only changed data can be identified and transferred to BigQuery.
This reduces the amount of transfer and speeds up the transfer.
On the first run, you must transfer all records. After that, in the transfer, only the differences are linked using the CDC method.

prerequisite

MySQL-enabled tables


MasterRead ReplicaSupported Versions
Amazon Aurora MySQL RDSx5.6.2+
Amazon MySQL RDS5.6.2+
Google CloudSQL MySQLx5.6.2+
MariaDB5.6.2+
MySQL5.6.2+

MySQL parameter settings

[mysqld]
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=7
binlog_expire_logs_seconds=604800
log_bin=mysql-binlog
log_slave_updates=1

The supplementary explanation is as follows.

Permissions for the connecting user

The following three permissions are required.

SELECT

  • Get Data
  • Table schema information reference

Example:

GRANT SELECT ON `<database_name>`.`<table_name>` to `<username>``@'%'`;

REPLICATION CLIENT、REPLICATION SLAVE

Example:

GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO `<username>`@'%';

Primary Key of the source table

The transfer source table must have a Primary Key setting.
Multiple Primary Keys (Composite Primary Keys) are also supported.

Conversion function during transfer

CDC transfer for MySQL to BigQuery supports the following:
It does not support all of TROCCO's normal transfer conversion functions. This will be expanded sequentially.

  • Dropping Columns
  • Column data conversion
  • Rename a column

Data transformations for columns are CAST in BigQuery. So follow the rules of BigQuery CAST.
See below for details.
https://cloud.google.com/bigquery/docs/reference/standard-sql/conversion_rules?hl=ja

Metadata

_trocco_fetched_atis added and linked to DWH.
This column has the value of the time that the transfer was initiated by the trocco transfer job.
Due to the fact that this column is added,_trocco_fetched_at the string cannot be used in the column name.

Transfer source MySQL SQL correspondence

  • INSERT
  • UPDATE
  • DELETE
    • Deleted data is soft-deleted on the DWH side.

Limited SQL

  • ALTER TABLE ... RENAME COLUMN old_col_name TO new_col_name
    If schema tracking is enabled, the data will be linked with the new name.
    Columns that are already linked are not deleted and the value is NULL.
  • ALTER TABLE ... ADD col_name date_type
    Only when schema tracking is enabled, it is automatically transferred.
    When adding a column, set it on the management screen before adding a MySQL column.
    If the column is already linked after being added, data may be omitted.
  • ALTER TABLE ... DROP col_name
    Data linked to DWH is not deleted, and NULL is entered after deletion.
  • ALTER TABLE ... MODIFY col_name column_definition
    If the changed type is modifiable, it can continue to be transferred.
    If an error occurs, it is necessary to transfer all records.

Unsupported SQL

The SQL mentioned below cannot be caught in binlog, so the data is not transferred correctly to DWH.

  • TRUNCATE
  • DROP
  • LOAD
  • RENAME
  • CASCADING DELETES
  • CASCADING UPDATES
  • ALTER TABLE ... SET DEFAULT
    If the data is already linked and it is NULL, it will remain NULL and will not be updated.

MySQL and BigQuery schema types

In the data type conversion function, data types that can be changed can be converted to the specified type.
By default, the following is supported. Binary data is converted to hexadecimal and treated as STRING.

data

MySQLtypesBigQuery data types
BINARY
BLOB
LONGBLOB
MEDIUMBLOB
TINYBLOB
VARBINARY
CHAR
ENUM
LONGTEXT
MEDIUMTEXT
SET
TIME
TINYTEXT
VARCHAR
STRING
BIT
TINYINT
BOOLEAN
BIGINT
INT
MEDIUMINT
SMALLINT
YEAR
INTEGER
DECIMAL
NUMERIC
DOUBLE
FLOAT
FLOAT
DATEDATE
TIMESTAMP
DATETIME
TIMESTAMP
GEOMETRY
GEOMETRYCOLLECTION
JSON
LINESTRING
MULTILINESTRING
MULTIPOINT
MULTIPOLYGON
POINT
POLYGON
STRING(STRUCT is not currently supported.) )
CURVE
MULTICURVE
MULTISURFACE
SURFACE IS NOT SUPPORTED.

Setup flow

1. Create forwarding settings

Select the source MySQL binlog, the destination BigQuery, and create the transfer settings.
image.png

* In "STEP2 Data Preview / Advanced Settings", be sure to press "Save BigQuery column settings" to save the settings.

2. Initial All Transfers

For the first time only, all MySQL tables are transferred to BigQuery.
On the Execute New Job screen, check the box to Transfer all records and execute the job.
image.png

3. Schedule incremental transfers

Register the schedule from the schedule setting screen.
The job starts at the frequency you set and the delta is captured.

When all transfers are required

All transfers are required in the following cases: If you encounter any other errors, please contact Torocco Support.

  • First time transfer
  • Type conversion runtime error occurs on BigQuery.
  • I want to erase all data instead of soft delete.
    Delete the transferred table before running.
  • When executing the query described in "Correspondence of the source MySQL SQL"

Technical Overview

CDC Technical Overview
target table: The table that will eventually contain the data

  1. trocco reads the MySQL binlog and forwards the binlog of the configured table to the BQ.
    image.png

  2. With binlog data, there are duplicates, so create a set of the latest records for each record.
    image.png

  3. Merge the table created in step 2 with the table that is already linked.
    image.png

  4. Replace the merged table in step 3 with the target table.
    image.png


Was this article helpful?