schema tracking
  • 17 Jul 2024
  • PDF

schema tracking

  • PDF

Article summary

summary

Schema tracking is a function that automatically corrects the schema of the destination connector table to resolve schema differences between the "data to be transferred" and the "destination connector table" when such differences occur.

Cases where schema differences occur

The following is one typical example of a schema diff.

  1. Create a forwarding configuration
  2. Transfer job is executed and the table is transferred to Google BigQuery side.
  3. The schema of the data on the source side is changed (e.g., columns are added or deleted, column names or types are changed, etc.)
  4. Edit transfer settings and set automatic data settings when transitioning to STEP 2.
    -> The schema of the data on the source side is reflected in the column definitions in STEP 2 of the transfer setup.

If a schema difference has occurred due to the above example or other reasons, the schema of the table on the destination connector side must be corrected before executing the transfer job again. (*Fixing is not required in some transfer modes.)
However, by using schema tracking, schema differences can be resolved automatically.

Difference detected by schema tracking

The following differences are detected
- Adding columns
- Deleting columns
- Change column type

Target connector

Schema tracking is available at the following destination connectors

  • Destination Google BigQuery
  • Forwarding Destination Snowflake

The source connector can be used without any special restrictions.

Usage Procedure

In the following, the forwarding destination Google BigQuery is used as an example.
The assumption is that the procedure starts with "Edit the existing transfer settings and the column definitions in Transfer Settings STEP2 have been changed.

  1. Click Save and Apply in Transfer Settings STEP 3.
    The following screen will appear
    schema-sync_001.png

  2. Specify the behavior of each column that is subject to change, with BigQuery also selected to reflect and update.

Behavior of each column that can be specified

For each added column, you can specify the value to be filled in the existing record from the following
- Fill with NULL
- Fill with default values (literals)
- Fill with literal values (numeric, string, TRUE/FALSE, etc. ) for each type.
- Fill with default value (expression)
- SQL expressions can be written and filled in.
- You can also use functions such as CURRENT_TIMESTAMP() or use values from other columns by specifying other column names in the expression.

Note that schema tracking will fail if a value that is not supported by the column type is specified. Please note

  1. Click Save and Apply.
    Moves to the detailed screen for transfer settings. Schema tracking is performed automatically.
    schema-sync_002.png
if schema tracking fails

You can re-execute the transfer on the Transfer Settings Details screen.
However, schema tracking will not succeed in the cases described in the Usage Restrictions below. Please understand that we are not responsible for any loss or damage to your property.

Usage Restrictions

Restrictions common to each destination connector

  • Schema tracking cannot be used if custom variables are embedded in table names, etc. in the destination settings.
  • Columns added, deleted, or changed by the following settings are not subject to schema tracking.
  • If the schema of a table in the destination connector is changed directly without going through TROCCO, schema tracking may not work properly.
if column name is changed

In schema tracking, if a column name is changed, it is considered "a column is deleted and another column is added.
Therefore, the values contained in the existing columns will be overwritten with the values specified in the transfer setup STEP 3 modal. Please note

Regarding the settings that schema tracking takes over

When schema tracking is performed, the settings that can be configured in TROCCO are carried over to the tracked table.
For example, the following settings, which can be configured in STEP2>Output Options of the destination Google BigQuery, will be inherited.
- Column Description
- Partitioning Table
- Duration of partition

On the other hand, if various settings that cannot be set in TROCCO are set directly on the destination connector side, the settings may be lost during schema tracking. Please note

Forwarding destination Google BigQuery

  • Schema tracking is available only when the following transfer modes are selected
    • Postscript (APPEND )
    • Postscript (APPEND DIRECT )
    • All cases washed (DELETE IN ADVANCE )
  • Schema tracking cannot be used if any of the following apply
    • If you are using a table that refers to the schema information in the Transfer Settings STEP2 Output Options****as a template
    • If the destination table contains a column of type RECORD

For forwarding destination Snowflake

  • Schema tracking is available only when the following transfer modes are selected
    • Append (INSERT )
    • Postscript (INSERT DIRECT)
    • UPSERT``(MERGE )
スキーマ追従後に転送ジョブに失敗する場合

At the destination Snowflake, if the column order does not match between the "table definition on the Snowflake side" and the "column definition in TROCCO's Transfer Settings STEP2", the transfer job will fail.

In addition, columns newly added by schema tracking or columns whose names or column types have been changed will have their ORDINAL order changed to the last column in the table definition on the Snowflake side.

Therefore, if a transfer job error occurs after executing schema tracking, please refer to the table definition on the Snowflake side and then replace the order of each column in TROCCO's column definition.

schema-sync_003.png

supplementary information

  • Schema tracking detects schema differences based on information from TROCCO's transfer settings.
  • Schema tracking is available regardless of whether or not features such ascolumn definition reloading andschema change detection are used.
  • Schema tracking does not detect changes in column Descriptions or output options for transfer settings (except for changes in data types).
  • For the forwarded Google BigQuery, when schema tracking is performed, the full amount of SELECT statement queries will be executed. Please be aware that this will cause scanning on the Google BigQuery side.

Was this article helpful?