Data Destination - Google BigQuery

Prev Next

summary

Help Page for Data Settings for ETL Configuration to Google Cloud's BigQuery.

About Google BigQuery and TROCCO Application Case Study

TROCCO's product website provides examples of customer installations.
Please refer to this page together with this help page.

constraints

Setting items

STEP1 Basic settings

(data) item indispensable default value Contents
Google BigQuery Connection Configuration Yes - From the pre-registered Google BigQuery Connection Configuration, select the one that has the necessary permissions for this ETL Configuration.
data-set Yes - Select the name of the Data Destination data set.
table Yes - Select the name of the Data Destination table.
If the target table does not exist in the Data Destination dataset, the table will be created automatically.
Dataset Location Yes US (United States) Specifies the location of the Data Destination data set.
To specify the Tokyo region, enter asia-northeast1.
For more information on locations that can be specified, please refer to the official BigQuery documentation - BigQuery Locations.
Option for automatic creation of data sets Yes Not created If the specified dataset does not exist under the target project, it will be created automatically.
transfer mode Yes Postscript (APPEND ) Select the transfer mode.
For more information, see About Transfer Mode below.
Naming constraints on dataset table columns

Each has its own naming constraints.

STEP2 Output Option

(data) item Contents
Column Setting See column settings below.
Table that references schema information as a template When creating a new table in BigQuery, the schema of a table that already exists in BigQuery can be used.
In this case, enter the name of the table under the data set specified in ETL Configuration STEP1.
partitioning See Partitioning and Clustering Settings below.
clustering See Partitioning and Clustering Settings below.
Conditions under which schema-related settings apply

The following settings in the STEP2 Output Option apply only when creating a new table.

  • Column Setting
  • Table that references schema information as a template
  • partitioning
  • clustering

Specifically, it is applied when a job is executed in the following conditions

  • If the target table does not exist in Data Destination
  • When Full Data Transfer (REPLACE) is selected in the Transfer Mode
    • In this case, the schema of the Data Destination table is updated with each ETL Configuration, so the above settings are applied each time.

supplementary information

About transfer mode

All of the following explanations are for cases where the target table already exists in the Data Destination.
If the target table does not exist in Data Destination, a new table is created in either mode.

transfer mode Schema Change transaction behavior supplement
Postscript (APPEND) No Yes The following additions are made to the table.
  1. Create a temporary table and populate it with data
  2. WRITE_APPEND data from temporary table to target table
Postscript (APPEND DIRECT) No No The table is appended as follows
  1. WRITE_APPEND data directly to the target table
  • Processing time is shorter than in the append (APPEND ) mode, but transactions are not secured.
  • Therefore, if a transfer fails midway through, rows added along the way may remain in the target table.
  • All cases washed (DELETE IN ADVANCE) No No The table will be rewashed as follows
    1. DELETE the actual data in the target table
    2. WRITE_APPEND data to the target table
  • Transactions are not secured.
  • All cases washed (REPLACE) Yes Yes The table will be rewashed as follows
    1. Create a temporary table and populate it with data
    2. WRITE_TRUNCATE data in temporary table to target table
  • The schema of the existing table is deleted and updated to that of the new table.
  • Running a Job with ETL Job with Partitioned ETL Configuration when the target table is a non-partitioned table will result in a Job Error.
  • UPSERT (MERGE) No Yes Perform UPSERT to the table as follows.
    1. Create a temporary table and insert data into it
    2. MERGE the temporary table's data into the target table
  • For the target table, rows matching the temporary table's merge key and values are updated, while non-matching rows are inserted.
  • Difference between WRITE_APPEND and WRITE_TRUNCATE

    See BigQuery API Reference - Job'swriteDisposition field.

    Column Setting

    You can configure detailed settings regarding columns.
    For more information, see BigQuery Official Documentation - Specifying Schema.

    (data) item Contents
    column name Select from the list of column names in ETL Configuration STEP 2 Column Setting.
    data type Select this option if you want to explicitly specify the data type of a column.
    mode (musical mode, mode of probability distribution, state of physical system) Select this option if you want to explicitly specify the mode of the column.
    date format If you want to transfer a column defined as timestamp type in ETL Configuration STEP 2 Column Setting asSTRING type, you can enter the date format to be used during the expansion.
    time zone If you want to transfer a column defined as timestamp type in ETL Configuration STEP2 Column Setting as STRING type or DATETIME type, you can enter the time zone to be used during the expansion.
    Description. You can enter a column description.
    Even when using a table that references schema information as a template, the information entered in this item takes precedence.

    Partitioning and clustering settings

    For more information on partitioning and clustering, see OverviewofPartitioned Tables and Overview of Clustered Tables, respectively.

    (data) item indispensable Contents
    partitioning You can choose one of the following
  • Partitioned by acquisition time: partitioning is based on the job execution time in TROCCO.
  • Partition by field: Partitioning is done based on the reference column.
  • partition field Enter when division is selected by field.
    Enter a column name of type DATE, TIMESTAMP, or DATETIME.
    Partition Type Select this option when either partitioning method is selected for partitioning.
    Please select the granularity of table partitioning from the following
  • Hourly
  • per day
  • Every 1 month
  • Every 1 year
  • clustering - This can be set if you wish to create a clustered table.
    By entering a column name in the Clustering column, the table will be clustered based on the column in question.
    Up to four clustered columns can be specified.
    Duration of partition - You can specify the expiration date of the partition.
    For more information, see Setting Partition Expiration Dates.
    Conditions for applying partitioning and clustering settings

    Partitioning and clustering settings are valid only when a table is newly created.
    If a table already exists at the output destination, the job will be executed using the Job Settings for the existing table, not the contents of this setting.

    Boundaries of partitions in "Split by fetch time"

    Due to Google BigQuery specifications, partition boundaries are based on UTC time. Please note

    Integer range partitioning

    TROCCO does not support integer range partitioning, which partitions a table based on the value of a particular INTEGER column.

    Required Authority

    The following permissions are required to use this service.

    • bigquery.datasets.create
    • bigquery.datasets.get
    • bigquery.jobs.create
    • bigquery.tables.create
    • bigquery.tables.delete
    • bigquery.tables.export
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.list
    • bigquery.tables.update
    • bigquery.tables.updateData