Data Destination - Google BigQuery

TROCCO
- TROCCO
- COMETA
English

Data Destination - Google BigQuery

Article summary

Did you find this summary helpful?

Thank you for your feedback!

summary

Help Page for Data Settings for ETL Configuration to Google Cloud's BigQuery.

About Google BigQuery and TROCCO Application Case Study

TROCCO's product website provides examples of customer installations.
Please refer to this page together with this help page.

constraints

Limitations on the Google BigQuery side
- There is a limit to the size of files that can be transferred, etc.
- For more information, see BigQuery Official Documentation - Allocation and Limits.

Setting items

STEP1 Basic settings

(data) item	indispensable	default value	Contents
Google BigQuery Connection Configuration	Yes	-	From the pre-registered Google BigQuery Connection Configuration, select the one that has the necessary permissions for this ETL Configuration.
data-set	Yes	-	Select the name of the Data Destination data set.
table	Yes	-	Select the name of the Data Destination table. If the target table does not exist in the Data Destination dataset, the table will be created automatically.
Dataset Location	Yes	US (United States)	Specifies the location of the Data Destination data set. To specify the Tokyo region, enter asia-northeast1. For more information on locations that can be specified, please refer to the official BigQuery documentation - BigQuery Locations.
Option for automatic creation of data sets	Yes	Not created	If the specified dataset does not exist under the target project, it will be created automatically.
transfer mode	Yes	Postscript `(APPEND` )	Select the transfer mode. For more information, see About Transfer Mode below.

Naming constraints on dataset table columns

Each has its own naming constraints.

Data set name (specified in STEP1)
- It must consist of only letters, numbers, and underscores.
- Spaces and special characters cannot be included.
- For more information, please refer to the official BigQuery documentation - Creating Data Sets.
Table name (specified in STEP1)
- It can consist of various characters ( including numbers, underscores, hyphens, spaces, etc.).
- For more information, see BigQuery Official Documentation - Creating and Using Tables.
Column Name (specified in STEP2 Column Setting)
- It must consist of only letters, numbers, and underscores.
- Column names must begin with a letter or underscore.
- Spaces and special characters cannot be included.
- For more information, see BigQuery Official Documentation - Specifying Schema.

STEP2 Output Option

(data) item	Contents
Column Setting	See column settings below.
Table that references schema information as a template	When creating a new table in BigQuery, the schema of a table that already exists in BigQuery can be used. In this case, enter the name of the table under the data set specified in ETL Configuration STEP1.
partitioning	See Partitioning and Clustering Settings below.
clustering	See Partitioning and Clustering Settings below.

Conditions under which schema-related settings apply

The following settings in the STEP2 Output Option apply only when creating a new table.

Column Setting
Table that references schema information as a template
partitioning
clustering

Specifically, it is applied when a job is executed in the following conditions

If the target table does not exist in Data Destination
When Full Data Transfer (REPLACE) is selected in the Transfer Mode
- In this case, the schema of the Data Destination table is updated with each ETL Configuration, so the above settings are applied each time.

supplementary information

About transfer mode

All of the following explanations are for cases where the target table already exists in the Data Destination.
If the target table does not exist in Data Destination, a new table is created in either mode.

transfer mode	Schema Change	transaction	behavior	supplement
Postscript `(APPEND` )	No	Yes	The following additions are made to the table. Create a temporary table and populate it with data `WRITE_APPEND` data from temporary table to target table
Postscript `(APPEND DIRECT` )	No	No	The table is appended as follows `WRITE_APPEND` data directly to the target table	Processing time is shorter than in the `append (APPEND` ) mode, but transactions are not secured. Therefore, if a transfer fails midway through, rows added along the way may remain in the target table.
All cases washed `(DELETE IN ADVANCE` )	No	No	The table will be rewashed as follows `DELETE the` actual data in the target table `WRITE_APPEND` data to the target table	Transactions are not secured.
All cases washed `(REPLACE)`	Yes	Yes	The table will be rewashed as follows Create a temporary table and populate it with data `WRITE_TRUNCATE` data in temporary table to target table	The schema of the existing table is deleted and updated to that of the new table. Running a Job with ETL Job with Partitioned ETL Configuration when the target table is a non-partitioned table will result in a Job Error.

If you want to do UPSERT

Data Destination Google BigQuery does not have UPSERT (MERGE) in its forwarding mode.

Difference between WRITE_APPEND and WRITE_TRUNCATE

See BigQuery API Reference - Job'swriteDisposition field.

Column Setting

You can configure detailed settings regarding columns.
For more information, see BigQuery Official Documentation - Specifying Schema.

(data) item	Contents
column name	Select from the list of column names in ETL Configuration STEP 2 Column Setting.
data type	Select this option if you want to explicitly specify the data type of a column.
mode (musical mode, mode of probability distribution, state of physical system)	Select this option if you want to explicitly specify the mode of the column.
date format	If you want to transfer a column defined as `timestamp` type in ETL Configuration STEP 2 Column Setting as`STRING` type, you can enter the date format to be used during the expansion.
time zone	If you want to transfer a column defined as `timestamp` type in ETL Configuration STEP2 Column Setting as `STRING` type or `DATETIME` type, you can enter the time zone to be used during the expansion.
Description.	You can enter a column description. Even when using a table that references schema information as a template, the information entered in this item takes precedence.

Partitioning and clustering settings

For more information on partitioning and clustering, see Overview of Partitioned Tables and Overview of Clustered Tables, respectively.

(data) item	indispensable	Contents
partitioning	✓	You can choose one of the following Partitioned by acquisition time: partitioning is based on the job execution time in TROCCO. Partition by field: Partitioning is done based on the reference column.
partition field	✓	Enter when division is selected by field. Enter a column name of type `DATE`, `TIMESTAMP`, or `DATETIME`.
Partition Type	✓	Select this option when either partitioning method is selected for partitioning. Please select the granularity of table partitioning from the following Hourly per day Every 1 month Every 1 year
clustering	-	This can be set if you wish to create a clustered table. By entering a column name in the Clustering column, the table will be clustered based on the column in question. Up to four clustered columns can be specified.
Duration of partition	-	You can specify the expiration date of the partition. For more information, see Setting Partition Expiration Dates.

Conditions for applying partitioning and clustering settings

Partitioning and clustering settings are valid only when a table is newly created.
If a table already exists at the output destination, the job will be executed using the Job Settings for the existing table, not the contents of this setting.

Boundaries of partitions in "Split by fetch time"

Due to Google BigQuery specifications, partition boundaries are based on UTC time. Please note

Integer range partitioning

TROCCO does not support integer range partitioning, which partitions a table based on the value of a particular INTEGER column.

Required Authority

The following permissions are required to use this service.

bigquery.datasets.create
bigquery.datasets.get
bigquery.jobs.create
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.export
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData

Was this article helpful?

What's Next

Data Destination - Box

Table of contents

summary
constraints
Setting items
supplementary information