- Print
- PDF
Data Destination - Google BigQuery
- Print
- PDF
summary
Help Page for Data Settings for ETL Configuration to Google Cloud's BigQuery.
TROCCO's product website provides examples of customer installations.
Please refer to this page together with this help page.
constraints
- Limitations on the Google BigQuery side
- There is a limit to the size of files that can be transferred, etc.
- For more information, see BigQuery Official Documentation - Allocation and Limits.
Setting items
STEP1 Basic settings
(data) item | indispensable | default value | Contents |
---|---|---|---|
Google BigQuery Connection Configuration | Yes | - | From the pre-registered Google BigQuery Connection Configuration, select the one that has the necessary permissions for this ETL Configuration. |
data-set | Yes | - | Select the name of the Data Destination data set. |
table | Yes | - | Select the name of the Data Destination table. If the target table does not exist in the Data Destination dataset, the table will be created automatically. |
Dataset Location | Yes | US (United States) | Specifies the location of the Data Destination data set. To specify the Tokyo region, enter asia-northeast1. For more information on locations that can be specified, please refer to the official BigQuery documentation - BigQuery Locations. |
Option for automatic creation of data sets | Yes | Not created | If the specified dataset does not exist under the target project, it will be created automatically. |
transfer mode | Yes | Postscript (APPEND ) | Select the transfer mode. For more information, see About Transfer Mode below. |
Each has its own naming constraints.
- Data set name (specified in STEP1)
- It must consist of only letters, numbers, and underscores.
- Spaces and special characters cannot be included.
- For more information, please refer to the official BigQuery documentation - Creating Data Sets.
- Table name (specified in STEP1)
- It can consist of various characters ( including numbers, underscores, hyphens, spaces, etc.).
- For more information, see BigQuery Official Documentation - Creating and Using Tables.
- Column Name (specified in STEP2 Column Setting)
- It must consist of only letters, numbers, and underscores.
- Column names must begin with a letter or underscore.
- Spaces and special characters cannot be included.
- For more information, see BigQuery Official Documentation - Specifying Schema.
STEP2 Output Option
(data) item | Contents |
---|---|
Column Setting | See column settings below. |
Table that references schema information as a template | When creating a new table in BigQuery, the schema of a table that already exists in BigQuery can be used. In this case, enter the name of the table under the data set specified in ETL Configuration STEP1. |
partitioning | See Partitioning and Clustering Settings below. |
clustering | See Partitioning and Clustering Settings below. |
The following settings in the STEP2 Output Option apply only when creating a new table.
- Column Setting
- Table that references schema information as a template
- partitioning
- clustering
Specifically, it is applied when a job is executed in the following conditions
- If the target table does not exist in Data Destination
- When
Full
Data Transfer(REPLACE)
is selected inthe
Transfer Mode- In this case, the schema of the Data Destination table is updated with each ETL Configuration, so the above settings are applied each time.
supplementary information
About transfer mode
All of the following explanations are for cases where the target table already exists in the Data Destination.
If the target table does not exist in Data Destination, a new table is created in either mode.
transfer mode | Schema Change | transaction | behavior | supplement |
---|---|---|---|---|
Postscript (APPEND ) | No | Yes | The following additions are made to the table.
| |
Postscript (APPEND DIRECT ) | No | No | The table is appended as follows
| append (APPEND ) mode, but transactions are not secured. |
All cases washed (DELETE IN ADVANCE ) | No | No | The table will be rewashed as follows
| |
All cases washed (REPLACE) | Yes | Yes | The table will be rewashed as follows
|
Data Destination Google BigQuery does not have UPSERT (MERGE)
in its forwarding mode.
An alternative to UPSERT (MERGE)
is explained in Efficiently Updating Master Tables in the DWH without Duplication.
Please refer to this page.
See BigQuery API Reference - Job'swriteDisposition
field.
Column Setting
You can configure detailed settings regarding columns.
For more information, see BigQuery Official Documentation - Specifying Schema.
(data) item | Contents |
---|---|
column name | Select from the list of column names in ETL Configuration STEP 2 Column Setting. |
data type | Select this option if you want to explicitly specify the data type of a column. |
mode (musical mode, mode of probability distribution, state of physical system) | Select this option if you want to explicitly specify the mode of the column. |
date format | If you want to transfer a column defined as timestamp type in ETL Configuration STEP 2 Column Setting asSTRING type, you can enter the date format to be used during the expansion. |
time zone | If you want to transfer a column defined as timestamp type in ETL Configuration STEP2 Column Setting as STRING type or DATETIME type, you can enter the time zone to be used during the expansion. |
Description. | You can enter a column description. Even when using a table that references schema information as a template, the information entered in this item takes precedence. |
Partitioning and clustering settings
For more information on partitioning and clustering, see OverviewofPartitioned Tables and Overview of Clustered Tables, respectively.
(data) item | indispensable | Contents |
---|---|---|
partitioning | ✓ | You can choose one of the following |
partition field | ✓ | Enter when division is selected by field. Enter a column name of type DATE , TIMESTAMP , or DATETIME . |
Partition Type | ✓ | Select this option when either partitioning method is selected for partitioning. Please select the granularity of table partitioning from the following |
clustering | - | This can be set if you wish to create a clustered table. By entering a column name in the Clustering column, the table will be clustered based on the column in question. Up to four clustered columns can be specified. |
Duration of partition | - | You can specify the expiration date of the partition. For more information, see Setting Partition Expiration Dates. |
Partitioning and clustering settings are valid only when a table is newly created.
If a table already exists at the output destination, the job will be executed using the Job Settings for the existing table, not the contents of this setting.
Due to Google BigQuery specifications, partition boundaries are based on UTC time. Please note
TROCCO does not support integer range partitioning, which partitions a table based on the value of a particular INTEGER
column.
Required Authority
The following permissions are required to use this service.
bigquery.datasets.create
bigquery.datasets.get
bigquery.jobs.create
bigquery.tables.create
bigquery.tables.delete
bigquery.tables.export
bigquery.tables.get
bigquery.tables.getData
bigquery.tables.list
bigquery.tables.update
bigquery.tables.updateData