Data Destination - BigQuery
  • 07 Dec 2022
  • Dark
    Light
  • PDF

Data Destination - BigQuery

  • Dark
    Light
  • PDF

Article Summary

Note

This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.

summary

This is a help page for setting up data transfer to BigQuery on Google Cloud Platform.

Supported Protocols

  • Data Transfer (Embulk)
    Using embulk-output-bigquery

constraint

  • Nothing in particular
  • API restrictions (reference)

Setting items

STEP1 Basic settings


FieldRequired default valuecontent
BigQuery connection informationYes-Please refer to BigQuery connection settings.
DatasetYes-Specifies the destination dataset name.
The dataset name must consist only of letters, numbers, and underscores (reference).
tableYes-Specifies the destination table name.
Table names must consist only of letters, numbers, and underscores (reference).
Dataset locationYesUSSpecify the location of the destination dataset.
To specify the Tokyo region, enter asia-northeast1.
Please refer to the official documentation for the locations that can be specified.
Automatic dataset creation optionsYesDo not createIf the specified dataset does not exist, it is automatically created.
* If the table does not exist, it will be generated automatically.
Transfer ModeYesappendSelect the transfer mode.
For details on each mode, see About Transfer Modes below.

Data settings

If the table is automatically created, the table is created using the column definitions in the dataset.
In this case, if the column name in the column definition contains a character string other than letters, numbers, or underscores (such as Japanese), an error will occur during transfer, so please change the column name.

About Transfer Modes

ModeDetails
appendMake an append to the table.
First, create a temporary table, populate it, and copy the temporary table to the destination table.
At this time, the transfer destination table is appended to the table.
Therefore, if the transfer fails in the middle, halfway data will not remain in the destination table.
append_directMake an append to the table.
It does not create temporary tables, etc., but directly inputs data.
At this time, it will be added to the transfer destination table.
Therefore, if the transfer fails in the middle, the data may remain in a half-finished state.
replaceWash the table.
First, create a temporary table, populate it, and copy the temporary table to the destination table.
At this time, the transfer destination table will be washed (overwritten).
delete_in_advanceWash the table.
First, if the destination table already exists, delete it, create a new destination table, and populate it with data.

For more detailed explanation, please refer to the official documentation.

Output options

Itemcontents
Column settingsYou can edit the column definition.

Column name: Table column name
Data type: The data type of the table column
Mode: The mode of the table column
Date format: When the column is a string, the format used when converting to Timestamp type
Time zone: When the column is a string, the time zone used when converting to the Timestamp type
Description: Description of the column
If the table already exists, append, append_direct will not update the column settings of the existing table except for the description.
The description is updated with a matching column name if the transfer is successful.
Also, even if a table name that references schema information as a template is used and a description is included, the column description set in the column setting takes precedence.
For a more detailed explanation of the columns, please check the official documentation.
The name of the table that references schema information as a templateGenerate schema information from this table when ingested into BigQuery.
Types of partitioned tablesSpecifies the type of partitioned table.
  • Split by ingestion time: Partitions are cut based on trocco job execution time.
  • Split by hourly column: Partitions are cut based on the reference column. Select the reference column from one of theDATE following types: DATETIME, . TIMESTAMP

    You can select from the following four types of division units.
  • Every 1 hour
  • Every 1 day
  • Every 1 month
  • Every 1 year

    For more information about split tables, refer to Overview of split tables.
Integer Range Partitioning

Trocco does not support integer range partitioning, which divides tables based on the values of specificINTEGER columns.

Required Permissions

The permissions required to use this service are as follows.

  • bigquery.datasets.create
  • bigquery.datasets.get
  • bigquery.jobs.create
  • bigquery.tables.create
  • bigquery.tables.delete
  • bigquery.tables.export
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.list
  • bigquery.tables.update
  • bigquery.tables.updateData

Was this article helpful?