Data Destination - Google BigQuery
    • PDF

    Data Destination - Google BigQuery

    • PDF

    Article summary

    summary

    Help Page for Data Settings for ETL Configuration to Google Cloud's BigQuery.

    About Google BigQuery and TROCCO Application Case Study

    TROCCO's product website provides examples of customer installations.
    Please refer to this page together with this help page.

    constraints

    Setting items

    STEP1 Basic settings

    (data) itemindispensabledefault valueContents
    Google BigQuery Connection ConfigurationYes-From the pre-registered Google BigQuery Connection Configuration, select the one that has the necessary permissions for this ETL Configuration.
    data-setYes-Select the name of the Data Destination data set.
    tableYes-Select the name of the Data Destination table.
    If the target table does not exist in the Data Destination dataset, the table will be created automatically.
    Dataset LocationYesUS (United States)Specifies the location of the Data Destination data set.
    To specify the Tokyo region, enter asia-northeast1.
    For more information on locations that can be specified, please refer to the official BigQuery documentation - BigQuery Locations.
    Option for automatic creation of data setsYesNot createdIf the specified dataset does not exist under the target project, it will be created automatically.
    transfer modeYesPostscript (APPEND )Select the transfer mode.
    For more information, see About Transfer Mode below.
    Naming constraints on dataset table columns

    Each has its own naming constraints.

    STEP2 Output Option

    (data) itemContents
    Column SettingSee column settings below.
    Table that references schema information as a templateWhen creating a new table in BigQuery, the schema of a table that already exists in BigQuery can be used.
    In this case, enter the name of the table under the data set specified in ETL Configuration STEP1.
    partitioningSee Partitioning and Clustering Settings below.
    clusteringSee Partitioning and Clustering Settings below.
    Conditions under which schema-related settings apply

    The following settings in the STEP2 Output Option apply only when creating a new table.

    • Column Setting
    • Table that references schema information as a template
    • partitioning
    • clustering

    Specifically, it is applied when a job is executed in the following conditions

    • If the target table does not exist in Data Destination
    • When Full Data Transfer (REPLACE) is selected in the Transfer Mode
      • In this case, the schema of the Data Destination table is updated with each ETL Configuration, so the above settings are applied each time.

    supplementary information

    About transfer mode

    All of the following explanations are for cases where the target table already exists in the Data Destination.
    If the target table does not exist in Data Destination, a new table is created in either mode.

    transfer modeSchema Changetransactionbehaviorsupplement
    Postscript (APPEND )NoYesThe following additions are made to the table.
    1. Create a temporary table and populate it with data
    2. WRITE_APPEND data from temporary table to target table
    Postscript (APPEND DIRECT )NoNoThe table is appended as follows
    1. WRITE_APPEND data directly to the target table
  • Processing time is shorter than in the append (APPEND ) mode, but transactions are not secured.
  • Therefore, if a transfer fails midway through, rows added along the way may remain in the target table.
  • All cases washed (DELETE IN ADVANCE )NoNoThe table will be rewashed as follows
    1. DELETE the actual data in the target table
    2. WRITE_APPEND data to the target table
  • Transactions are not secured.
  • All cases washed (REPLACE)YesYesThe table will be rewashed as follows
    1. Create a temporary table and populate it with data
    2. WRITE_TRUNCATE data in temporary table to target table
  • The schema of the existing table is deleted and updated to that of the new table.
  • Running a Job with ETL Job with Partitioned ETL Configuration when the target table is a non-partitioned table will result in a Job Error.
  • If you want to do UPSERT

    Data Destination Google BigQuery does not have UPSERT (MERGE) in its forwarding mode.
    An alternative to UPSERT (MERGE) is explained in Efficiently Updating Master Tables in the DWH without Duplication.
    Please refer to this page.

    Difference between WRITE_APPEND and WRITE_TRUNCATE

    See BigQuery API Reference - Job'swriteDisposition field.

    Column Setting

    You can configure detailed settings regarding columns.
    For more information, see BigQuery Official Documentation - Specifying Schema.

    (data) itemContents
    column nameSelect from the list of column names in ETL Configuration STEP 2 Column Setting.
    data typeSelect this option if you want to explicitly specify the data type of a column.
    mode (musical mode, mode of probability distribution, state of physical system)Select this option if you want to explicitly specify the mode of the column.
    date formatIf you want to transfer a column defined as timestamp type in ETL Configuration STEP 2 Column Setting asSTRING type, you can enter the date format to be used during the expansion.
    time zoneIf you want to transfer a column defined as timestamp type in ETL Configuration STEP2 Column Setting as STRING type or DATETIME type, you can enter the time zone to be used during the expansion.
    Description.You can enter a column description.
    Even when using a table that references schema information as a template, the information entered in this item takes precedence.

    Partitioning and clustering settings

    For more information on partitioning and clustering, see OverviewofPartitioned Tables and Overview of Clustered Tables, respectively.

    (data) itemindispensableContents
    partitioningYou can choose one of the following
  • Partitioned by acquisition time: partitioning is based on the job execution time in TROCCO.
  • Partition by field: Partitioning is done based on the reference column.
  • partition fieldEnter when division is selected by field.
    Enter a column name of type DATE, TIMESTAMP, or DATETIME.
    Partition TypeSelect this option when either partitioning method is selected for partitioning.
    Please select the granularity of table partitioning from the following
  • Hourly
  • per day
  • Every 1 month
  • Every 1 year
  • clustering-This can be set if you wish to create a clustered table.
    By entering a column name in the Clustering column, the table will be clustered based on the column in question.
    Up to four clustered columns can be specified.
    Duration of partition-You can specify the expiration date of the partition.
    For more information, see Setting Partition Expiration Dates.
    Conditions for applying partitioning and clustering settings

    Partitioning and clustering settings are valid only when a table is newly created.
    If a table already exists at the output destination, the job will be executed using the Job Settings for the existing table, not the contents of this setting.

    Boundaries of partitions in "Split by fetch time"

    Due to Google BigQuery specifications, partition boundaries are based on UTC time. Please note

    Integer range partitioning

    TROCCO does not support integer range partitioning, which partitions a table based on the value of a particular INTEGER column.

    Required Authority

    The following permissions are required to use this service.

    • bigquery.datasets.create
    • bigquery.datasets.get
    • bigquery.jobs.create
    • bigquery.tables.create
    • bigquery.tables.delete
    • bigquery.tables.export
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.list
    • bigquery.tables.update
    • bigquery.tables.updateData

    Was this article helpful?