Data Mart - Google BigQuery
  • 17 Jul 2024
  • PDF

Data Mart - Google BigQuery

  • PDF

Article summary

summary

Help page for setting up data mart definitions using Google Cloud Platform's BigQuery.

Setting items

basic setting

item nameindispensableContents
Google BigQuery connection informationFrom the pre-registered Google BigQuery connection information, select the one that has the necessary permissions for this data mart definition.
custom variable-Custom variables set here can be embedded in queries, dataset names, table names, etc.
See About Custom Variables for more information.

Query settings

item nameindispensableContents
query execution modeSelect one of the following modes
  • Data Transfer Mode
    Simply specify the query and the destination table, and you can easily perform rewashing and appending to the table.
  • free write mode
    You can write and execute any DML/DDL statement (``INSERT/DELETE, CREATE/DROP, etc. ) to the DWH to which you are connecting.
  • queryEnter a query.
    When specifying table names in a query, use the following format.
  • In data transfer mode: dataset_name.table_name
  • In free description mode: project_name.dataset_name.table_name

  • If you have selected the data transfer mode, you can also check the execution result by clicking Preview Execution.

    When data transfer mode is selected

    item nameindispensableContents
    output-data-setEnter the name of the data set to which the data will be output.
    For more information on dataset naming conventions, please refer to BigQuery Official Documentation - Naming Data Sets.
    output tableEnter the name of the table to which the data will be output.
    For more information on table naming conventions, see BigQuery Official Documentation - Table Naming.
    write modeSelect one of the following modes
    • PS
      • The result of the query execution is appended after the records of the existing table.
    • All cases washed out
      • Records in the existing table are deleted and replaced with the results of the query execution.

    Partitioning and clustering settings

    This can be set when data transfer mode is selected.
    For more information on partitioning and clustering, see Overview ofPartitioned Tables and Overview of Clustered Tables, respectively.

    Conditions for applying partitioning and clustering settings

    Partitioning and clustering settings are valid only when a table is newly created.
    If a table already exists at the output destination, the job will be executed using the settings of the existing table, not the contents of this setting.

    Boundaries of partitions in "Split by fetch time"

    Due to Google BigQuery specifications, partition boundaries are based on UTC time. Please note

    Integer range partitioning

    TROCCO does not support integer range partitioning, which partitions a table based on the value of a particular INTEGER column.

    item nameindispensableContents
    partitioning-You can choose one of the following
  • Partitioning based on acquisition time: Partitions are cut based on TROCCO's job execution time.
  • Partition by field: Partitioning is done based on the reference column.
  • partition fieldEnter when division is selected by field.
    Enter a column name of type DATE, TIMESTAMP, or DATETIME.
    Partition TypeSelect this option when either partitioning method is selected for partitioning.
    Please select the granularity of table partitioning from the following
  • Hourly
  • per day
  • Every 1 month
  • Every 1 year
  • clustering-This can be set if you wish to create a clustered table.
    By entering a column name in the Clustering column, the table will be clustered based on the column in question.
    Up to four clustered columns can be specified.

    When free description mode is selected

    item nameindispensableContents
    Data Processing Location-Specify the Google BigQuery location where the query will be executed.
    Please specify if you want to specify a resource that is not tied to a location in the query.
    If not specified, the location is automatically determined by Google BigQuery.
    For more information, please refer to the official BigQuery documentation - Specifying Locations.

    Job startup settings

    item nameindispensabledefault valueContents
    Parallel execution of jobsNo parallel job execution.Select whether or not to run a job if another job with the same data mart definition is running at the time the job is run.
  • No parallel job execution: jobs are not executed and skipped.
  • Allow jobs to run in parallel: Jobs will run.
  • Required Authority

    The following permissions are required to use this service.

    • bigquery.datasets.get
    • bigquery.jobs.create
    • bigquery.tables.create
    • bigquery.tables.delete
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.list
    • bigquery.tables.update
    • bigquery.tables.updateData

    Was this article helpful?