Data Source - Google BigQuery
    • PDF

    Data Source - Google BigQuery

    • PDF

    Article summary

    summary

    Help Page for ETL Configuration of Data Settings from Google Cloud Platform's BigQuery.

    constraints

    Setting items

    STEP1 Basic settings

    item nameindispensabledefault valueContents
    BigQuery Connection ConfigurationYes-Please refer to BigQuery's Connection Configuration.
    GCS URI to which data is exportedYes-Specify the GCS bucket/folder to which you want to temporarily export BigQuery data. (The bucket must be created in advance.)
    Only the bucket name cannot be specified.
    If a URI containing wildcards (*) is specified, output will be in multiple files; otherwise, output will be in a single file (up to 1 GB).
    Please refer to the official BigQuery documentation for details.
    SQLYes-Enter SQL to retrieve transfer data from BigQuery.
    Custom Variables can also be used to dynamically determine the value of a setting during ETL Configuration of TROCCO's data.
    Please note that the FROM clause should not include the account name, and the query should be written in the format FROM dataset_name.table_name.
    The dataset specified here must be in the same region as the dataset for which the temporary table is being created and must be the same as that specified in the dataset location.
    Dataset LocationYesUSSpecify the dataset specified in SQL and the region of the dataset for which the temporary table will be created.
    (Both must be created in the same region.)
    temporary table destination data setYes-TROCCO stores the results of SQL executions in a temporary table during the transfer process, but please specify which data set the temporary table should be created for.
    Note that the data set specified here is not automatically created by TROCCO, so it is necessary to specify a data set that has already been created.
    Please note that the temporary table created will be deleted after the ETL Job is completed, but if the ETL Job is terminated with an error, the temporary table may remain.
    The dataset specified here must be in the same region as the dataset specified in SQL and the same as specified in the dataset location.
    Temporary data file formatYesCSVSelect the file format for temporary data to be exported to Google Cloud Storage.
    Be sure to perform Automatic Data Setting when making any changes.
    Temporary Data Deletion PolicyYesDeleteData Setting for deleting temporary data exported to GCS after the job is completed.
    If you wish to review the temporary data after the job is completed, select Do not delete.
    For more information on how to check temporary data, please refer to How to Check Temporary Data on GCS.
    SQL TypeYesSQLStandard and legacy SQL are supported.
    Please refer to the official BigQuery documentation for the differences.

    STEP1 detailed settings

    item namedefault valueContents
    Job wait timeout (sec)600You can specify a timeout in seconds to wait for a job to run.
    Use query cachevaliditySelect whether to use BigQuery's query cache feature.
    Please refer to the official BigQuery documentation for more information on query caching.
    Example of using job wait timeout

    When there are many queries running in BigQuery, slot limits may cause jobs to wait until they are executed.
    When this wait time reaches the specified timeout time, Not found:. Tableand the corresponding ETL Job will fail.
    By increasing the job wait timeout in such cases, the failure of ETL Job can be avoided.

    Required Authority

    The following permissions are required to use this service.

    • bigquery.datasets.get
    • bigquery.jobs.create
    • bigquery.tables.create
    • bigquery.tables.delete
    • bigquery.tables.export
    • bigquery.tables.get
    • bigquery.tables.getData
    • bigquery.tables.updateData
    • storage.buckets.create
    • storage.buckets.delete
    • storage.buckets.get
    • storage.buckets.list
    • storage.objects.create
    • storage.objects.delete
    • storage.objects.get
    • storage.objects.list

    How to check temporary data on GCS

    As mentioned above, the transferred data is temporarily stored in the bucket/folder specified by GCS.
    If the "Temporary Data Deletion Policy" is set to "Do Not Delete," temporary data remaining in the GCS can be viewed after ETL Configuration.
    Since the temporary data is compressed, it must be decompressed if you wish to review its contents.
    Please change the extension of the temporary data to .gz, unzip the file with gzip, and check the contents of the file.


    Was this article helpful?