- Print
- PDF
Data Source - Google BigQuery
- Print
- PDF
summary
Help Page for ETL Configuration of Data Settings from Google Cloud Platform's BigQuery.
constraints
- Limited transfer capacity
- I cannot execute a query against a table with Spreadsheets as an external table.
Setting items
STEP1 Basic settings
item name | indispensable | default value | Contents |
---|---|---|---|
BigQuery Connection Configuration | Yes | - | Please refer to BigQuery's Connection Configuration. |
GCS URI to which data is exported | Yes | - | Specify the GCS bucket/folder to which you want to temporarily export BigQuery data. (The bucket must be created in advance.) Only the bucket name cannot be specified. If a URI containing wildcards (*) is specified, output will be in multiple files; otherwise, output will be in a single file (up to 1 GB). Please refer to the official BigQuery documentation for details. |
SQL | Yes | - | Enter SQL to retrieve transfer data from BigQuery. Custom Variables can also be used to dynamically determine the value of a setting during ETL Configuration of TROCCO's data. Please note that the FROM clause should not include the account name, and the query should be written in the format FROM dataset_name.table_name. The dataset specified here must be in the same region as the dataset for which the temporary table is being created and must be the same as that specified in the dataset location. |
Dataset Location | Yes | US | Specify the dataset specified in SQL and the region of the dataset for which the temporary table will be created. (Both must be created in the same region.) |
temporary table destination data set | Yes | - | TROCCO stores the results of SQL executions in a temporary table during the transfer process, but please specify which data set the temporary table should be created for. Note that the data set specified here is not automatically created by TROCCO, so it is necessary to specify a data set that has already been created. Please note that the temporary table created will be deleted after the ETL Job is completed, but if the ETL Job is terminated with an error, the temporary table may remain. The dataset specified here must be in the same region as the dataset specified in SQL and the same as specified in the dataset location. |
Temporary data file format | Yes | CSV | Select the file format for temporary data to be exported to Google Cloud Storage. Be sure to perform Automatic Data Setting when making any changes. |
Temporary Data Deletion Policy | Yes | Delete | Data Setting for deleting temporary data exported to GCS after the job is completed. If you wish to review the temporary data after the job is completed, select Do not delete. For more information on how to check temporary data, please refer to How to Check Temporary Data on GCS. |
SQL Type | Yes | SQL | Standard and legacy SQL are supported. Please refer to the official BigQuery documentation for the differences. |
STEP1 detailed settings
item name | default value | Contents |
---|---|---|
Job wait timeout (sec) | 600 | You can specify a timeout in seconds to wait for a job to run. |
Use query cache | validity | Select whether to use BigQuery's query cache feature. Please refer to the official BigQuery documentation for more information on query caching. |
When there are many queries running in BigQuery, slot limits may cause jobs to wait until they are executed.
When this wait time reaches the specified timeout time, Not found:. Table
and the corresponding ETL Job will fail.
By increasing the job wait timeout in such cases, the failure of ETL Job can be avoided.
Required Authority
The following permissions are required to use this service.
- bigquery.datasets.get
- bigquery.jobs.create
- bigquery.tables.create
- bigquery.tables.delete
- bigquery.tables.export
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.updateData
- storage.buckets.create
- storage.buckets.delete
- storage.buckets.get
- storage.buckets.list
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list
How to check temporary data on GCS
As mentioned above, the transferred data is temporarily stored in the bucket/folder specified by GCS.
If the "Temporary Data Deletion Policy" is set to "Do Not Delete," temporary data remaining in the GCS can be viewed after ETL Configuration.
Since the temporary data is compressed, it must be decompressed if you wish to review its contents.
Please change the extension of the temporary data to .gz
, unzip the file with gzip, and check the contents of the file.