Data Source - BigQuery
- 24 Jan 2023
- Print
- DarkLight
- PDF
Data Source - BigQuery
- Updated on 24 Jan 2023
- Print
- DarkLight
- PDF
Article Summary
Share feedback
Thanks for sharing your feedback!
This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.
Overview
This is a help page for setting up data transfer from BigQuery on Google Cloud Platform.
Supported Protocols
- Data Transfer (Embulk)
Using embulk-input-bigquery_extract_files - File aggregation
Limitations
- Limited transfer capacity
- You cannot query a table that has Spreadsheets as an external table.
Settings
STEP 1: General Settings
Field Name | Required | Default Value | Description |
---|---|---|---|
BigQuery connection information | Yes | - | Please refer to the connection information of BigQuery. |
Data export destination GCS URI | Yes | - | Specify the GCS bucket folder where you want to temporarily export BigQuery data. (You need to create a bucket in advance.) ) You cannot specify only the bucket name. If you specify a URI containing a wildcard (*), the output will be in multiple files, and if it is not specified, it will be output in a single file (up to 1GB). Please refer to the official BigQuery documentation for details. |
.SQL | Yes | - | Enter the SQL to retrieve the transferred data from BigQuery. Custom variables can also be used to dynamically determine the setting value during trocco data transfer. Note that the FROM clause does not include the account name, but writes the query in the format FROM dataset_name.table_name. The dataset specified here must be in the same region as the dataset to which the temporary table is created and identical to the dataset location specified. |
Dataset location | Yes | US | Specify the region of the dataset specified in SQL and the dataset where the temporary table is created. (Both must be created in the same region.) ) |
Temporary table destination dataset | Yes | - | trocco saves the SQL execution result in a temporary table during the transfer process, but specify in which dataset the temporary table is to be created. Note that the dataset specified here is not automatically created by trocco, so you need to specify a pre-created dataset. Please note that the created temporary tables are deleted after the transfer is complete, but if the transfer job ends with an error, the temporary tables may remain. The dataset specified here must be in the same region as the dataset specified in SQL and the same as the dataset location specified. |
Temporary data deletion policy | Yes | delete | Set whether to delete temporary data exported to GCS after the job completes. If you want to check the temporary data after the job completes, select Do not delete. For details on how to check temporary data, refer to How to check temporary data on GCS. |
SQL Type | Yes | Standard SQL | Standard SQL and legacy SQL are supported. Please refer to the official BigQuery documentation for the differences. |
STEP2: Advanced settings
Field Name | Default Value | Description |
---|---|---|
Take advantage of the query cache | valid | Choose whether to use BigQuery's query caching feature. For more information on query caching, please refer to the official BigQuery documentation. |
Required Permissions
The permissions required to use this service are as follows.
- bigquery.datasets.get
- bigquery.jobs.create
- bigquery.tables.create
- bigquery.tables.delete
- bigquery.tables.export
- bigquery.tables.get
- bigquery.tables.getData
- bigquery.tables.updateData
- storage.buckets.create
- storage.buckets.delete
- storage.buckets.get
- storage.buckets.list
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list
How to check temporary data in GCS
As mentioned above, the transferred data is temporarily stored in a bucket folder designated by GCS.
If the "Delete temporary data policy" is set to "Do not delete", you can check the temporary data remaining in GCS after transfer.
Temporary data is compressed, so if you want to see the contents, you need to decompress it.
Change the extension of the temporary data, unzip it with gzip, and check the contents of the.gz
file.
Was this article helpful?