- Print
- PDF
About input file format settings
- Print
- PDF
summary
In the Data Source File/Storage Connector, the input file format can be set in STEP 1 of ETL Configuration, and detailed settings for the input file can be configured in Input Option in STEP 2.
This page describes the input file formats and Connectors supported by TROCCO, as well as the various Input Option setting items.
Input file format
In STEP 1 of ETL Configuration editing, you can choose from the following
- CSV/TSV
- JSON Lines
- JSONPath
- LTSV
- Microsoft Excel (xlsx or xls file)
- XML (beta version)
The XML file format transfer function is available as a beta version.
Specifications are subject to change when the official version is released.
Connectors covered in this article
- Data Source - Amazon S3
- Data Source - App Store Connect API
- Data Source - Azure Blob Storage
- Data Source - Box
- Data Source - FTP/FTPS
- Data Source - Google Cloud Storage
- Data Source - Google Drive
- Data Source - HTTP/HTTPS
- Data Source - KARTE Datahub
- Data Source - SFTP
- Data Source - Local Files
Setting items
Setting items vary depending on the input file format.
CSV/TSV
Embulk's csv-parser-plugin is used.
STEP2 Input Option
item name | default value | Description. |
---|---|---|
delimiter | , | CSV Data Setting |
quotation marks | " | You can set quotation marks. |
escape character | \ | Escape characters can be set. |
String to be replaced by NULL | Not set | You can set the string to be replaced by NULL. If you select "Set", you can enter the string to be replaced. |
Number of header rows to skip | 0 | Sets the number of rows to skip. For example, if the first line is the item name and you do not want the item name line to be included in the ETL Configuration Data Setting, set 1. |
Remove white space from value if no quotation marks are present | No, sir. | You can set whether or not whitespace should be removed from the value if it is not quoted. |
Processing method for irregular quarts | ACCEPT_ONLY_RFC4180_ESCAPED | Sets how to handle the presence of irregular quotes in quoted fields. See quotes_in_quoted_fields in csv-parser-pluginfor details. |
comment line marker | - | If the character set here is at the beginning of a line, skip that line. |
Processing rows with fewer columns | Treat as invalid record | If you choose to process as invalid records, when there are records with insufficient columns, the processing of the corresponding rows will be skipped. If NULL completion of missing columns is selected, when a record with missing columns exists, NULL values are completed and processing continues. |
Processing rows with more columns | Treat as invalid record | If you choose to process as invalid records, when a record with an excess number of columns exists, the processing of the corresponding row is skipped. If you choose to ignore columns, when there is a record with an excess number of columns, the process continues ignoring the overflowing columns. |
Maximum amount of data that can be enclosed in quotation marks | 131072 | Maximum amount of Data Setting (in bytes) that can be quoted. If there is a larger amount of data than the value set here, the processing of that row is skipped. |
Whether or not to abort the transfer if an invalid record is found. | Abort transfer | If you choose to abort the transfer, the transfer will be aborted when an invalid record is found. If you choose to continue processing, when an invalid record is found, the NULL value is completed and processing continues. |
Default time zone | UTC | Time zone can be set to be used in a timestamp type column when the read Data itself has no information about time zone. |
Date Initialization | 1970-01-01 | In the date column, you can set a default value if the date is not recognized. |
new line | CRLF | Set the rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during Automatic Data Setting. |
JSON Lines
embulk-parser-jsonl
.
STEP2 Input Option
item name | default value | Description. |
---|---|---|
Whether or not to abort the transfer if an invalid record is found. | Abort transfer | If you choose to abort the transfer, the transfer will be aborted when an invalid record is found. If you choose to continue processing, when an invalid record is found, the NULL value is completed and processing continues. |
Default time zone | UTC | Time zone can be set to be used in a timestamp type column when the read Data itself has no information about time zone. |
new line | CRLF | You can set rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during Automatic Data Setting. |
JSONPath
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
JSONPath | - | See Operators for information on how to write JSONPath. *To specify all, specify $. *If you want to specify all, specify $. |
STEP2 Input Option
item name | default value | Description. |
---|---|---|
root | - | This is the same setting as "JSONPath" in STEP1 Basic Settings. |
Default time zone | UTC | Time zone can be set to be used in a timestamp type column when the read Data itself has no information about time zone. |
LTSV
STEP2 Input Option
item name | default value | Description. |
---|---|---|
new line | CRLF | You can set rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during Automatic Data Setting. |
Microsoft Excel
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
Sheet name | - | Enter the name of the sheet to be transferred. |
Number of header rows to skip | 1 | Sets the number of rows to skip. For example, if the first line is the item name and you do not want the item name line to be included in the ETL Configuration Data Setting, set 1. |
Column Setting | - | Set the column name, column type, and method of retrieving values. How to get the value,
|
Time zone for date and time columns | Asia/Tokyo | You can set the time zone to be used for timestamp type columns. |
If you select "Use Cache" in the column configuration, ETL Configuration will use the values at the time the Excel Files are saved locally for ETL Configuration.
Therefore, functions that change the result of calculation each time, such as date/time functions (e.g., TODAY function) and random number generation functions (e.g., RAND function), are not recalculated when the transfer is executed. Please note
Select Recalculate on Transfer if the formulas in the cells need to be recalculated when the transfer is executed.
XML (beta version)
embulk-parser-xml
.
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
Path of the root (XPath format) | - | Enter the root path to the node on which the Xpath search will be based. If you want all nodes in the XML file to be searched, enter / . |
Column Setting | - | Enter the path of the node you wish to retrieve in Xpath format. For XPath notation, see XPath Syntax. The column type can be selected from the following string long timestamp double boolean json timestamp is selected, also enter the format and time zone. |
You cannot specify a path in combination with a prefix defined in the XML namespace.