- 17 Jul 2024
- Print
- PDF
About input file format settings
- Updated on 17 Jul 2024
- Print
- PDF
summary
In the source file/storage connector, the input file format can be set in STEP 1 of the transfer settings, and detailed settings for the input file can be configured in the input options in STEP 2.
This page describes the input file formats and connectors supported by TROCCO, as well as various input option setting items.
Input file format
In STEP 1 of the Edit Transfer Settings, you can choose from the following
* CSV/TSV
* JSON Lines
* JSONPath
* LTSV
* Microsoft Excel (xlsx or xls file)
* XML (beta version)
The XML file format transfer function is available as a beta version.
Specifications are subject to change when the official version is released.
Connectors covered in this article
- Transfer source - Amazon S3
- Forwarding Source - App Store Connect API
- Transfer from - Azure Blob Storage
- Transfer source - Box
- Transfer source - FTP/FTPS
- From - Google Cloud Storage
- From - Google Drive
- Transfer source - HTTP/HTTPS
- Forwarding source - KARTE Datahub
- Source - SFTP
- Source - Local files
Setting items
Setting items vary depending on the input file format.
CSV/TSV
Embulk's csv-parser-plugin is used.
STEP2 Input Options
item name | default value | Description. |
---|---|---|
delimiter | , | Sets the delimiter character for CSV data. |
quotation marks | " | You can set quotation marks. |
escape character | \blow | Escape characters can be set. |
String to be replaced by NULL | Not set | You can set the string to be replaced by NULL. If you select " Set", you can enter the string to be replaced. |
Number of header lines to skip | 0 | Sets the number of rows to skip. For example, if the first line is the item name and you do not want the item name line to be included in the transfer data, set 1. |
Remove white space from value if no quotation marks are present | No, sir. | You can set whether or not whitespace should be removed from the value if it is not quoted. |
Processing method for irregular quarts | ACCEPT_ONLY_RFC4180_ESCAPED | Sets how to handle the presence of irregular quotes in quoted fields. See quotes_in_quoted_fields in csv-parser-plugin for details. |
comment line marker | - | If the character set here is at the beginning of a line, skip that line. |
Processing rows with fewer columns | Treat as invalid record | If you choose to process as invalid records, when there are records with insufficient columns, the processing of the corresponding rows will be skipped. If NULL completion of missing columns is selected, when a record with missing columns exists, NULL values are completed and processing continues. |
Processing rows with more columns | Treat as invalid record | If you choose to process as invalid records, when a record exists that has an excess number of columns, the processing of the corresponding row is skipped. If you choose to ignore columns, when there is a record with an excess number of columns, the process continues ignoring the overflowing columns. |
Maximum amount of data that can be enclosed in quotation marks | 131072 | Sets the maximum amount of data (in bytes) that can be quoted. If there is a larger amount of data than the value set here, the processing of that row is skipped. |
Whether or not to abort the transfer if an invalid record is found. | Abort transfer | If you choose to abort the transfer, the transfer will be aborted when an invalid record is found. If you choose to continue processing, when an invalid record is found, the NULL value is completed and processing continues. |
Default time zone | UTC | You can set the time zone to be used in a timestamp-type column when the read data itself has no information about the time zone. |
Date Initialization | 1970-01-01 | In the date column, you can set a default value if the date is not recognized. |
new line | CRLF | Set the rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during automatic data setup. |
JSON Lines
embulk-parser-jsonl
.
STEP2 Input Options
item name | default value | Description. |
---|---|---|
Whether or not to abort the transfer if an invalid record is found. | Abort transfer | If you choose to abort the transfer, the transfer will be aborted when an invalid record is found. If you choose to continue processing, when an invalid record is found, the NULL value is completed and processing continues. |
Default time zone | UTC | You can set the time zone to be used in a timestamp-type column when the read data itself has no information about the time zone. |
new line | CRLF | You can set rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during automatic data setup. |
JSONPath
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
JSONPath | - | See Operators for information on how to write JSONPath. *To specify all, specify $. *If you want to specify all, specify $. |
STEP2 Input Options
item name | default value | Description. |
---|---|---|
root | - | This is the same setting as "JSONPath" in STEP1 Basic Settings. |
Default time zone | UTC | You can set the time zone to be used in a timestamp-type column when the read data itself has no information about the time zone. |
LTSV
STEP2 Input Options
item name | default value | Description. |
---|---|---|
new line | CRLF | You can set rules for line breaks among CRLF, LF, and CR. |
character encoding | - | You can set the character encoding method. If not entered, it will be automatically guessed during automatic data setup. |
Microsoft Excel
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
Sheet name | - | Enter the name of the sheet to be transferred. |
Number of header lines to skip | 1 | Sets the number of rows to skip. For example, if the first line is the item name and you do not want the item name line to be included in the transfer data, set 1. |
Column Setting | - | Set the column name, column type, and method of retrieving values. How to get the value,
|
Time zone for date and time columns | Asia/Tokyo | You can set the time zone to be used for timestamp type columns. |
If " Use cache" is selected in the column settings, the values at the time the Excel file is saved locally will be used for the transfer.
Therefore, functions that change the result of calculation each time, such as date/time functions (e.g., TODAY function) and random number generation functions (e.g., RAND function), are not recalculated when the transfer is executed. Please note
Select Recalculate on Transfer if the formulas in the cells need to be recalculated when the transfer is executed.
XML (beta version)
embulk-parser-xml
.
STEP1 Basic settings
item name | default value | Description. |
---|---|---|
Path of the root (XPath format) | - | Enter the root path to the node on which the Xpath search will be based. If you want all nodes in the XML file to be searched, enter / . |
Column Setting | - | Enter the path of the node you wish to retrieve in Xpath format. For XPath notation, see XPath Syntax. The column type can be selected from the following string long timestamp double boolean json timestamp is selected, also enter the format and time zone. |
You cannot specify a path in combination with a prefix defined in the XML namespace.