Configuring the Input File Format
- 07 Dec 2022
- Print
- DarkLight
- PDF
Configuring the Input File Format
- Updated on 07 Dec 2022
- Print
- DarkLight
- PDF
Article Summary
Share feedback
Thanks for sharing your feedback!
This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.
summary
This is a help page for "Input file format" that can be specified by the transfer source in the transfer settings.
If the source is file storage such as S3 or SFTP, you need to specify the format of the file to be transferred.
Input file format
You can choose from all five types below.
- CSV/TSV
- JSON Lines
- JSONPath
- LTSV
- Microsoft Excel (xlsx or xls file)
File Storage Connectors
- Forwarding from - Amazon S3
- Source - Azure Blob Storage
- Transfer from - Box
- Transfer source - FTP/FTPS
- Forwarding from - Google Cloud Storage
- Forward from - Google Drive
- Forwarding from - HTTP(S)
- Forward from - SFTP
- Source - Local Files
Setting items
You can configure settings for each input file format when the file is imported.
.CSV
Here is the site I used as a reference.
STEP2 Advanced Settings Input Options
Item name | default value | description |
---|---|---|
Delimiter | , | You can set the delimiter for CSV data. |
quotation mark | " | You can set quotation marks. |
Escape characters | \ | You can set escape characters. |
Setting the character to be converted to NULL | Do not set | You can set the characters to be converted to NULL. If you select Set, you can set the character to be converted to NULL. |
Number of header rows to skip | 0 | Set how many lines you want to skip. For example, if the first line is the item name and you do not want to include it in the transferred data, set 1. |
Remove whitespace from values when quotes are missing | no | Set whether to remove whitespace from the value when there are no quotes. |
Irregular quote processing method | ACCEPT_ONLY_RFC4180_ESCAPED | In the field with quotes, set the method of processing when irregular quotes exist. Please see here for details. |
Comment Line Markers | - | If the character you set here appears at the beginning of a line, skip that line. |
Handling rows with fewer columns | Treat as an invalid record | If you select Treat as invalid record, the row processing is skipped when there are records that do not have enough columns. If you select NULL completion for missing columns, the process continues by completing NULL values when there are records with insufficient columns. |
Processing rows with an increasing number of columns | Treat as an invalid record | If you select Treat as invalid record, the processing of the row is skipped when there are records that exceed the number of columns. If you choose to ignore columns, if there are records that have exceeded the number of columns, the overflowing columns are ignored and processing continues. |
Maximum amount of data that can be enclosed in quotation marks | 131072 | You can set the maximum amount of data that can be enclosed in quotation marks. If there is a larger amount of data than the value set here, the row is skipped. |
Whether to abort the transfer if an incorrect record exists | Abort a transfer | If you select Abort Transfer, the transfer is aborted when an incorrect record exists. If you select Continue Processing, if an invalid record exists, the NULL value is completed and processing continues. |
Default time zone | UTC | If the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column. |
Date Preferences | 1970-01-01 | In a date column, you can set a default value if the date is not recognized. |
newline | CRLF | You can set the rules for line breaks from CRLF, LF, and CR. |
Character encoding | - | You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting. |
JSON Lines
Using embulk-parser-jsonl
STEP2 Advanced Settings Input Options
Item name | default value | description |
---|---|---|
Whether to abort the transfer if an incorrect record exists | Abort a transfer | If you select Abort Transfer, the transfer is aborted when an incorrect record exists. If you select Continue Processing, if an invalid record exists, the NULL value is completed and processing continues. |
Default time zone | UTC | If the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column. |
newline | CRLF | You can set the rules for line breaks from CRLF, LF, and CR. |
Character encoding | - | You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting. |
JSONPath
Using embulk-parser-jsonpath
STEP1 Basic settings
Item name | default value | description |
---|---|---|
JSONPath | - | Learn how to write JSONPath. * Please $.* specify when specifying all. |
STEP2 Advanced Settings Input Options
Item name | default value | description |
---|---|---|
route | - | STEP1 Same as "JSONPath" in the basic settings. |
Default time zone | UTC | If the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column. |
LTSV
STEP2 Advanced Settings Input Options
Item name | default value | description |
---|---|---|
newline | CRLF | You can set the rules for line breaks from CRLF, LF, and CR. |
Character encoding | - | You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting. |
Microsoft Excel
STEP1 Basic settings
Item name | default value | description |
---|---|---|
Sheet Name | - | Please enter the name of the sheet to be transferred. |
Number of header rows to skip | 1 | Set how many lines you want to skip. For example, if the first line is the item name and you do not want to include it in the transferred data, set 1. |
Column settings | - | Specify the column name and column type. |
Time zone for datetime columns | Asia/Tokyo | If there is a column in timestamp format, you can specify the time zone from this item. |
Was this article helpful?