Configuring the Input File Format
  • 07 Dec 2022
  • PDF

Configuring the Input File Format

  • PDF

Article summary

Note

This is a machine-translated version of the original Japanese article.
Please understand that some of the information contained on this page may be inaccurate.

summary

This is a help page for "Input file format" that can be specified by the transfer source in the transfer settings.
If the source is file storage such as S3 or SFTP, you need to specify the format of the file to be transferred.

Input file format

You can choose from all five types below.

  • CSV/TSV
  • JSON Lines
  • JSONPath
  • LTSV
  • Microsoft Excel (xlsx or xls file)

File Storage Connectors

Setting items

You can configure settings for each input file format when the file is imported.

.CSV

Here is the site I used as a reference.

STEP2 Advanced Settings Input Options

Item namedefault valuedescription
Delimiter,You can set the delimiter for CSV data.
quotation mark"You can set quotation marks.
Escape characters\You can set escape characters.
Setting the character to be converted to NULLDo not setYou can set the characters to be converted to NULL.
If you select Set, you can set the character to be converted to NULL.
Number of header rows to skip0Set how many lines you want to skip.
For example, if the first line is the item name and you do not want to include it in the transferred data, set 1.
Remove whitespace from values when quotes are missingnoSet whether to remove whitespace from the value when there are no quotes.
Irregular quote processing methodACCEPT_ONLY_RFC4180_ESCAPEDIn the field with quotes, set the method of processing when irregular quotes exist.
Please see here for details.
Comment Line Markers-If the character you set here appears at the beginning of a line, skip that line.
Handling rows with fewer columnsTreat as an invalid recordIf you select Treat as invalid record, the row processing is skipped when there are records that do not have enough columns.
If you select NULL completion for missing columns, the process continues by completing NULL values when there are records with insufficient columns.
Processing rows with an increasing number of columnsTreat as an invalid recordIf you select Treat as invalid record, the processing of the row is skipped when there are records that exceed the number of columns.
If you choose to ignore columns, if there are records that have exceeded the number of columns, the overflowing columns are ignored and processing continues.
Maximum amount of data that can be enclosed in quotation marks131072You can set the maximum amount of data that can be enclosed in quotation marks.
If there is a larger amount of data than the value set here, the row is skipped.
Whether to abort the transfer if an incorrect record existsAbort a transferIf you select Abort Transfer, the transfer is aborted when an incorrect record exists.
If you select Continue Processing, if an invalid record exists, the NULL value is completed and processing continues.
Default time zoneUTCIf the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column.
Date Preferences1970-01-01In a date column, you can set a default value if the date is not recognized.
newlineCRLFYou can set the rules for line breaks from CRLF, LF, and CR.
Character encoding-You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting.

JSON Lines

Using embulk-parser-jsonl

STEP2 Advanced Settings Input Options

Item namedefault valuedescription
Whether to abort the transfer if an incorrect record existsAbort a transferIf you select Abort Transfer, the transfer is aborted when an incorrect record exists.
If you select Continue Processing, if an invalid record exists, the NULL value is completed and processing continues.
Default time zoneUTCIf the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column.
newlineCRLFYou can set the rules for line breaks from CRLF, LF, and CR.
Character encoding-You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting.

JSONPath

Using embulk-parser-jsonpath

STEP1 Basic settings

Item namedefault valuedescription
JSONPath-Learn how to write JSONPath.
* Please$.* specify when specifying all.

STEP2 Advanced Settings Input Options

Item namedefault valuedescription
route-STEP1 Same as "JSONPath" in the basic settings.
Default time zoneUTCIf the imported data itself does not have information about the time zone, you can set the time zone used for the timestamp type column.

LTSV

STEP2 Advanced Settings Input Options

Item namedefault valuedescription
newlineCRLFYou can set the rules for line breaks from CRLF, LF, and CR.
Character encoding-You can set the character encoding you want to use for encoding. If it is not entered, it will be automatically guessed during automatic data setting.

Microsoft Excel

STEP1 Basic settings

Item namedefault valuedescription
Sheet Name-Please enter the name of the sheet to be transferred.
Number of header rows to skip1Set how many lines you want to skip.
For example, if the first line is the item name and you do not want to include it in the transferred data, set 1.
Column settings-Specify the column name and column type.
Time zone for datetime columnsAsia/TokyoIf there is a column in timestamp format, you can specify the time zone from this item.

Was this article helpful?