- Print
- PDF
What to do when OutOfMemoryError occurs
- Print
- PDF
summary
In TROCCO, an error due to insufficient memory (Out Of Memory) may occur during the preview of STEP 2 or during job execution.
This page describes the causes of OutOfMemoryError
and how to deal with it.
Example of error message
If an error occurs due to insufficient memory, the following error message will appear in the log
OutOfMemoryError: GC overhead limit exceeded
OutOfMemoryError: Java heap space
Possible Causes
An OutOfMemoryError
occurs when the amount of data to be ETL Job transferred in a single job exceeds TROCCO's processing capacity and uses up the memory of TROCCO's job execution container.
More specifically, this occurs in the following cases
- When acquiring data: The amount of data to be acquired is enormous, depending on the contents of TROCCO's ETL Configuration.
- Data Submission: Processing volume is enormous, depending on the Data Destination Connector's Simultaneous Number of Connections and other Data Configuration settings.
cope
There are several possible responses.
Split transfer data
If the amount of data retrieved is huge due to the ETL Source Connector Configuration, the amount of data retrieved at one time should be reduced.
For example, adjust the following parts of the ETL Source Connector Configuration to reduce the amount of data acquired.
- Data Source File/Storage System: Path Prefix
- Specify a deeper hierarchy to reduce the number of files to be retrieved at one time.
- Custom Variables can also be embedded to retrieve one file at a time.
- Data Source Database System: Query
- Reduce the number of records to be retrieved at one time by writing a WHERE clause.
- Custom Variables can be embedded in the WHERE clause to dynamically specify which records to retrieve for each run.
- Data Source Cloud Application and Advertising System: Data Acquisition Period
- Reduce the number of records retrieved at one time by narrowing the time period.
- Custom Variables can be embedded in the start and end dates of data retrieval, respectively.
In addition to the above, narrowing down the columns to be retrieved and** Custom Variable Loop Execution** to split up the job itself.
Review Job Settings
Running multiple jobs simultaneously in TROCCO will result in many processing requests to the Data Destination service.
Adjust the Schedule Settings and the number of parallel workflow executions to prevent too many jobs from running at the same time.
Lower the number of rows to retrieve in a single request
Some Data Source Connectors have a configuration item that allows you to specify the number of rows to retrieve in a single request. (e.g. Data Source - Google Analytics 4)
If such a setting item exists, lower its value accordingly.
Lower batch size
Some Data Destination Connectors have a Configuration item that allows the batch size to be specified. (e.g. Data Destination - Snowflake)
If such a setting item exists, lower its value accordingly.
Adjusting Data Destination Configuration
If the Data Destination service allows you to specify a memory allocation or a limit on the number of simultaneous connections, adjust the value.
Data Destination Snowflake, for example, can improve performance per query by limiting concurrent queries.