- 31 Jul 2023
- Print
- DarkLight
- PDF
Workflow - Overview
- Updated on 31 Jul 2023
- Print
- DarkLight
- PDF
Overview
This page is a help guide for Workflow configurations in trocco.
Configurations created in trocco (ETL, Data Mart, etc), are called Tasks in the Workflow feature.
In a Workflow, you can schedule, set dependencies, and orchestrate the execution of your pipeline jobs.
Job Execution Settings
Maximum Concurrency
Determines the maximum number of tasks that can be run concurrently in your Workflow.
You can set up to 10 tasks to run concurrently.
Setting a task concurrency limit increases the number of tasks that can run concurrently in a single Workflow. This reduces the overall execution time of the Workflow.
Timeout Settings
If Disabled, the Workflow job won’t timeout based on run time of the Workflow. The tasks will run until the configured Workflow completes.
If Enabled, the Workflow job will cancel after a specified amount of time has elapsed since the job started.
If a job is canceled due to a timeout, you can either restart it from where it stopped or abort the workflow job entirely.
If a timeout setting is applied to a specific task, for example, an ETL Configuration, it will take precedence over the timeout setting specified in the Workflow.
An example use case for this feature is in cases where there are delivery requirements for your data pipeline. For example, when the data being processed is used in a BI tool with a specific delivery timeframe requirement. If the Workflow hasn't completed after a certain amount of time you can have it automatically timeout.
Retry Count
Sets the number of times an automatic retry is performed when a Workflow fails, and the time interval before the next retry is executed.
Note, automatic retries are not performed in the following cases:
- When the retry count is set to 0
- If all tasks are successful
- If the Workflow job is cancelled
You cannot set the product of the retry count and retry interval to exceed 60 minutes.
Parallel Execution
If a Workflow job is still running when a scheduled job for the same Workflow is set to start, choose whether to skip the scheduled job or perform a parallel execution.
In this case, if you choose to skip, the freshness of the data is decreased, however, you can avoid the risk of duplicate records in the destination. If Workflow jobs are run in parallel there is a chance of duplication, but the freshness of your data will be as expected by the schedule.
Task Error Handling
Select whether or not to execute subsequent tasks when the previous task fails.
If set to OFF, if any task fails, the Workflow is stopped without executing subsequent tasks.
If set to ON, subsequent tasks will continue to run, even if a task fails.
By disabling Task Error Handling in Workflows with multiple task dependencies, you can prevent unintended changes to your data.
If your Workflow consists of tasks with no dependencies, you can enable Task Error Handling to continue running jobs despite another task failing.
Schedule and Notification settings
You can set and manage schedules for running your Workflow job.
trocco supports hourly, daily, weekly, and monthly schedule frequency.
If you have a schedule set up for an individual task in the Workflow (for example, an ETL or Data Mart Configuration), as well as a schedule set for the Workflow itself, both will be run respectively. We recommend that you set your schedules in either the Workflow or the configuration for the respective task (not both).
Set up Slack or Email notifications and manage alerts.
You can set multiple notifications for a single Workflow, such as when a Workflow completes, and errors occur.
Editing a Flow
- To create a Flow, from the Edit Flow page, add tasks such as ETL Jobs and Data Mart Syncs.
- By selecting a range within the flowchart, you can simultaneously select multiple tasks and draw lines from a single starting point to set their run order.
- You can set parent and child Workflow relationships.
If a parent Workflow is stopped due to the failure of a task in a child workflow, when the parent Workflow is rerun, it will rerun from the job that caused the error in the child workflow.
Behavior when a Workflow job fails
When a Workflow job is rerun from a failed state, the rerun starts from the task where it previous stopped.
Examples:
- Flow containing two ETL jobs that run concurrently, a Data Mart job, and notification.
- Allows up to two tasks to be executed concurrently.
- If one of the ETL jobs fails.
- If Task Error Handling is enabled, ETL jobs, Data Mart jobs, and notification tasks will continue to be executed.
- If Task Error Handling is disabled, the Workflow job will stop when an error is detected.
- When an error occurs at a particular job, if you modify the configuration of the failed ETL job and then rerun the job, the rerun will start from the modified job, and subsequent jobs will also be run.