Workflow - Overview
  • 31 Jul 2023
  • PDF

Workflow - Overview

  • PDF

Article Summary

Overview

This page is a help guide for Workflow configurations in trocco.

Note

Configurations created in trocco (ETL, Data Mart, etc), are called Tasks in the Workflow feature.

In a Workflow, you can schedule, set dependencies, and orchestrate the execution of your pipeline jobs.

image.png

Job Execution Settings

image.png

Maximum Concurrency

Determines the maximum number of tasks that can be run concurrently in your Workflow.
You can set up to 10 tasks to run concurrently.

Setting a task concurrency limit increases the number of tasks that can run concurrently in a single Workflow. This reduces the overall execution time of the Workflow.

Timeout Settings

If Disabled, the Workflow job won’t timeout based on run time of the Workflow. The tasks will run until the configured Workflow completes.

If Enabled, the Workflow job will cancel after a specified amount of time has elapsed since the job started.
If a job is canceled due to a timeout, you can either restart it from where it stopped or abort the workflow job entirely.
If a timeout setting is applied to a specific task, for example, an ETL Configuration, it will take precedence over the timeout setting specified in the Workflow.

An example use case for this feature is in cases where there are delivery requirements for your data pipeline. For example, when the data being processed is used in a BI tool with a specific delivery timeframe requirement. If the Workflow hasn't completed after a certain amount of time you can have it automatically timeout.

Retry Count

Sets the number of times an automatic retry is performed when a Workflow fails, and the time interval before the next retry is executed.
Note, automatic retries are not performed in the following cases:

  • When the retry count is set to 0
  • If all tasks are successful
  • If the Workflow job is cancelled
Retry interval limitations

You cannot set the product of the retry count and retry interval to exceed 60 minutes.

Parallel Execution

If a Workflow job is still running when a scheduled job for the same Workflow is set to start, choose whether to skip the scheduled job or perform a parallel execution.
In this case, if you choose to skip, the freshness of the data is decreased, however, you can avoid the risk of duplicate records in the destination. If Workflow jobs are run in parallel there is a chance of duplication, but the freshness of your data will be as expected by the schedule.

Task Error Handling

Select whether or not to execute subsequent tasks when the previous task fails.

If set to OFF, if any task fails, the Workflow is stopped without executing subsequent tasks.

If set to ON, subsequent tasks will continue to run, even if a task fails.

By disabling Task Error Handling in Workflows with multiple task dependencies, you can prevent unintended changes to your data.
If your Workflow consists of tasks with no dependencies, you can enable Task Error Handling to continue running jobs despite another task failing.

Schedule and Notification settings

You can set and manage schedules for running your Workflow job.
trocco supports hourly, daily, weekly, and monthly schedule frequency.

Note

If you have a schedule set up for an individual task in the Workflow (for example, an ETL or Data Mart Configuration), as well as a schedule set for the Workflow itself, both will be run respectively. We recommend that you set your schedules in either the Workflow or the configuration for the respective task (not both).

image.png

Set up Slack or Email notifications and manage alerts.
You can set multiple notifications for a single Workflow, such as when a Workflow completes, and errors occur.

image.png

Editing a Flow

  • To create a Flow, from the Edit Flow page, add tasks such as ETL Jobs and Data Mart Syncs.
  • By selecting a range within the flowchart, you can simultaneously select multiple tasks and draw lines from a single starting point to set their run order.
  • You can set parent and child Workflow relationships.
    If a parent Workflow is stopped due to the failure of a task in a child workflow, when the parent Workflow is rerun, it will rerun from the job that caused the error in the child workflow.

image.png

Behavior when a Workflow job fails

When a Workflow job is rerun from a failed state, the rerun starts from the task where it previous stopped.

Examples:

  • Flow containing two ETL jobs that run concurrently, a Data Mart job, and notification.
  • Allows up to two tasks to be executed concurrently.
  • If one of the ETL jobs fails.
  • If Task Error Handling is enabled, ETL jobs, Data Mart jobs, and notification tasks will continue to be executed.
  • If Task Error Handling is disabled, the Workflow job will stop when an error is detected.
  • When an error occurs at a particular job, if you modify the configuration of the failed ETL job and then rerun the job, the rerun will start from the modified job, and subsequent jobs will also be run.

Was this article helpful?