Data Mart - Azure Synapse Analytics

Prev Next

summary

Help page for setting up Data Mart Configuration with Azure Synapse Analytics.

Setting items

STEP1 Basic settings

item name indispensable default value Contents
Azure Synapse Analytics Connection Configuration Yes - Select the preregistered Azure Synapse Analytics Connection Configuration that has the necessary permissions for this Data Mart Configuration.
Custom Variable No - Custom Variables set here can be used for database names, queries, schema names, and table names.
database Yes - Specify the name of the database to be used for processing.

Query settings

item name indispensable default value Contents
query execution mode Yes Data Transfer Mode You can choose from the following two modes
Data transfer mode
 You can easily perform rewashing and appending to tables by simply specifying SQL and the destination table.
Free description mode
 You are free to execute any query (DDL, DELETE, INSERT, etc.) against the DWH to which you are connected.
query Yes - Enter SQL to retrieve transfer data from Azure Synapse Analytics.
Custom Variables can also be used to dynamically determine the value of a setting during ETL Configuration of TROCCO's data.

In addition, the following ETL Configuration can be specified only when the query execution mode is Data Setting mode.

Data Setting

item name indispensable default value Contents
destination schema Yes - Specifies the name of the schema to which the data will be output
Specify a schema that exists.
output table Yes - Specify the name of the table to which the data will be output.
If the table does not exist, a new one will be created at the time of transfer.
Table structure of the destination table Yes Heap (HEAP ) Select the table structure.
For more information, see Table Structure below.
Index key column of destination table Yes - Enter when CLUSTERED COLUMNSTORE INDEX is selected for the table structure.
Specify the input and order of key column names for the index.
Table distribution of output destination tables Yes Round Robin Variance (ROUND_ROBIN ) Select table variance.
For more information, see Table Distribution below.
Distribution column of the destination table Yes - Enter when hashdistribution(HASH ) is selected for table distribution.
Specify the name of the distribution column that will be used as the hash key.
Write settings for output destination table Yes - Select one of the following writing methods
  • PS
    • The result of the query execution is appended after the records of the existing table.
  • All cases washed out
    • Records in the existing table are deleted and replaced with the results of the query execution.
if table already exists in Azure Synapse Analytics

If you are appending to a table that already exists in Azure Synapse Analytics, the following settings on the TROCCO side will not be reflected and the settings on the existing table will be maintained.

  • Table structure of the destination table
  • Table distribution of output destination tables

About Table Structure

For more information on table structures, please refer to the official Azure Synapse Analytics documentation - Table Structure Options.

Table Structure Contents
Heap (HEAP ) Stores tables as a heap.
CLUSTERED COLUMNSTORE INDEX Store the table as a clustered column store index.
CLUSTERED INDEX Store tables as clustered indexes containing one or more key columns.

About Table Distributions

For more information on table distribution, see Azure Synapse Analytics Official Documentation - Table Distribution Options.

Table Distribution Contents
Hash Dispersion (HASH ) Distribute rows based on hash values of up to 8 columns.
Round Robin Variance (ROUND_ROBIN ) Distribute rows evenly across all distributions in a round-robin fashion.
REPLICATE One copy of the table is stored on each computing node.

Job Setting

item name indispensable default value Contents
Parallel execution of jobs Yes No parallel job execution. Select whether or not to run a job if another job with the same Data Mart Configuration is running at the time the job is run.
  • No parallel job execution: jobs are not executed and skipped.
  • Allow jobs to run in parallel: Jobs will run.