Azure Data Factory - Multiple activities in Pipeline execution order

April 25, 2011

i have 2 blob files copy azure sql tables. pipeline 2 activities:

{
"name": "nutrientdatablobtoazuresqlpipeline",
"properties": {
"description": "copy nutrient data azure blob azure sql",
"activities": [
{
"type": "copy",
"typeproperties": {
"source": {
"type": "blobsource"
},
"sink": {
"type": "sqlsink",
"writebatchsize": 10000,
"writebatchtimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "foodgroupdescriptionsazureblob"
}
],
"outputs": [
{
"name": "foodgroupdescriptionssqlazure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionpriorityorder": "newestfirst"
},
"scheduler": {
"frequency": "minute",
"interval": 15
},
"name": "foodgroupdescriptions",
"description": "#1 bulk import foodgroupdescriptions"
},
{
"type": "copy",
"typeproperties": {
"source": {
"type": "blobsource"
},
"sink": {
"type": "sqlsink",
"writebatchsize": 10000,
"writebatchtimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "fooddescriptionsazureblob"
}
],
"outputs": [
{
"name": "fooddescriptionssqlazure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionpriorityorder": "newestfirst"
},
"scheduler": {
"frequency": "minute",
"interval": 15
},
"name": "fooddescriptions",
"description": "#2 bulk import fooddescriptions"
}
],
"start": "2015-07-14t00:00:00z",
"end": "2015-07-14t00:00:00z",
"ispaused": false,
"hubname": "gymappdatafactory_hub",
"pipelinemode": "scheduled"
}
}

as understand, once first activity done, second starts. how execute pipeline both activities 2 blobs execute 1 after another(constrains in database), instead of going dataset slices , run manually? also pipelinemode how can set onetime only, instead of scheduled?

hi amel,

the following help:

ordered copy

it possible run multiple copy operations 1 after in sequential/ordered manner. have 2 copy activities in pipeline: ca1 , ca2 following input data output datasets.

ca1: input: dataset1 output dataset2

ca2: inputs: dataset2 output: dataset4

ca2 run if ca1 has run , dataset2 available.

in above example, ca2 can have different input, dataset3, need specify dataset2 input ca2 ca2 not run until ca1 completes. example:

ca1: input: dataset1 output dataset2

ca2: inputs: dataset3, dataset2 output: dataset4

when multiple inputs specified, first input dataset used copying data other datasets used dependencies. ca2 start executing when following conditions met:

ca2 has completed , dataset2 available. dataset not used when copying data dataset4. acts scheduling dependency ca2.
dataset3 available. dataset represents data copied destination.

the datasets in pipeline have shared use different datasets in 2 activities , not seem share dependency. hence scheduled in parallel.

thanks, harish

Microsoft Azure > Azure Data Factory

Search This Blog

Gain

Azure Data Factory - Multiple activities in Pipeline execution order

ordered copy

Comments

Post a Comment

Popular posts from this blog

BizTalk Server 2013 Azure VM Log Shipping and HA for hosts

Azure DocumentDB Owner resource does not exist

SQL Server 2008 - High Memory Usage