Azure Data Factory - Multiple activities in Pipeline execution order
i have 2 blob files copy azure sql tables. pipeline 2 activities:
{"name": "nutrientdatablobtoazuresqlpipeline",
"properties": {
"description": "copy nutrient data azure blob azure sql",
"activities": [
{
"type": "copy",
"typeproperties": {
"source": {
"type": "blobsource"
},
"sink": {
"type": "sqlsink",
"writebatchsize": 10000,
"writebatchtimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "foodgroupdescriptionsazureblob"
}
],
"outputs": [
{
"name": "foodgroupdescriptionssqlazure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionpriorityorder": "newestfirst"
},
"scheduler": {
"frequency": "minute",
"interval": 15
},
"name": "foodgroupdescriptions",
"description": "#1 bulk import foodgroupdescriptions"
},
{
"type": "copy",
"typeproperties": {
"source": {
"type": "blobsource"
},
"sink": {
"type": "sqlsink",
"writebatchsize": 10000,
"writebatchtimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "fooddescriptionsazureblob"
}
],
"outputs": [
{
"name": "fooddescriptionssqlazure"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionpriorityorder": "newestfirst"
},
"scheduler": {
"frequency": "minute",
"interval": 15
},
"name": "fooddescriptions",
"description": "#2 bulk import fooddescriptions"
}
],
"start": "2015-07-14t00:00:00z",
"end": "2015-07-14t00:00:00z",
"ispaused": false,
"hubname": "gymappdatafactory_hub",
"pipelinemode": "scheduled"
}
}
as understand, once first activity done, second starts. how execute pipeline both activities 2 blobs execute 1 after another(constrains in database), instead of going dataset slices , run manually? also pipelinemode how can set onetime only, instead of scheduled?
hi amel,
the following help:
ordered copy
it possible run multiple copy operations 1 after in sequential/ordered manner. have 2 copy activities in pipeline: ca1 , ca2 following input data output datasets.
ca1: input: dataset1 output dataset2
ca2: inputs: dataset2 output: dataset4
ca2 run if ca1 has run , dataset2 available.
in above example, ca2 can have different input, dataset3, need specify dataset2 input ca2 ca2 not run until ca1 completes. example:
ca1: input: dataset1 output dataset2
ca2: inputs: dataset3, dataset2 output: dataset4
when multiple inputs specified, first input dataset used copying data other datasets used dependencies. ca2 start executing when following conditions met:
- ca2 has completed , dataset2 available. dataset not used when copying data dataset4. acts scheduling dependency ca2.
- dataset3 available. dataset represents data copied destination.
the datasets in pipeline have shared use different datasets in 2 activities , not seem share dependency. hence scheduled in parallel.
thanks, harish
Microsoft Azure > Azure Data Factory
Comments
Post a Comment