Azure Data Factory - Multiple activities in Pipeline execution order


i have 2 blob files copy azure sql tables. pipeline 2 activities:

{
    "name": "nutrientdatablobtoazuresqlpipeline",
    "properties": {
        "description": "copy nutrient data azure blob azure sql",
        "activities": [
            {
                "type": "copy",
                "typeproperties": {
                    "source": {
                        "type": "blobsource"
                    },
                    "sink": {
                        "type": "sqlsink",
                        "writebatchsize": 10000,
                        "writebatchtimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "foodgroupdescriptionsazureblob"
                    }
                ],
                "outputs": [
                    {
                        "name": "foodgroupdescriptionssqlazure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionpriorityorder": "newestfirst"
                },
                "scheduler": {
                    "frequency": "minute",
                    "interval": 15
                },
                "name": "foodgroupdescriptions",
                "description": "#1 bulk import foodgroupdescriptions"
            },
            {
                "type": "copy",
                "typeproperties": {
                    "source": {
                        "type": "blobsource"
                    },
                    "sink": {
                        "type": "sqlsink",
                        "writebatchsize": 10000,
                        "writebatchtimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "fooddescriptionsazureblob"
                    }
                ],
                "outputs": [
                    {
                        "name": "fooddescriptionssqlazure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionpriorityorder": "newestfirst"
                },
                "scheduler": {
                    "frequency": "minute",
                    "interval": 15
                },
                "name": "fooddescriptions",
                "description": "#2 bulk import fooddescriptions"
            }
        ],
        "start": "2015-07-14t00:00:00z",
        "end": "2015-07-14t00:00:00z",
        "ispaused": false,
        "hubname": "gymappdatafactory_hub",
        "pipelinemode": "scheduled"
    }
}

as understand, once first activity done, second starts. how execute pipeline both activities 2 blobs execute 1 after another(constrains in database), instead of going dataset slices , run manually? also pipelinemode how can set onetime only, instead of scheduled?

hi amel,

the following help:

ordered copy

it possible run multiple copy operations 1 after in sequential/ordered manner. have 2 copy activities in pipeline: ca1 , ca2 following input data output datasets.

ca1: input: dataset1 output dataset2

ca2: inputs: dataset2 output: dataset4

ca2 run if ca1 has run , dataset2 available.

in above example, ca2 can have different input, dataset3, need specify dataset2 input ca2 ca2 not run until ca1 completes. example:

ca1: input: dataset1 output dataset2

ca2: inputs: dataset3, dataset2 output: dataset4

when multiple inputs specified, first input dataset used copying data other datasets used dependencies. ca2 start executing when following conditions met:

  • ca2 has completed , dataset2 available. dataset not used when copying data dataset4. acts scheduling dependency ca2.
  • dataset3 available. dataset represents data copied destination.

the datasets in pipeline have shared use different datasets in 2 activities , not seem share dependency. hence scheduled in parallel.

thanks, harish




Microsoft Azure  >  Azure Data Factory



Comments

Popular posts from this blog

Azure DocumentDB Owner resource does not exist

BizTalk Server 2013 Azure VM Log Shipping and HA for hosts

How to Share webservice object to all user