Azure Data Factory - Multiple activities in Pipeline execution order


i have 2 blob files copy azure sql tables. pipeline 2 activities:

{
    "name": "nutrientdatablobtoazuresqlpipeline",
    "properties": {
        "description": "copy nutrient data azure blob azure sql",
        "activities": [
            {
                "type": "copy",
                "typeproperties": {
                    "source": {
                        "type": "blobsource"
                    },
                    "sink": {
                        "type": "sqlsink",
                        "writebatchsize": 10000,
                        "writebatchtimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "foodgroupdescriptionsazureblob"
                    }
                ],
                "outputs": [
                    {
                        "name": "foodgroupdescriptionssqlazure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionpriorityorder": "newestfirst"
                },
                "scheduler": {
                    "frequency": "minute",
                    "interval": 15
                },
                "name": "foodgroupdescriptions",
                "description": "#1 bulk import foodgroupdescriptions"
            },
            {
                "type": "copy",
                "typeproperties": {
                    "source": {
                        "type": "blobsource"
                    },
                    "sink": {
                        "type": "sqlsink",
                        "writebatchsize": 10000,
                        "writebatchtimeout": "60.00:00:00"
                    }
                },
                "inputs": [
                    {
                        "name": "fooddescriptionsazureblob"
                    }
                ],
                "outputs": [
                    {
                        "name": "fooddescriptionssqlazure"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "executionpriorityorder": "newestfirst"
                },
                "scheduler": {
                    "frequency": "minute",
                    "interval": 15
                },
                "name": "fooddescriptions",
                "description": "#2 bulk import fooddescriptions"
            }
        ],
        "start": "2015-07-14t00:00:00z",
        "end": "2015-07-14t00:00:00z",
        "ispaused": false,
        "hubname": "gymappdatafactory_hub",
        "pipelinemode": "scheduled"
    }
}

as understand, once first activity done, second starts. how execute pipeline both activities 2 blobs execute 1 after another(constrains in database), instead of going dataset slices , run manually? also pipelinemode how can set onetime only, instead of scheduled?

hi amel,

the following help:

ordered copy

it possible run multiple copy operations 1 after in sequential/ordered manner. have 2 copy activities in pipeline: ca1 , ca2 following input data output datasets.

ca1: input: dataset1 output dataset2

ca2: inputs: dataset2 output: dataset4

ca2 run if ca1 has run , dataset2 available.

in above example, ca2 can have different input, dataset3, need specify dataset2 input ca2 ca2 not run until ca1 completes. example:

ca1: input: dataset1 output dataset2

ca2: inputs: dataset3, dataset2 output: dataset4

when multiple inputs specified, first input dataset used copying data other datasets used dependencies. ca2 start executing when following conditions met:

  • ca2 has completed , dataset2 available. dataset not used when copying data dataset4. acts scheduling dependency ca2.
  • dataset3 available. dataset represents data copied destination.

the datasets in pipeline have shared use different datasets in 2 activities , not seem share dependency. hence scheduled in parallel.

thanks, harish




Microsoft Azure  >  Azure Data Factory



Comments

Popular posts from this blog

Azure DocumentDB Owner resource does not exist

RFC_ERROR_SYSTEM_FAILURE with SAP ECC 6 Unicode

C# System.Data.Common DbCommand and getting Datasets from Oracle