SQL Server SSIS Tasks Running before a prior Task Completes

9nvpjoqh  于 2023-03-07  发布在  其他
关注(0)|答案(1)|浏览(131)

Let's say I have an SSIS Project developed on VS2019 against a SQL2019 database. It has 6 tasks (see image)

  1. The first, Data Flow Task - Product Rollup In, takes 2 minutes to complete
  2. The second, Data Flow Task - Provider Group In, takes 45 minutes to complete
  3. Data Flow Task - Provider In and of the rest of the tasks take less than 2 minutes each

How do I make sure that tasks 3, 4, 5, and 6 do not run prior to the completion of Step 2 above? Would changing "DelayValidation" to "true" take care of all that?

I ask because I'm just finishing a mid to large SSIS project including 31 tasks.

While running the package from within Visual Studio 2019, I believe I've seen just that, shorter running tasks running before the 45-minute task completes.

hkmswyz6

hkmswyz61#

You've not seen that in SSIS as @larnu calls out in their comment-that-should-have-been-an-answer "The flow already dictates that the prior task must be successfully complete before the prior is; that's what the green lines means." ;)

Here you can see the description of the Precedence Constraint Editor, aka the green line.
A precedence constraint defines the workflow between two executables. The precedence constraint ca be based on a combination of the execution results and the evaluation of the expressions.

In this sequence of Task 1 to Task 2, when the package begins, the package will validate any constraints associated to Task 1 and Task 2. This is before any actual work happens because it's designed to fail fast. There's no reason to run 30 steps if you know the 31st step will not succeed because the data for that final step does not exist. Stated another way, tasks are executed in serial when connected by precedent constraint(s). Tasks without precedent constraints are executed in parallel up to MaxConcurrentExecutables value and/or cpu threads and/or fuzzy formulas I don't remember. At one point, I think 5 was default number.

Once validation is complete, then Task 1 begins and only when it has completed, could Task 2 begin execution.

The DelayValidation property comes into play if step 30 generates a file that step 31 is dependent on. Setting Delay Validation to True on step 31, assuming it's something like a Data Flow Task, means that validation is delayed until that task receives the signal to start processing. But it still validates that the metadata is as expected before the Execution phase begins.

You can design packages to use both parallel and serial execution. Thinking about data warehouse loads, say you've got 20 dimension: 19 are type 1 and 1 is a type 2. Maybe you can rip through the 19 in the same amount of time it takes for the 1 type 2. In a case like that, and assuming no dependencies between the dimensions, you can stack it like this

All the stuff in the "fast" container runs sequentially and the one item in "Slow" also runs and they both get the signal to start processing at the same time (because there is no parent precedent constraint leading into "Task 1 Fast" or "Task 1 Slow". Only when all of the workers inside the sequence container "Task 1" have completed, would Task 2 get the signal to start processing. I like this pattern because when DFT 18 gets to be slow and needs dedicated processing, I just add another "Task 1 Slow 2" and move the slow task into that container and now I have 3 approximately equal sized workers running. And they all must complete before we move on to the Task 2 bucket.

For extra points, use this concept with a simple Work Pile pattern and dynamic package execution and you can rebalance a package with just table value changes instead of package edits.

相关问题