SQL Server SSIS ForEach Loop Not Iterating Over All Items in a Folder

slsn1g29  于 2023-10-15  发布在  其他
关注(0)|答案(1)|浏览(148)

I have 4 excel files in a single folder. 2 have single sheets with the same name and the other 2 have 2 sheets each (the sheet i want data from have same names).

  • I used a ForEach loop in SSIS to iterate over all the files in the folder.
  • I assigned the initial path to a variable i.e. ExcelFilePath.
  • I configured this variable in my loop container so it can hold the path to all the files in the folder.
  • I've passed this variable as an expresson in Excel File Path property in my Excel Connection Manager string.

This is the design of my project:

However, when I execute the package, only the first data flow object i.e. for Branches gets executed successfully. I figure this is because its path is hardcoded in my variable. All other data flow objects fail.

Execution Result:

Logs:

Error: 0xC004701C at COUNTRY_INFO, SSIS.Pipeline: OLE DB
Destination.Inputs[OLE DB Destination Input].Columns[COUNTRY_CODE] has
lineage ID 255 that was not previously used in the Data Flow task.
Error: 0xC004706B at MG_SEND, SSIS.Pipeline: "Excel Source" failed
validation and returned validation status "VS_ISBROKEN". 
Error: 0xC004700C at MG_SEND, SSIS.Pipeline: One or more component failed
validation. Error: 0xC0024107 at MG_SEND: There were errors during
task validation.
Error: 0xC004706B at COUNTRY_INFO, SSIS.Pipeline: "OLE DB Destination"
failed validation and returned validation status "VS_NEEDSNEWMETADATA".

I've set DelayValidation to True for each component. RetainSameConnection is False in the Connection string Properties.

EDIT: Tried adding a sequence container with Precedence Constraint arrows so the path is only redirect to the respective DFT. Delay Validation is set to True for all components.

Design:

Execution Result is as follows:

Log:

Error: 0xC001000E at Package1: The connection "" is not found. This error is thrown by Connections collection when the specific connection element is not found.
Error: 0xC02020EA at Package1, Log provider "SSIS log provider for SQL Server": The connection manager "" is not found. A component failed to find the connection manager in the Connections collection.
Error: 0xC001000E at Package1: The connection "" is not found. This error is thrown by Connections collection when the specific connection element is not found.
Error: 0xC02020EA at Package1, Log provider "SSIS log provider for SQL Server": The connection manager "" is not found. A component failed to find the connection manager in the Connections collection.
Warning: 0x80019002 at Package1: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  The Execution method succeeded, but the number of errors raised (2) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
Information: 0x4004300A at BRANCHES, SSIS.Pipeline: Validation phase is beginning.
Warning: 0x800470C8 at BRANCHES, Excel Source [26]: The external columns for Excel Source are out of synchronization with the data source columns. The column "BRANCH_CODE" needs to be added to the external columns.
The column "BRANCH_NAME" needs to be added to the external columns.
The column "AGENT_ID" needs to be added to the external columns.
The column "MAKER" needs to be added to the external columns.
The column "CHECKER" needs to be added to the external columns.
The column "CREATE_DATE" needs to be added to the external columns.
The column "BRANCH_COUNTRY" needs to be added to the external columns.
The column "CREATE_STATUS" needs to be added to the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[IS_RECEIVE_ACTIVE] needs to be removed from the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[IS_SEND_ACTIVE] needs to be removed from the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[BASE_RECEIVE_CURRENCY] needs to be removed from the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[COUNTRY_NAME] needs to be removed from the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[COUNTRY_LEGACY_CODE] needs to be removed from the external columns.
The Excel Source.Outputs[Excel Source Output].ExternalColumns[COUNTRY_CODE] needs to be removed from the external columns.
Error: 0xC004706B at BRANCHES, SSIS.Pipeline: "Excel Source" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
Error: 0xC004700C at BRANCHES, SSIS.Pipeline: One or more component failed validation.
Error: 0xC0024107 at BRANCHES: There were errors during task validation.

Please note that ExcelFilePath variable set for BRANCHES path initially. Also, I've tested the container through a Script Task and it is indeed looping over all the paths in the folder.

ctzwtxfj

ctzwtxfj1#

A data flow works based on the design-time premise that the source data which exactly conforms to the design-time value will be loaded super fast and shoved into this destination which also match the design-time value.

SSIS cannot be a generic source to destination engine. It's not how the product is designed.

A step that happens in every package execution every time is Validation. The goal of the validation step is to fail early if something does not meet the contract upon which data flows were built.

You can change the validation to occur on package start to on task start. That way, if you are pulling data from a source that is created in a precursor step, the package will run validation only when the task begins.

What is happening here is that the one Excel connection manager you have is valid for the first Data Flow Task but invalid for the remainder. To get around this, you need to instruct SSIS to wait to validate.

With all of that said, to make your design work as intended:

  • Update the Excel Connection Manager Property of DelayValidation to True
  • Update the 4 Data Flow Tasks to have a property of DelayValidation to True

Now, those 5 changes are not enough to make your process work. Because when the Branches file is found, the COUNTRY_INFO data flow is still going to fire and the metadata does not match.

To rectify this, you need to have a precursor task that does nothing but simply serves as an anchor point. I find Sequence Container to work nicely here

Notice that the green connector has a little f(x) between the sequence container and the data flow tasks. That's because I have changed the Evaluation operation to "Expression and Constraint".

The constraint is the final piece of the puzzle. You need to make this condition true when the name matches whatever the data flow is.

Let's assume that the words in your data flow exist in the file name which is surfaced in the Foreach Enumerator into variable @[User::ExcelFilePath]

The expression for BRANCHES would be something like

FINDSTRING(UPPER(@[User::ExcelFilePath]), "BRANCHES" , 1 ) > 0

That is: Cast the value of Excel File Path to all upper case. If the also-all-uppercase word BRANCHES is in the file path, the FindString method will return a number starting at 1 or greater. If the number from FindString is non-zero, then TRUE, otherwise, it's false

Repeat this pattern for each branch.

What happens is that the path will only light up when the expected file is enumerated (and then validate). Wrong file paths won't light up and therefore won't fail validation.

Edit to address the updated question

The Design data flow shows you have broken components I would not expect execution to succeed.

相关问题