I am building an SSIS package to pull data from a BMC Remedy database (via the AR System ODBC driver) into a table in my SQL Server database. Note that there are a few peculiar limitations when dealing with this data source.
The issue that I'm having is that one of the columns I need to pull in is showing as a dt_ntext (Unicode text stream). The field's data type (when viewing in TOAD) is "longvarchar". The field can contain strings of thousands of characters.
I can import all of the other fields into my SQL Server table fine as long as I omit this one column. If I include this, the package hangs (in BIDS and in production on the server) indefinitely. I have let it run for over 12 hours, and it never makes any progress. Without this one column, it takes under a minute. In BIDS I can see it is hanging on the "Source" step of the Data Flow task, on "Execute phase is beginning". It hangs there regardless of the destination (same result dumping to text file).
I don't need all of the data from this field. The first 200 characters would actually suffice. However, I don't have the option to change my source SQL statement, as there are no functions (i.e. substring) allowed (one of the aforementioned limitations). I tried opening advanced editor on the source, and changing the output properties for that column to a Unicode string with length 200. It didn't cause errors, but the result is the same (hangs). I assume that means that the data still has to be "brought in" and THEN truncated to 200, which is doing me no good.
The data is not THAT big...I can run the query in TOAD and return all the rows in under a minute, without smoke pouring out of my machine. As such, I feel like this is some sort of SSIS optimizing issue.
The way I see it, I need one of two things. 1) to truncate the data before it comes in to memory (without doing so in my SELECT statement), or 2) to make some configuration changes to my package (buffer size/rows?) that will allow it to run in a reasonable amount of time. I'm not sure how to achieve either of these. Any guidance would be greatly appreciated.
Thanks, Eric
2条答案
按热度按时间fsi0uk1n1#
In the end, I wasnt able to resolve this within the Visual Studio / BIDS 2008/2012 designer. However, we are in the process of migrating to SQL 2016. As part of that process, we installed SSDT 2015, which has an "ODBC Source" object (as opposed to the ADO Net source previously available). Once I switched to the ADO Net Source, the long column no longer caused the package to hang.
However, it is worth noting that I am using a Data Conversion step in my Data Flow to convert the text unicode text stream into a string. I'm also truncating the output of the data conversion as follows: right click on the Data Conversion step, select "Advanced editor", go to Input and Output Properties, Go to Data Conversion Output -> Output Columns -> Copy of Work_Log (Work_Log is my problematic column name), change TruncationRowDisposition to RD_IgnoreFailure , change DataType to string [DT_STR] and change Length to 500. Then connect the Data Conversion step to the output and use the "Copy of" column, not the original column. I'm not 100% sure if the truncation was necessary to avoif the error I was getting, but my destination was already set up for only 500 characters, as in my case I know the data I need will be within the first couple hundred characters, but thought it was worth mentioning anyway.
I also noticed something interesting along the way. My ODBC Source task has an exclamation point (warning) showing on it. When I hover over it, I see the warning, "Row by Row fetch method is enforced because the table has LOB column(s). Column Work_Log is LOB. I did not see this (or anything similar) in the 2008 / 2012 designer, so apparently the new ODBC Source is smart enough to handle the wide columns more gracefully (Row by Row?). I'm not sure if there is a downside / performance cost to operating in this mode, but it is working fine in my case.
siv3szwd2#
In my case the source is Filemaker ODBC which also treats long text as LOB datatype. My package used to hang for a long time due to the extreme decrease in performance for Row by Row fetch method is enforced because the table has LOB column(s). Thus, while being deployed it used to timeout after a long time and fail eventually.
I am sharing the actual solution that worked like a charm for me. One day worth over 30k LOB type data pull took approx 10 minutes for me::
Lower the DefaultBufferMaxRows down to 1 and increase DefaultBufferSize to maximum i.e. 100 MB. Then change the source ODBC DSN by checking the option 'treat text as long varchar'. And map the datatypes as is from source to target (without any change in advanced editor in source).