SSIS C# -使用逗号从CDATA中提取CSV

v9tzhpje  于 2023-06-19  发布在  C#
关注(0)|答案(2)|浏览(111)

我们有一个SSIS包,它以XML文档的形式发出REST API调用,并以XML文档的形式接收响应。我们在SSIS脚本组件(在数据流任务中)中使用RestSharp来发送请求并返回数据。此部件工作正常。
我们遇到的问题是,返回XML将我们需要的数据作为CSV存储在XML响应的CDATA部分中。尽管CSV中的数据被 Package 在双引号内,但我们的代码中设计用于将其拆分并将其分配给输出列的部分由于数据中的逗号而错误地拆分了一些值,尽管是双引号。
这是我试图用来将CSV解析成列的当前代码,然后将这些列发送到脚本组件的输出,并发送到SQL Server中的目标表:

XmlDocument xmlDoc = new XmlDocument();
  xmlDoc.LoadXml(response.Content);

var string1 = xmlDoc.SelectSingleNode("response").InnerText;

string[] splitStart = string1.Split('\n');

int counter1;
int counter2;

for (counter1 = 0; counter1 < splitStart.Length; counter1++)
{
  string[] splitEnd = splitStart[counter1].Split(',');
  counter2 = 0;

  MMABuffer.AddRow();
  MMABuffer.Col01 = splitStart[counter1];
  MMABuffer.Col02 = splitEnd[counter2]; counter2++;
  MMABuffer.Col03 = splitEnd[counter2]; counter2++;
  MMABuffer.Col04 = splitEnd[counter2]; counter2++;
  MMABuffer.Col05 = splitEnd[counter2]; counter2++;
  MMABuffer.Col06 = splitEnd[counter2]; counter2++;
  MMABuffer.Col07 = splitEnd[counter2]; counter2++;
  MMABuffer.Col08 = splitEnd[counter2]; counter2++;
  MMABuffer.Col09 = splitEnd[counter2]; counter2++;
  MMABuffer.Col10 = splitEnd[counter2]; counter2++;
  MMABuffer.Col11 = splitEnd[counter2]; counter2++;
  MMABuffer.Col12 = splitEnd[counter2]; counter2++;
  MMABuffer.Col13 = splitEnd[counter2]; counter2++;
  MMABuffer.Col14 = splitEnd[counter2]; counter2++;
  MMABuffer.Col15 = splitEnd[counter2]; counter2++;
  MMABuffer.Col16 = splitEnd[counter2]; counter2++;
  MMABuffer.Col17 = splitEnd[counter2];

}

输入-string1的值看起来像这样:

"Col01","Col02","Col03","Col04","Col05","Col06","Col07","Col08","Col09","Col10","Col11","Col12","Col13","Col14","Col15","Col16"
"S736243","StackOverflow","900780","Blah Blarg,ePTFE Graft,standardwall, reinforced,l: 80cm, d: 8mm","102010","FE","ePTFE","6790","","FG","Blah Graft","Standard Blah Grafts","Other Blah/Grafts","ePTFE","HCH",""
"S736244","StackOverflow","900782","Blah Blarg,ePTFE Graft,thinwall, reinforced,l: 10cm, d: 5mm","102010","FE","ePTFE","6790","","FG","Blah Graft","Standard Blah Grafts","Other Blah/Grafts","ePTFE","HCH",""
"S736245","StackOverflow","900783","Blah Blarg,ePTFE Graft,thinwall, reinforced,l:40cm, d: 5mm","102010","FE","ePTFE","6790","","FG","Blah Graft","Standard Blah Grafts","Other Blah/Grafts","ePTFE","HCH",""
"S736246","StackOverflow","900784","Blah Blarg,ePTFE Graft,thinwall, reinforced,l: 70cm, d: 5mm","102010","FE","ePTFE","6790","","FG","Blah Graft","Standard Blah Grafts","Other Blah/Grafts","ePTFE","HCH",""
"S736291","StackOverflow","901401","E-vita Blarg 3GØprox: 40mmØdist: 36mmLcovered: 170mm","201010","FE","E-vita Blarg 3G","6720","","FG","Blarg Grafts","Standard Blarg Grafts","Blarg Grafts","Blarg","HCH",""
"S736292","StackOverflow","901402","E-vita Blarg 3GØprox: 40mmØdist: 40mmLcovered: 170mm","201010","FE","E-vita Blarg 3G","6720","","FG","Blarg Grafts","Standard Blarg Grafts","Blarg Grafts","Blarg","HCH",""
"S736293","StackOverflow","901403","E-vita Blarg 3GØprox: 44mmØdist: 40mmLcovered: 170mm","201010","FE","E-vita Blarg 3G","6720","","FG","Blarg Grafts","Standard Blarg Grafts","Blarg Grafts","Blarg","HCH",""
"S736294","StackOverflow","901406","E-vita Blarg 3GØprox: 40mmØdist: 36mmLcovered: 130mm","201010","FE","E-vita Blarg 3G","6720","","FG","Blarg Grafts","Standard Blarg Grafts","Blarg Grafts","Blarg","HCH",""

前四行不能正确导入上面的代码,因为Col04值中存在','-尽管该值被双引号括起来。
最后四行导入正确,因为Col04值中没有','。
我确信问题在于我如何在上面的for {}块中分割输入,但我不确定如何纠正它。我基于另一个API调用包编写了这段代码,但是该数据在其输入中没有任何',',所以没有遇到这个问题。

aydmsdu9

aydmsdu91#

在脚本组件中,添加对Microsoft.VisualBasic的引用-这是使用TextFieldParser所必需的。

using System.IO // required for MemoryStream
using System.Text // required for Encoding.UTF8
using Microsoft.VisualBasic.FileIO;

public override void CreateNewOutputRows
{
    // your code extrating the CDATA string

    // create a MemoryStream from your string1 bytes and fallback to empty string just in case
    using(var ms = new MemoryStream(Encoding.UTF8.GetBytes(string1 ?? "")))
    {
        // create a parser on your stream
        using(var parser = new TextFieldParser(ms))
        {
            // set the field delimiter(s)
            parser.Delimiter = new string[] { "," };

            // your CSV has text in quotes
            parser.HasFieldsEnclosedInQuotes = true;

            // to skip the column header row, just read the line before starting to iterate.
            parser.ReadLine();

            while(!parser.EndOfData)
            {
                // get all fields from the current line
                string[] currentrow = parser.ReadFields();

                // ... and add to your output buffer
                MMABuffer.AddRow();

                MMABuffer.Col01 = currentrow[0];
                MMABuffer.Col02 = currentrow[1];
                // ... etc.
            }
        }
    }
}

您可能必须根据API响应的编码来调整创建MemoryStream的编码。
要检查读取的值,请在脚本组件中放置一个断点,然后使用F10单步执行代码。将鼠标悬停在currentrow上以查看字段数组及其值。

7lrncoxx

7lrncoxx2#

使用反引号``代替“"。

相关问题