csv TSQL展平拆分制表符分隔列

ovfsdjhp  于 2023-05-26  发布在  其他
关注(0)|答案(1)|浏览(312)

我正在将错误日志平面文件导入到SQL Server中,需要将制表符分隔的列解析为多个列。从这个问题(SQL Split Tab Delimited Column)中大量借用,特别是@Lobo的答案,我想完成几件事:

  • 我捕获在[Column 0]上使用STRING_SPLIT()创建的最大列数,以便在列数超过当前已知的列数时创建动态枢轴
  • 正确地合并解析的[列0]列与其余的

记录
第一个目标(动态列计数)我可以暂时不考虑,但它是我遇到问题的第二个目标。

DECLARE @SAMPLE_TABLE table(
    [Column 0] nvarchar(4000),
    [Filename] nvarchar(260),
    FileExtention varchar(255),
    DateTimeStamp datetime,
    CustomerNumber varchar(255),
    FileType varchar(255),
    ImportSetNumber varchar(255)
)

表中的一些示例数据:

[Column 0]                                                                                                                                | [Filename]                                                       | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1<tab>Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker.<tab><tab>| E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.<tab>0<tab>218                    | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.<tab>0<tab>17                     | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186

游戏开始

;WITH
CTE_Columns AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 'MyRowID',
            [Filename],
            FileExtention,
            DateTimeStamp,
            CustomerNumber,
            FileType,
            ImportSetNumber,
            A.ColID 'ColumnNumber',
            A.Cols 'ColumnValue'
    FROM @SAMPLE_TABLE
    CROSS APPLY (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
                value [Cols]
                FROM STRING_SPLIT([Column 0], CHAR(9))  -- split by tab character
    )A
)

SELECT MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        NULLIF(TRIM([1]), '') 'FirstColumn',
        NULLIF(TRIM([2]), '') 'SecondColumn',
        NULLIF(TRIM([3]), '') 'ThirdColumn',
        NULLIF(TRIM([4]), '') 'FourthColumn'
FROM (
        SELECT MyRowID,
                [Filename],
                FileExtention,
                DateTimeStamp,
                CustomerNumber,
                FileType,
                ImportSetNumber,
                ColumnNumber,
                ColumnValue
        FROM CTE_Columns
)Q
PIVOT(MAX(Q.ColumnValue) FOR ColumnNumber IN([1], [2], [3], [4])) PIV
ORDER BY CustomerNumber,
            ImportSetNumber

此查询生成以下结果集:

MyRowID | Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
2       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
3       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
4       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
5       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
6       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
7       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | 0           | NULL
8       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | 218
9       | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | NULL                                                                                                                       | NULL        | NULL
10      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
11      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | 0           | NULL
12      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | NULL        | 17

根据上面的结果集,第1- 4行应该是一条记录,第5 - 8行应该是第二条记录,第9 - 12行应该是第三条记录,这样我就得到了我想要的最终状态:

Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | 0           | 218
E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | 0           | 17

我相信这只是一个正确分组的简单问题,但是我不确定分组在什么上面,或者在哪里正确地放置分组(是分区还是标准的GROUPBY子句)

yhxst69z

yhxst69z1#

您需要以正确的方式重新分组行,并通过添加一个可以充当分组标识符的新列来合并解析的列:

;WITH CTE_Columns AS (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        A.ColID AS ColumnNumber,
        A.Cols AS ColumnValue
    FROM @SAMPLE_TABLE
    CROSS APPLY (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
            value AS Cols
            FROM STRING_SPLIT([Column 0], CHAR(9)) -- split by tab character
    ) A
), CTE_Grouping AS (
    SELECT MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        MAX(CASE WHEN ColumnNumber = 1 THEN ColumnValue END) AS FirstColumn,
        MAX(CASE WHEN ColumnNumber = 2 THEN ColumnValue END) AS SecondColumn,
        MAX(CASE WHEN ColumnNumber = 3 THEN ColumnValue END) AS ThirdColumn,
        MAX(CASE WHEN ColumnNumber = 4 THEN ColumnValue END) AS FourthColumn
    FROM CTE_Columns
    GROUP BY MyRowID, [Filename], FileExtention, DateTimeStamp, CustomerNumber, FileType, ImportSetNumber
)
SELECT [Filename],
    FileExtention,
    DateTimeStamp,
    CustomerNumber,
    FileType,
    ImportSetNumber,
    FirstColumn,
    SecondColumn,
    ThirdColumn,
    FourthColumn
FROM CTE_Grouping
ORDER BY CustomerNumber, ImportSetNumber

CTE_Grouping内部,MAX函数与条件语句一起使用,以根据相应的列编号ColumnNumber组合解析的列ColumnValue。这将确保每个分组只有一行包含组合值。
外部查询从CTE_Grouping中选择所需的列,并按CustomerNumberImportSetNumber对结果进行排序。

相关问题