我正在将错误日志平面文件导入到SQL Server中,需要将制表符分隔的列解析为多个列。从这个问题(SQL Split Tab Delimited Column)中大量借用,特别是@Lobo的答案,我想完成几件事:
- 我捕获在[Column 0]上使用STRING_SPLIT()创建的最大列数,以便在列数超过当前已知的列数时创建动态枢轴
- 正确地合并解析的[列0]列与其余的
记录
第一个目标(动态列计数)我可以暂时不考虑,但它是我遇到问题的第二个目标。
DECLARE @SAMPLE_TABLE table(
[Column 0] nvarchar(4000),
[Filename] nvarchar(260),
FileExtention varchar(255),
DateTimeStamp datetime,
CustomerNumber varchar(255),
FileType varchar(255),
ImportSetNumber varchar(255)
)
表中的一些示例数据:
[Column 0] | [Filename] | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1<tab>Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker.<tab><tab>| E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001
1<tab>Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.<tab>0<tab>218 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001
1<tab>Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.<tab>0<tab>17 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186
游戏开始
;WITH
CTE_Columns AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 'MyRowID',
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
A.ColID 'ColumnNumber',
A.Cols 'ColumnValue'
FROM @SAMPLE_TABLE
CROSS APPLY (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
value [Cols]
FROM STRING_SPLIT([Column 0], CHAR(9)) -- split by tab character
)A
)
SELECT MyRowID,
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
NULLIF(TRIM([1]), '') 'FirstColumn',
NULLIF(TRIM([2]), '') 'SecondColumn',
NULLIF(TRIM([3]), '') 'ThirdColumn',
NULLIF(TRIM([4]), '') 'FourthColumn'
FROM (
SELECT MyRowID,
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
ColumnNumber,
ColumnValue
FROM CTE_Columns
)Q
PIVOT(MAX(Q.ColumnValue) FOR ColumnNumber IN([1], [2], [3], [4])) PIV
ORDER BY CustomerNumber,
ImportSetNumber
此查询生成以下结果集:
MyRowID | Filename | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn | ThirdColumn | FourthColumn
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | NULL | NULL | NULL
2 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL | NULL
3 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | NULL
4 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | NULL
5 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | NULL | NULL | NULL
6 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation. | NULL | NULL
7 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | 0 | NULL
8 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | 218
9 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | 1 | NULL | NULL | NULL
10 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation. | NULL | NULL
11 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | NULL | 0 | NULL
12 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | NULL | NULL | 17
根据上面的结果集,第1- 4行应该是一条记录,第5 - 8行应该是第二条记录,第9 - 12行应该是第三条记录,这样我就得到了我想要的最终状态:
Filename | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn | ThirdColumn | FourthColumn
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL | NULL
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation. | 0 | 218
E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | 1 | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation. | 0 | 17
我相信这只是一个正确分组的简单问题,但是我不确定分组在什么上面,或者在哪里正确地放置分组(是分区还是标准的GROUPBY子句)
1条答案
按热度按时间yhxst69z1#
您需要以正确的方式重新分组行,并通过添加一个可以充当分组标识符的新列来合并解析的列:
在
CTE_Grouping
内部,MAX
函数与条件语句一起使用,以根据相应的列编号ColumnNumber
组合解析的列ColumnValue
。这将确保每个分组只有一行包含组合值。外部查询从
CTE_Grouping
中选择所需的列,并按CustomerNumber
和ImportSetNumber
对结果进行排序。