using Sylvan.Data.Csv;
var data =
"""
a,b,c
1,2,3
d,e,f,g
4,5,6,7
""";
// MutliResult will identify a new "table" any time the number of columns changes
// any empty lines between tables are skipped.
var opts = new CsvDataReaderOptions { ResultSetMode = ResultSetMode.MultiResult };
// CsvDataReader can be used as a DbDataReader, so can be fed directly to SqlBulkCopy.
System.Data.Common.DbDataReader csv = CsvDataReader.Create(new StringReader(data), opts);
do
{
Console.WriteLine(csv.GetName(0));
while (csv.Read())
{
Console.WriteLine(csv.GetString(0));
}
Console.WriteLine("---");
} while (csv.NextResult());
// outputs:
// a
// 1
// ---
// d
// 4
// ---
using Sylvan.Data;
using Sylvan.Data.Csv;
var data =
"""
a,b,c
1,2,3
d,e,f
4,5,6
""";
var csv = CsvDataReader.Create(new StringReader(data));
// the batchReader will read until it finds a row starting with "d".
// you can customize this logic to identify when the next table starts in your data.
// The batchReader here is a wrapper around the csv reader, and will yield rows as long as the "TakeWhile"
// predicate is true
System.Data.Common.DbDataReader batchReader = csv.TakeWhile(r => r.GetString(0) != "d");
// process the table using the standard DbDataReader APIs.
while (batchReader.Read())
{
Console.WriteLine(csv.GetName(0));
Console.WriteLine(batchReader.GetString(0));
}
Console.WriteLine("---");
// The csv reader is now positioned on the start of the next table
// calling initialize will re-initialize the CsvDataReader with the current row
// this will cause the "d,e,f" headers to be loaded.
csv.Initialize();
// consume the rest of the CSV data.
// Or, you might need to use another TakeWhile.
while (csv.Read())
{
Console.WriteLine(csv.GetName(0));
Console.WriteLine(csv.GetString(0));
}
Console.WriteLine("---");
1条答案
按热度按时间gz5pxeao1#
我维护了一个可能对您有用的库:Sylvan.Data.Csv,支持包含“多表”的CSV文件。要使此功能正常工作,它要求每个表具有不同数量的列,或者它们之间只有一个空行。
以下示例显示如何使用“MutliResult”模式:
但是,如果两个连续的表包含相同数量的列,并且没有空行分隔它们,则此功能将不起作用,因为它会将下一个表视为第一个表的延续。
在这种情况下,可以使用Sylvan.Data库的一个特性来标识下一个表的开始位置。
此示例使用“TakeWhile”扩展方法来标识下一个表何时开始。
第二个示例产生与第一个示例相同的输出。