csv 使用Java将制表符分隔字符串转换为逗号分隔字符串

3pvhb19x  于 2023-02-01  发布在  Java
关注(0)|答案(1)|浏览(158)

我有制表符分隔的文件作为字符串。在下面的例子中,我有2行。在第一行的值被制表符和换行符分隔。第二行不包含换行符。
我在原始数据中也有标题。我想读取字符串数据的标题和值,并将它们转换为CSV字符串。当我使用CSVParser逐行读取此数据时,它没有提供正确的值,因为某些列也使用\n(换行符)进行了拆分。
但是,每个“行”都以相同的字符串结束,即

"test2222"

第一行

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. 
Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. 
Plant registration was all observed and the weight loads were all abided by."   "test2222"

第二行

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "1" "0" "Level 79"  "16/01/23 11:12:50 pm"  "Logistics - Construction Personnel & Material Lifts"                   "Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft."    "test2222"

字符串形式的数据

String test = "\"abc\"\t\"cde\"\t\"fhg\"\t\"ijk\"\t\"17/01/23 10:09:50 am\"\t\"test111\"\t\"test2\"\t\"Individual\"\t\"Enclosure of Work Areas\"\t\t\"Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. \n" +
            "Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. \n" +
            "Plant registration was all observed and the weight loads were all abided by.\"\t\"test2222\"\n" +
            "\"abc\"\t\"cde\"\t\"fhg\"\t\"ijk\"\t\"17/01/23 10:09:50 am\"\t\"test111\"\t\"test2\"\t\"Individual\"\t\"Enclosure of Work Areas\"\t\t\"1\"\t\"0\"\t\"Level 79\"\t\"16/01/23 11:12:50 pm\"\t\"Logistics - Construction Personnel & Material Lifts\"\t\t\t\t\t\"Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft.\"\t\"test2222\"";

有没有人能帮我解决这个问题。提前感谢!!

1tu0hz3e

1tu0hz3e1#

你可以使用Scanner来读取文件。将分隔符设置为终止文件中“行”的字符串。下面的代码演示了这一点。

import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Scanner;

public class Main {

    public static void main(String[] args) {
        Path source = Paths.get("sampldat.txt");
        try (Scanner reader = new Scanner(source)) {
            reader.useDelimiter("\"test2222\"");
            int counter = 0;
            while (reader.hasNext()) {
                String line = reader.next();
                String[] fields = line.split("\\s+");
                System.out.println("Row: " + ++counter);
                System.out.println(String.join(",", fields));
            }
        }
        catch (IOException xIo) {
            xIo.printStackTrace();
        }
    }
}

文件 sampldat.txt 的内容是您的问题中的两个示例“行“,即

"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "Highlight aluminium personnel lanyarded into the Haulotte boom lift with a spotter. All tools observed to be lanyarded including protection gear. 
Blue glue asset card observed to be attached to the machinery, 10 year inspection of plant not required due to it being only 3yrs old. Last annual inspection august 2022 and logbook was subsequently observed. 
Plant registration was all observed and the weight loads were all abided by."   "test2222"
"abc"   "cde"   "fhg"   "ijk"   "17/01/23 10:09:50 am"  "test111"   "test2" "Individual"    "Enclosure of Work Areas"       "1" "0" "Level 79"  "16/01/23 11:12:50 pm"  "Logistics - Construction Personnel & Material Lifts"                   "Schindler lift cages were observed to be free of any loose debris or material that may pose a risk of falling into the lift shaft below. L80 and L79 were observed to be compliant on both sides of the shaft."    "test2222"

我使用try-with-resources来确保文件已关闭。
方法next将读取到下一个出现的分隔符,即字符串"test2222"
然后,我用[任意]空格(即空格、制表符、换行符等)拆分方法next返回的值。
然后,我调用[static]方法joinjava.lang.String类的)来创建一个逗号分隔的列表。
我在打印输出中添加了一个标题,指示打印的“行”。
这是我得到的输出:

Row: 1
"abc","cde","fhg","ijk","17/01/23,10:09:50,am","test111","test2","Individual","Enclosure,of,Work,Areas","Highlight,aluminium,personnel,lanyarded,into,the,Haulotte,boom,lift,with,a,spotter.,All,tools,observed,to,be,lanyarded,including,protection,gear.,Blue,glue,asset,card,observed,to,be,attached,to,the,machinery,,10,year,inspection,of,plant,not,required,due,to,it,being,only,3yrs,old.,Last,annual,inspection,august,2022,and,logbook,was,subsequently,observed.,Plant,registration,was,all,observed,and,the,weight,loads,were,all,abided,by."
Row: 2
,"abc","cde","fhg","ijk","17/01/23,10:09:50,am","test111","test2","Individual","Enclosure,of,Work,Areas","1","0","Level,79","16/01/23,11:12:50,pm","Logistics,-,Construction,Personnel,&,Material,Lifts","Schindler,lift,cages,were,observed,to,be,free,of,any,loose,debris,or,material,that,may,pose,a,risk,of,falling,into,the,lift,shaft,below.,L80,and,L79,were,observed,to,be,compliant,on,both,sides,of,the,shaft."

相关问题