csv 替换引号内的换行符[已关闭]

laik7k3q  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(87)

已关闭,此问题需要更focused。它目前不接受回答。
**想改善这个问题吗?**更新问题,使其只关注editing this post的一个问题。

29天前关闭
Improve this question
我有一个包含多列的TSV文件,但它似乎没有正确对齐。一列中的引号之间有换行符(示例中的“Examples”列)。我想把所有的例句(引号内)放在同一栏下。如何使用Python(和JavaScript)解决这个问题?

example.tsv - actual output

ID   Data   Fruit   Examples
1   August   Apple   "I have an apple. (Ana) 
I give her an apple. (Tomas) 
There are apples (Lisa)"
2   July   Melon   "I have a melon. (Ana) 
I give him a melon. (Tomas) 
There are melons (Lisa)"
3   May   Lemon   "I have a lemon. (Ana) 
I give him a lemon. (Tomas) 
There are lemons (Lisa)"
...
example.tsv - ideal output

ID   Data   Fruit   Examples
1   August   Apple   "I have an apple. (Ana) I give her an apple. (Tomas) There are apples (Lisa)"
2   July   Melon   "I have a melon. (Ana) I give him a melon. (Tomas) There are melons (Lisa)"
3   May   Lemon   "I have a lemon. (Ana) I give him a lemon. (Tomas) There are lemons (Lisa)"

编辑:谢谢大家的建议,并为混乱感到抱歉。我现在确实需要Python代码,但我也打算用JavaScript来做。这是我目前为止在Python上使用正则表达式得到的,但这并没有将句子合并在一起。

df = 'example.tsv'
import re
with open(df, 'r+', encoding='utf-8') as file:
    content = file.read()
    content_replaced = re.sub('[^\S\r\n]*[\n\r]\s*', " ", content)
    print(content)
rseugnpd

rseugnpd1#

您可以使用正则表达式-替换引号之后和非引号字符之后以及非引号字符和引号之前的所有新行:

const src = `ID   Data   Fruit   Examples
1   August   Apple   "I have an apple. (Ana) 
I give her an apple. (Tomas) 
There are apples (Lisa)"
2   July   Melon   "I have a melon. (Ana) 
I give him a melon. (Tomas) 
There are melons (Lisa)"
3   May   Lemon   "I have a lemon. (Ana) 
I give him a lemon. (Tomas) 
There are lemons (Lisa)"`

$pre.textContent = src.replace(/(?<="[^"]+)\n(?=[^"]+")/g, '');
<pre id="$pre"></pre>
ttvkxqim

ttvkxqim2#

这已经是一个有效的TSV文件-因为换行符是在引号中,换行符成为单个单元格的一部分,它不会开始一个新的行。没有什么说像TSV这样的计算机格式必须好看。但是如果你确实想consolodate,并且你正在使用python,你可以用CSV解析器读取,改变单元格并写:

import csv

with open("test.tsv", newline="") as infile, open("testout.tsv", "w", newline="") as outfile:
    reader = csv.reader(infile, delimiter="\t")
    writer = csv.writer(outfile, delimiter="\t")
    for row in reader:
        row[3] = " ".join(row[3].split())
        writer.writerow(row)

请注意,您丢失了信息-换行符分隔了句子。

ibps3vxo

ibps3vxo3#

这很难回答,你需要包含一些代码,并告诉我们你已经尝试过了。
也就是说,这里有一个如何使用JavaScript实现这一点的示例,假设您已经将该列的文本作为变量。

// 'text' defined like in your example above. 
// With quote marks at the start and end and with line breaks as \n
const text = '"I have an apple. (Ana)\nI give her an apple. (Tomas)\nThere are apples(Lisa)"';

// To replace the linebreaks with a space character using regular expression:
const regexText = text.replace( /\n/g, " ");

// To replace the linebreaks with a space character using split/join
const splitJoinText = text.split("\n").join(" ");

相关问题