我已将csv文件加载到rdd中,如下所示:
['"3331/587","Sub,Metro","1235","1000"',
'"1234/232","City","8479","2000"',
'"5987/215","Sub,Metro","1111","Unknown"',
'"8794/215","Sub,Metro","1112","1000"',
'"1254/951","City","6598","XXXX"',
'"1584/951","City","1548","Unknown"',
'"1833/331","Sub,Metro","1009","2000"',
'"2213/987","City","1197", ']
我最终想要达到的是
[["3331/587","Sub,Metro","1235","1000"],
["1234/232","City","8479","2000"],
["5987/215","Sub,Metro","1111","Unknown"],
["8794/215","Sub,Metro","1112","1000"],
["1254/951","City","6598","XXXX"],
["1584/951","City","1548","Unknown"],
["1833/331","Sub,Metro","1009","2000"],
["2213/987","City","1197", ]]
如果我使用此代码:
sc.textFile(file).map(lambda l: l.replace(r'"', '').split(','))
它还用逗号分隔值(“sub,metro”)
按逗号拆分时,如何自动忽略“”之间的所有逗号?
1条答案
按热度按时间laximzn51#
下面是我的正则表达式示例。