pyspark:attributeerror:'pipelinedrdd'对象没有属性'\u get\u object\u id'

d4so4syb 于 2021-06-27 发布在 Hive

关注(0)|答案(1)|浏览(592)

我试图在文件中找到一个特定的字符串，然后用另一个特定的字符串替换它。我在用齐柏林飞艇的笔记本。这是我目前的代码。。

%pyspark
import fileinput
import sys
from pyspark import SparkContext

sc = SparkContext.getOrCreate()
hivectx = HiveContext(sc)
file = sc.textFile('PATH/my_query.sql')
file1 = sc.textFile('PATH/my_query1sql')
phrase = "(Month|| '-' || '5' || '-' || year)"
replace ="('5' || '/' || month || '/' || year)"

read = file.collect()

//for i in read:
     //print i     ---> this successfully prints out my_query.sql file  

for i in read:
    file1 = file1.map(lambda x: x.replace(phrase, replace))
    file1.saveAsTextFile(file1)   // im trying to save it as the empty file "PATH/my_query.sql" also known as file1.

但是，我收到以下错误：

AttributeError: 'PipelinedRDD' object has no attribute '_get_object_id'

我在网上找不到任何有关此错误的文档，该错误名为“\u get\u object\u id”。类似的错误说明这是版本问题？
是这样吗？我的代码中有明显的错误吗？对不起，我不懂这门语言

Hive python pyspark

来源：https://stackoverflow.com/questions/52899337/pyspark-attributeerror-pipelinedrdd-object-has-no-attribute-get-object-id

1条答案

按热度按时间

mqkwyuun1#

如果您想替换文件中的某个文本模式，您可以尝试以下方法，而不使用spark，这对于像sql查询这样的小文件可能会更有效。

with open('PATH/file.sql','r') as f:
    lines = f.readlines()

phrase = "(Month|| '-' || '5' || '-' || year)"
replace ="('5' || '/' || month || '/' || year)"

new_lines = ''.join([i.replace(phrase,replace) for i in lines])

print(new_lines)

with open('text.sql', 'w') as f:
    f.write(new_lines)

文件在这里被读取并存储到一个列表中，然后replace函数将应用于文件的所有行并将其连接起来。最后，写下你想保存的文件。

赞(0）回复(0）举报 2021-06-27

我来回答

pyspark:attributeerror:'pipelinedrdd'对象没有属性'\u get\u object\u id'

1条答案

相关问题

热门标签

最新问答