我有一些带有字符串ID的列。其中一个读对了,但另一个不管我做什么都不能是红色的。我设置了限制(20),但仍然没有结果。我的Parquet文件是相当大的事实上,但为什么我不能读只有一列,没有它一切顺利。
java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)
pyspark --version
20/06/19 15:56:20 WARN Utils: Your hostname, anton-G5-5587 resolves to a loopback address: 127.0.1.1; using 192.168.0.25 instead (on interface wlp0s20f3)
20/06/19 15:56:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/anton/Documents/libs/spark/jars/spark-unsafe_2.12-3.0.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0
/_/
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.7
Branch HEAD
Compiled by user ubuntu on 2020-06-06T13:05:28Z
我使用为Apache3.2和更高版本预先构建的spark
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
df = spark.read.load("data/purchases.parquet")
df.select('client_id', 'product_id').limit(20).write.csv('c_id_to_p_id.csv')
在我运行了上面的代码之后,我得到了以下信息(完整的消息太长,无法发布)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/anton/.local/lib/python3.6/site-packages/py4j/java_gateway.py", line 1115, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:46655)
Traceback (most recent call last):
File "/home/anton/.local/lib/python3.6/site-packages/py4j/java_gateway.py", line 977, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/anton/.local/lib/python3.6/site-packages/py4j/java_gateway.py", line 1115, in start
self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
我的Parquet文件架构如下所示
StructType(
List(
StructField(client_id,StringType,true),
...
StructField(product_id,StringType,true),
...
))
有人遇到过这样的问题吗?谢谢你的帮助
暂无答案!
目前还没有任何答案,快来回答吧!