如何使用write()方法用spark在s3 bucket中写入txt文件

deyfvvtc 于 2023-01-05 发布在 Apache

关注(0)|答案(1)|浏览(201)

我试图写数据集在txt格式的s3桶使用Spark。
但我收到以下错误：

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:64)
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

enter image description here
我的代码：

override fun write(input: Dataset<String>) =

        input.coalesce(NUMBER_PARTITIONS).write().text(S3_BUCKET_PATH)
            .also {
                LOGGER.logInfo(
                    LOG_MESSAGE_TEMPLATE,
                    READ_DATA_METHOD,
                    WRITE_MESSAGE
                )
            }

enter image description here
我的Spark配置：*

object SparkConfiguration {
    private const val SPARK_MASTER_NAME = "spark.master"
    private const val SPARK_APP_NAME_CONFIG = "spark.app.name"
    fun buildSparkSession(config: Config): SparkSession {
        return SparkSession.builder()
            .config(buildSparkConfig(config))
            .orCreate
    }
    fun buildSparkConfig(config: Config): SparkConf = SparkConf()
        .setMaster(config.getString(SPARK_MASTER_NAME))
        .setAppName(config.getString(SPARK_APP_NAME_CONFIG))
}

enter image description here

apache-spark

来源：https://stackoverflow.com/questions/75006210/how-to-write-txt-file-in-s3-bucket-with-spark-using-write-method

1条答案

按热度按时间

6ojccjat1#

这是由于运行Spark Job时没有权限：

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

确保您有权限对S3运行代码。
理想情况下，在conf/core-site.xml中将凭据设置为：

<configuration>
  <property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>XXXXXX</value>
  </property>

  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>XXXXXX</value>
  </property>
</configuration>

或在计算机上重新安装awscli和。

pip install awscli

那么

aws configure

赞(0）回复(0）举报 2023-01-05

我来回答

如何使用write()方法用spark在s3 bucket中写入txt文件

1条答案

相关问题

热门标签

最新问答