如何使用write()方法用spark在s3 bucket中写入txt文件

deyfvvtc  于 2023-01-05  发布在  Apache
关注(0)|答案(1)|浏览(202)

我试图写数据集在txt格式的s3桶使用Spark。
但我收到以下错误:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:64)
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

enter image description here
我的代码:

override fun write(input: Dataset<String>) =

        input.coalesce(NUMBER_PARTITIONS).write().text(S3_BUCKET_PATH)
            .also {
                LOGGER.logInfo(
                    LOG_MESSAGE_TEMPLATE,
                    READ_DATA_METHOD,
                    WRITE_MESSAGE
                )
            }

enter image description here
我的Spark配置:*

object SparkConfiguration {
    private const val SPARK_MASTER_NAME = "spark.master"
    private const val SPARK_APP_NAME_CONFIG = "spark.app.name"
    fun buildSparkSession(config: Config): SparkSession {
        return SparkSession.builder()
            .config(buildSparkConfig(config))
            .orCreate
    }
    fun buildSparkConfig(config: Config): SparkConf = SparkConf()
        .setMaster(config.getString(SPARK_MASTER_NAME))
        .setAppName(config.getString(SPARK_APP_NAME_CONFIG))
}

enter image description here

6ojccjat

6ojccjat1#

这是由于运行Spark Job时没有权限:

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

确保您有权限对S3运行代码。
理想情况下,在conf/core-site.xml中将凭据设置为:

<configuration>
  <property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>XXXXXX</value>
  </property>

  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>XXXXXX</value>
  </property>
</configuration>

或在计算机上重新安装awscli和。

pip install awscli

那么

aws configure

相关问题