我编写了下面的脚本来运行Glue作业:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import *
from awsglue.dynamicframe import DynamicFrame
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
source_data = glueContext.create_dynamic_frame.from_catalog(database = "source_db", table_name = "source_table")
source_data.toDF().createOrReplaceTempView("data")
query = "SELECT id, date_created FROM data"
data_df = spark.sql(query)
data_dynamicframe = DynamicFrame.fromDF(data_df.repartition(1), glueContext, "data_dynamicframe")
target_data = glueContext.write_dynamic_frame.from_catalog(frame = data_dynamicframe, database = "target", table_name = "target_table", transformation_ctx = "target_data")
job.commit()
我在日志里看到这条消息
Thread-4 INFO Log4j appears to be running in a Servlet environment, but there's no log4j-web module available. If you want better web container support, please add the log4j-web JAR to your web archive or server lib directory.
有没有人遇到过同样的情况?剧本有什么问题吗?谢谢!
1条答案
按热度按时间v09wglhw1#
原来有一个错字!
脚本运行正常,但我仍然收到以下消息
我想这将是值得在未来研究的。