scala 错误:对象apache不是软件包org的成员

rqqzpn5f  于 2023-04-06  发布在  Scala
关注(0)|答案(1)|浏览(304)

我在docker上学习scala,docker上没有sbt或maven,我遇到了这个错误,所有的互联网解决方案都涉及sbt或maven,我想知道这个问题是否可以在没有sbt或maven的情况下解决。
希望使用以下内容创建jar
scalac problem1.scala -d problem1.jar
错误:
问题1. scala:3:错误:对象apache不是包org import org.apache.spark.SparkContext的成员
验证码:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.log4j.{Logger,Level}
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions.lit
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StructType, StructField,  LongType, StringType}
//import org.apache.parquet.format.StringType

object problem1 {
  def main(args: Array[String]) {
    Logger.getLogger("org").setLevel(Level.OFF)
    //Create conf object
    val conf = new SparkConf().setMaster("local[2]").setAppName("loadData")
    //create spark context object
    val sc = new SparkContext(conf)

    val SQLContext = new SQLContext(sc)
    import SQLContext.implicits._

    //Read file and create RDD
    val table_schema = StructType(Seq(
      StructField("TransID", LongType, true),
      StructField("CustID", LongType, true),
      StructField("TransTotal", LongType, true),
      StructField("TransNumItems", LongType, true),
      StructField("TransDesc", StringType, true)
    ))
    val T = SQLContext.read
      .format("csv")
      .schema(table_schema)
      .option("header","false")
      .option("nullValue","NA")
      .option("delimiter",",")
      .load(args(0))
    //    T.show(5)

    val T1 = T.filter($"TransTotal" >= 200)
    //    T1.show(5)
    val T2 = T1.groupBy("TransNumItems").agg(sum("TransTotal"), avg("TransTotal"),
      min("TransTotal"), max("TransTotal"))
    //    T2.show(500)
    T2.show()
    val T3 =  T1.groupBy("CustID").agg(count("TransID").as("number_of_transactions_T3"))
    //    T3.show(50)
    val T4 = T.filter($"TransTotal" >= 600)
    //   T4.show(5)
    val T5 = T4.groupBy("CustID").agg(count("TransID").as("number_of_transactions_T5"))
    //    T5.show(50)
    val temp = T3.as("T3").join(T5.as("T5"), ($"T3.CustID" === $"T5.CustID") )
    //    T6.show(5)
    //    print(T6.count())
    val T6 = temp.where(($"number_of_transactions_T5")*5 < $"number_of_transactions_T3")
    //    T6.show(5)
    T6.show()
    sc.stop
  }
}
niwlg2el

niwlg2el1#

  • 为什么不选择带有sbt的Docker镜像?
  • 无论如何,是的,你当然可以使用纯Scala从命令行创建一个jar而不需要sbt。你应该有依赖jar(spark-corespark-catalystspark-sqllog4j,如果需要的话,可能还有其他一些)并手动指定classpath
scalac -cp /path/to/spark-core_2.13-3.3.1.jar:/path/to/spark-catalyst_2.13/3.3.1/spark-catalyst_2.13-3.3.1.jar:/path/to/spark-sql_2.13/3.3.1/spark-sql_2.13-3.3.1.jar:/path/to/log4j-1.2-api-2.17.2.jar -d problem1.jar problem1.scala

例如,path/to如下所示:

scalac -cp /home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-core_2.13/3.3.1/spark-core_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.13/3.3.1/spark-catalyst_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.3.1/spark-sql_2.13-3.3.1.jar:/home/dmitin/.cache/coursier/v1/https/repo1.maven.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.17.2/log4j-1.2-api-2.17.2.jar -d problem1.jar problem1.scala
  • 或者,在有sbt的地方,可以创建一个包含所有依赖项(甚至包含应用程序和所有依赖项)的fatjar(sbt assembly)并使用它。
scalac -cp fat-jar.jar -d problem1.jar problem1.scala

https://github.com/sbt/sbt-assembly

  • 还有一个选择是为你的应用程序创建sbt启动器

https://www.scala-sbt.org/1.x/docs/Sbt-Launcher.html
SBT gives java.lang.NullPointerException when trying to run spark
Sbt启动器有助于在只安装了Java的环境中运行应用程序。

  • 还有一个选择是用Coursier以编程方式管理依赖项

Can you import a separate version of the same dependency into one build file for test?
How to compile and execute scala code at run-time in Scala3?
How can I run generated code during script runtime?

相关问题