如何从rdd/df创建图形?斯卡拉Spark

beq87vna  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(327)

我的rdd实际上包含一些生物数据,即蛋白质名称,以及它们之间的相似度。我想创建一个图,其中顶点是蛋白质,边表示相似性值。这实际上是我的rdd:

+-------------+------------+------------+
|   Protein1  |  Protein2  | Similarity |
+-------------+------------+------------+
|    P28469   |   Q70UP5   | 0.11111111 |
|    O45687   |   P00325   |    1.0     |
|    A7ME43   |   Q5HG16   |    0.6     |
|    A4VJT7   |   Q9LD43   |    1.0     |
|    P31937   |   Q64415   | 0.07692308 |
|    A1VAA0   |   Q9L298   |    1.0     |
|    B8DG74   |   Q6MT35   |    1.0     |
+-------------+------------+------------+

谢谢您!

edqdpe6u

edqdpe6u1#

不是相同的数据,但您需要这样做(当然是从文件中),并使此方法适应您的数据:

// Vertex DataFrame
val v = sqlContext.createDataFrame(List(
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)
)).toDF("id", "name", "age")
// Edge DataFrame
val e = sqlContext.createDataFrame(List(
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")
)).toDF("src", "dst", "relationship")

val g = GraphFrame(v, e)

就你而言:

// i remember your question on distinct, but not sure if we need ditinct or not
// you talk about RDD but looks like a dataframe, let us assume RDD

//RDD tuple, simulated from file
val rdd = sc.parallelize(Array(("p1", "p2", 1), 
                               ("p1", "p3", 2), 
                               ("p2", "p4", 3), 
                               ("p5", "p6", 4)))
val v = rdd.map(x => x._1).union(rdd.map(x => x._2)).distinct.toDF("protein")
v.collect
val e = rdd.map(x => (x._1, x._2, x._3)).toDF("protein1", "protein2", "similarity")

v.show(false)
e.show(false)

val g = GraphFrame(v, e)

相关问题