如何使用python/pyspark运行graphx?

nnt7mjpx  于 2021-06-03  发布在  Hadoop
关注(0)|答案(3)|浏览(617)

我正在尝试使用pyspark用python运行spark graphx。我的安装看起来是正确的,因为我能够很好地运行pyspark教程和(java)graphx教程。大概因为graphx是spark的一部分,pyspark应该能够连接它,对吗?
以下是pyspark的教程:http://spark.apache.org/docs/0.9.0/quick-start.htmlhttphttp://spark.apache.org/docs/0.9.0/python-programming-guide.html
以下是graphx的示例:http://spark.apache.org/docs/0.9.0/graphx-programming-guide.htmlhttp网址:ampcamp.berkeley.edu/big-data-mini-course/graph-analytics-with-graphx.html
有人能把graphx教程转换成python吗?

lstz6jyr

lstz6jyr1#

graphx0.9.0还没有pythonapi。预计在即将发布的版本中。

hvvq6cgz

hvvq6cgz2#

看起来到graphx的python绑定至少延迟到spark 1.41.5∞. 它在javaapi后面等待。
您可以在spark-3789 graphx-asf jira的graphx-python绑定中跟踪状态

1qczuiv0

1qczuiv03#

你应该看看笔架(https://github.com/graphframes/graphframes),它将graphx算法封装在dataframesapi下,并提供python接口。
下面是一个来自https://graphframes.github.io/graphframes/docs/_site/quick-start.html,稍加修改即可工作
首先在graphframes pkg加载的情况下启动pyspark pyspark --packages graphframes:graphframes:0.1.0-spark1.6 python代码:

from graphframes import *

# Create a Vertex DataFrame with unique ID column "id"

v = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
], ["id", "name", "age"])

# Create an Edge DataFrame with "src" and "dst" columns

e = sqlContext.createDataFrame([
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
], ["src", "dst", "relationship"])

# Create a GraphFrame

g = GraphFrame(v, e)

# Query: Get in-degree of each vertex.

g.inDegrees.show()

# Query: Count the number of "follow" connections in the graph.

g.edges.filter("relationship = 'follow'").count()

# Run PageRank algorithm, and show results.

results = g.pageRank(resetProbability=0.01, maxIter=20)
results.vertices.select("id", "pagerank").show()

相关问题