spark graphx发行版

zzlelutf  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(428)

我正试着在实践中学习这个榜样https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-python.html
然而,当改变某些标准时,结果并不符合预期。请参见以下步骤-
从functools import reduce from pyspark.sql.functions import col,lit,when from graphframes import*

vertices = sqlContext.createDataFrame([
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)], ["id", "name", "age"])

edges = sqlContext.createDataFrame([
  ("a", "b", "follow"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "follow"),
  ("d", "a", "follow"),
  ("a", "e", "follow")
], ["src", "dst", "relationship"])

g = GraphFrame(vertices, edges)

现在我在“关系”列中做了一个更改,所有值都是“follow”而不是“friend”。
下面的查询运行正常-

g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 32", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()

+--------------+--------------+---------------+--------------+----------------+
|          from|            e0|             v1|            e1|              to|
+--------------+--------------+---------------+--------------+----------------+
|[a, Alice, 34]|[a, e, follow]|[e, Esther, 32]|[e, d, follow]|  [d, David, 29]|
|[a, Alice, 34]|[a, b, follow]|   [b, Bob, 36]|[b, c, follow]|[c, Charlie, 30]|
+--------------+--------------+---------------+--------------+----------------+

但如果我将筛选条件从32更改为40,将获取错误的结果-

>>> g.bfs(fromExpr ="name = 'Alice'",toExpr = "age < 35", edgeFilter ="relationship != 'friend'" , maxPathLength = 10).show()
+--------------+--------------+
|          from|            to|
+--------------+--------------+
|[a, Alice, 34]|[a, Alice, 34]|
+--------------+--------------+

理想情况下,它应该从第一个查询中获取类似的结果,因为所有行的筛选条件仍然得到满足。
有什么解释吗?

k5hmc34c

k5hmc34c1#

bfs()搜索满足 predicate 的第一个结果。艾丽丝今年34岁,符合 toExpr = "age < 35" 所以你得到了从alice开始的零长度路径。请更改为EXPR以获取更具体的信息。例如 toExpr ="name = 'David' or name = 'Charlie'" 应该给出与第一个查询完全相同的结果。

相关问题