本文整理了Java中org.apache.spark.rdd.RDD.getNumPartitions
方法的一些代码示例,展示了RDD.getNumPartitions
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。RDD.getNumPartitions
方法的具体详情如下:
包路径:org.apache.spark.rdd.RDD
类名称:RDD
方法名:getNumPartitions
暂无
代码示例来源:origin: uber/marmaray
private int calculateHiveNumPartitions(@NonNull final Dataset<Row> data) {
/*
* For now we just return the number of partitions in the underlying RDD, but in the future we can define
* the type of strategy in the configuration and heuristically calculate the number of partitions.
*
* todo: T923425 to actually do the heuristic calculation to optimize num partitions
*/
return data.rdd().getNumPartitions();
}
}
代码示例来源:origin: org.apache.pig/pig
public static int getParallelism(List<RDD<Tuple>> predecessors,
PhysicalOperator physicalOperator) {
if (defaultParallelism != null) {
return getDefaultParallelism();
}
int parallelism = physicalOperator.getRequestedParallelism();
if (parallelism <= 0) {
//Spark automatically sets the number of "map" tasks to run on each file according to its size (though
// you can control it through optional parameters to SparkContext.textFile, etc), and for distributed
//"reduce" operations, such as groupByKey and reduceByKey, it uses the largest parent RDD's number of
// partitions.
int maxParallism = 0;
for (int i = 0; i < predecessors.size(); i++) {
int tmpParallelism = predecessors.get(i).getNumPartitions();
if (tmpParallelism > maxParallism) {
maxParallism = tmpParallelism;
}
}
parallelism = maxParallism;
}
return parallelism;
}
代码示例来源:origin: apache/incubator-nemo
/**
* Static method to create a JavaRDD object from an text file.
*
* @param sparkContext the spark context containing configurations.
* @param minPartitions the minimum number of partitions.
* @param inputPath the path of the input text file.
* @return the new JavaRDD object
*/
public static JavaRDD<String> of(final SparkContext sparkContext,
final int minPartitions,
final String inputPath) {
final DAGBuilder<IRVertex, IREdge> builder = new DAGBuilder<>();
final org.apache.spark.rdd.RDD<String> textRdd = sparkContext.textFile(inputPath, minPartitions);
final int numPartitions = textRdd.getNumPartitions();
final IRVertex textSourceVertex = new SparkTextFileBoundedSourceVertex(sparkContext, inputPath, numPartitions);
textSourceVertex.setProperty(ParallelismProperty.of(numPartitions));
builder.addVertex(textSourceVertex);
return new JavaRDD<>(textRdd, sparkContext, builder.buildWithoutSourceSinkCheck(), textSourceVertex);
}
代码示例来源:origin: apache/incubator-nemo
/**
* Static method to create a JavaRDD object from a Dataset.
*
* @param sparkSession spark session containing configurations.
* @param dataset dataset to read initial data from.
* @param <T> type of the resulting object.
* @return the new JavaRDD object.
*/
public static <T> JavaRDD<T> of(final SparkSession sparkSession,
final Dataset<T> dataset) {
final DAGBuilder<IRVertex, IREdge> builder = new DAGBuilder<>();
final IRVertex sparkBoundedSourceVertex = new SparkDatasetBoundedSourceVertex<>(sparkSession, dataset);
final org.apache.spark.rdd.RDD<T> sparkRDD = dataset.sparkRDD();
sparkBoundedSourceVertex.setProperty(ParallelismProperty.of(sparkRDD.getNumPartitions()));
builder.addVertex(sparkBoundedSourceVertex);
return new JavaRDD<>(
sparkRDD, sparkSession.sparkContext(), builder.buildWithoutSourceSinkCheck(), sparkBoundedSourceVertex);
}
内容来源于网络,如有侵权,请联系作者删除!