本文整理了Java中org.apache.spark.sql.DataFrame.schema()
方法的一些代码示例,展示了DataFrame.schema()
的具体用法。这些代码示例主要来源于Github
/Stackoverflow
/Maven
等平台,是从一些精选项目中提取出来的代码,具有较强的参考意义,能在一定程度帮忙到你。DataFrame.schema()
方法的具体详情如下:
包路径:org.apache.spark.sql.DataFrame
类名称:DataFrame
方法名:schema
暂无
代码示例来源:origin: stackoverflow.com
DataFrame df;
SQLContext sqlContext;
Long start;
Long end;
JavaPairRDD<Row, Long> indexedRDD = df.toJavaRDD().zipWithIndex();
JavaRDD filteredRDD = indexedRDD.filter((Tuple2<Row,Long> v1) -> v1._2 >= start && v1._2 < end);
DataFrame filteredDataFrame = sqlContext.createDataFrame(filteredRDD, df.schema());
代码示例来源:origin: org.wso2.carbon.analytics/org.wso2.carbon.analytics.spark.core
public static List<Record> dataFrameToRecordsList(int tenantId, String tableName,
DataFrame dataFrame) {
Row[] rows = dataFrame.collect();
List<Record> records = new ArrayList<>();
StructType schema = dataFrame.schema();
for (Row row : rows) {
records.add(new Record(tenantId, tableName, convertRowAndSchemaToValuesMap(row, schema)));
}
return records;
}
代码示例来源:origin: org.wso2.carbon.analytics/org.wso2.carbon.analytics.spark.core
private AnalyticsQueryResult toResult(DataFrame dataFrame)
throws AnalyticsExecutionException {
int resultsLimit = this.sparkConf.getInt("carbon.spark.results.limit", -1);
if (resultsLimit != -1) {
return new AnalyticsQueryResult(dataFrame.schema().fieldNames(),
convertRowsToObjects(dataFrame.limit(resultsLimit).collect()));
} else {
return new AnalyticsQueryResult(dataFrame.schema().fieldNames(),
convertRowsToObjects(dataFrame.collect()));
}
}
代码示例来源:origin: flipkart-incubator/spark-transformers
@Override
public OneHotEncoderModelInfo getModelInfo(final OneHotEncoder from, DataFrame df) {
OneHotEncoderModelInfo modelInfo = new OneHotEncoderModelInfo();
String inputColumn = from.getInputCol();
//Ugly but the only way to deal with spark here
int numTypes = -1;
Attribute attribute = Attribute.fromStructField(df.schema().apply(inputColumn));
if (attribute.attrType() == AttributeType.Nominal()) {
numTypes = ((NominalAttribute) Attribute.fromStructField(df.schema().apply(inputColumn))).values().get().length;
} else if (attribute.attrType() == AttributeType.Binary()) {
numTypes = ((BinaryAttribute) Attribute.fromStructField(df.schema().apply(inputColumn))).values().get().length;
}
//TODO: Since dropLast is not accesible here, We are deliberately setting numTypes. This is the reason, we should use CustomOneHotEncoder
modelInfo.setNumTypes(numTypes - 1);
Set<String> inputKeys = new LinkedHashSet<String>();
inputKeys.add(from.getInputCol());
modelInfo.setInputKeys(inputKeys);
Set<String> outputKeys = new LinkedHashSet<String>();
outputKeys.add(from.getOutputCol());
modelInfo.setOutputKeys(outputKeys);
return modelInfo;
}
代码示例来源:origin: org.wso2.carbon.analytics/org.wso2.carbon.analytics.spark.core
private void writeDataFrameToDAL(DataFrame data) {
if (this.preserveOrder) {
logDebug("Inserting data with order preserved! Each partition will be written using separate jobs.");
for (int i = 0; i < data.rdd().partitions().length; i++) {
data.sqlContext().sparkContext().runJob(data.rdd(),
new AnalyticsWritingFunction(this.tenantId, this.tableName, data.schema(),
this.globalTenantAccess, this.schemaString, this.primaryKeys, this.mergeFlag,
this.recordStore, this.recordBatchSize), CarbonScalaUtils.getNumberSeq(i, i + 1),
false, ClassTag$.MODULE$.Unit());
}
} else {
data.foreachPartition(new AnalyticsWritingFunction(this.tenantId, this.tableName, data.schema(),
this.globalTenantAccess, this.schemaString, this.primaryKeys, this.mergeFlag,
this.recordStore, this.recordBatchSize));
}
}
内容来源于网络,如有侵权,请联系作者删除!