在java中将orc转换为json

t5fffqht  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(424)

我试图在单元测试中将输出orc文件转换为java中的json。我一直在阅读他们的单元测试,灵感来自:

PrintStream origOut = System.out;
      String outputFilename = "orc-file-dump.json";
      String tmpFileLocationJson = createTempFileJson();
      FileOutputStream myOut = new FileOutputStream(tmpFileLocationJson);

      // replace stdout and run command
      System.setOut(new PrintStream(myOut, true, StandardCharsets.UTF_8.toString()));
      FileDump.main(new String[]{"data", tmpFileLocationJson});
      System.out.flush();
      System.setOut(origOut);
      System.out.println("done");

像这样的。问题是,我不太确定如何将此代码等同于java utils利用率: java -jar orc-tools-1.5.5-uber.jar data output-1595448128191.orc 例如,输出以下json转储。

{"integerExample":1,"nestedExample":{"sub1":"value1","sub2":42},"dateExample":"2018-01-04"}

所以我想把orc转换成json,这样我就可以在单元测试中交叉引用。
编辑:这可能是包私有的:(https://github.com/apache/orc/blob/b9e82b3d7b473201bdcf46011c3b2fda10ef897f/java/tools/src/java/org/apache/orc/tools/printdata.java#l227

rkue9o1l

rkue9o1l1#

好的,我从hive提供代码,将outputstream重写到filewriter,并将输出重定向到文件中,以便读回测试。

static void printJsonData(String fileName, PrintStream printStream,
      Reader reader) throws IOException, JSONException, org.codehaus.jettison.json.JSONException {
//    OutputStreamWriter out = new OutputStreamWriter(printStream, "UTF-8");
    BufferedWriter out = new BufferedWriter(new FileWriter(fileName.concat(".json")));
    RecordReader rows = reader.rows();
    try {
      TypeDescription schema = reader.getSchema();
      VectorizedRowBatch batch = schema.createRowBatch();
      while (rows.nextBatch(batch)) {
        for (int r = 0; r < batch.size; ++r) {
          JSONWriter writer = new JSONWriter(out);
          printRow(writer, batch, schema, r);
          out.write("\n");
          out.flush();
          if (printStream.checkError()) {
            throw new IOException("Error encountered when writing to stdout.");
          }
        }
      }
    } finally {
      rows.close();
    }
  }

相关问题