如何使用opencsv解析多行记录?

v8wbuo2f  于 2021-06-27  发布在  Java
关注(0)|答案(1)|浏览(546)

我试图用opencsv解析一个类似的文件-

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

我将使用customer对象读取以“cust”开头的记录。customer对象将包含一个事务列表。

public class Customer {
      private String firstName;
      private String middleInitial;
      private String lastName;
      private String address;
      private String city;
      private String state;
      private String zipCode;
      List<Transaction> transactions;
      ...
}

我将使用事务对象来读取以“trans”开头的记录。

public class Transaction {
    private String accountNumber;
    private Date transactionDate;
    private Double amount;
    ...
}

一个客户可以有一个或多个事务。不过,我可以使用csvreader来实现这一点。我能用注解实现同样的效果吗?

vlju58qv

vlju58qv1#

csv文件是列表,对吗?嗯,有些人喜欢单子里的单子。
从文档中
似乎opencsv只能处理单个“物理”csv记录中的子列表,而且似乎没有什么可以处理您的案例。但是,如果可以逐个记录解析输入的csv文档,则可以将解析组织为组解析,以便在组准备就绪后,可以自己反序列化它。
例如,

public static Stream<List<String[]>> readGroups(@WillClose final CSVReader csvReader, final Predicate<? super String[]> isGroupStart,
        final Predicate<? super String[]> isGroupSpan) {
    final Spliterator<List<String[]>> spliterator = new Spliterators.AbstractSpliterator<List<String[]>>(Long.MAX_VALUE, Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.ORDERED) {
        @Override
        public boolean tryAdvance(final Consumer<? super List<String[]>> action) {
            try {
                final String[] head = csvReader.readNextSilently();
                if ( !isGroupStart.test(head) ) {
                    throw new IOException("First record must delimit a group start");
                }
                final List<String[]> buffer = new ArrayList<>();
                buffer.add(head);
                @Nullable
                String[] peeked;
                while ( (peeked = csvReader.peek()) != null && !isGroupStart.test(peeked) ) {
                    if ( !isGroupSpan.test(peeked) ) {
                        throw new IOException("Not a group span");
                    }
                    csvReader.readNextSilently(); // discard the "peeked" state
                    buffer.add(peeked);
                }
                action.accept(buffer);
                return peeked != null;
            } catch ( final IOException ex ) {
                throw new UncheckedIOException(ex);
            }
        }
    };
    return StreamSupport.stream(spliterator, false)
            .onClose(() -> {
                try {
                    csvReader.close();
                } catch ( final IOException ex ) {
                    throw new UncheckedIOException(ex);
                }
            });
}

上面的方法可以从csv生成两个字符串数组列表:

CUST,Warren,Q,Darrow,8272 4th Street,New York,IL,76091
TRANS,1165965,2011-01-22 00:13:29,51.43
CUST,Erica,I,Jobs,8875 Farnam Street,Aurora,IL,36314
TRANS,8116369,2011-01-21 20:40:52,-14.83
TRANS,8116369,2011-01-21 15:50:17,-45.45
TRANS,8116369,2011-01-21 16:52:46,-74.6
TRANS,8116369,2011-01-22 13:51:05,48.55
TRANS,8116369,2011-01-21 16:51:59,98.53

只有这两个组,就可以将每个组反序列化为的一个示例 Customer :

@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Customer {

    final String firstName;
    final String middleInitial;
    final String lastName;
    final String address;
    final String city;
    final String state;
    final String zipCode;
    final List<Transaction> transactions;

}
@AllArgsConstructor
@EqualsAndHashCode
@ToString
final class Transaction {

    final String accountNumber;
    final String id;
    final LocalDateTime transactionDate;
    final BigDecimal amount;

}
public final class CsvTest {

    private static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss");

    @Test
    public void testRead() {
        try ( final Stream<List<String[]>> rawStream = Csv.readGroups(new CSVReader(new InputStreamReader(CsvTest.class.getResourceAsStream("customers.csv"))), CsvTest::isGroupStart, CsvTest::isGroupSpan) ) {
            rawStream
                    .map(CsvTest::parseCustomer)
                    .forEachOrdered(System.out::println);
        }
    }

    private static boolean isGroupStart(final String[] row) {
        return row.length > 0 && row[0].equals("CUST");
    }

    private static boolean isGroupSpan(final String[] row) {
        return row.length > 0 && row[0].equals("TRANS");
    }

    private static Customer parseCustomer(final List<String[]> group) {
        final List<Transaction> transactions = group.subList(1, group.size())
                .stream()
                .map(rawTransaction -> {
                    final String accountNumber = rawTransaction[1];
                    final LocalDateTime transactionDate = LocalDateTime.parse(rawTransaction[2], dateTimeFormatter);
                    final BigDecimal amount = new BigDecimal(rawTransaction[3]);
                    return new Transaction(accountNumber, transactionDate, amount);
                })
                .collect(Collectors.collectingAndThen(Collectors.toList(), Collections::unmodifiableList));
        final String[] rawCustomer = group.get(0);
        final String firstName = rawCustomer[1];
        final String middleInitial = rawCustomer[2];
        final String lastName = rawCustomer[3];
        final String address = rawCustomer[4];
        final String city = rawCustomer[5];
        final String state = rawCustomer[6];
        final String zipCode = rawCustomer[7];
        return new Customer(firstName, middleInitial, lastName, address, city, state, zipCode, transactions);
    }

}

产生以下输出到终端:

Customer(firstName=Warren, middleInitial=Q, lastName=Darrow, address=8272 4th Street, city=New York, state=IL, zipCode=76091, transactions=[Transaction(accountNumber=1165965, transactionDate=2011-01-22T00:13:29, amount=51.43)])
Customer(firstName=Erica, middleInitial=I, lastName=Jobs, address=8875 Farnam Street, city=Aurora, state=IL, zipCode=36314, transactions=[Transaction(accountNumber=8116369, transactionDate=2011-01-21T20:40:52, amount=-14.83), Transaction(accountNumber=8116369, transactionDate=2011-01-21T15:50:17, amount=-45.45), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:52:46, amount=-74.6), Transaction(accountNumber=8116369, transactionDate=2011-01-22T13:51:05, amount=48.55), Transaction(accountNumber=8116369, transactionDate=2011-01-21T16:51:59, amount=98.53)])

我想它应该比opencsv中内置的反序列化还要快一点(+它只是更灵活,但是很无聊)。但是我还不知道如何改进上面的代码来支持csv头而不是硬编码的列位置。

相关问题