如何进行hbase部分扫描？

cqoc49vn 于 2021-06-02 发布在 Hadoop

关注(0)|答案(2)|浏览(441)

我有一个hbase表，其中包含大约1000万条记录。关于hbase，我有三个问题
扫描一千万条记录需要多长时间？
我应该使用hive hbase集成吗？
如果我在每行中只添加一个标识符（如fl01）的前缀，如何执行部分范围扫描？
4294970043|1
column=cf:segmentmultiplefundbdescription，时间戳=1478316937790，值=4294970043 | 1
column=cf:segmentmultiplefundbdescription|u languageid，时间戳=1478316937790，值=505074 4294970043 | 1
column=cf:statementtypecode，timestamp=1478316937790，value=ftn 4294970929 | 1 column=cf:ffaction，timestamp=1478316937790，value=i 4294970929 | 1
column=cf:filename，timestamp=1478316937790，value=basic.financiallineitem.financiallineitem.thirdpartyprivate.ftn.1.2 016-07-15-2108.full 4294970929 | 1 column=cf:filepartition，timestamp=1478316937790，value=thirdpartyprivate 4294970929 | 1
column=cf:filepartitionlocation，时间戳=1478316937790，值=ftn 4294970929 | 1
column=cf:financialconceptcodeglobalsecondary，时间戳=1478316937790，值=4294970929 | 1
column=cf:financialconceptcodeglobalsecondaryid，时间戳=1478316937790，值=4294970929 | 1
列=cf:financialconceptglobal，时间戳=1478316937790，值=metl 4294970929 | 1
column=cf:financialconceptglobalid，时间戳=1478316937790，值=3015071

hadoop Hive hbase

来源：https://stackoverflow.com/questions/40446338/how-to-do-hbase-partial-scan

2条答案

按热度按时间

igetnqfo1#

hbase将执行fts，除非并直到您提供开始和停止行键。因此，如果标识符是行键的一部分，并且行键是固定的，那么可以尝试设置start和stop row key，或者尝试fuzzyfilter。否则，如果标识符s不是行键的一部分，hbase将执行fts。
扫描所需的时间实际上取决于各种因素，如行键的大小、cf的数量、列限定符的数量。。。

赞(0）回复(0）举报 2021-06-03

vlju58qv2#

假设您的键是字符串，并且行在列表中作为Map返回，那么您的范围扫描应该类似于下面的代码。

public List<Map<String,byte[]>> rangeFetch(String valueFrom, String valueTo, String[] columns, int maxrows) {
    ArrayList<Map<String,byte[]>> rst = new ArrayList<Map<String,byte[]>>();
    Scan scn = new Scan();
    scn.setStartRow(valueFrom.getBytes());
    scn.setStopRow (valueTo.getBytes());
    for (String colName : columns) {
        scn.addColumn(colName.getBytes());
    }
    ResultScanner rsc = null;
    int rowCount = 0;
    try {
        rsc = oTbl.getScanner(scn);
        for (Result res=rsc.next(); res!=null && rowCount<maxrows; res=rsc.next()) {
            Map<String,byte[]> row = new HashMap<String,byte[]>();
            for (String colName : columns) {
                KeyValue kvl = res.getColumnLatest("columnFamilyName".getBytes(), colName.getBytes());
                if (kvl!=null) {
                    if (kvl.getValue()!=null)
                        row.put(colName, kvl.getValue());
                }
            } // next
            rst.add(row);             
        } // next
    } finally {
        if (rsc!=null) rsc.close();
    }
    return rst;
}

那就跟我说吧

List<Map<String,byte[]>> results = yourObj.rangeFetch("FL01"+"000000", "FL01"+"999999", new String[]{"column1","column2","column3"}, 10000);

赞(0）回复(0）举报 2021-06-02

我来回答

如何进行hbase部分扫描？

2条答案

相关问题

热门标签

最新问答