如何使用pig在mongodb中按id进行过滤

hxzsmxv2  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(360)

我有一个mongo文档,像这样:

db.activity_days.findOne()
{
    "_id" : ObjectId("54b4ee617acf9ce0440a3185"),
    "aca" : 0,
    "ca" : 0,
    "cbdw" : true,
    "day" : ISODate("2014-12-10T00:00:00Z"),
    "dm" : 0,
    "fbc" : 0,
    "go" : 2500,
    "gs" : [ ],
    "its" : [
        {
            "_id" : ObjectId("551ac8d44f9f322e2b055d3a"),
            "at" : 2000,
            "atn" : "Running",
            "cas" : 386.514909469507,
            "dis" : 2.788989730832084,
            "du" : 1472,
            "ibr" : false,
            "ide" : false,
            "lcs" : false,
            "pt" : 0,
            "rpt" : 0,
            "src" : 1001,
            "stp" : 0,
            "tcs" : [ ],
            "ts" : 1418257729,
            "u_at" : ISODate("2015-01-13T00:32:10.954Z")
        }
    ],
    "po" : 0,
    "se" : 0,
    "st" : 0,
    "tap3c" : [ ],
    "tzo" : -21600,
    "u_at" : ISODate("2015-01-13T00:32:10.952Z"),
    "uid" : ObjectId("545eb753ae9237b1df115649")
}

我想用pig过滤特殊的id范围,我可以这样写mongo查询:

db.activity_day.find(_id:{$gt:ObjectId("54a48e000000000000000000"),$lt:ObjectId("54cd6c800000000000000000")})

但我不知道怎么写Pig,有人知道吗?

xwbd5t1u

xwbd5t1u1#

你可以试着用 mongo-hadoop pig的连接器,参见mongohadoop:usage with pig。
一旦你 REGISTER jar(core、pig和java驱动程序),例如。, REGISTER /path-to/mongo-hadoop-pig-<version>.jar; 通过grunt,您可以运行:

SET mongo.input.query '{"_id":{"\$gt":{"\$oid":"54a48e000000000000000000},"\$lt":{"\$oid":"54cd6c800000000000000000}}}'
rangeActivityDay = LOAD 'mongodb://localhost:27017/database.collection' USING com.mongodb.hadoop.pig.MongoLoader()
DUMP rangeActivityDay

您可能希望在转储数据之前使用limit。
以上测试使用: mongo-java-driver-3.0.0-rc1.jar , mongo-hadoop-pig-1.4.0.jar , mongo-hadoop-core-1.4.0.jar 和mongodb v3.0.9

相关问题