solr 云搜索按查询删除

tgabmvqs  于 2022-11-05  发布在  Solr
关注(0)|答案(3)|浏览(198)

官方的Solr Java API有一个deleteByQuery操作,我们可以在这里删除满足查询的文档。AWS CloudSearch SDK似乎没有匹配的功能。我只是没有看到deleteByQuery的等效功能,还是我们需要自己推出?
大概是这样的:

SearchRequest searchRequest = new SearchRequest();
searchRequest.setQuery(queryString);
searchRequest.setReturn("id,version");
SearchResult searchResult = awsCloudSearch.search(searchRequest);
JSONArray docs = new JSONArray();
for (Hit hit : searchResult.getHits().getHit()) {
    JSONObject doc = new JSONObject();
    doc.put("id", hit.getId());
    // is version necessary?
    doc.put("version", hit.getFields().get("version").get(0));
    doc.put("type", "delete");
    docs.put(doc);
}
UploadDocumentsRequest uploadDocumentsRequest = new UploadDocumentsRequest();
StringInputStream documents = new StringInputStream(docs.toString());
uploadDocumentsRequest.setDocuments(documents);
UploadDocumentsResult uploadResult = awsCloudSearch.uploadDocuments(uploadDocumentsRequest);

这合理吗?有没有更简单的方法?

k10s72fa

k10s72fa1#

你说得对,CloudSearch没有与deleteByQuery等价的方法,你的方法看起来是次佳选择。
不,version是不必要的--它在2013年1月1日的CloudSearch API(又名v2)中被删除了。

kuarbcqp

kuarbcqp2#

CloudSearch不提供删除作为查询,它支持删除的方式略有不同,即构建只有文档ID的json对象(要删除),操作应指定为删除。这些json对象可以一起批处理,但批处理大小必须小于5 MB。
下面的类支持此功能,您只需向其delete方法传递要删除的id数组:

class AWS_CS
{
    protected $client;

    function connect($domain)
    {
        try{
            $csClient = CloudSearchClient::factory(array(
                            'key'          => 'YOUR_KEY',
                            'secret'      => 'YOUR_SECRET',
                            'region'     =>  'us-east-1'

                        ));

            $this->client = $csClient->getDomainClient(
                        $domain,
                        array(
                            'credentials' => $csClient->getCredentials(),
                            'scheme' => 'HTTPS'
                        )
                    );
        }
        catch(Exception $ex){
            echo "Exception: ";
            echo $ex->getMessage();
        }
        //$this->client->addSubscriber(LogPlugin::getDebugPlugin());        
    }
    function search($queryStr, $domain){

        $this->connect($domain);

        $result = $this->client->search(array(
            'query' => $queryStr,
            'queryParser' => 'lucene',
            'size' => 100,
            'return' => '_score,_all_fields'
            ))->toArray();

        return json_encode($result['hits']);
        //$hitCount = $result->getPath('hits/found');
        //echo "Number of Hits: {$hitCount}\n";
    }

    function deleteDocs($idArray, $operation = 'delete'){

        $batch = array();

        foreach($idArray as $id){
            //dumpArray($song);
            $batch[] = array(
                        'type'        => $operation,
                        'id'        => $id);                       
        }
        $batch = array_filter($batch);
        $jsonObj = json_encode($batch, JSON_HEX_TAG | JSON_HEX_APOS | JSON_HEX_QUOT | JSON_HEX_AMP);

        print_r($this->client->uploadDocuments(array(
                        'documents'     => $jsonObj,
                        'contentType'     =>'application/json'
                    )));

        return $result['status'] == 'success' ? mb_strlen($jsonObj) : 0;
    }   
}
eqzww0vc

eqzww0vc3#

针对C#进行了修改-在云搜索中删除已上传的文档

public void DeleteUploadedDocuments(string location)
    {
        SearchRequest searchRequest = new SearchRequest { };
        searchRequest = new SearchRequest { Query = "resourcename:'filepath'", QueryParser = QueryParser.Lucene, Size = 10000 };
        searchClient = new AmazonCloudSearchDomainClient( ConfigurationManager.AppSettings["awsAccessKeyId"]  ,  ConfigurationManager.AppSettings["awsSecretAccessKey"]  , new AmazonCloudSearchDomainConfig { ServiceURL = ConfigurationManager.AppSettings["CloudSearchEndPoint"] });

        SearchResponse searchResponse = searchClient.Search(searchRequest);
        JArray docs = new JArray();

        foreach (Hit hit in searchResponse.Hits.Hit)
        {
            JObject doc = new JObject();
            doc.Add("id", hit.Id);
            doc.Add("type", "delete");
            docs.Add(doc);
        }

        UpdateIndexDocument<JArray>(docs, ConfigurationManager.AppSettings["CloudSearchEndPoint"]);
    }

    public void UpdateIndexDocument<T>(T document, string DocumentUrl)
    {
        AmazonCloudSearchDomainConfig config = new AmazonCloudSearchDomainConfig { ServiceURL = DocumentUrl };
        AmazonCloudSearchDomainClient searchClient = new AmazonCloudSearchDomainClient( ConfigurationManager.AppSettings["awsAccessKeyId"]  ,  ConfigurationManager.AppSettings["awsSecretAccessKey"]   , config);
        using (Stream stream = GenerateStreamFromString(JsonConvert.SerializeObject(document)))
        {
            UploadDocumentsRequest upload = new UploadDocumentsRequest()
            {
                ContentType = "application/json",
                Documents = stream
            };
            searchClient.UploadDocuments(upload);
        };

    }

相关问题