Solr BlockJoinQuery返回误报

5m1hhzi4  于 2022-10-21  发布在  Solr
关注(0)|答案(1)|浏览(129)

我们试图在Solr中查询索引嵌套子文档,但当我们查询例如返回子文档的父文档时,结果中的父文档中的子文档中的event_id: order-1event_id: order-5
我们确实使用Solr的示例数据设置了一个新的Solr,在查询时,返回的结果是正确的。想法是,solrconfig.xml中可能有一些东西,但在删除或将其设置为默认值后,结果仍然不正确。
目前,我们正在检查schema.xml,看看是否可以用这种方法更正结果。
我们当前的solrconfig.xml

<config>
  <luceneMatchVersion>8.11.2</luceneMatchVersion>
  <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}" />
  <schemaFactory class="ClassicIndexSchemaFactory"/>

  <indexConfig>
    <lockType>single</lockType>

    <ramBufferSizeMB>256</ramBufferSizeMB>

    <mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
      <str name="sort">id asc</str>
      <str name="wrapped.prefix">inner</str>
      <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
      <int name="inner.maxMergeAtOnce">10</int>
      <int name="inner.segmentsPerTier">10</int>
      <int name="inner.deletesPctAllowed">20</int>
    </mergePolicyFactory>

  </indexConfig>

  <updateHandler class="solr.DirectUpdateHandler2">

    <autoCommit>
      <maxDocs>1000000</maxDocs>
      <maxSize>2g</maxSize>
      <openSearcher>false</openSearcher>
    </autoCommit>

    <updateLog>
      <str name="dir">${solr.data.dir:}</str>
    </updateLog>
  </updateHandler>
  <query>
    <maxBooleanClauses>102400</maxBooleanClauses>

    <filterCache class="solr.CaffeineCache" maxRamMB="750" initialSize="0" autowarmCount="0" />
    <queryResultCache class="solr.CaffeineCache" size="512" initialSize="0" autowarmCount="0" />
    <fieldValueCache class="solr.CaffeineCache" size="1" initialSize="0" autowarmCount="0" />
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
    <queryResultWindowSize>0</queryResultWindowSize>
    <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
    <useColdSearcher>false</useColdSearcher>
    <maxWarmingSearchers>2</maxWarmingSearchers>
  </query>

  <requestDispatcher handleSelect="false">
    <requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" />
    <httpCaching never304="true" />
  </requestDispatcher>
  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <int name="rows">10</int>
      <str name="df">text</str>
    </lst>
  </requestHandler>

  <requestHandler name="/update" class="solr.UpdateRequestHandler"></requestHandler>
</config>

我们当前的schema.xml

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="default-config" version="1.6">
    <fieldType name="_nest_path_" class="solr.NestPathField" />

    <!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
    <fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />

    <!-- boolean type: "true" or "false" -->
    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true" />
    <fieldType name="booleans" class="solr.BoolField" sortMissingLast="true" multiValued="true" />

    <!-- Numeric field types that index values using KD-trees. Point fields don't support FieldCache, so they must have docValues="true" 
        if needed for sorting, faceting, functions, etc. -->
    <fieldType name="pint" class="solr.IntPointField" docValues="true" />
    <fieldType name="pfloat" class="solr.FloatPointField" docValues="true" />
    <fieldType name="plong" class="solr.LongPointField" docValues="true" />
    <fieldType name="pdouble" class="solr.DoublePointField" docValues="true" />

    <fieldType name="pints" class="solr.IntPointField" docValues="true" multiValued="true" />
    <fieldType name="pfloats" class="solr.FloatPointField" docValues="true" multiValued="true" />
    <fieldType name="plongs" class="solr.LongPointField" docValues="true" multiValued="true" />
    <fieldType name="pdoubles" class="solr.DoublePointField" docValues="true" multiValued="true" />

    <!-- KD-tree versions of date fields -->
    <fieldType name="pdate" class="solr.DatePointField" docValues="true" />
    <fieldType name="pdates" class="solr.DatePointField" docValues="true" multiValued="true" />

    <uniqueKey>id</uniqueKey>

    <!-- Solr automatically populates this with the value of the top/parent ID. E.g. the profile ID. It is required. -->
    <field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

    <!-- Is populated by Solr automatically with the path of the document in the hierarchy for non-root documents. -->
    <field name="_nest_path_" type="_nest_path_" />

    <!-- Is populated by Solr automatically to store the ID of each document’s parent document (if there is one). -->
    <field name="_nest_parent_" type="string" indexed="true" stored="true"/>

    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

    <!-- docValues are enabled by default for long type so we don't need to index the version field -->
    <field name="_version_" type="plong" indexed="false" stored="false" />

    <field name="_indexversion_" type="pint" indexed="true" stored="false" multiValued="false" required="true"
        default="4" />

    <field name="timestamp" type="pdate" indexed="true" stored="false" default="NOW" />
    <field name="content_type" type="string" indexed="true" stored="false" />

    <!-- define system values, which are known to be single valued -->
    <field name="creationdate_l" type="plong" indexed="true" stored="false" />
    <field name="lastmodifieddate_l" type="plong" indexed="true" stored="false" />
    <field name="firstvisit_l" type="plong" indexed="true" stored="false" />
    <field name="lastvisit_l" type="plong" indexed="true" stored="false" />

    <!-- behavioral properties -->
    <field name="frequency_bp" type="pint" indexed="true" stored="false" />
    <field name="intensity_bp" type="pint" indexed="true" stored="false" />
    <field name="recent_intensity_bp" type="pfloat" indexed="true" stored="false" />
    <field name="firstvisit_behavior_bp" type="pint" indexed="true" stored="false" />
    <field name="lastvisit_behavior_bp" type="pint" indexed="true" stored="false" />

    <!-- Profile meta data fields only have one value -->
    <field name="propertycount_i" type="pint" indexed="true" stored="false" />
    <field name="totalpropertycount_i" type="pint" indexed="true" stored="false" />
    <field name="totalpropertysize_i" type="pint" indexed="true" stored="false" />

    <field name="maxproperty_s" type="string" indexed="true" stored="false" />
    <field name="maxpropertyvalues_i" type="pint" indexed="true" stored="false" />

    <field name="system_has_property_s" type="strings" indexed="true" stored="false" />
    <field name="sample_id_i" type="pint" indexed="true" stored="false" />

    <field name="event_id" type="string" indexed="true" multiValued="false" stored="true" />
    <field name="event_type_id" type="string" indexed="true" multiValued="false" stored="true" />
    <field name="event_date" type="plong" indexed="true" multiValued="false" stored="true" />
    <field name="event_profile_id" type="string" indexed="true" multiValued="false" stored="true" />

    <dynamicField name="*_ordinal_i" type="pint" indexed="true" stored="false" />
    <dynamicField name="*_i" type="pints" indexed="true" stored="false" />
    <dynamicField name="*_l" type="plongs" indexed="true" stored="false" />
    <dynamicField name="*_f" type="pfloats" indexed="true" stored="false" />
    <dynamicField name="*_s" type="strings" indexed="true" stored="false" />
    <dynamicField name="*_b" type="boolean" indexed="true" stored="false" />

    <dynamicField name="momentum_bp_*" type="pint" indexed="true" stored="false" />
    <dynamicField name="threshold_*" type="plong" indexed="true" multiValued="false" stored="false" />
    <dynamicField name="firsttouch_*" type="plong" indexed="true" multiValued="false" stored="false" />
    <dynamicField name="reentryrestricted_*" type="string" indexed="true" multiValued="false" stored="false"/>
    <dynamicField name="exitentrancerestricted_*" type="string" indexed="true" multiValued="false" stored="false"/>
</schema>

索引文档:

{
        "id":"99c75c9a-b083-428d-baa1-6a9662c6eb72",
        "name_s":"Profile 1",
        "description_t":"test description",
        "age_is":[28,
          34],
        "creationdate_l":1658990989645,
        "content_type":"profile",
        "_version_":1739600934763233280,
        "_root_":"99c75c9a-b083-428d-baa1-6a9662c6eb72",
        "timeline_events":
        {
          "id":"dcde9bfd-97ee-4d76-97d8-5297c1b2e87d",
          "event_id":"order-0",
          "event_type_id":"order",
          "event_date":1658990989644,
          "total_revenue_f":865.0,
          "_nest_path_":"/timeline_events#",
          "_nest_parent_":"99c75c9a-b083-428d-baa1-6a9662c6eb72",
          "content_type":"timeline_event",
          "_version_":1739600934763233280,
          "_root_":"99c75c9a-b083-428d-baa1-6a9662c6eb72",
          "product":[
            {
              "id":"9dabaac8-7651-4c56-9fb4-66d56b7175c3",
              "name_s":"product-0",
              "promotion_s":"NO",
              "listprice_f":477.0,
              "quantity_i":22,
              "variant_ss":["handbags",
                "men"],
              "pages_i":1,
              "_nest_path_":"/timeline_events#/product#0",
              "_nest_parent_":"dcde9bfd-97ee-4d76-97d8-5297c1b2e87d",
              "content_type":"order_product",
              "_version_":1739600934763233280,
              "_root_":"99c75c9a-b083-428d-baa1-6a9662c6eb72"}]}},
      {
        "id":"c19483e2-f940-403f-bb24-03adce1bcb02",
        "name_s":"Profile 2",
        "description_t":"test description for profile 2",
        "age_is":[25,
          40],
        "creationdate_l":1658990989653,
        "content_type":"profile",
        "_version_":1739600934766379008,
        "_root_":"c19483e2-f940-403f-bb24-03adce1bcb02",
        "timeline_events":
        {
          "id":"dcde9bfd-97ee-4d76-97d8-5297c1b2e87d",
          "event_id":"order-4",
          "event_type_id":"order",
          "event_date":1658990989649,
          "total_revenue_f":952.0,
          "_nest_path_":"/timeline_events#",
          "_nest_parent_":"c19483e2-f940-403f-bb24-03adce1bcb02",
          "content_type":"timeline_event",
          "_version_":1739600934766379008,
          "_root_":"c19483e2-f940-403f-bb24-03adce1bcb02",
          "product":[
            {
              "id":"7a143554-b5f9-4487-b182-9938b91f76b4",
              "name_s":"product-4",
              "promotion_s":"YES",
              "listprice_f":487.0,
              "quantity_i":25,
              "variant_ss":["junior",
                "watches"],
              "pages_i":1,
              "_nest_path_":"/timeline_events#/product#0",
              "_nest_parent_":"dcde9bfd-97ee-4d76-97d8-5297c1b2e87d",
              "content_type":"order_product",
              "_version_":1739600934766379008,
              "_root_":"c19483e2-f940-403f-bb24-03adce1bcb02"}]}},
      {
        "id":"da88463c-fcca-4405-8656-0371809ccb28",
        "name_s":"Profile 3",
        "description_t":"test description for profile 3",
        "age_is":[34,
          39],
        "creationdate_l":1658990989648,
        "content_type":"profile",
        "_version_":1739600934768476160,
        "_root_":"da88463c-fcca-4405-8656-0371809ccb28",
        "timeline_events":
        {
          "id":"61f47b18-15f4-4a4d-bb93-a4232dd22043",
          "event_id":"order-2",
          "event_type_id":"order",
          "event_date":1658990989647,
          "total_revenue_f":838.0,
          "_nest_path_":"/timeline_events#",
          "_nest_parent_":"da88463c-fcca-4405-8656-0371809ccb28",
          "content_type":"timeline_event",
          "_version_":1739600934768476160,
          "_root_":"da88463c-fcca-4405-8656-0371809ccb28",
          "product":[
            {
              "id":"1fc4616b-2629-4cc4-8a60-7238f97c9aae",
              "name_s":"product-2",
              "promotion_s":"YES",
              "listprice_f":403.0,
              "quantity_i":26,
              "variant_ss":["pants",
                "women"],
              "pages_i":1,
              "_nest_path_":"/timeline_events#/product#0",
              "_nest_parent_":"61f47b18-15f4-4a4d-bb93-a4232dd22043",
              "content_type":"order_product",
              "_version_":1739600934768476160,
              "_root_":"da88463c-fcca-4405-8656-0371809ccb28"}]}}]
  }
}

当我们执行以下查询时

{!parent which="*:* -_nest_path_:*"}event_id:order-0

OR

{!parent which="content_type:profile"}event_id:order-0

对于本例,查询执行相同的操作,但都返回相同的错误结果。

{
        "id":"da88463c-fcca-4405-8656-0371809ccb28",
        "name_s":"Profile 3",
        "description_t":"test description for profile 3",
        "age_is":[34,
          39],
        "creationdate_l":1658990989648,
        "content_type":"profile",
        "_version_":1739600934768476160,
        "_root_":"da88463c-fcca-4405-8656-0371809ccb28"
}

这是不正确的,正确的回答是

{
        "id":"99c75c9a-b083-428d-baa1-6a9662c6eb72",
        "name_s":"Profile 1",
        "description_t":"test description",
        "age_is":[28,
          34],
        "creationdate_l":1658990989645,
        "content_type":"profile",
        "_version_":1739600934763233280,
        "_root_":"99c75c9a-b083-428d-baa1-6a9662c6eb72"
}
vkc1a9a2

vkc1a9a21#

经过多次反复试验,我们发现问题在于

<mergePolicyFactory class="org.apache.solr.index.SortingMergePolicyFactory">
      <str name="sort">id asc</str>
      <str name="wrapped.prefix">inner</str>
      <str name="inner.class">org.apache.solr.index.TieredMergePolicyFactory</str>
      <int name="inner.maxMergeAtOnce">10</int>
      <int name="inner.segmentsPerTier">10</int>
      <int name="inner.deletesPctAllowed">20</int>
    </mergePolicyFactory>

如果删除了该部分,结果是正确的。我们仍在进行进一步调查,以确定到底出了什么问题。随着我们找到更多细节,将不断更新线程。

相关问题