SOLR单元格是否以任何方式限制导入solr.TextField的字符数？

kninwzqo 于 2022-11-05 发布在 Solr

关注(0)|答案(1)|浏览(172)

我使用带有Windows命令提示符的curl命令，使用Solr Cell对一个大型HTML页面进行索引，如下所示：

curl http://localhost:8987/solr/myexample/update/extract -d @test.html -H 'Content-type:html'

当我在SOLR的管理菜单中查询（query？q=：&q.op=OR&indent=true）字段时，我发现字段中缺少数据（文本）。示例：我有一堆lorem ipsum标记，但在我的HTML页面的末尾，我有另一个段落标记Hello world，这在SOLR管理中没有显示。
我在旧wiki上找到了以下内容。

大型个别字段。

在一个记录中可以存储兆字节的文本。这些字段使用起来很笨拙。默认情况下，存储的字符数是有限的。
它没有详细说明如何防止文本被剪切，也就是说，如果这是导致问题的原因，因为我甚至不能在剪切之前在字段中获得MB值的数据。

架构.xml

<field name="main" type="text_general" indexed="true" stored="true"/>
    <field name="div" type="text_general" indexed="true" stored="true"/>
    <field name="doc_id" type="string" uninvertible="true" indexed="true" stored="true"/>
    <field name="date_pub" type="pdate" uninvertible="true" indexed="true" stored="true"/>
    <field name="p" type="text_general" uninvertible="true" indexed="true" stored="true"/>
    <field name="_text_" type="text_general" indexed="true" stored="true" multiValued="true"/>
    <copyField source="*" dest="_text_"/>

解决方案配置.xml

<requestHandler name="/update/extract"
    class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="uprefix">ignored_</str>
      <str name="fmap.content">content</str>
      <str name="capture">div</str>
      <str name="fmap.div">div</str>
      <str name="capture">h1</str>
      <str name="fmap.h1">h1</str>
      <str name="capture">h2</str>
      <str name="fmap.h2">h2_t</str>
      <str name="capture">p</str>
      <str name="fmap.p">p</str>
    </lst>
  </requestHandler>

**Solr版本：**8.10.1

solr

来源：https://stackoverflow.com/questions/70933474/does-solr-cell-in-any-way-limit-the-amount-of-characters-imported-into-a-solr-te

1条答案

按热度按时间

ha5z0ras1#

SOLR单元似乎并不限制字符，但是，不要问我为什么，罪魁祸首是我在下面使用的curl命令：

curl http://localhost:8987/solr/myexample/update/extract -d @test.html -H 'Content-type:html'

**解决方案：**以下命令提取所有文本，而不截断任何文本（将路径替换为post.jar和HTML文件所在的位置）：

java -jar -Dc=myexample -Dauto example\exampledocs\post.jar example\exampledocs\sample.html

值得注意的是，这些是用于命令提示符的窗口命令。

赞(0）回复(0）举报 2022-11-05

我来回答

SOLR单元格是否以任何方式限制导入solr.TextField的字符数？

1条答案

相关问题

热门标签

最新问答