React -使用Regex来突出显示RestruouslySetInnerHTML中的文本,工作不可靠

mrfwxfqh  于 2023-10-22  发布在  React
关注(0)|答案(1)|浏览(104)

其目标是突出显示一个MySetInnerHTML中的文本部分(字符串)。因此,我尝试匹配html中所需的文本部分,并使用适当的样式将其 Package 在“span”中。我使用下面的代码,适用于某些文本(html)无障碍,但对于某些文本根本不是。请在下面找到一个工作和不工作的例子。
我的问题是:为什么正则表达式在某些情况下失败,而在其他情况下工作?即使在所有情况下,文本(“报价”)都在那里。

高亮组件JSX:

import React from "react";


class HighlightQuote extends React.Component {
  render = () => {

    //zitat is for getting rid of any quotation marks in the beginning or end.
    var zitat = this.props.quotes.map(x => x.replace(/^[“”"’()]+|[“”"’()]+$/g, ""));

    if (this.props.quotes.length === 0) {
      var highlightedHtml = this.props.newcontent

    }
    else {
      var zitat = this.props.quotes.map(x => x.replace(/^[“”"’()]+|[“”"’()]+$/g, ""));
      const regex = new RegExp(`(${zitat.join('|')})`, 'g');
      var highlightedHtml = this.props.content.replace(
          regex,
          '<span class="hl">$1</span>'
        );
       console.log ('highlightedHtml:');
       console.log (highlightedHtml);
    }

    return (
        <div className="reader" ref="test" dangerouslySetInnerHTML={{ __html: highlightedHtml }} />

    );
  };
}

export default HighlightQuote;

工作示例(console.log('highlighted html ')

<div class="post" id="post-17660">
  <p class="postcontents">
    <article>
      <div class="post-inside">
        <p>One of the things I have disliked the most about the crypto sector is the idea that people should &#x201C;hodl&#x201D; or &#x201C;hold on for dear life.&#x201D;</p>
        <p>I have written many times here at AVC that one should take profits when they are available and diversify an investment portfolio.</p>
        <p><span class="hl">The idea that an investor should hold on no matter what has always seemed ridiculous to me.</span></p>
        <p>Now, the crypto markets are in the eighth month of a long and painful bear market and we are starting to see some signs of capitulation, particularly in the assets that went up the most last year.</p>
        <p>Whether this is the long-awaited&#xA0;capitulation of the HODL crowd or not, I can&#x2019;t say.</p>
        <p>But capitulation would be a good thing for the crypto markets, releasing assets into the market that until now have been locked up by long-term&#xA0;holders.</p>
        <p><span class="hl">Until then it is hard to get excited about buying anything in crypto.</span></p>
      </div>
    </article>
  </p>
</div>

按预期突出显示的报价:

"The idea that an investor should hold on no matter what has always seemed ridiculous to me."

"Until then it is hard to get excited about buying anything in crypto."

失败示例(console.log('highlighted html ')

<div><article id="story" class="Story-story--2QyGh css-1j0ipd9"><header class="css-1qcpy3f e345g291"><p class="css-1789nl8 etcg8100"><a class="css-1g7m0tk" href="https://www.nytimes.com/column/new-sentences">New Sentences</a></p><div class="css-30n6iy e345g290"><div class="css-acwcvw"></div></div><figure class="ResponsiveMedia-media--32g1o ResponsiveMedia-sizeSmall--3092U ResponsiveMedia-layoutVertical--1pg1o ResponsiveMedia-sizeSmallNoCaption--n--T0 css-1hzd7ei"><figcaption class="css-pplcdj ResponsiveMedia-caption--1dUVu"></figcaption></figure></header><div class="css-18sbwfn StoryBodyCompanionColumn"><div class="css-1h6whtw"><p class="css-1i0edl6 e2kc3sl0"><em class="css-2fg4z9 ehxkw330">&#x2014; From Keith Gessen&#x2019;s second novel, &#x201C;A Terrible Country&#x201D; (Viking, 2018, Page 4). Gessen is also the author of &#x201C;All the Sad Young Literary Men&#x201D; and a founding editor of the journal n+1.</em></p><p class="css-1i0edl6 e2kc3sl0">All authors have signature sentence structures &#x2014; deep expressive grooves that their minds instinctively find and follow. (That previous sentence is one of mine: a simple declaration that leaps, after the break of a long dash, into an elaborate restatement.)</p><p class="css-1i0edl6 e2kc3sl0">Here is one of Keith Gessen&#x2019;s:</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;As for me, I wasn&#x2019;t really an idiot. But neither was I not an idiot.&#x201D;</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;I hadn&#x2019;t been yelling, I didn&#x2019;t think. But I hadn&#x2019;t not been yelling either.&#x201D;</p><p class="css-1i0edl6 e2kc3sl0">&#x201C;Cute cafes were not the problem, but they were also not, as I&#x2019;d once apparently thought, the opposite of the problem.&#x201D;</p></div><aside class="css-14jsv4e"><span></span></aside></div><div class="css-18sbwfn StoryBodyCompanionColumn"><div class="css-1h6whtw"><p class="css-1i0edl6 e2kc3sl0">Sentence structures are not simply sentence structures, of course &#x2014; they are miniature philosophies. Hemingway, with his blunt verbal bullets, is making a huge claim about the nature of the world. So is James Joyce, with his collages and frippery. So are Nikki Giovanni and Samuel Delany and Ursula K. Le Guin and John McPhee and Missy Elliott and Dr. Seuss and anyone else who converts thoughts into prose.</p><p class="css-1i0edl6 e2kc3sl0">Likewise, Keith Gessen&#x2019;s signature sentence structure &#x2014; &#x201C;not X, but also not not X&#x201D; &#x2014; suggests an entire worldview. It is a universe of in-betweenness, in which the most basic facts of life, the things we absolutely expect to understand, spill and scatter like toast crumbs into the gaps between the floorboards. It is a world of embarrassingly trivial category errors. The sentences above come from Gessen&#x2019;s new novel, &#x201C;A Terrible Country,&#x201D; the story of a 30-something American man who goes to Russia to care for his elderly grandmother. He falls into the gaps between huge concepts: youth and age, purpose and purposelessness, progress and stasis. He is not Russian but also not not Russian, not smart but also not not smart, not heroic but also not not heroic. Such is the way of the world. No matter how much we try, none of us is ever only one thing. None of us is ever pure.</p></div><aside class="css-14jsv4e"><span></span></aside></div><div class="bottom-of-article"><div class="css-k8fkhk"><p>Sam Anderson is a staff writer for the magazine.</p> <p><i>Sign up for </i><a href="http://www.nytimes.com/newsletters/magazine"><i>our newsletter</i></a><i> to get the best of The New York Times Magazine delivered to your inbox every week.</i></p></div><div class="css-3glrhn">A version of this article appears in print on , on Page 11 of the Sunday Magazine with the headline: From Keith Gessen&#x2019;s &#x2018;A Terrible Country&#x2019;<span>. <a href="http://www.nytreprints.com/">Order Reprints</a> | <a href="http://www.nytimes.com/pages/todayspaper/index.html">Today&#x2019;s Paper</a> | <a href="https://www.nytimes.com/subscriptions/Multiproduct/lp8HYKU.html?campaignId=48JQY">Subscribe</a></span></div></div><span></span></article></div>

应该强调的引用:

"Sentence structures are not simply sentence structures, of course — they are miniature philosophies"
pepwfjgg

pepwfjgg1#

正则表达式匹配失败的原因是html实体。一些解析后的文本使用了实体引用。在上面失败的例子中,引号包含一个“-”字符,在html中被解码为&#x2014;
为了摆脱html实体,我使用了“he”库https://github.com/mathiasbynens/he,一个用JavaScript编写的健壮的HTML实体编码器/解码器。

var contentDecoded = he.decode(this.props.content);

 var highlightedHtml = contentDecoded.replace(
    regex,
    '<span class="annotator-hl">$1</span>'
 );

相关问题