regex 将一个HTML字符串按特定的标签拆分成一个数组

6ojccjat 于 2023-10-22 发布在其他

关注(0)|答案(5)|浏览(85)

给定这个HTML作为字符串“html”，我如何将它拆分成一个数组，其中每个头部<h标记一个元素的开始？

开始：

<h1>A</h1>
<h2>B</h2>
<p>Foobar</p>
<h3>C</h3>

结果：

["<h1>A</h1>", "<h2>B</h2><p>Foobar</p>", "<h3>C</h3>"]

我尝试过的：

我想将Array.split()与正则表达式一起使用，但结果将每个<h拆分为自己的元素。我需要弄清楚如何从一个<h的开始捕获到下一个<h。然后包括第一个但排除第二个。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';
var foo = html.split(/(<h)/);

编辑：Regex不是必需的，它只是我认为以这种方式分割HTML字符串的唯一解决方案。

regex

来源：https://stackoverflow.com/questions/34491459/split-a-string-of-html-into-an-array-by-particular-tags

5条答案

按热度按时间

nwsw7zdq1#

在您的示例中，您可以用途：

/
  <h   // Match literal <h
  (.)  // Match any character and save in a group
  >    // Match literal <
  .*?  // Match any character zero or more times, non greedy
  <\/h // Match literal </h
  \1   // Match what previous grouped in (.)
  >    // Match literal >
/g

var str = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>'
str.match(/<h(.)>.*?<\/h\1>/g); // ["<h1>A</h1>", "<h2>B</h2>", "<h3>C</h3>"]

但是请不要用regexp解析HTML，请阅读RegEx match open tags except XHTML self-contained tags

赞(0）回复(0）举报 2023-10-22

h7appiyu2#

从评论到问题，这似乎是任务：
我正在从GitHub上抓取动态markdown。然后我想把它渲染成HTML，但是把每个title元素都 Package 在一个ReactJS <WayPoint>组件中。
下面是一个完全与库无关的、基于DOM-API的解决方案。

function waypointify(html) {
    var div = document.createElement("div"), nodes;

    // parse HTML and convert into an array (instead of NodeList)
    div.innerHTML = html;
    nodes = [].slice.call(div.childNodes);

    // add <waypoint> elements and distribute nodes by headings
    div.innerHTML = "";
    nodes.forEach(function (node) {
        if (!div.lastChild || /^h[1-6]$/i.test(node.nodeName)) {
            div.appendChild( document.createElement("waypoint") );
        }
        div.lastChild.appendChild(node);
    });

    return div.innerHTML;
}

在一个现代的库中用更少的代码行做同样的事情是完全可能的，把它看作是一个挑战。
这是它使用您的示例输入生成的结果：

<waypoint><h1>A</h1></waypoint>
<waypoint><h2>B</h2><p>Foobar</p></waypoint>
<waypoint><h3>C</h3></waypoint>

赞(0）回复(0）举报 2023-10-22

pkln4tw63#

我相信有人可以减少for循环，把尖括号放回去，但这是我怎么做的。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';

//split on ><
var arr = html.split(/></g);

//split removes the >< so we need to determine where to put them back in.
for(var i = 0; i < arr.length; i++){
  if(arr[i].substring(0, 1) != '<'){
    arr[i] = '<' + arr[i];
  }

  if(arr[i].slice(-1) != '>'){
    arr[i] = arr[i] + '>';
  }
}

此外，我们实际上可以删除第一个和最后一个括号，进行拆分，然后将尖括号替换为整个内容。

var html = '<h1>A</h1><h2>B</h2><p>Foobar</p><h3>C</h3>';

//remove first and last characters
html = html.substring(1, html.length-1);

//do the split on ><
var arr = html.split(/></g);

//add the brackets back in
for(var i = 0; i < arr.length; i++){
    arr[i] = '<' + arr[i] + '>';
}

当然，对于没有内容的元素，这会失败。

赞(0）回复(0）举报 2023-10-22

4smxwvx54#

我用这个函数来转换HTML字符串DOM数组

static getArrayTagsHtmlString(str){
    let htmlSplit = str.split(">")
    let arrayElements = []
    let nodeElement =""
    htmlSplit.forEach((element)=>{  
      if (element.includes("<")) {
        nodeElement = element+">"   
       }else{
         nodeElement = element
        }
        arrayElements.push(nodeElement)
    })
    return arrayElements
  }

快乐密码

赞(0）回复(0）举报 2023-10-22

dhxwm5r45#

我刚刚遇到了这个问题，在我的一个项目中需要同样的东西。执行了以下操作，并对所有HTML字符串都有效。

let splitArray = data.split("><")
    splitArray.forEach((item, index) => {

        if (index === 0) {
            splitArray[index] = item += ">"

            return
        }

        if (index === splitArray.length - 1) {
            splitArray[index] = "<" + item

            return
        }
        
        splitArray[index] = "<" + item + ">"
    })

console.log(splitArray)

其中data是HTML字符串

赞(0）回复(0）举报 2023-10-22

我来回答

regex 将一个HTML字符串按特定的标签拆分成一个数组

5条答案

相关问题

热门标签

最新问答