jquery 使用querySelectorAll从html页面中提取LaTeX公式块

xn1cxnb4  于 2023-03-17  发布在  jQuery
关注(0)|答案(1)|浏览(104)

我尝试从HTML页面(使用latex2html生成)中提取LaTeX方程公式,以便用mathjax公式替换LaTeX方程图像。
首先,我有以下想法,这里有一个例子:
输入:

<div align="CENTER" class="mathdisplay"><a name="eq402"></a><!-- MATH
 \begin{equation}
\text{d}\,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s
\end{equation}
 -->
<table class="equation" cellpadding="0" width="100%" align="CENTER">
<tr valign="MIDDLE">
<td nowrap align="CENTER"><span class="MATH">d<img width="150" height="65" align="MIDDLE" border="0" src="img1919.gif" alt="$\displaystyle \,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s$"></span></td>
<td nowrap class="eqno" width="10" align="RIGHT">
(<span class="arabic">5</span>.<span class="arabic">65</span>)</td></tr>
</table></div>

通过在HTML页的底部插入以下JavaScript代码:

<script type="text/javascript">
function transform() {
        
        [].forEach.call(document.querySelectorAll('table tr img'),function(img) {
                var puretext = img.getAttribute('alt');
                if(!puretext || puretext == 'up' || puretext == 'previous' || puretext == 'next' || puretext == 'contents') return;
                puretext = puretext.replace(/..displaystyle /g,"$");
                var text = document.createTextNode(puretext);
                img.parentNode.insertBefore(text, img);
                img.style.display = 'none';
        });
}
transform();
</script>

我在我的HTML页面上得到了下面的渲染,即我有mathjax公式:

$\,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s$

这可能是足够的,但我注意到,有时,进入HTML页面,我有一个不完整的公式“alt“属性,这里是一个例子:

<div align="CENTER" class="mathdisplay"><a name="eq407"></a><!-- MATH
 \begin{equation}
\text{d}\,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mathbf{n}-\mathbf{V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=\text{d}\,\mathbf{V}\,\cdot\,\mathbf{n}
\end{equation}
 -->
<table class="equation" cellpadding="0" width="100%" align="CENTER">
<tr valign="MIDDLE">
<td nowrap align="CENTER"><span class="MATH">d<img width="538" height="38" align="MIDDLE" border="0" src="img1929.gif" alt="$\displaystyle \,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mat...
...V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=$">d<img width="56" height="34" align="MIDDLE" border="0" src="img1930.gif" alt="$\displaystyle \,\mathbf{V}\,\cdot\,\mathbf{n}$"></span></td>
<td nowrap class="eqno" width="10" align="RIGHT">
(<span class="arabic">5</span>.<span class="arabic">70</span>)</td></tr>
</table></div>

如您所见,对于<img的“alt“属性,我有:

$\displaystyle \,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mat...
...V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=$

整个LaTeX方程尚未由latex2html生成(参见...字符)
因此,我不能总是处理img alt属性,我希望使用\begin{equation} ... \end{equation}块,该块位于HTML注解标记(<!-- ... -->)中
我怎样才能用querySelectorAll得到这个comments block?它是否存在,例如document.querySelectorAll('div.mathdisplay a comments'),function(comments) {或类似的东西,可以允许提取这个注解块?
如果我能得到这个文本块,我会把它保存到一个变量中,然后像我第一个想法那样,在img标签之前插入它,如下所示:

var text = document.createTextNode(puretext);
                    img.parentNode.insertBefore(text, img);
                    img.style.display = 'none';
tf7tbtn2

tf7tbtn21#

您可以使用TreeWalker,它本机支持NodeFilter.SHOW_COMMENT等实用节点筛选策略。

var walker = document.createTreeWalker(
      document.documentElement, 
      NodeFilter.SHOW_COMMENT
    ),
    frag = document.createDocumentFragment(),
    li, node;

while (node = walker.nextNode()) {
  li = document.createElement('li');
  li.textContent = node.textContent;
  frag.appendChild(li);
}

document.getElementById('comment-list').appendChild(frag);
<!-- This is a comment -->

<div>
    <!-- This is another comment -->
</div>

<ul id="comment-list">
</ul>

相关问题