regex 如何使用C#中的正则表达式从RTF字符串中提取二进制图像代码?[关闭]

vulvrdjw  于 2023-11-20  发布在  C#
关注(0)|答案(1)|浏览(151)

已关闭。此问题需要更多focused。目前不接受回答。
**要改进此问题吗?**更新此问题,使其仅针对editing this post的一个问题。

12天前关门了。
Improve this question
我有这个RTF图像字符串:

{\pict{\*\picprop\shplid1025{\sp{\sn shapeType}{\sv 75}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fLockRotation}{\sv 0}}{\sp{\sn fLockAspectRatio}{\sv 1}}{\sp{\sn fLockPosition}{\sv 0}}{\sp{\sn fLockAgainstSelect}{\sv 0}}
{\sp{\sn fLockCropping}{\sv 0}}{\sp{\sn fLockVerticies}{\sv 0}}{\sp{\sn fLockAgainstGrouping}{\sv 0}}{\sp{\sn pictureGray}{\sv 0}}{\sp{\sn pictureBiLevel}{\sv 0}}{\sp{\sn fFilled}{\sv 0}}
{\sp{\sn fNoFillHitTest}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn wzName}{\sv \u1056\'3f\u1080\'3f\u1089\'3f\u1091\'3f\u1085\'3f\u1086\'3f\u1082\'3f 1}}{\sp{\sn dhgt}{\sv 251658240}}{\sp{\sn fHidden}{\sv 0}}{\sp{\sn fLayoutInCell}{\sv 1}}}
\picscalex36\picscaley36\piccropl0\piccropr0\piccropt0\piccropb0\picw6879\pich6964\picwgoal3900\pichgoal3948\pngblip\bliptag-1175992069{\*\blipuid b9e7c8fbb3e14fcb3dc35ca2b0b6a03f}
89504e470d0a1a0a0000000d49484452000001450000014908060000000cb63f26000000017352474200aece1ce90000000467414d410000b18f0bfc61050000
000970485973000012740000127401de661f78000022bf49444154785eeddd0b8c55e5d5fff10741ae8232805ca6828c2297ca188494815186b6d6a216b56049
5b06696a04ad56191a943a686d046da50e9a682ab65adbd1482f83b660064d5404ac9a205651c0a232202072919b5c05deffac93d57f9e9c3efb396bbfef394e
a1df4ff2c4bd76cedefb5cf65938c9f96535fb9f460e00907192fe1700d088a608001e9a220078688a00e0a129028087a608001e9a220078688a00e0a1290280
87a608001e9a220078a2d9e7c99327eb566e478e1c712d5ab4d02ad9ba75eb5cefdebdb58ab33ed67a6db17af56ad7bf7f7fade23efef863d7bd7b77ade2e6ce
9dab5bf993e6fd2fc4f5ad162d5ae49e7efa69ade2d6ae5debfaf4e9a355fee4fbf56fdfbedd5557576b15d7ae5d3b575353a3555c213ed3193366b86ddbb669
15b777ef5ed7be7d7bad926dd9b2c575ebd64dabb834dfa9a6ec13c2f49e4a534c3269d2246998a6d5b66ddbe0feec55565616dc1f5a43870e0dee00101010101000000040000002701ffff030000000000}}}{\rtlch\fcs1 \af1\afs16 \ltrch\fcs0 
\f1\fs16\lang1058\langfe1049\langnp1058\langfenp1049\insrsid13721686 \cell

字符串
我需要从中得到:

89504e470d0a1a0a0000000d49484452000001450000014908060000000cb63f26000000017352474200aece1ce90000000467414d410000b18f0bfc61050000
000970485973000012740000127401de661f78000022bf49444154785eeddd0b8c55e5d5fff10741ae8232805ca6828c2297ca188494815186b6d6a216b56049
5b06696a04ad56191a943a686d046da50e9a682ab65adbd1482f83b660064d5404ac9a205651c0a232202072919b5c05deffac93d57f9e9c3efb396bbfef394e
a1df4ff2c4bd76cedefb5cf65938c9f96535fb9f460e00907192fe1700d088a608001e9a220078688a00e0a129028087a608001e9a220078688a00e0a1290280
87a608001e9a220078a2d9e7c99327eb566e478e1c712d5ab4d02ad9ba75eb5cefdebdb58ab33ed67a6db17af56ad7bf7f7fade23efef863d7bd7b77ade2e6ce
9dab5bf993e6fd2fc4f5ad162d5ae49e7efa69ade2d6ae5debfaf4e9a355fee4fbf56fdfbedd5557576b15d7ae5d3b575353a3555c213ed3193366b86ddbb669
15b777ef5ed7be7d7bad926dd9b2c575ebd64dabb834dfa9a6ec13c2f49e4a534c3269d2246998a6d5b66ddbe0feec55565616dc1f5a43870e0dee00101010101000000040000002701ffff030000000000


但是可以有不同的图像rtf表示,所以我需要从通用图像获得二进制代码。
P.S.图像的二进制代码被剪切,因为full有太多的字符。
所以,看来我需要一些正则表达式,可以从//pict中提取RTF标记的二进制表示的图像。

nukf8bse

nukf8bse1#

一种可能的解决方案是基于文件幻数来检测图像(参见here
在您的例子中,我们可以看到您的图像是.png,因为它以89 50 4e 47开头,因此您可以编写此正则表达式\b(?:89504e47|ffd8ffe0)[a-zA-Z0-9\s]+\b,它将适用于pngjpeg => test here https://regex101.com/r/r8sS4E/1
当然,您可以调整正则表达式的第一部分,以适应您可能的图像格式

相关问题