regex C#正则表达式在拆分之前或之后不包含任何数字

j1dl9f46  于 2023-04-07  发布在  C#
关注(0)|答案(1)|浏览(132)

下面是我在LINQPad中运行的脚本:

class Program
{
    static void Main()
    {
        string text = "Why Cruise? 2 ALASKA 4 The Canadian Rockies & Alaska 6 Alaska by Land & Sea 8 Fairmonts of Western Canada & Incredible Alaska 10 Alaska’s Wilderness, Glaciers & Culture 12 THE CARIBBEAN 14 Ultimate Caribbean 16 Orlando & the Caribbean 17 Tall Ship Sailing in the Grenadine Islands 18 Hidden Gems of the Caribbean 19";

        string[] textArr = Regex.Split(text, @"(?<=\s)(?=\d+(?!\d))");

        string re = "(?:^|(?<=\\s))\\d+(?=\\s|$)";

        MatchCollection numArr = Regex.Matches(text, re, RegexOptions.IgnoreCase);

        Console.WriteLine("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><ols>");
        Console.WriteLine("<o><c><![CDATA[Front Page]]></c><p>1</p><l>1</l><ci></ci></o>");
        Console.WriteLine("<o><c><![CDATA[Contents]]></c><p>3</p><l>1</l><ci></ci></o>");
        Console.WriteLine("<o><c><![CDATA[Map]]></c><p>4</p><l>1</l><ci></ci></o>");

        for (int i = 0; i < textArr.Length-1; i++)
        {
            string pageTitle = textArr[i].Trim();
            string pageNumber = numArr[i].Value;
            Console.WriteLine("<o><c><![CDATA["+pageTitle+"]]></c><p>"+PageNo(pageNumber)+"</p><l>1</l><ci></ci></o>");
        }

        Console.WriteLine("</ols>");
    }
    
    static string PageNo(string num) {
        int pgno = int.Parse(num) + 2;
        return pgno.ToString();
    }
}

这是我得到的输出:

<?xml version="1.0" encoding="UTF-8" ?><ols>
<o><c><![CDATA[Front Page]]></c><p>1</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Contents]]></c><p>3</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Map]]></c><p>4</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Why Cruise?]]></c><p>4</p><l>1</l><ci></ci></o>
<o><c><![CDATA[2 ALASKA]]></c><p>6</p><l>1</l><ci></ci></o>
<o><c><![CDATA[4 The Canadian Rockies & Alaska]]></c><p>8</p><l>1</l><ci></ci></o>
<o><c><![CDATA[6 Alaska by Land & Sea]]></c><p>10</p><l>1</l><ci></ci></o>
<o><c><![CDATA[8 Fairmonts of Western Canada & Incredible Alaska]]></c><p>12</p><l>1</l><ci></ci></o>
<o><c><![CDATA[10 Alaska’s Wilderness, Glaciers & Culture]]></c><p>14</p><l>1</l><ci></ci></o>
<o><c><![CDATA[12 THE CARIBBEAN]]></c><p>16</p><l>1</l><ci></ci></o>
<o><c><![CDATA[14 Ultimate Caribbean]]></c><p>18</p><l>1</l><ci></ci></o>
<o><c><![CDATA[16 Orlando & the Caribbean]]></c><p>19</p><l>1</l><ci></ci></o>
<o><c><![CDATA[17 Tall Ship Sailing in the Grenadine Islands]]></c><p>20</p><l>1</l><ci></ci></o>
<o><c><![CDATA[18 Hidden Gems of the Caribbean]]></c><p>21</p><l>1</l><ci></ci></o>
</ols>

我不知道在文本分割中完全排除数字的正则表达式,因此,当我想输出“ALASKA”,“The Canadian Rockies & Alaska”时,为什么输出会有“2 ALASKA”,“4 The Canadian Rockies & Alaska”等文本...
谢谢

tf7tbtn2

tf7tbtn21#

\s*          # 0+ whitespaces, followed by
([^\d]+)     # capturing group consisting of 1+ non-digit characters,
\s+          # 1+ whitespaces, then
(\d+)        # capturing group consisting of 1+ digits

试试on regex101.com
试试on dotnetfiddle.net

相关问题