下面是我在LINQPad中运行的脚本:
class Program
{
static void Main()
{
string text = "Why Cruise? 2 ALASKA 4 The Canadian Rockies & Alaska 6 Alaska by Land & Sea 8 Fairmonts of Western Canada & Incredible Alaska 10 Alaska’s Wilderness, Glaciers & Culture 12 THE CARIBBEAN 14 Ultimate Caribbean 16 Orlando & the Caribbean 17 Tall Ship Sailing in the Grenadine Islands 18 Hidden Gems of the Caribbean 19";
string[] textArr = Regex.Split(text, @"(?<=\s)(?=\d+(?!\d))");
string re = "(?:^|(?<=\\s))\\d+(?=\\s|$)";
MatchCollection numArr = Regex.Matches(text, re, RegexOptions.IgnoreCase);
Console.WriteLine("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><ols>");
Console.WriteLine("<o><c><![CDATA[Front Page]]></c><p>1</p><l>1</l><ci></ci></o>");
Console.WriteLine("<o><c><![CDATA[Contents]]></c><p>3</p><l>1</l><ci></ci></o>");
Console.WriteLine("<o><c><![CDATA[Map]]></c><p>4</p><l>1</l><ci></ci></o>");
for (int i = 0; i < textArr.Length-1; i++)
{
string pageTitle = textArr[i].Trim();
string pageNumber = numArr[i].Value;
Console.WriteLine("<o><c><![CDATA["+pageTitle+"]]></c><p>"+PageNo(pageNumber)+"</p><l>1</l><ci></ci></o>");
}
Console.WriteLine("</ols>");
}
static string PageNo(string num) {
int pgno = int.Parse(num) + 2;
return pgno.ToString();
}
}
这是我得到的输出:
<?xml version="1.0" encoding="UTF-8" ?><ols>
<o><c><![CDATA[Front Page]]></c><p>1</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Contents]]></c><p>3</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Map]]></c><p>4</p><l>1</l><ci></ci></o>
<o><c><![CDATA[Why Cruise?]]></c><p>4</p><l>1</l><ci></ci></o>
<o><c><![CDATA[2 ALASKA]]></c><p>6</p><l>1</l><ci></ci></o>
<o><c><![CDATA[4 The Canadian Rockies & Alaska]]></c><p>8</p><l>1</l><ci></ci></o>
<o><c><![CDATA[6 Alaska by Land & Sea]]></c><p>10</p><l>1</l><ci></ci></o>
<o><c><![CDATA[8 Fairmonts of Western Canada & Incredible Alaska]]></c><p>12</p><l>1</l><ci></ci></o>
<o><c><![CDATA[10 Alaska’s Wilderness, Glaciers & Culture]]></c><p>14</p><l>1</l><ci></ci></o>
<o><c><![CDATA[12 THE CARIBBEAN]]></c><p>16</p><l>1</l><ci></ci></o>
<o><c><![CDATA[14 Ultimate Caribbean]]></c><p>18</p><l>1</l><ci></ci></o>
<o><c><![CDATA[16 Orlando & the Caribbean]]></c><p>19</p><l>1</l><ci></ci></o>
<o><c><![CDATA[17 Tall Ship Sailing in the Grenadine Islands]]></c><p>20</p><l>1</l><ci></ci></o>
<o><c><![CDATA[18 Hidden Gems of the Caribbean]]></c><p>21</p><l>1</l><ci></ci></o>
</ols>
我不知道在文本分割中完全排除数字的正则表达式,因此,当我想输出“ALASKA”,“The Canadian Rockies & Alaska”时,为什么输出会有“2 ALASKA”,“4 The Canadian Rockies & Alaska”等文本...
谢谢
1条答案
按热度按时间tf7tbtn21#
试试on regex101.com。
试试on dotnetfiddle.net。