regex 我应该使用什么正则表达式从文本文件中提取股票代码?

s2j5cfk0  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(127)

我想从一个文本文件中提取所有的符号与正则表达式,ALLY,AMZN,AXP,AON等.
以下数据位于文本文件中

Symbol  Holdings    Stake   Mkt. price  Value   Pct of portfolio
Ally Financial Inc  ALLY    29,000,000  9.6%    $25.59  $742,110,000    0.2%
Amazon.com, Inc.    AMZN    10,551,000  0.1%    $137.09 $1,446,436,590  0.4%
American Express Company    AXP 151,610,700 20.8%   $149.79 $22,709,766,753 6.7%
Aon PLC AON 4,335,000   2.1%    $315.64 $1,368,299,400  0.4%
Apple Inc   AAPL    915,560,382 5.9%    $176.54 $161,633,029,838    47.4%
Bank of America Corp    BAC 1,032,852,006   13.0%   $27.25  $28,145,217,164 8.3%
BYD Co. Ltd BYDDF   98,603,142  9.0%    $30.11  $2,968,940,606  0.9%
Capital One Financial Corp. COF 12,471,030  3.3%    $103.99 $1,296,862,410  0.4%
Celanese Corporation    CE  5,358,535   4.9%    $115.37 $618,214,183    0.2%
Charter Communications Inc  CHTR    3,828,941   2.6%    $409.20 $1,566,802,657  0.5%
Chevron Corporation CVX 123,120,120 6.5%    $147.07 $18,107,276,048 5.3%
Citigroup Inc   C   55,244,797  2.9%    $40.83  $2,255,645,062  0.7%
Coca-Cola Co    KO  400,000,000 9.3%    $56.98  $22,792,000,000 6.7%

字符串
我应该使用什么正则表达式来提取文本文件中的所有股票代码?

uttx8gqw

uttx8gqw1#

好吧,这一个会起作用:
第一个月
它查找后面跟着空格和数字的字符组。意识到一些公司的名称中可能有相同的模式。
说明:

- \b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
- Match a single character present in the list below [A-Z]
     +? matches the previous token between one and unlimited times, as few times as possible, expanding as needed (lazy)
     A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
- Positive Lookahead (?=\s+\d+)
    Assert that the Regex below matches
      \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
      + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
      \d matches a digit (equivalent to [0-9])
      + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
      \b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)

字符串
在这里查看它的工作:https://regex101.com/r/ruBdVj/1
来自https://www.debuggex.com/

dw1jzc5e

dw1jzc5e2#

另一个选项是匹配每个%符号之前的最后一组字符。

[A-Z]+(?=[^A-Z\n%]*%)

字符串
公司名称包含%的可能性可能比包含数字/数字的可能性小。\n的存在是为了确保[A-Z]+%在同一行上。
x1c 0d1x的数据
Regexper

const text = `\
Symbol  Holdings    Stake   Mkt. price  Value   Pct of portfolio
Ally Financial Inc  ALLY    29,000,000  9.6%    $25.59  $742,110,000    0.2%
Amazon.com, Inc.    AMZN    10,551,000  0.1%    $137.09 $1,446,436,590  0.4%
American Express Company    AXP 151,610,700 20.8%   $149.79 $22,709,766,753 6.7%
Aon PLC AON 4,335,000   2.1%    $315.64 $1,368,299,400  0.4%
Apple Inc   AAPL    915,560,382 5.9%    $176.54 $161,633,029,838    47.4%
Bank of America Corp    BAC 1,032,852,006   13.0%   $27.25  $28,145,217,164 8.3%
BYD Co. Ltd BYDDF   98,603,142  9.0%    $30.11  $2,968,940,606  0.9%
Capital One Financial Corp. COF 12,471,030  3.3%    $103.99 $1,296,862,410  0.4%
Celanese Corporation    CE  5,358,535   4.9%    $115.37 $618,214,183    0.2%
Charter Communications Inc  CHTR    3,828,941   2.6%    $409.20 $1,566,802,657  0.5%
Chevron Corporation CVX 123,120,120 6.5%    $147.07 $18,107,276,048 5.3%
Citigroup Inc   C   55,244,797  2.9%    $40.83  $2,255,645,062  0.7%
Coca-Cola Co    KO  400,000,000 9.3%    $56.98  $22,792,000,000 6.7%
`;

console.log(...text.match(/[A-Z]+(?=[^A-Z\n%]*%)/g));


regex101

3df52oht

3df52oht3#

this regex

^([A-Z].*?)\s+([A-Z]{1,5})(?=\s+\d[\d,]+)

字符串
你可以解析name + symbol。试试看。

相关问题