regex grep两个单词彼此靠近

iszxjhcz  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(114)

假设我在一个文件中有一行“This is perhaps the easyest place to add new functionality.”,并且我想用grep两个单词相互靠近。

grep -ERHn "\beasiest\W+(?:\w+\W+){1,6}?place\b" *

字符串
它能起作用,给我台词。但是当我这样做的时候

grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" *


它失败了,打败了 {1,10}? 的全部意义。这一点在regular-expression.info网站和一些Regex书籍中列出。虽然他们没有用grep描述它,但这不重要。

更新

我把正则表达式放到一个python脚本里。可以用,但是没有grep-C的功能。

#!/usr/bin/python
import re
import sys
import os

word1 = sys.argv[1]
word2 = sys.argv[2]
dist = sys.argv[3]
regex_string = (r'\b(?:' 
    + word1  
    + r'\W+(?:\w+\W+){0,'
    + dist
    + '}?'
    + word2 
    + r'|'
    + word2
    + r'\W+(?:\w+\W+){0,'
            + dist
    + '}?'
    + word1
    + r')\b')

regex = re.compile(regex_string)

def findmatches(PATH):
for root, dirs, files in os.walk(PATH):
    for filename in files:
        fullpath = os.path.join(root,filename)

        with open(fullpath, 'r') as f:
            matches = re.findall(regex, f.read())
            for m in matches:
                print "File:",fullpath,"\n\t",m

if __name__ == "__main__":  
    findmatches(sys.argv[4])


称之为

python near.py charlie winning 6 path/to/charlie/sheen


对我来说没问题

j0pj023g

j0pj023g1#

你真的需要前瞻结构吗?也许这就足够了:

grep -ERHn "\beasiest\W+(\w+\W+){1,10}new\b" *

字符串
这就是我得到的:

echo "This is perhaps the easiest place to add new functionality." | grep -EHn "\beasiest\W+(\w+\W+){1,10}new\b"


(标准输入):1:这可能是添加新功能最简单的地方。

编辑

正如Camille Goudeseune所说:
为了方便使用,可以在.bashrc中添加:

grepNear() {
 grep -EHn "\b$1\W+(\w+\W+){1,10}$2\b"
}.


然后在bash提示符下:echo "..." | grepNear easiest new

oxosxuxt

oxosxuxt2#

grep不支持Python正则表达式的非捕获组。当您编写类似(?:\w+\W+)的代码时,你要求grep匹配一个问号?,后面跟着一个冒号:,后面跟着一个或多个单词字符\w+,后面跟着一个或多个非word chars \W+ . ?grep正则表达式的特殊字符,但由于它位于组的开头,因此会自动转义(与正则表达式[?]匹配问号的方式相同)。
让我们测试一下吧?我有以下文件:

$ cat file
This is perhaps the easiest place to add new functionality.

字符串
grep与您使用的表达式不匹配:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file


然后,我创建了以下文件:

$ cat file2
This is perhaps the easiest ?:place ?:to ?:add new functionality.


请注意,每个单词前面都有?:。在这种情况下,您的表达式与文件匹配:

$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file2
file2:1:This is perhaps the easiest ?:place ?:to ?:add new functionality.


解决方案是删除表达式的?:

$ grep -ERHn "\beasiest\W+(\w+\W+){1,10}?new\b" file
file:1:This is perhaps the easiest place to add new functionality.


因为你甚至不需要一个非捕获组(至少就我所看到的),它没有任何问题。

  • 加分点 *:你可以简化你的表达式,将{1,10}改为{0,10},并删除以下?
$ grep -ERHn "\beasiest\W+(\w+\W+){0,10}new\b" file
file:1:This is perhaps the easiest place to add new functionality.

bq3bfh9z

bq3bfh9z3#

例如:在php文件中查找“select .......from”

grep --include \*.php -REHn "\bselect\W+(\w+\W+){1,10}from\b" *

字符串

相关问题