regex 正则表达式从.css文件中查找所有类名

gupuwyp2  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(90)

我正在为一个开源项目工作。有超过800+ css文件,我需要从所有这些文件中获取所有的类名,然后检查所有的HTML文件中的这些类名。我们的目标是找出哪些类在项目中没有被使用并删除它们。所以第一步是正确地提取所有的类名。
问题是我在构造正确的正则表达式来获取所有的类名时遇到了麻烦,因为一些css文件的类名格式不正确,都在一行上,像一个大的运行句子一样混乱,只有最少的白色空格。因此,我创建的正则表达式在出现这些情况时中断,无法提取类名。我必须创建一个更复杂但更灵活的正则表达式,我寻求指导。
一些格式的示例:

.class-Name-Here {
}

.class1,.class2,.class3 {
}

.class1 .class2 {
}

但是,还有一些格式确实很混乱,比如这样(真实的例子):

.toast-title{font-weight:700}.toast-message{word-wrap:break-word}.toast-message a,.toast-message label{color:#FFF}.toast-message a:hover{color:#CCC;text-decoration:none}.toast-close-button{position:relative;right:-.3em;top:-.3em;float:right;font-size:20px;font-weight:700;color:#FFF;-webkit-text-shadow:0 1px 0 #fff;text-shadow:0 1px 0 #fff;opacity:.8}.toast-top-center,.toast-top-full-width{top:0;right:0;width:100%}.toast-close-button:focus,.toast-close-button:hover{color:#000;text-decoration:none;cursor:pointer;opacity:.4}button.toast-close-button{padding:0;cursor:pointer;background:0 0;border:0;-webkit-appearance:none}.toast-bottom-center{bottom:0;right:0;width:100%}.toast-bottom-full-width{bottom:0;right:0;width:100%}.toast-top-left{top:12px;left:12px}.toast-top-right{top:12px;right:12px}.toast-bottom-right{right:12px;bottom:12px}.toast-bottom-left{bottom:12px;left:12px}#toast-container{position:fixed;z-index:999999}#toast-container *{-moz-box-sizing:border-box;-webkit-box-sizing:border-box;box-sizing:border-box}#toast-container>div{position:relative;overflow:hidden;margin:0 0 6px;padding:15px 15px 15px 50px;width:300px;-moz-border-radius:3px;-webkit-border-radius:3px;border-radius:3px;background-position:15px center;background-repeat:no-repeat;-moz-box-shadow:0 0 12px #999;-webkit-box-shadow:0 0 12px #999;box-shadow:0 0 12px #999;color:#FFF;opacity:.8}#toast-container>:hover{-moz-box-shadow:0 0 12px #000;-webkit-box-shadow:0 0 12px #000;box-shadow:0 0 12px #000;opacity:1;cursor:pointer}#toast-container>.toast-info{background-image:url()!important}#toast-container>.toast-error{background-image:url()!important}#toast-container>.toast-success{background-image:url()!important}#toast-container>.toast-warning{background-image:url()!important}#toast-container.toast-bottom-center>div,#toast-container.toast-top-center>div{width:300px;margin:auto}#toast-container.toast-bottom-full-width>div,#toast-container.toast-top-full-width>div{width:96%;margin:auto}.toast{background-color:#030303}.toast-success{background-color:#51A351}.toast-error{background-color:#BD362F}.toast-info{background-color:#2F96B4}.toast-warning{background-color:#F89406}.toast-progress{position:absolute;left:0;bottom:0;height:4px;background-color:#000;opacity:.4}.toast{opacity:1!important}.toast.ng-enter{opacity:0!important;transition:opacity .3s linear}.toast.ng-enter.ng-enter-active{opacity:1!important}.toast.ng-leave{opacity:1;transition:opacity .3s linear}.toast.ng-leave.ng-leave-active{opacity:0!important}@media all and (max-width:240px){#toast-container>div{padding:8px 8px 8px 50px;width:11em}#toast-container .toast-close-button{right:-.2em;top:-.2em}}@media all and (min-width:241px) and (max-width:480px){#toast-container>div{padding:8px 8px 8px 50px;width:18em}#toast-container .toast-close-button{right:-.2em;top:-.2em}}@media all and (min-width:481px) and (max-width:768px){#toast-container>div{padding:15px 15px 15px 50px;width:25em}}/*!

我需要创建一个非常灵活的Regex,它可以获取所有这些类名。
问题是,它变得相当复杂,我试图创造一些东西,有:
我对类名做了一些分析,它们遵循以下一般规则:
1.开始吧。并且可以链接多次:例如:.Interesting-Complex.Class-Name2.Specialty_Class {
1.多次使用A-Za-z以及 * 有时 * -_
1.可以以换行符、逗号、{:结尾
1.有时前面有}
我现在的表情是:((?<=}))?\.(\w+(-+?)\w+)+?((?={)|(?=\s)|(?=,)|(?=:))
但它不适用于简单的情况,例如:

.glyphicon {
}

因为它认为-是强制性的,所以没有选择它。它也在链式类上失败,例如:例如:.Interesting-Complex.Class-Name2.Specialty_Class {

zf2sa74q

zf2sa74q1#

正如@JaredSmith所建议的,你不应该使用正则表达式,而是一个成熟的CSS解析器。例如,tinycss2

import tinycss2

def get_css_classes(string):
  # A list of QualifiedRules, AtRules and Comments
  rules = tinycss2.parse_stylesheet(string)
  
  for rule in rules:
    if rule.type != 'qualified-rule':
      continue
    
    # 'prelude' means the tokens preceding '{'
    # See https://doc.courtbouillon.org/tinycss2/stable/api_reference.html#tinycss2.ast.QualifiedRule
    prelude_tokens = rule.prelude
    
    for index, token in enumerate(prelude_tokens[1:], 1):
      previous_token = prelude_tokens[index - 1]
      
      if token.type == 'ident' and previous_token == '.':
        yield token.value

对于以下内容:

@media only screen and (max-width: 1000px) {
  .class-name {
    display: none;
    & > #nested-css { caret-color: #e8403a }
  }
}

.class-Name-Here {
}
.class1,.class2,.class3 {
}

.class1 .class2 {
}.glyphicon {
}

.Interesting-Complex.Class-Name2.Specialty_Class {}

它输出:

[
  'class-Name-Here', 'class1', 'class2', 'class3', 'class1', 'class2',
  'glyphicon', 'Interesting-Complex', 'Class-Name2', 'Specialty_Class'
]

正如你所看到的,这个函数非常粗糙,不能正确处理at规则。但是,它应该为您的实际实现提供一个良好的基线。

ljo96ir5

ljo96ir52#

尝试以下模式。

(?:(?<=[{},]\.)|(?<=^\.))[^{}/]+?(?=[{,\s])
  • (?:(?<=[{},]\.)|(?<=^\.)),前面是.,后面是{},或行的开头
  • [^{}/]+?,除{}/之外的任何字符,一个或多个
  • (?=[{, ]),后跟{,\s

下面是一个使用 list comprehension 的例子,其中包含 re.finditer

p = r'(?:(?<=[{},]\.)|(?<=^\.))[^{}/]+?(?=[{,\s])'
[print(x.group()) for x in re.finditer(p, s)]

输出

toast-title
toast-message
toast-message
toast-message
toast-message
toast-close-button
toast-top-center
toast-top-full-width
toast-close-button:focus
toast-close-button:hover
toast-bottom-center
toast-bottom-full-width
toast-top-left
toast-top-right
toast-bottom-right
toast-bottom-left
toast
toast-success
toast-error
toast-info
toast-warning
toast-progress
toast
toast.ng-enter
toast.ng-enter.ng-enter-active
toast.ng-leave
toast.ng-leave.ng-leave-active

相关问题