Swift:如果启动器和终止器以正则表达式的形式提供,如何从较长的文本中剪切文本?

egmofgnx  于 2023-02-18  发布在  Swift
关注(0)|答案(2)|浏览(119)

如何从这个:"{\n \"DNAHeader\": {},\n \"ItemsSaleable\": []\n}\n"
这是:"\"DNAHeader\":{},\"ItemsSaleable\":[]"
我有这个正则表达式作为启动器:
"<OWSP>{<OWSP>"
对于终止符:
"<OWSP>}<OWSP>"
其中<OWSP>是可选白色,与Swift正则表达式中的\s*相同。
我将它们转换为Swift的等价形式:

if let group = groupOrItem as? Group,
   let initiator = group.typeSyntax?.initiator?.literal.literalValue?.replacingOccurrences(of: "<OWSP>", with: "\\s*"),
   let terminator = group.typeSyntax?.terminator?.literal.literalValue?.replacingOccurrences(of: "<OWSP>", with: "\\s*")
{
    let captureString = "(.*?)"
    let regexString = initiator + captureString + terminator
    let regexPattern = "#" + regexString + "#"

正则表达式模式如下所示:

(lldb) po regexString
"\\s*{\\s*(.*?)\\s*}\\s*"

问题,如何应用它,如何切断有意义的内在文本?我试过这个,

var childText = text.replacingOccurrences(of: regexPattern, with: "$1", options: .regularExpression).filter { !$0.isWhitespace }

但不删除启动器/终止器文本,如此处的{}部分:

(lldb) po text
"{\n    \"DNAHeader\": {},\n    \"ItemsSaleable\": []\n}\n"

(lldb) po childText
"{\"DNAHeader\":{},\"ItemsSaleable\":[]}"
iqxoj9l9

iqxoj9l91#

正如在评论中所说的,您目前有JSON(但是您说不要关注它,但是...),这使得Regex的构建相当困难。
当我怀疑一个流,我创建假测试值,但它不是必要的情况:

let startSeparator = "<OWS>{<OWS>"
let endSeparator = "<OWS>}<OWS>"

//Fake structure
struct Object: Codable {
    let id: Int
    let NDAHeader: Header
    let ItemsSaleable: [Saleable]
}
struct Header: Codable {}
struct Saleable: Codable {}

let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted
let str0 = embedStr(codable: Object(id: 0, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str1 = embedStr(codable: Object(id: 1, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str2 = embedStr(codable: Object(id: 2, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str3 = embedStr(codable: Object(id: 3, NDAHeader: Header(), ItemsSaleable: []), with: encoder)

//Replace starting `{` & closing `}` of JSON with surroundnig <OWS>
func embedStr(codable: Codable, with encoder: JSONEncoder) -> String {
    let jsonData = try! encoder.encode(codable)
    var value = String(data: jsonData, encoding: .utf8)!
    value = startSeparator + String(value.dropFirst())
    value = String(value.dropLast()) + endSeparator
    return value
}

//Create a fake stream, by joining multiple JSON values, and "cut it"
func concate(strs: [String], dropStart: Int, dropEnd: Int) -> String {
    var value = strs.joined()
    value = String(value.dropFirst(dropStart))
    value = String(value.dropLast(dropEnd))
    return value
}

//Fake Streams
let concate0 = concate(strs: [str0], dropStart: 0, dropEnd: 0)
let concate1 = concate(strs: [str0, str1, str2], dropStart: 13, dropEnd: 13)
let concate2 = concate(strs: [str0, str1, str2, str3], dropStart: 20, dropEnd: 13)

"提取/查找"代码:

//Here, if it's a stream, you could return the rest of `value`, because it might be the start of a message, and to concatenate with the next part of the stream
//Side note, if it's a `Data`, `range(of:range:)` can be called on `Data`, avoiding you a strinigification if possible (like going back to JSON to remove the pretty printed format)
func analyze(str: String, found: ((String) -> Void)) {
    var value = str
    var start = value.range(of: startSeparator)

    //Better coding might be applied, it's more a proof of concept, but you should be able to grasp the logic:
    // Search for START to next END, return that captured part with closure `found`
    // Keep searching for the rest of the string.
    guard start != nil else { return }
    var end = value.range(of: endSeparator, range: start!.upperBound..<value.endIndex)

    while (start != nil && end != nil) {
        let sub = value[start!.upperBound..<end!.lowerBound]
        found("{" + String(sub) + "}") //Here is hard encoded the part surrounded by <OWS> tag
        value = String(value[end!.upperBound...])
        start = value.range(of: startSeparator)
        if start != nil {
            end = value.range(of: endSeparator, range: start!.upperBound..<value.endIndex)
        } else {
            end = nil
        }
    }
}

测试:

func test(str: String) {
    print("In \(str.debugDescription)")
    analyze(str: str) { match in
        print("Found \(match.debugDescription)")

        //The next part isn't beautiful, but it's one of the safest way to get rid of spaces/\n which are part of the pretty printed
        let withouthPrettyPrintedData = try! (JSONSerialization.data(withJSONObject: try! JSONSerialization.jsonObject(with: Data(match.utf8))))
        print("Cleaned: \(String(data: withouthPrettyPrintedData, encoding: .utf8)!.debugDescription)")
    }
    print("")
}

//Test the fake streams
[concate0, concate1, concate2].forEach {
    test(str: $0)
}

我使用debugDescription是为了在控制台中看到"\n"
输出:

$>In "<OWS>{<OWS>\n  \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS>"
$>Found "{\n  \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":0,\"ItemsSaleable\":[],\"NDAHeader\":{}}"

$>In " \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  "
$>Found "{\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":1,\"ItemsSaleable\":[],\"NDAHeader\":{}}"

$>In " 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 3,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  "
$>Found "{\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":1,\"ItemsSaleable\":[],\"NDAHeader\":{}}"
$>Found "{\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":2,\"ItemsSaleable\":[],\"NDAHeader\":{}}"
cidc1ykv

cidc1ykv2#

  • 如果您可以使用 * RegexBuilder(适用于iOS 16和macOS 13),则以下代码有效(请参阅函数extracted()):
import RegexBuilder
import SwiftUI

struct MyView: View {
    let text = "{\n    \"DNAHeader\": {},\n    \"ItemsSaleable\": []\n}\n"
    
    var body: some View {
        VStack {
            Text(text)
            Text(extracted(text) ?? "Not found")
        }
    }
    
    private func extracted(_ text: String) -> String? {
        
        let initiator = Regex {
            ZeroOrMore(.horizontalWhitespace)
            "{"
            ZeroOrMore(.horizontalWhitespace)
        }

        let terminator = Regex {
            ZeroOrMore(.horizontalWhitespace)
            "}"
            ZeroOrMore(.horizontalWhitespace)
        }

        // Read the text and capture only what's in between the
        // initiator and terminator
        let searchJSON = Regex {
            initiator
            Capture { OneOrMore(.any) }     // The real contents
            terminator
        }

        // Extract the whole string and the extracted string
        if let match = text.firstMatch(of: searchJSON) {
            let (wholeMatch, extractedString) = match.output
            
            print(wholeMatch)
            
            // Replace whitespaces and line feeds before returning
            return String(extractedString
                .replacing("\n", with: "")
                .replacing(" ", with: ""))
            
        } else {
            return nil
        }

    }
}

相关问题