Flink 如何使用正则表达式在JSON中捕获列表的值

ki0zmccv 于 2023-11-15 发布在 Apache

关注(0)|答案(2)|浏览(199)

我有一个json部分，其中包含以下内容：
{“tokenType”：“email”，“tokenList”：[“token1”，“token2”，“token3”，“token4”]}
我有一个flink作业，它记录从Kafka接收的有效负载，我需要在我的日志中屏蔽这些令牌值（电子邮件/电话号码）。为此，我们使用了一个实用程序，它将屏蔽所有捕获组的前x个字符（从索引1开始）。
我的问题是，我找不到一个正则表达式，将捕获所有这些令牌时，列表可以是可变长度。
我可以写正则表达式来捕获整个列表，但问题是整个匹配将被认为是一个捕获的组，前x个字符将被屏蔽。我希望我的日志看起来像这样，x = 3：
{“tokenType”：“email”，“tokenList”：["*en1”，"*en2”，"*en3”，"*en4”]}
我需要正则表达式来捕获这些标记的值。

apache-flink

来源：https://stackoverflow.com/questions/77360856/how-to-capture-the-values-of-a-list-inside-json-using-regex

2条答案

按热度按时间

iqjalb3h1#

如果你正在处理一对多的Map，你可能希望通过一个**flatMap()**函数来实现这一点，以支持从一个元素创建多个元素。由于你的payload已经接受了基于JSON的字符串，你可以将其解析为一个结构化的JSON对象，然后提取你需要的元素并将其发送到下游。
比如说：

// Function to take in a single JSON string and output multiple string elements
class TokenMaskingFunction: FlatMapFunction<String, String>{
    override fun flatMap(input: String, collector: Collector<String>) {
        try {
            // Construct your JSON node
            val structuredJson: JsonNode = mapper.readTree(input)
    
            // Capture your tokens
            val tokens: JsonNode = structuredJson.get("tokenList")
            if (tokens.isArray) {
                for (token in tokens) {
                    // Extract and mask your token before sending downstream
                    val maskedToken = maskFunction(token.toString())
                    collector.collect(maskedToken)
                }
            }
        } catch (e: Exception) {
            // Handle accordingly / log / ignore
        }
    }

    private fun mask(token: String): String { 
        // Do your masking here
    }

    companion object {
        // Define some static JSON parser here (static is important)
        val mapper = ObjectMapper()
    }
}

字符串
然后你只需要在从JSON字符串源Map后使用它：

streamEnv
    .fromSource(yourJsonStringSource)
    .flatMap(TokenMaskingFunction())
    .process(...)

型

赞(0）回复(0）举报 2023-11-15

xcitsw882#

我用Pattern类做了一个粗略的解决方案。它不是很有效，但对给定的例子有效。

public static void main(String[] args) {
        String val = " {\"tokenType\":\"email\",\"tokenList\":[\"token1\",\"token2\",\"token3\",\"token4\"]}";
        Pattern p = Pattern.compile("(?:\\G(?!^)|\"tokenList\":\\[)\"(?<elem>[^\"]*)\",?");
        Matcher m = p.matcher(val);
        int n  = 3;
        StringBuilder builder = new StringBuilder();
        while (m.find()) {
            String elem = m.group("elem");
            if (elem != null) {
                m.appendReplacement(builder, m.group().replace(elem, "*".repeat(n).concat(elem.substring(n))));
            }
        }
        m.appendTail(builder);
        System.out.println(builder);
    }

字符串

赞(0）回复(0）举报 2023-11-15

我来回答

Flink 如何使用正则表达式在JSON中捕获列表的值

2条答案

相关问题

热门标签

最新问答