如何用regex过滤长字符串(动态的)?

cngwdvgl  于 2021-07-13  发布在  Java
关注(0)|答案(3)|浏览(310)

我已将web应用程序的响应存储在字符串中。字符串包含多个url:s,它是动态的。可能是10到1000之间url:s.
我从事性能工程,但这次我必须用java编写一个插件,而且我还远远不是一个编程Maven。
我的问题是,在我的响应字符串中,我有很多我不需要的胡言乱语,我不知道如何过滤它。在我的打印/请求中,我只想发送URL。
我走了这么远:

responseData = "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65354-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment1_4_av.ts?null=" +
                "#EXTINF:10.000, " + 
                "http://xxxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-65365-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=" + 
                "#EXTINF:fgsgsmoregiberish, " + 
                "http://xxxx-f.akamaihd.net/i/world/open/20150426/1370235-005A/EPISOD-6353-005A-016f1729028090bf_,892,144,252,360,540,1584,2700,.mp4.csmil/segment2_4_av.ts?null=";

            pattern = "^(http://.*\\.ts)";

             pr = Pattern.compile(pattern); 

             math = pr.matcher(responseData);

            if (math.find()) {
                System.out.println(math.group());

// in this print, I get everything from the response. I only want the URLS (dynamic. could be different names, but they all start with http and end with .ts). 
            }
            else {
                System.out.println("No Math");
            }
7kjnsjlb

7kjnsjlb1#

根据url的外观,您可以使用这个简单的模式,它适用于您的示例,并在 ? (以java风格编写):

\\bhttps?://[^?\\s]+

以确保 .ts 最后,您可以将其更改为:

\\bhttps?://[^?\\s]+\\.ts

\\bhttps?://[^?\\s]+\\.ts(?=[\\s?]|\\z)

检查是否到达路径的末端。
请注意,这些模式不处理在双引号之间包含空格的URL。

lf5gs5x2

lf5gs5x22#

只是让你的正则表达式懒惰 .*? 而不是贪婪 .* ,即:

pr = Pattern.compile("(https?.*?\\.ts)");

正则表达式演示:
https://regex101.com/r/nq5pa7/1
正则表达式解释:

(https?.*?\.ts)

Match the regex below and capture its match into backreference number 1 «(https?.*?\.ts)»
   Match the character string “http” literally (case sensitive) «http»
   Match the character “s” literally (case sensitive) «s?»
      Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match any single character that is NOT a line break character (line feed, carriage return, next line, line separator, paragraph separator) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “.” literally «\.»
   Match the character string “ts” literally (case sensitive) «ts»
qacovj5a

qacovj5a3#

使用以下正则表达式模式: (((http|ftp|https):\/{2})+(([0-9a-z_-]+\.)+([a-z]{2,4})(:[0-9]+)?((\/([~0-9a-zA-Z\#\+\%@\.\/_-]+))?(\?[0-9a-zA-Z\+\%@\/&\[\];=_-]+)?)?))\b 说明:
包含http或https或ftp,带//: ((http|ftp|https):\/{2}) 现在添加“+”符号以在同一字符串中添加下一部分
带一个的url名称:([0-9a-z(0-]+)
域名:([a-z]{2,4})
任何数字不出现或只出现一次(这里?表示非或一次):(:[0-9]+)?
rest url不出现或只出现一次:'(/([~0-9a-za-z#+%@./.-]+)?(\?[0-9a-za-z+%@/&[];=-]+)?)'

相关问题