.net 如何编写能够验证URI的正则表达式模式?

ruoxqz4g  于 12个月前  发布在  .NET
关注(0)|答案(7)|浏览(119)

如何编写一个匹配所有有效URI字符串的 * 正则表达式 ,而无法匹配所有无效URI字符串?
为了明确我在说URI时所指的内容,我在下面添加了一个最新的URI RFC标准的链接。它定义了我想要使用正则表达式验证的实体。
我不需要它来解析URI,我只需要一个用于验证的正则表达式。
首选
*.Net正则表达式格式**。(.Net V1.1)

我目前的解决方案:

^([a-zA-Z0-9+.-]+):(//([a-zA-Z0-9-._~!$&'()*+,;=:]*)@)?([a-zA-Z0-9-._~!$&'()*+,;=]+)(:(\\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$(:(\\d*))?(/?[a-zA-Z0-9-._~!$&'()*+,;=:/]+)?(\?[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?(\#[a-zA-Z0-9-._~!$&'()*+,;=:/?@]+)?$

字符串

jpfvwuh4

jpfvwuh42#

URI规范说:

  • 下面一行是用于将格式良好的URI引用分解为其组件的正则表达式。*

第一个月
(我猜这与另一个答案中给出的STD 66链接中的正则表达式相同。
但是 * 分解 * 并不是 * 验证 *。要正确地验证URI,必须将URI的BNF转换为正则表达式。虽然有些BNF * 不能 * 表示为正则表达式,但我认为这一个可以 * 完成。但是不应该这样做-这将是一个巨大的混乱。最好使用库函数。

arknldoa

arknldoa3#

这个网站看起来很有前途:http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/
他们提出了以下正则表达式:

/^([a-z0-9+.-]+):(?://(?:((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*)@)?((?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*)(?::(\d*))?(/(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?|(/?(?:[a-z0-9-._~!$&'()*+,;=:@]|%[0-9A-F]{2})+(?:[a-z0-9-._~!$&'()*+,;=:@/]|%[0-9A-F]{2})*)?)(?:\?((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?(?:#((?:[a-z0-9-._~!$&'()*+,;=:/?@]|%[0-9A-F]{2})*))?$/i

字符串

x8diyxa7

x8diyxa74#

我根据RFC 3986(https://www.rfc-editor.org/rfc/rfc3986)提出的最好的正则表达式如下:


的数据

// named groups
/^(?<scheme>[a-z][a-z0-9+.-]+):(?<authority>\/\/(?<user>[^@]+@)?(?<host>[a-z0-9.\-_~]+)(?<port>:\d+)?)?(?<path>(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])*)*|(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+)*)?(?<query>\?(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?(?<fragment>\#(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?$/i

// unnamed groups
/^([a-z][a-z0-9+.-]+):(\/\/([^@]+@)?([a-z0-9.\-_~]+)(:\d+)?)?((?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])*)*|(?:\/(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@])+)*)?(\?(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?(\#(?:[a-z0-9-._~]|%[a-f0-9]|[!$&'()*+,;=:@]|[/?])+)?$/i

字符串
捕捉组
1.方案
1.权威

  1. userinfo
    1.主机
    1.端口
    1.路径
    1.查询
    1.片段
wqnecbli

wqnecbli5#

我找到的最好和最权威的指南在这里:http://jmrware.com/articles/2009/uri_regexp/URI_regex.html(为了回答您的问题,请参阅URI表条目)
表2沿着了RFC 3986中的所有规则以及每个规则的正则表达式实现。
这里有一个JavaScript实现:https://github.com/jhermsmeier/uri.regex
作为参考,URI正则表达式重复如下:

# RFC-3986 URI component:  URI
[A-Za-z][A-Za-z0-9+\-.]* :                                      # scheme ":"
(?: //                                                          # hier-part
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:)    {6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:)    {5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:)    {2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "?" query ]
(?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "#" fragment ]

字符串

fruv7luv

fruv7luv6#

是否有一些您关心的特定URI,或者您是否试图找到一个验证STD66的正则表达式?
我将指向这个正则表达式来解析URI。然后,理论上,您可以检查是否所有您关心的元素都在那里。
但我认为bdukes的回答更好。

uqcuzwp8

uqcuzwp87#

对于js ppl,如果你对测试满意的话,可以看看前面的几行。注意这个正则表达式来自:[wizard 04][1]
快乐编码!

//This function

function explodeUri(str) {
    let regexUri = /^([a-z][a-z0-9+.-]*):(?:\/\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\3)@)?(?=(\[[0-9A-F:.]{2,}\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\5(?::(?=(\d*))\6)?)(\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\8)?|(\/?(?!\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/]|%[0-9A-F]{2})*))\10)?)(?:\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\/?]|%[0-9A-F]{2})*))\12)?$/i;
    let match = str.match(regexUri);
    if (!match) {
        throw new Error('Invalid URI');
    }

    return {
        scheme: match[1],           // 1 == scheme
        userinfo: match[4],         // 4 == userinfo
        host: match[5],             // 5 == host
        port: match[6],             // 6 == port
        path: match[7] || match[9], // 7 if it has an authority, 9 if it doesn't
        query: match[11],           // 11 == query
        fragment: match[12]         // 12 == fragment
    };

/*    let { // scheme://user:info@host:port/path?query=val#fragment
        scheme, userinfo, host, port,
        path, query, fragment,
    } = explodeUri(u);                                           */
} ///https://snipplr.com/view/6889?codeview= Thanks bro!

function testUriValidation() {
    const validUrls = [
        'https://www.example.com/path/to/resource?query=value#fragment',
        'ftp://username:[email protected]:21/path/to/file',
        'http://subdomain.example.co.uk/page?param=value',
        'mailto:[email protected]',
        'tel:+123456789',
        'file:///path/to/file.txt',
        'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D',
        'https://[2001:db8::1]/path',               // IPv6 address
        'https://[email protected]',                  // IPv4 address with userinfo
        'http://user:pass@[::1]:8080/path',         // IPv6 with userinfo and port
        'http://example.com/a%20b',                 // URL-encoded path
        'https://example.com/%E2%82%AC100',         // URL-encoded path segment
        'http://example.com?name=John%20Doe',       // URL-encoded query parameter
        'https://example.com#section-3.4',          // URL fragment
        'https://example.com:8080',                 // URL with port
        'https://example.com:8080/path?query=value', // URL with path and query
    ];

    const invalidUrls = [
        'invalid-url',                          // Missing scheme
        'http://www.example.com:8080:',          // Invalid port
        'ftp://user:pass@invalid host/',         // Invalid characters in the host
        'http://[::1]',                          // IPv6 without a scheme
        'https://[email protected]:invalid',      // Invalid port
        'http://example.com/ path with space',   // Invalid space in the path
        "https://example.com?invalid=query",     // Invalid character in the query
        'ftp://example.com:8080#invalid-fragment', // Invalid character in the fragment
        "http://example.com:8080/path?query#fragment", // Both query and fragment without '='
        "http://:8080/path",                     // Empty authority
        'ftp://user@:21/path',                   // Empty host in authority
        'http://example.com/path?query#fragment%zz', // Invalid percent encoding
        'schema://user:[email protected]:1234?option=value#rowid'
    ];

    const uri = 'https://username:[email protected]:8080/path/to/resource?query=value#fragment';

    const explodedUri = explodeUri(uri);

    console.log(explodedUri);

    console.log('Valid URLs:');
    validUrls.forEach((url) => {
        console.log("validationg: ", url)
        try {
            const explodedUri = explodeUri(url);
            console.log('O:', url, explodedUri);
        } catch (error) {
            console.log('X', url, 'Error:', error.message);
        }
    });

    console.log('\nInvalid URLs:');
    invalidUrls.forEach((url) => {
        try {
            const explodedUri = explodeUri(url);
            console.log('X:', url, 'Exploded URI:', explodedUri);
        } catch (error) {
            console.log('ERROR: ', url, error.message);
        }
    });
}

testUriValidation();

字符串

相关问题