Regex获取Instagram图片(PHP)

ukdjmx9f  于 12个月前  发布在  PHP
关注(0)|答案(6)|浏览(140)

我尝试使用正则表达式来检查URL是否是Instagram图片,并仅返回URL的开头部分和/p/PICTUREID
到目前为止,这是我能想到的:

^(.*instagram.com\/p\/.*)\/

字符串
然而,这需要有一个尾随斜杠,但我不想要求它。
示例(应匹配):

https://www.instagram.com/p/BKbwlrfjGHY/?post->
https://www.instagram.com/p/BKbwlrfjGHY

http://www.instagram.com/p/BKbwlrfjGHY/ ->
http://www.instagram.com/p/BKbwlrfjGHY

instagram.com/p/BKbwlrfjGHY ->
instagram.com/p/BKbwlrfjGHY


我如何停止在尾随斜杠,如果它存在和任何其他之后?
下面是我的regex 101测试:
https://regex101.com/r/JJS2kz/1

pbossiut

pbossiut1#

解决方案一

你可以在这里使用这个正则表达式来匹配你提供的所有例子:

/(https?:\/\/www\.)?instagram\.com(\/p\/\w+\/?)/

字符串

说明

正则表达式的第一部分查找httphttps,然后是www.,并使整个组合成为可选的。

(https?:\/\/www\.)?


第二部分是查找字符串instagram.com

instagram\.com


第三部分是查找/p/后面的斜杠后面的任何字母,以及可选的尾随斜杠/。请注意,正则表达式的这一部分位于括号中,因此您可以在稍后使用preg_match_all时检索它。

(\/p\/\w+\/?)

方案二

如果您希望能够支持以下模式(withthe http/https andwithoutthe www):

http://instagram.com/p/BkbwlrfjGHY
http://instagram.com/p/BkbwlrfjGHY/
https://instagram.com/p/BkbwlrfjGHY
https://instagram.com/p/BkbwlrfjGHY


你可以使用这个regex:

/(https?:\/\/(www\.)?)?instagram\.com(\/p\/\w+\/?)/

示例

$string = 'https://www.instagram.com/p/abcd/?post->
           https://www.instagram.com/p/efgh

           http://www.instagram.com/p/iJkL/ ->
           http://www.instagram.com/p/MnNadfoadf

           instagram.com/p/ACDOFfaf ->
           instagram.com/p/AFMDAOF';

preg_match_all('/(https?:\/\/(www\.)?)?instagram\.com(\/p\/\w+\/?)/', $string, $matches);


然后,如果你做一个$matchesvar_dump

array(4) {
  [0]=>
  array(6) {
    [0]=>
    string(33) "https://www.instagram.com/p/abcd/"
    [1]=>
    string(32) "https://www.instagram.com/p/efgh"
    [2]=>
    string(32) "http://www.instagram.com/p/iJkL/"
    [3]=>
    string(37) "http://www.instagram.com/p/MnNadfoadf"
    [4]=>
    string(24) "instagram.com/p/ACDOFfaf"
    [5]=>
    string(23) "instagram.com/p/AFMDAOF"
  }
  [1]=>
  array(6) {
    [0]=>
    string(12) "https://www."
    [1]=>
    string(12) "https://www."
    [2]=>
    string(11) "http://www."
    [3]=>
    string(11) "http://www."
    [4]=>
    string(0) ""
    [5]=>
    string(0) ""
  }
  [2]=>
  array(6) {
    [0]=>
    string(4) "www."
    [1]=>
    string(4) "www."
    [2]=>
    string(4) "www."
    [3]=>
    string(4) "www."
    [4]=>
    string(0) ""
    [5]=>
    string(0) ""
  }
  [3]=>
  array(6) {
    [0]=>
    string(8) "/p/abcd/"
    [1]=>
    string(7) "/p/efgh"
    [2]=>
    string(8) "/p/iJkL/"
    [3]=>
    string(13) "/p/MnNadfoadf"
    [4]=>
    string(11) "/p/ACDOFfaf"
    [5]=>
    string(10) "/p/AFMDAOF"
  }
}


现在要检索每个id,您可以使用foreach

foreach($matches[3] as $instagramId){
    echo $instagramId . "<br>";
}


结果将会是:

/p/abcd/
/p/efgh
/p/iJkL/
/p/MnNadfoadf
/p/ACDOFfaf
/p/AFMDAOF

wribegjk

wribegjk2#

下面是regexp,它也适用于username在路径中的情况
第一个月
测试用例:
https://instagram.com/p/BryAm8hnjGk
https://www.instagram.com/anettletigre/p/BryAm8hnjGk/

uklbhaso

uklbhaso3#

基于上述书面信息,并添加一点我自己,我提出了一个通用的方法来获得代码:

(?:(?:(?:(?:https?)(?::\/\/))?(?:www\.))?)instagram\.com\/?(?<username>[a-zA-Z0-9_.]{1,30})?\/p\/(?<code>[A-Za-z0-9_\-]+)\/?

字符串
范例:

$string = '
    instagram.com/p/code1
    instagram.com/username1/p/code2
    instagram.com/p/code3/
    instagram.com/username2/p/code4/

    http://instagram.com/p/code5
    http://instagram.com/username3/p/code6
    http://instagram.com/p/code7/
    http://instagram.com/username4/p/code8/
    https://instagram.com/p/code9
    https://instagram.com/username5/p/code10
    https://instagram.com/p/code11/
    https://instagram.com/username6/p/code12/

    http://www.instagram.com/p/code13
    http://www.instagram.com/username7/p/code14
    http://www.instagram.com/p/code15/
    http://www.instagram.com/username8/p/code16/

    https://www.instagram.com/p/code17
    https://www.instagram.com/username9/p/code18
    https://www.instagram.com/p/code19/
    https://www.instagram.com/username10/p/code20/

    instagram.com/username11/p/code21?utm_source=...
    instagram.com/username12/p/code22/?utm_source=...
    https://www.instagram.com/p/code23?utm_source=...
    https://www.instagram.com/username13/p/code24/?utm_source=...';

preg_match_all("/(?:(?:(?:(?:https?)(?::\/\/))?(?:www\.))?)instagram\.com\/?(?<username>[a-zA-Z0-9_.]{1,30})?\/p\/(?<code>[A-Za-z0-9_\-]+)\/?/", $string, $matches);

echo "<pre>";
print_r($matches);


答:

Array
(
    [0] => Array
        (
            [0] => instagram.com/p/code1
            [1] => instagram.com/username1/p/code2
            [2] => instagram.com/p/code3/
            [3] => instagram.com/username2/p/code4/
            [4] => instagram.com/p/code5
            [5] => instagram.com/username3/p/code6
            [6] => instagram.com/p/code7/
            [7] => instagram.com/username4/p/code8/
            [8] => instagram.com/p/code9
            [9] => instagram.com/username5/p/code10
            [10] => instagram.com/p/code11/
            [11] => instagram.com/username6/p/code12/
            [12] => http://www.instagram.com/p/code13
            [13] => http://www.instagram.com/username7/p/code14
            [14] => http://www.instagram.com/p/code15/
            [15] => http://www.instagram.com/username8/p/code16/
            [16] => https://www.instagram.com/p/code17
            [17] => https://www.instagram.com/username9/p/code18
            [18] => https://www.instagram.com/p/code19/
            [19] => https://www.instagram.com/username10/p/code20/
            [20] => instagram.com/username11/p/code21
            [21] => instagram.com/username12/p/code22/
            [22] => https://www.instagram.com/p/code23
            [23] => https://www.instagram.com/username13/p/code24/
        )

    [username] => Array
        (
            [0] => 
            [1] => username1
            [2] => 
            [3] => username2
            [4] => 
            [5] => username3
            [6] => 
            [7] => username4
            [8] => 
            [9] => username5
            [10] => 
            [11] => username6
            [12] => 
            [13] => username7
            [14] => 
            [15] => username8
            [16] => 
            [17] => username9
            [18] => 
            [19] => username10
            [20] => username11
            [21] => username12
            [22] => 
            [23] => username13
        )

    [1] => Array
        (
            [0] => 
            [1] => username1
            [2] => 
            [3] => username2
            [4] => 
            [5] => username3
            [6] => 
            [7] => username4
            [8] => 
            [9] => username5
            [10] => 
            [11] => username6
            [12] => 
            [13] => username7
            [14] => 
            [15] => username8
            [16] => 
            [17] => username9
            [18] => 
            [19] => username10
            [20] => username11
            [21] => username12
            [22] => 
            [23] => username13
        )

    [code] => Array
        (
            [0] => code1
            [1] => code2
            [2] => code3
            [3] => code4
            [4] => code5
            [5] => code6
            [6] => code7
            [7] => code8
            [8] => code9
            [9] => code10
            [10] => code11
            [11] => code12
            [12] => code13
            [13] => code14
            [14] => code15
            [15] => code16
            [16] => code17
            [17] => code18
            [18] => code19
            [19] => code20
            [20] => code21
            [21] => code22
            [22] => code23
            [23] => code24
        )

    [2] => Array
        (
            [0] => code1
            [1] => code2
            [2] => code3
            [3] => code4
            [4] => code5
            [5] => code6
            [6] => code7
            [7] => code8
            [8] => code9
            [9] => code10
            [10] => code11
            [11] => code12
            [12] => code13
            [13] => code14
            [14] => code15
            [15] => code16
            [16] => code17
            [17] => code18
            [18] => code19
            [19] => code20
            [20] => code21
            [21] => code22
            [22] => code23
            [23] => code24
        )

)

djmepvbi

djmepvbi4#

如果Instagram.com是您搜索的唯一URL,则strpos将比正则表达式运行得更快:

<?php

$test = [
'https://www.instagram.com/p/BKbwlrfjGHY/',
'https://www.instagram.com/p/BKbwlrfjGHY',
'http://www.instagram.com/p/BKbwlrfjGHY/',
'http://www.instagram.com/p/BKbwlrfjGHY',
'instagram.com/p/BKbwlrfjGHY/',
'someother.com/p/asdfads',
'instagram.com/p/BKbwlrfjGHY'];

$target = 'instagram.com';
$offset = strlen($target);
foreach ($test as $url) {
        $p = strpos($url, $target);
        if ($p === false) {
                echo 'Not an instagram URL'.PHP_EOL;
        } else {
                $instagramId = rtrim(substr($url,$p+$offset),'/');

                echo $instagramId.' is an instagram id'.PHP_EOL;
        }
}

字符串

mdfafbf1

mdfafbf15#

(https?:\/\/www\.)?(?:instagram.com|instagr.am)\/p\/([^\/]*)\/?

字符串

oxosxuxt

oxosxuxt6#

我知道这不是一个真正的答案OP,但我想提,我经常使用不同的方式来检查网址,parse_url .这是非常冗长,但我发现它很清楚阅读和理解.为Instagram做了一个快速演示:

<?php

$parts = parse_url('https://www.instagram.com/p/BKbwlrfjGHY/?post');

if(in_array($parts['host'], ['instagram.com', 'www.instagram.com'])) {
    // is instagram url
    $path = explode('/', trim($parts['path'], '/'));

    if(count($path) > 1) {
        if(mb_strtolower($path[0]) == 'p') {
            $post_id = $path[1];
        }
    }
}

$result = $parts['host'].'/p/'.$post_id;

字符串
在URL中添加协议或做更多的健全性检查(如果需要的话),甚至将其捆绑到一个函数中,都是非常简单的。希望它能帮助您解决这个问题。

相关问题