如何修复PHP中格式错误的JSON?

bf1o4zei  于 12个月前  发布在  PHP
关注(0)|答案(6)|浏览(248)

我得到了一个JSON格式的数据源,这是唯一可用的格式。在PHP中,我使用json_decode来解码JSON,但它被破坏了,我发现JSON在某些地方生成,在一个人的昵称中有双引号。我使用http://jsonformatter.curiousconcept.com验证了这一点
我不能控制数据的创建,但是我必须在这种情况发生时处理这种损坏的格式。解析后的数据将被放入MySQL TABLE中。
举例来说:

"contact1": "David "Dave" Letterman",

字符串
json_decode将返回NULL。如果我手动保存文件,并将其更改为Dave昵称周围的单引号,那么一切都正常。

$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);


如何在json_decode处理之前修复json_string中损坏的JSON格式?应该做些什么来预处理文件,反斜杠昵称的双引号?或者将它们改为单引号?在MySQL中存储这样的双引号是一个好主意吗?
我不知道每次数据馈送时什么时候会发生这种情况,所以我不想只检查contact 1是否有内部双引号来修复它们。在PHP中有没有一种方法可以像上面的例子一样,在冒号后面的所有东西都反斜杠,除了外部双引号?谢谢!
这是tftd提供的正确代码:

<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David \"Dave\" Letterman",

$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "\n";
echo $preg . "\n";
?>


下面是输出:

string(39) ""contact1": "David \"Dave\" Letterman","
NULL

"contact1": "David \"Dave\" Letterman",

cyej8jka

cyej8jka1#

我有一个自己的jsonFixer()函数--它分两步工作:删除垃圾(用于不一致格式的平等)和重新格式化。

<?php
  function jsonFixer($json){
    $patterns     = [];
    /** garbage removal */
    $patterns[0]  = "/([\s:,\{}\[\]])\s*'([^:,\{}\[\]]*)'\s*([\s:,\{}\[\]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
    $patterns[1]  = '/([^\s:,\{}\[\]]*)\{([^\s:,\{}\[\]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[2]  =  "/([^\s:,\{}\[\]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[3]  = "/(}),\s*/"; //JSON.parse() doesn't allow trailing commas
    /** reformatting */
    $patterns[4]  = '/([^\s:,\{}\[\]]+\s*)*[^\s:,\{}\[\]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[5]  = '/["\']+([^"\':,\{}\[\]]*)["\']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
    $patterns[6]  = '/(")([^\s:,\{}\[\]]+)(")(\s+([^\s:,\{}\[\]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[7]  = "/(')([^\s:,\{}\[\]]+)(')(\s+([^\s:,\{}\[\]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and  one or more of any character except spaces, colons, commas, curly and square brackets...
    $patterns[8]  = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
    $patterns[9]  = '/,\s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
    $patterns[10] = '/\s+/'; //Find one or more spaces...
    $patterns[11] = '/^\s+/'; //Find one or more spaces at start of string...

    $replacements     = [];
    /** garbage removal */
    $replacements[0]  = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
    $replacements[1]  = '$1 { $2'; //...and put spaces between them;
    $replacements[2]  = '$1 }'; //...and put a space between them;
    $replacements[3]  = '$1'; //...so, remove trailing commas of any right curly brackets;
    /** reformatting */
    $replacements[4]  = '"$0"'; //...and put quotation marks surrounding them;
    $replacements[5]  = '"$1"'; //...and replace by single quotation marks;
    $replacements[6]  = '\\$1$2\\$3$4'; //...and add back slashes to its quotation marks;
    $replacements[7]  = '\\$1$2\\$3$4'; //...and add back slashes to its apostrophes;
    $replacements[8]  = '$1, $2'; //...and put a comma followed by a space character between them;
    $replacements[9]  = ' $1'; //...and replace by a space followed by a right curly bracket;
    $replacements[10] = ' '; //...and replace by one space;
    $replacements[11] = ''; //...and remove it.

    $result = preg_replace($patterns, $replacements, $json);

    return $result;
  }
?>

字符串
使用示例:

<?php
  // Received badly formatted json:
  // {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
  $json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, \'details\' : "Greatest \'Hits\' Album"}';
  jsonFixer($json_string);
?>


将导致:

{"contact1": "David \"Dave\" Letterman", "price" : "30.00", "details" : "Greatest \'Hits\' Album"}


注意:这并没有测试所有可能的格式不好的JSON字符串,但我使用一个复杂的多级JSON字符串,并在此之前工作得很好。

cbeh67ev

cbeh67ev2#

正如其他人已经指出的,最好告诉你的客户JSON格式的问题。让他们发送一个bugreport给原始开发人员/公司,这样他们就可以修复它。如果他/他们不能修复它-然后提供你的解决方案。你只需要在json_encode之前addslashes字符串。
如果出于某种原因,你最终不得不fix,这里有一种方法可能对你有用:

$data = '"contact1": "David "Dave" Letterman", "contact2": "Peter "Robert" Smith",{\'test\': \'working "something"\'}';
function replace($match){
    $key = trim($match[1]);
    $val = trim($match[2]);

    if($val[0] == '"')
        $val = '"'.addslashes(substr($val, 1, -1)).'"';
    else if($val[0] == "'")
        $val = "'".addslashes(substr($val, 1, -1))."'";

    return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
// string '"contact1": "David \"Dave\" Letterman", "contact2": "Peter \"Robert\" Smith",{'test': 'working \"something\"'}' (length=110)

字符串
请记住,如果有人再次弄乱json格式,这可能会中断。

iqjalb3h

iqjalb3h3#

正如其他人所说,你可以做一个搜索和替换,但困难的部分是创建模糊匹配规则,因为为了解析它,你需要假设一些事情。可能,你需要假设:
1a)键不包含冒号
1b)或关键引号被正确转义

2a)值不包含逗号
2b)或值有正确的转义引号。
即使这样,你也可能会遇到解析混乱的情况,如果他们在JSON中有注解,情况会更糟。(不符合,但很常见。)
现在,根据数据的不同,你可以使用换行符来决定你什么时候看到一个新的键,但是同样,这是不可靠的,你开始做大的假设。
所以,长话短说,你要么必须做出一些可能在任何时候都是错误的假设,要么你需要让他们来修复数据。

rqmkfv5c

rqmkfv5c4#

告诉他们在输出之前转义字符串。你甚至可以提供修复它或提供代码解决方案。
否则,可以将preg_replace与正则表达式一起使用
参见Replacing specified double quotes in text with preg_replace

8qgya5xd

8qgya5xd5#

当值中包含逗号和[]时,Regexp是不可靠的,它包含json字符串,担心和噩梦开始。在php json_decode fails without quotes on key中,建议使用pear Services_JSON,如果类名的代码固定,无效json的游戏结束,它可以实现最安全的结果:

<?php include("Services_JSON-1.0.3b/JSON.php"); 
//Patched version https://github.com/pear/Services_JSON/edit/trunk/JSON.php

$json = <<< JSONCODEFROMJS
   { 
      sos:presents, 
      james:'bond', 
      "agent":[0,0,7], 
      secret:"{mission:'impossible',permit: \"tokill\"}",
      go:true 
    }
JSONCODEFROMJS;

function json_fix($json) {
   $sjson = new Services_JSON(SERVICES_JSON_IN_ARR|SERVICES_JSON_USE_TO_JSON| SERVICES_JSON_LOOSE_TYPE);
   $json_array=$sjson->decode($json);
   return json_encode($json_array);
}

$json_array = json_decode(json_fix($json),true);

if(json_last_error() == JSON_ERROR_NONE) {

   $json=json_encode($json_array,JSON_PRETTY_PRINT);
   echo "<pre>";
   echo(htmlentities($json));
   echo "</pre>";
} else {
   die(json_last_error_msg());
}
?>

字符串

jw5wzhpr

jw5wzhpr6#

function fixJson($json) {
  $json = preg_replace('/,\s+/', ',', $json);
  $json = preg_replace('/}+\s*{+/', '},{', $json);
  $json = preg_replace('/,\s*]/', ']', $json);
  $json = preg_replace('/,\s*}/', '}', $json);
  $json = trim($json);

  return $json;
}

字符串

相关问题