json machine下载 - json machine源码下载

对于 PHP >=7.2 的大 JSON 文件或流的低效迭代，非常易于使用和内存高效的直接替换。参见 TL;DR。除了可选的ext-json之外，生产中没有任何依赖项。 README 与代码同步

1.2.0版本中的新功能 - 递归迭代

长话短说
介绍
解析 JSON 文档
- 解析文档
- 解析子树
- 解析数组中的嵌套值
- 解析单个标量值
- 解析多个子树
- 递归迭代
- JSON 指针到底是什么？
选项
解析来自 JSON API 的流响应
- GuzzleHttp
- Symfony HttpClient
跟踪进度
解码器
- 可用解码器
错误处理
- 捕获畸形项目
解析器效率
- 流/文件
- 内存中的 JSON 字符串
故障排除
- “我仍然得到允许的内存大小......耗尽”
- “那没有帮助”
- “我还是运气不好”
安装
发展
- 非集装箱化
- 集装箱化
支持
执照

长话短说

<?php

use JsonMachineItems;

// this often causes Allowed Memory Size Exhausted,
// because it loads all the items in the JSON into memory
- $users = json_decode(file_get_contents('500MB-users.json'));

// this has very small memory footprint no matter the file size
// because it loads items into memory one by one
+ $users = Items::fromFile('500MB-users.json');

foreach ($users as $id => $user) {
    // just process $user as usual
    var_dump($user->name);
}

像$users[42]这样的随机访问尚不可能。使用上面提到的foreach并查找项目或使用 JSON Pointer。

通过iterator_count($users)计算项目数。请记住，它仍然需要在内部迭代整个过程才能获得计数，因此所需的时间与迭代和手动计数大约相同。

如果开箱即用，则需要ext-json但如果使用自定义解码器则不需要。请参阅解码器。

遵循变更日志。

介绍

JSON Machine 是一个高效、易于使用且快速的 JSON 流/拉动/增量/惰性（无论你怎么称呼它）解析器，基于为不可预测的长 JSON 流或文档开发的生成器。主要特点是：

不可预测的大型 JSON 文档的恒定内存占用。
易于使用。只需使用foreach迭代任意大小的 JSON 即可。没有事件和回调。
在文档的任何子树上进行高效迭代，由 JSON Pointer 指定
速度。性能关键代码不包含不必要的函数调用，不包含正则表达式，并且默认使用本机json_decode来解码 JSON 文档项。请参阅解码器。
不仅解析流，还解析任何生成 JSON 块的迭代。
经过彻底测试。超过 200 个测试和 1000 个断言。

解析 JSON 文档

解析文档

假设fruits.json包含这个巨大的 JSON 文档：

 // fruits.json
{
    "apple" : {
        "color" : " red "
    },
    "pear" : {
        "color" : " yellow "
    }
}

可以这样解析：

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruits.json ' );

foreach ( $ fruits as $ name => $ data ) {
    // 1st iteration: $name === "apple" and $data->color === "red"
    // 2nd iteration: $name === "pear" and $data->color === "yellow"
}

解析 json 数组而不是 json 对象遵循相同的逻辑。 foreach 中的键是项目的数字索引。

如果您希望 JSON Machine 返回数组而不是对象，请使用new ExtJsonDecoder(true)作为解码器。

 <?php

use JsonMachine  JsonDecoder  ExtJsonDecoder ;
use JsonMachine  Items ;

$ objects = Items:: fromFile ( ' path/to.json ' , [ ' decoder ' => new ExtJsonDecoder ( true )]);

解析子树

如果您只想迭代此fruits.json中的results子树：

 // fruits.json
{
    "results" : {
        "apple" : {
            "color" : " red "
        },
        "pear" : {
            "color" : " yellow "
        }
    }
}

使用 JSON Pointer /results作为pointer选项：

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruits.json ' , [ ' pointer ' => ' /results ' ]);
foreach ( $ fruits as $ name => $ data ) {
    // The same as above, which means:
    // 1st iteration: $name === "apple" and $data->color === "red"
    // 2nd iteration: $name === "pear" and $data->color === "yellow"
}

笔记：
results的值不会立即加载到内存中，而是一次只加载results中的一项。在您当前迭代的级别/子树上，内存中每次始终只有一项。因此，内存消耗是恒定的。

解析数组中的嵌套值

JSON 指针规范还允许使用连字符 ( - ) 而不是特定的数组索引。 JSON Machine 将其解释为匹配任何数组索引（而不是任何对象键）的通配符。这使您能够迭代数组中的嵌套值，而无需加载整个项目。

例子：

 // fruitsArray.json
{
    "results" : [
        {
            "name" : " apple " ,
            "color" : " red "
        },
        {
            "name" : " pear " ,
            "color" : " yellow "
        }
    ]
}

要迭代水果的所有颜色，请使用 JSON 指针"/results/-/color" 。

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruitsArray.json ' , [ ' pointer ' => ' /results/-/color ' ]);

foreach ( $ fruits as $ key => $ value ) {
    // 1st iteration:
    $ key == ' color ' ;
    $ value == ' red ' ;
    $ fruits -> getMatchedJsonPointer () == ' /results/-/color ' ;
    $ fruits -> getCurrentJsonPointer () == ' /results/0/color ' ;

    // 2nd iteration:
    $ key == ' color ' ;
    $ value == ' yellow ' ;
    $ fruits -> getMatchedJsonPointer () == ' /results/-/color ' ;
    $ fruits -> getCurrentJsonPointer () == ' /results/1/color ' ;
}

解析单个标量值

您可以像解析集合一样解析文档中任何位置的单个标量值。考虑这个例子：

 // fruits.json
{
    "lastModified" : " 2012-12-12 " ,
    "apple" : {
        "color" : " red "
    },
    "pear" : {
        "color" : " yellow "
    },
    // ... gigabytes follow ...
}

获取lastModified键的标量值，如下所示：

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruits.json ' , [ ' pointer ' => ' /lastModified ' ]);
foreach ( $ fruits as $ key => $ value ) {
    // 1st and final iteration:
    // $key === 'lastModified'
    // $value === '2012-12-12'
}

当解析器找到该值并将其提供给您时，它会停止解析。因此，当单个标量值位于千兆字节大小的文件或流的开头时，它会立即从开头获取该值，并且几乎不消耗任何内存。

明显的捷径是：

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruits.json ' , [ ' pointer ' => ' /lastModified ' ]);
$ lastModified = iterator_to_array ( $ fruits )[ ' lastModified ' ];

单标量值访问也支持 JSON 指针中的数组索引。

解析多个子树

还可以使用多个 JSON 指针来解析多个子树。考虑这个例子：

 // fruits.json
{
    "lastModified" : " 2012-12-12 " ,
    "berries" : [
        {
          "name" : " strawberry " , // not a berry, but whatever ...
          "color" : " red "
        },
        {
          "name" : " raspberry " , // the same ...
          "color" : " red "
        }
    ],
    "citruses" : [
      {
          "name" : " orange " ,
          "color" : " orange "
      },
      {
          "name" : " lime " ,
          "color" : " green "
      }
    ]
}

要迭代所有浆果和柑橘类水果，请使用 JSON 指针["/berries", "/citrus"] 。指针的顺序并不重要。这些项目将按照文档中出现的顺序进行迭代。

 <?php

use  JsonMachine  Items ;

$ fruits = Items:: fromFile ( ' fruits.json ' , [
    ' pointer ' => [ ' /berries ' , ' /citruses ' ]
]);

foreach ( $ fruits as $ key => $ value ) {
    // 1st iteration:
    $ value == [ " name " => " strawberry " , " color " => " red " ];
    $ fruits -> getCurrentJsonPointer () == ' /berries ' ;

    // 2nd iteration:
    $ value == [ " name " => " raspberry " , " color " => " red " ];
    $ fruits -> getCurrentJsonPointer () == ' /berries ' ;

    // 3rd iteration:
    $ value == [ " name " => " orange " , " color " => " orange " ];
    $ fruits -> getCurrentJsonPointer () == ' /citruses ' ;

    // 4th iteration:
    $ value == [ " name " => " lime " , " color " => " green " ];
    $ fruits -> getCurrentJsonPointer () == ' /citruses ' ;
}

递归迭代

当 JSON 结构很难甚至无法使用Items和 JSON 指针处理，或者您迭代的单个项太大而无法处理时，请使用RecursiveItems而不是Items 。另一方面，它明显比Items慢，所以请记住这一点。

当RecursiveItems遇到 JSON 中的列表或字典时，它会返回自身的一个新实例，然后可以对其进行迭代并重复循环。因此，它从不返回 PHP 数组或对象，而仅返回标量值或RecursiveItems 。 JSON 字典或列表都不会一次完全加载到内存中。

让我们看一个有很多很多用户和很多很多朋友的例子：

 // users.json
[
  {
    "username" : " user " ,
    "e-mail" : " [email protected] " ,
    "friends" : [
      {
        "username" : " friend1 " ,
        "e-mail" : " [email protected] "
      },
      {
        "username" : " friend2 " ,
        "e-mail" : " [email protected] "
      }
    ]
  }
]

 <?php

use JsonMachine  RecursiveItems

$ users = RecursiveItems:: fromFile ( ' users.json ' );
foreach ( $ users as $ user ) {
    /** @var $user RecursiveItems */
    foreach ( $ user as $ field => $ value ) {
        if ( $ field === ' friends ' ) {
            /** @var $value RecursiveItems */
            foreach ( $ value as $ friend ) {
                /** @var $friend RecursiveItems */
                foreach ( $ friend as $ friendField => $ friendValue ) {
                    $ friendField == ' username ' ;
                    $ friendValue == ' friend1 ' ;
                }
            }
        }
    }
}

如果您中断这种惰性更深层次的迭代（即您通过break跳过一些"friends" ）并前进到下一个值（即下一个user ），您将无法稍后对其进行迭代。 JSON Machine 必须在后台迭代它才能读取下一个值。这种尝试将导致关闭生成器异常。

`RecursiveItems`的便捷方法

toArray(): array如果您确定 RecursiveItems 的某个实例指向内存可管理的数据结构（例如 $friend），您可以调用$friend->toArray() ，该项目将具体化为纯 PHP 数组。
advanceToKey(int|string $key): scalar|RecursiveItems在集合中搜索特定键（例如， $user中的'friends' ）时，不需要使用循环和条件来搜索它。相反，您可以简单地调用$user->advanceToKey("friends") 。它将为您迭代并返回该键的值。呼叫可以被链接。它还支持类似数组的语法，用于前进并获取后续索引。因此$user['friends']将是$user->advanceToKey('friends')的别名。呼叫可以被链接。请记住，它只是一个别名 - 在RecursiveItems上直接使用它后，您将无法随机访问以前的索引。它只是一个语法糖。如果您需要随机访问记录/项目上的索引，请使用toArray() 。

因此，前面的示例可以简化如下：

 <?php

use JsonMachine  RecursiveItems

$ users = RecursiveItems:: fromFile ( ' users.json ' );
foreach ( $ users as $ user ) {
    /** @var $user RecursiveItems */
    foreach ( $ user [ ' friends ' ] as $ friend ) { // or $user->advanceToKey('friends')
        /** @var $friend RecursiveItems */
        $ friendArray = $ friend -> toArray ();
        $ friendArray [ ' username ' ] === ' friend1 ' ;
    }
}

链接允许您执行以下操作：

 <?php

use JsonMachine  RecursiveItems

$ users = RecursiveItems:: fromFile ( ' users.json ' );
$ users [ 0 ][ ' friends ' ][ 1 ][ ' username ' ] === ' friend2 ' ;

`RecursiveItems implements RecursiveIterator`

因此，您可以使用 PHP 的内置工具来处理RecursiveIterator ，如下所示：

递归回调过滤迭代器
递归过滤迭代器
递归正则表达式迭代器
递归树迭代器

JSON 指针到底是什么？

这是一种寻址 JSON 文档中的一项的方法。请参阅 JSON Pointer RFC 6901。它非常方便，因为有时 JSON 结构会更深，并且您想要迭代子树，而不是主级别。因此，您只需指定指向要迭代的 JSON 数组或对象（甚至标量值）的指针即可。当解析器命中您指定的集合时，迭代开始。您可以将其作为所有Items::from*函数中的pointer选项传递。如果指定指向文档中不存在的位置的指针，则会引发异常。它也可用于访问标量值。 JSON 指针本身必须是有效的 JSON 字符串。参考标记（斜杠之间的部分）的字面比较是针对 JSON 文档键/成员名称执行的。

一些例子：

JSON 指针值	将迭代
（空字符串 - 默认）	`["this", "array"]`或`{"a": "this", "b": "object"}`将被迭代（主级别）
`/result/items`	`{"result": {"items": ["this", "array", "will", "be", "iterated"]}}`
`/0/items`	`[{"items": ["this", "array", "will", "be", "iterated"]}]` （支持数组索引）
`/results/-/status`	`{"results": [{"status": "iterated"}, {"status": "also iterated"}]}` （连字符作为数组索引通配符）
`/` （陷阱！ - 斜线后跟空字符串，请参阅规范）	`{"":["this","array","will","be","iterated"]}`
`/quotes"`	`{"quotes"": ["this", "array", "will", "be", "iterated"]}`

选项

选项可能会更改 JSON 的解析方式。选项数组是所有Items::from*函数的第二个参数。可用选项有：

pointer - 一个 JSON 指针字符串，告诉您要迭代文档的哪一部分。
decoder - ItemDecoder接口的实例。
debug - true或false以启用或禁用调试模式。启用调试模式后，文档中的行、列和位置等数据在解析期间或异常时可用。保持调试禁用会增加轻微的性能优势。

解析来自 JSON API 的流响应

流 API 响应或任何其他 JSON 流的解析方式与文件完全相同。唯一的区别是，您使用Items::fromStream($streamResource)来实现，其中$streamResource是带有 JSON 文档的流资源。其余的与解析文件相同。以下是支持流响应的流行 http 客户端的一些示例：

GuzzleHttp

Guzzle 使用自己的流，但可以通过调用GuzzleHttpPsr7StreamWrapper::getResource()将它们转换回 PHP 流。将此函数的结果传递给Items::fromStream函数，然后您就完成了设置。请参阅工作 GuzzleHttp 示例。

Symfony HttpClient

Symfony HttpClient 的流响应充当迭代器。而且由于 JSON Machine 基于迭代器，因此与 Symfony HttpClient 的集成非常简单。请参阅 HttpClient 示例。

跟踪进度（启用`debug` ）

大文档可能需要一段时间才能解析。在foreach中调用Items::getPosition()来获取从头开始处理的当前字节数。百分比很容易计算为position / total * 100 。要了解文档的总大小（以字节为单位），您可能需要检查：

strlen($document)如果解析字符串
filesize($file)如果解析文件
如果您解析 http 流响应，则Content-Length http 标头
...你明白了

如果禁用debug ， getPosition()始终返回0 。

 <?php

use JsonMachine  Items ;

$ fileSize = filesize ( ' fruits.json ' );
$ fruits = Items:: fromFile ( ' fruits.json ' , [ ' debug ' => true ]);
foreach ( $ fruits as $ name => $ data ) {
    echo ' Progress: ' . intval ( $ fruits -> getPosition () / $ fileSize * 100 ) . ' % ' ; 
}

解码器

Items::from*函数也接受decoder选项。它必须是JsonMachineJsonDecoderItemDecoder的实例。如果未指定，则默认使用ExtJsonDecoder 。它需要ext-json PHP 扩展存在，因为它使用json_decode 。当json_decode不能满足您的要求时，请实现JsonMachineJsonDecoderItemDecoder并制作您自己的。

可用解码器

ExtJsonDecoder -默认。使用json_decode来解码键和值。构造函数与json_decode具有相同的参数。
PassThruDecoder - 不进行解码。键和值均以纯 JSON 字符串的形式生成。当您想直接在 foreach 中使用其他内容解析 JSON 项并且不想实现JsonMachineJsonDecoderItemDecoder时很有用。从1.0.0开始不使用json_decode 。

例子：

 <?php

use JsonMachine  JsonDecoder  PassThruDecoder ;
use JsonMachine  Items ;

$ items = Items:: fromFile ( ' path/to.json ' , [ ' decoder ' => new PassThruDecoder ]);

ErrorWrappingDecoder - 一个装饰器，它将解码错误包装在DecodingError对象中，从而使您能够跳过格式错误的项目，而不是因SyntaxError异常而死亡。例子：

 <?php

use JsonMachine  Items ;
use JsonMachine  JsonDecoder  DecodingError ;
use JsonMachine  JsonDecoder  ErrorWrappingDecoder ;
use JsonMachine  JsonDecoder  ExtJsonDecoder ;

$ items = Items:: fromFile ( ' path/to.json ' , [ ' decoder ' => new ErrorWrappingDecoder ( new ExtJsonDecoder ())]);
foreach ( $ items as $ key => $ item ) {
    if ( $ key instanceof DecodingError || $ item instanceof DecodingError) {
        // handle error of this malformed json item
        continue ;
    }
    var_dump ( $ key , $ item );
}

错误处理

从 0.4.0 开始，每个异常都会扩展JsonMachineException ，因此您可以捕获它以过滤 JSON Machine 库中的任何错误。

跳过格式错误的项目

如果 json 流中的任何位置出现错误，则会引发SyntaxError异常。这非常不方便，因为如果一个 json 项目中有错误，您将无法解析文档的其余部分，因为其中一个项目格式错误。 ErrorWrappingDecoder是一个解码器装饰器，可以帮助您解决这个问题。用它包装一个解码器，您正在迭代的所有格式错误的项目都将通过DecodingError在 foreach 中提供给您。这样您就可以跳过它们并继续阅读文档。请参阅可用解码器中的示例。不过，迭代项之间的 json 流结构中的语法错误仍然会抛出SyntaxError异常。

解析器效率

时间复杂度始终为O(n)

流/文件

TL;DR：内存复杂度为O(2)

JSON Machine 一次读取流（或文件）1 个 JSON 项，并一次生成相应的 1 个 PHP 项。这是最有效的方法，因为如果您在 JSON 文件中有 10,000 个用户并希望使用json_decode(file_get_contents('big.json'))解析它，那么您将在内存中拥有整个字符串以及所有 10,000 个用户PHP 结构。下表显示了差异：

	一次在内存中存储字符串项	一次在内存中解码 PHP 项目	全部的
`json_decode()`	10000	10000	20000
`Items::from*()`	1	1	2

这意味着，JSON Machine 对于任何大小的已处理 JSON 都始终高效。 100GB没问题。

内存中的 JSON 字符串

TL;DR：内存复杂度为O(n+1)

还有一个方法Items::fromString() 。如果你被迫解析一个大字符串，并且流不可用，JSON Machine 可能比json_decode更好。原因是，与json_decode不同，JSON Machine 仍然一次遍历 JSON 字符串一项，并且不会一次将所有生成的 PHP 结构加载到内存中。

让我们继续使用 10,000 个用户的示例。这次它们都在内存中以字符串形式存在。当使用json_decode解码该字符串时，会在内存中创建 10,000 个数组（对象），然后返回结果。另一方面，JSON Machine 为字符串中找到的每个项目创建单个结构并将其返回给您。当您处理此项并迭代到下一项时，将创建另一个单一结构。这与流/文件的行为相同。下表阐述了这个概念：

	一次在内存中存储字符串项	一次在内存中解码 PHP 项目	全部的
`json_decode()`	10000	10000	20000
`Items::fromString()`	10000	1	10001

现实情况甚至更好。 Items::fromString消耗的内存比json_decode少大约 5 倍。原因是 PHP 结构比其相应的 JSON 表示占用更多的内存。

故障排除

“我仍然得到允许的内存大小......耗尽”

原因之一可能是您想要迭代的项目位于某些子键（例如"results"中，但您忘记指定 JSON 指针。请参阅解析子树。

“那没有帮助”

另一个原因可能是，您迭代的其中一项本身太大，无法立即解码。例如，您迭代用户，其中一个用户中有数千个“朋友”对象。最有效的解决方案是使用递归迭代。

“我还是运气不好”

这可能意味着单个 JSON 标量字符串本身太大，无法放入内存。例如非常大的 base64 编码文件。在这种情况下，在 JSON Machine 支持将标量值生成为 PHP 流之前，您可能仍然会运气不佳。

安装

使用作曲家

composer require halaxa/json-machine

没有作曲家

克隆或下载此存储库并将以下内容添加到您的引导文件中：

 spl_autoload_register ( require ' /path/to/json-machine/src/autoloader.php ' );

发展

克隆这个存储库。该库支持两种开发方法：

非容器化（PHP 和 Composer 已安装在您的计算机上）
容器化（您机器上的 Docker）

非集装箱化

在项目目录中运行composer run -l以查看可用的开发脚本。这样您就可以运行构建过程的某些步骤，例如测试。

集装箱化

安装 Docker 并在主机上的项目目录中运行make以查看可用的开发工具/命令。您可以单独运行构建过程的所有步骤，也可以一次运行整个构建过程。 Make 基本上在后台容器内运行 Composer 开发脚本。

make build ：运行完整的构建。通过 GitHub Actions CI 运行相同的命令。

支持

你喜欢这个图书馆吗？给它加注星标、分享它、展示它:) 非常欢迎提出问题和拉取请求。

执照

阿帕奇2.0

齿轮元素：由 TutsPlus 从 www.flaticon.com 制作的图标已获得 CC 3.0 BY 许可

使用 markdown-toc 生成的目录

展开

json machine