hardf是一个 PHP 7.1+ 库,可让您处理链接数据 (RDF)。它提供:
解析器和序列化器都具有流支持。
该库是 N3.js 到 PHP 的端口
我们在 PHP 中使用从 NodeJS N3.js 库移植的三重表示。检查 https://github.com/rdfjs/N3.js/tree/v0.10.0#triple-representation 了解更多信息
我们有意关注性能,而不是开发人员友好性。因此,我们使用关联数组而不是 PHP 对象来实现这种三重表示。因此,对于 N3.js 来说,现在是一个数组。例如:
<?php
$ triple = [
' subject ' => ' http://example.org/cartoons#Tom ' ,
' predicate ' => ' http://www.w3.org/1999/02/22-rdf-syntax-ns#type ' ,
' object ' => ' http://example.org/cartoons#Cat ' ,
' graph ' => ' http://example.org/mycartoon ' , #optional
];
如下对文字进行编码(类似于 N3.js)
' "Tom"@en-gb ' // lowercase language
'" 1 "^^http: //www.w3.org/2001/XMLSchema#integer' // no angular brackets <>
使用 Composer 安装此库:
composer require pietercolpaert/ hardf
use pietercolpaert hardf TriGWriter ;
应该实例化的类,可以编写TriG或Turtle
使用示例:
$ writer = new TriGWriter ([
" prefixes " => [
" schema " => " http://schema.org/ " ,
" dct " => " http://purl.org/dc/terms/ " ,
" geo " => " http://www.w3.org/2003/01/geo/wgs84_pos# " ,
" rdf " => " http://www.w3.org/1999/02/22-rdf-syntax-ns# " ,
" rdfs " => " http://www.w3.org/2000/01/rdf-schema# "
],
" format " => " n-quads " //Other possible values: n-quads, trig or turtle
]);
$ writer -> addPrefix ( " ex " , " http://example.org/ " );
$ writer -> addTriple ( " schema:Person " , " dct:title " , "" Person " @en " , " http://example.org/#test " );
$ writer -> addTriple ( " schema:Person " , " schema:label " , "" Person " @en " , " http://example.org/#test " );
$ writer -> addTriple ( " ex:1 " , " dct:title " , "" Person1 " @en " , " http://example.org/#test " );
$ writer -> addTriple ( " ex:1 " , " http://www.w3.org/1999/02/22-rdf-syntax-ns#type " , " schema:Person " , " http://example.org/#test " );
$ writer -> addTriple ( " ex:2 " , " dct:title " , "" Person2 " @en " , " http://example.org/#test " );
$ writer -> addTriple ( " schema:Person " , " dct:title " , "" Person " @en " , " http://example.org/#test2 " );
echo $ writer -> end ();
//The method names should speak for themselves:
$ writer = new TriGWriter ([ " prefixes " : [ /* ... */ ]]);
$ writer -> addTriple ( $ subject , $ predicate , $ object , $ graphl );
$ writer -> addTriples ( $ triples );
$ writer -> addPrefix ( $ prefix , $ iri );
$ writer -> addPrefixes ( $ prefixes );
//Creates blank node($predicate and/or $object are optional)
$ writer -> blank ( $ predicate , $ object );
//Creates rdf:list with $elements
$ list = $ writer -> addList ( $ elements );
//Returns the current output it is already able to create and clear the internal memory use (useful for streaming)
$ out .= $ writer -> read ();
//Alternatively, you can listen for new chunks through a callback:
$ writer -> setReadCallback ( function ( $ output ) { echo $ output });
//Call this at the end. The return value will be the full triple output, or the rest of the output such as closing dots and brackets, unless a callback was set.
$ out .= $ writer -> end ();
//OR
$ writer -> end ();
除了 TriG 之外,TriGParser 类还解析 Turtle、N-Triples、N-Quads 和 W3C 团队提交 N3
$ parser = new TriGParser ( $ options , $ tripleCallback , $ prefixCallback );
$ parser -> setTripleCallback ( $ function );
$ parser -> setPrefixCallback ( $ function );
$ parser -> parse ( $ input , $ tripleCallback , $ prefixCallback );
$ parser -> parseChunk ( $ input );
$ parser -> end ();
使用返回值并将其传递给编写器:
use pietercolpaert hardf TriGParser ;
use pietercolpaert hardf TriGWriter ;
$ parser = new TriGParser ([ " format " => " n-quads " ]); //also parser n-triples, n3, turtle and trig. Format is optional
$ writer = new TriGWriter ();
$ triples = $ parser -> parse ( " <A> <B> <C> <G> . " );
$ writer -> addTriples ( $ triples );
echo $ writer -> end ();
使用回调并将其传递给编写器:
$ parser = new TriGParser ();
$ writer = new TriGWriter ([ " format " => " trig " ]);
$ parser -> parse ( " <http://A> <https://B> <http://C> <http://G> . <A2> <https://B2> <http://C2> <http://G3> . " , function ( $ e , $ triple ) use ( $ writer ) {
if (! isset ( $ e ) && isset ( $ triple )) {
$ writer -> addTriple ( $ triple );
echo $ writer -> read (); //write out what we have so far
} else if (! isset ( $ triple )) // flags the end of the file
echo $ writer -> end (); //write the end
else
echo " Error occured: " . $ e ;
});
当您需要解析大文件时,您只需要解析块并已经处理它们。您可以按如下方式执行此操作:
$ writer = new TriGWriter ([ " format " => " n-quads " ]);
$ tripleCallback = function ( $ error , $ triple ) use ( $ writer ) {
if ( isset ( $ error ))
throw $ error ;
else if ( isset ( $ triple )) {
$ writer -> write ();
echo $ writer -> read ();
else if ( isset ( $ error )) {
throw $ error ;
} else {
echo $ writer -> end ();
}
};
$ prefixCallback = function ( $ prefix , $ iri ) use (& $ writer ) {
$ writer -> addPrefix ( $ prefix , $ iri );
};
$ parser = new TriGParser ([ " format " => " trig " ], $ tripleCallback , $ prefixCallback );
$ parser -> parseChunk ( $ chunk );
$ parser -> parseChunk ( $ chunk );
$ parser -> parseChunk ( $ chunk );
$ parser -> end (); //Needs to be called
format
输入格式(不区分大小写)turtle
- 乌龟trig
-TriGtriple
,例如triple
, ntriples
, N-Triples
- N-三元组quad
,例如quad
, nquads
, N-Quads
- N-Quadsn3
,例如n3
- N3blankNodePrefix
(默认为b0_
)强制在空白节点名称上添加前缀,例如TriGWriter(["blankNodePrefix" => 'foo'])
会将_:bar
解析为_:foobar
。documentIRI
设置用于解析相对 URI 的基本 URI(如果format
指示 n-三元组或 n-四元组,则不适用)lexer
允许使用自己的词法分析器类。词法分析器必须提供以下公共方法:tokenize(string $input, bool $finalize = true): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
tokenizeChunk(string $input): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
end(): array<array{'subject': string, 'predicate': string, 'object': string, 'graph': string}>
explicitQuantifiers
- [...] 一些 Turtle 和 N3 文档可能使用相对于基 IRI 的 IRI 语法(请参阅此处和此处),例如
<> <someProperty> "some value" .
为了正确解析此类文档,必须知道文档基础 IRI。否则,我们可能会得到空的 IRI(例如,对于上例中的主题)。
有时基本 IRI 被编码在文档中,例如
@base <http://some.base/iri/> .
<> <someProperty> "some value" .
但有时它会丢失。在这种情况下,Turtle 规范要求我们遵循 RFC3986 的第 5.1.1 节,其中规定如果基本 IRI 未封装在文档中,则应假定它是文档检索 URI(例如,您下载文档的 URL)来自或转换为 URL 的文件路径)。不幸的是, hardf解析器无法猜测这一点,并且必须由您使用documentIRI
解析器创建选项来提供,例如
parser = new TriGParser ([ " documentIRI " => " http://some.base/iri/ " ]);
长话短说,如果您subject/predicate/object on line X can not be parsed without knowing the the document base IRI.(...)
错误,请使用documentIRI
选项初始化解析器。
use pietercolpaert hardf Util ;
一个静态类,带有一些有用的函数,用于处理我们特定的三重表示。它将帮助您创建和评估文字、IRI 和扩展前缀。
$ bool = isIRI ( $ term );
$ bool = isLiteral ( $ term );
$ bool = isBlank ( $ term );
$ bool = isDefaultGraph ( $ term );
$ bool = inDefaultGraph ( $ triple );
$ value = getLiteralValue ( $ literal );
$ literalType = getLiteralType ( $ literal );
$ lang = getLiteralLanguage ( $ literal );
$ bool = isPrefixedName ( $ term );
$ expanded = expandPrefixedName ( $ prefixedName , $ prefixes );
$ iri = createIRI ( $ iri );
$ literalObject = createLiteral ( $ value , $ modifier = null );
有关更多信息,请参阅 https://github.com/RubenVerborgh/N3.js#utility 上的文档。
我们还在bin/
中提供了 2 个简单的工具作为示例实现:一个验证器和一个翻译器。尝试例如:
curl -H " accept: application/trig " http://fragments.dbpedia.org/2015/en | php bin/validator.php trig
curl -H " accept: application/trig " http://fragments.dbpedia.org/2015/en | php bin/convert.php trig n-triples
我们比较了两个海龟文件的性能,并使用 PHP 中的 EasyRDF 库、NodeJS 的 N3.js 库以及hardf对其进行了解析。结果如下:
#三元组 | 框架 | 时间(毫秒) | 内存(MB) |
---|---|---|---|
1,866 | 没有opcache的hardf | 27.6 | 0.722 |
1,866 | 带有opcache的hardf | 24.5 | 0.380 |
1,866 | 没有 opcache 的 EasyRDF | 5,166.5 | 2.772 |
1,866 | EasyRDF 与 opcache | 5,176.2 | 2.421 |
1,866 | 带有 opcache 的 ARC2 | 71.9 | 1.966 |
1,866 | N3.js | 24.0 | 28.xxx |
3,896,560 | 没有opcache的hardf | 40,017.7 | 0.722 |
3,896,560 | 带有opcache的hardf | 33,155.3 | 0.380 |
3,896,560 | N3.js | 7,004.0 | 59.xxx |
3,896,560 | 带有 opcache 的 ARC2 | 203,152.6 | 3,570.808 |
hardf库的版权归 Ruben Verborgh 和 Pieter Colpaert 所有,并根据 MIT 许可证发布。
欢迎贡献,错误报告或拉取请求总是有帮助的。如果您计划实现更大的功能,最好首先通过提交问题来讨论。