1/* Part of ClioPatria SeRQL and SPARQL server 2 3 Author: Michiel Hildebrand and Jan Wielemaker 4 E-mail: michielh@few.vu.nl 5 WWW: http://www.swi-prolog.org 6 Copyright (c) 2010-2018, VU University Amsterdam 7 All rights reserved. 8 9 Redistribution and use in source and binary forms, with or without 10 modification, are permitted provided that the following conditions 11 are met: 12 13 1. Redistributions of source code must retain the above copyright 14 notice, this list of conditions and the following disclaimer. 15 16 2. Redistributions in binary form must reproduce the above copyright 17 notice, this list of conditions and the following disclaimer in 18 the documentation and/or other materials provided with the 19 distribution. 20 21 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 24 FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 25 COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 26 INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 27 BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 28 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 29 CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 31 ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 32 POSSIBILITY OF SUCH DAMAGE. 33*/ 34 35:- module(api_lod, 36 [ lod_api/2 % +Request 37 ]). 38 39:- use_module(library(http/http_dispatch)). 40:- use_module(library(http/http_json)). 41:- use_module(library(http/http_host)). 42:- use_module(library(http/http_request_value)). 43:- use_module(library(http/http_cors)). 44:- use_module(library(semweb/rdf_db)). 45:- use_module(library(semweb/rdf_json)). 46:- use_module(library(semweb/rdf_describe)). 47:- use_module(library(settings)). 48:- use_module(library(option)). 49:- use_module(library(rdf_write)). 50:- use_module(library(semweb/rdf_turtle_write)). 51:- use_module(library(uri)). 52:- use_module(library(debug)). 53:- use_module(library(apply)). 54:- use_module(library(dcg/basics)). 55:- use_module(library(base64)). 56:- use_module(library(utf8)). 57 58 59/** <module> LOD - Linked Open Data server 60 61Linked (Open) Data turns RDF URIs (indentifiers) into URLs (locators). 62Requesting the data behind the URL returns a description of the 63resource. So, if we see a resource http://example.com/employe/bill, we 64get do an HTTP GET request and expect to receive a description of 65_bill_. This module adds LOD facilities to ClioPatria. 66 67---++ Running the LOD server 68 69There are several ways to run the LOD server. 70 71 1. The simplest way to realise LOD is to run ClioPatria there where 72 the authority component of the URL points to (see uri_components/2 73 for decomposing URIs). This implies you must be able to create a 74 DNS binding for the host and be able to run ClioPatria there. 75 76 2. Sometimes the above does not work, because the port is already 77 assigned to another machine, you are not allowed to run ClioPatria 78 on the target host, the target is behind a firewall, etc. In that 79 case, notable if the host runs Apache, you can exploit the Apache 80 module =mod_proxy= and proxy the connections to a location where 81 ClioPatria runs. If you ensure that the path on Apache is the same 82 as the path on ClioPatria, the following Apache configuration rule 83 solves the problem: 84 85 == 86 ProxyPass /rdf/ http://cliopatria-host:3020/rdf/ 87 == 88 89 3. Both above methods require no further configuration. 90 Unfortunately, they require a registered domain control over DNS 91 and administrative rights over certain machines. A solution that 92 doesn't require this is to use www.purl.org. This allows you to 93 redirect URLs within the purl domain to any location you control. 94 The redirection method can be defined with purl. In the semantic 95 web community, we typically use *|See other|* (303). The catch 96 is that if the address arrives at ClioPatria, we no longer know 97 where it came from. This is not a problem in (1), as there was 98 no redirect. It is also not a problem in (2), because Apache 99 adds a header =|x-forwarded-host|=. Unfortunately, there is 100 no way to tell you are activated through a redirect, let alone 101 where the redirect came from. 102 103 To deal with this situation, we use the redirected_from option of 104 lod_api/2. For example, if http://www.purl.org/vocabularies/myvoc/ 105 is redirected to /myvoc/ on ClioPatria, we use: 106 107 == 108 :- http_handler('/myvoc/', 109 lod_api([ redirected_from('http://www.purl.org/vocabularies/myvoc/') 110 ]), 111 [ prefix ]). 112 == 113 114By default, there is no HTTP handler pointing to lod_api/2. The example 115above describes how to deal with redirected URIs. The cases (1) and (2) 116must also be implemented by registering a handler. This can be as blunt 117as registering a handler for the root of the server, but typically one 118would use one or more handlers that deal with sub-trees that act as 119Linked Data repositories. Handler declarations should use absolute 120addresses to guarantee a match with the RDF URIs, even if the server is 121relocated by means of the http:prefix setting. For example: 122 123 == 124 :- http_handler('/rdf/', lod_api([]), [prefix]). 125 == 126 127@see http://linkeddata.org/ 128*/ 129 130:- setting(lod:redirect, boolean, false, 131 'If true, redirect from accept-header to extension'). 132 133%! lod_api(+Options, +Request) 134% 135% Reply to a Linked Data request. The handler is capable of three 136% output formats. It decides on the desired format based on the 137% HTTP =Accept= header-field. If no acceptable format is found, it 138% replies with a human-readable description of the resource using 139% ClioPatria RDF browser-page as defined by list_resource//2. 140% 141% Options: 142% 143% * redirected_from(+URL) 144% This option must be provided when using a purl.org or 145% similar redirect. See overall documentation of this 146% library. 147% 148% * bounded_description(+Type) 149% Description style to use. See rdf_bounded_description/4. 150% The default is =cbd= (Concise Bounded Description) 151 152lod_api(_Options, Request) :- 153 \+ memberchk(path_info(_), Request), 154 !, 155 accepts(Request, AcceptList), 156 preferred_format(AcceptList, Format), 157 ( Format == html 158 -> http_link_to_id(home, [], Redirect) 159 ; http_link_to_id(well_known_void, [], Redirect) 160 ), 161 http_redirect(see_other, Redirect, Request). 162lod_api(_Options, Request) :- 163 memberchk(path_info('/.well-known/void'), Request), 164 !, 165 http_link_to_id(well_known_void, [], Redirect), 166 http_redirect(see_other, Redirect, Request). 167lod_api(Options, Request) :- 168 lod_uri(Request, URI, Options), 169 debug(lod, 'LOD URI: ~q', [URI]), 170 accepts(Request, AcceptList), 171 triple_filter(Request, Filter), 172 cors_enable, 173 lod_request(URI, AcceptList, Request, Filter, Options). 174 175accepts(Request, AcceptList) :- 176 ( memberchk(accept(AcceptHeader), Request) 177 -> ( atom(AcceptHeader) % compatibility 178 -> http_parse_header_value(accept, AcceptHeader, AcceptList) 179 ; AcceptList = AcceptHeader 180 ) 181 ; AcceptList = [] 182 ). 183 184%! triple_filter(+Request, -Filter) is det. 185% 186% Extract Triple-Filter from Request. Ignores the filter if it 187% is invalid. 188 189triple_filter(Request, Filter) :- 190 catch(phrase(triple_filter(Request), Filter), E, 191 (print_message(warning, E),fail)), 192 !. 193triple_filter(_, []). 194 195 196%! triple_filter(+Text)// 197% 198% Translate an RDF triple pattern into a list of rdf(S,P,O) terms. 199 200triple_filter([]) --> 201 []. 202triple_filter([triple_filter(Filter)|T]) --> 203 !, 204 one_triple_filter(Filter), 205 triple_filter(T). 206triple_filter([_|T]) --> 207 triple_filter(T). 208 209one_triple_filter(Encoded) --> 210 { string_codes(Encoded, EncCodes), 211 phrase(base64(UTF8Bytes), EncCodes), 212 phrase(utf8_codes(PlainCodes), UTF8Bytes), 213 string_codes(Filter, PlainCodes), 214 split_string(Filter, "\r\n", "\r\n", Filters), 215 maplist(map_triple_filter, Filters, Triples) 216 }, 217 string(Triples). 218 219map_triple_filter(String, rdf(S,P,O)) :- 220 split_string(String, "\s\t", "\s\t", [SS,SP,SO]), 221 triple_term(SS, S), 222 triple_term(SP, P), 223 triple_term(SO, O). 224 225triple_term("?", _) :- !. 226triple_term(S, N) :- 227 string_codes(S, Codes), 228 phrase(sparql_grammar:graph_term(N), Codes). 229 230%! lod_request(+URI, +AcceptList, +Request, +Filter, +Options) 231% 232% Handle an LOD request. 233 234lod_request(URI, AcceptList, Request, Filter, Options) :- 235 lod_resource(URI), 236 !, 237 preferred_format(AcceptList, Format), 238 debug(lod, 'LOD Format: ~q', [Format]), 239 ( cliopatria:redirect_uri(Format, URI, SeeOther) 240 -> http_redirect(see_other, SeeOther, Request) 241 ; setting(lod:redirect, true), 242 redirect(URI, AcceptList, SeeOther) 243 -> http_redirect(see_other, SeeOther, Request) 244 ; lod_describe(Format, URI, Request, Filter, Options) 245 ). 246lod_request(URL, _AcceptList, Request, Filter, Options) :- 247 format_request(URL, URI, Format), 248 !, 249 lod_describe(Format, URI, Request, Filter, Options). 250lod_request(URI, _AcceptList, _Request, _Filter, _) :- 251 throw(http_reply(not_found(URI))). 252 253 254%! lod_uri(+Request, -URI, +Options) 255% 256% URI is the originally requested URI. This predicate deals with 257% redirections if the HTTP handler was registered using the option 258% redirected_from(URL). Otherwise it resolves the correct global 259% URI using http_current_host/4. 260 261lod_uri(Request, URI, Options) :- 262 memberchk(redirected_from(Org), Options), 263 memberchk(request_uri(ReqURI), Request), 264 handler_location(Request, Location), 265 atom_concat(Location, Rest, ReqURI), 266 atom_concat(Org, Rest, URI). 267lod_uri(Request, URI, _) :- 268 memberchk(request_uri(ReqURI), Request), 269 http_current_host(Request, Host, Port, 270 [ global(true) 271 ]), 272 ( Port == 80 273 -> atomic_list_concat(['http://', Host, ReqURI], URI) 274 ; atomic_list_concat(['http://', Host, :, Port, ReqURI], URI) 275 ). 276 277 278%! handler_location(+Request, -Location) is det. 279% 280% Location is the requested location on the server. This includes 281% the handler location, normally concatenated with the path_info. 282 283handler_location(Request, Location) :- 284 memberchk(path(Path), Request), 285 ( memberchk(path_info(Rest), Request), 286 atom_concat(Location, Rest, Path) 287 -> true 288 ; Location = Path 289 ). 290 291 292%! redirect(+URI, +AcceptList, -RedirectURL) 293% 294% Succeeds if URI is in the store and a RedirectURL is found for 295% it. 296 297redirect(URI, AcceptList, To) :- 298 lod_resource(URI), 299 preferred_format(AcceptList, Format), 300 ( cliopatria:redirect_uri(Format, URI, To) 301 -> true 302 ; uri_components(URI, URIComponents), 303 uri_data(path, URIComponents, Path0), 304 format_suffix(Format, Suffix), 305 file_name_extension(Path0, Suffix, Path), 306 uri_data(path, URIComponents, Path, ToComponents), 307 uri_components(To, ToComponents) 308 ). 309 310 311%! preferred_format(+AcceptList, -Format) is det. 312% 313% Format is the highest ranked mimetype found in the Acceptlist of 314% the request and that we can support. Expects an AcceptList 315% sorted by rank. 316 317preferred_format(AcceptList, Format) :- 318 member(media(MimeType,_,_,_), AcceptList), 319 ground(MimeType), 320 mimetype_format(MimeType, Format), 321 !. 322preferred_format(_, html). 323 324 325%! format_request(+URL, -URI, -Format) is semidet. 326% 327% True if URL contains a suffix that corresponds to a supported 328% output format, and the global URI occurs in the database. 329 330format_request(URL, URI, Format) :- 331 uri_components(URL, URLComponents), 332 uri_data(path, URLComponents, Path), 333 file_name_extension(Base, Ext, Path), 334 ( format_suffix(Format, Ext), 335 mimetype_format(_, Format) 336 -> true 337 ), 338 uri_data(path, URLComponents, Base, PlainComponents), 339 uri_components(URI, PlainComponents), 340 lod_resource(URI). 341 342 343%! lod_describe(+Format, +URI, +Request, +Filter, +Options) is det. 344% 345% Write an HTTP document describing URI to in Format to the 346% current output. Format is defined by mimetype_format/2. 347 348lod_describe(html, URI, Request, _, _) :- 349 !, 350 ( rdf_graph(URI) 351 -> http_link_to_id(list_graph, [graph=URI], Redirect) 352 ; http_link_to_id(list_resource, [r=URI], Redirect) 353 ), 354 http_redirect(see_other, Redirect, Request). 355lod_describe(Format, URI, _Request, Filter, Options) :- 356 lod_description(URI, RDF, Filter, Options), 357 send_graph(Format, RDF). 358 359send_graph(xmlrdf, RDF) :- 360 format('Content-type: application/rdf+xml; charset=UTF-8~n~n'), 361 rdf_write_xml(current_output, RDF). 362send_graph(json, RDF) :- 363 graph_json(RDF, JSON), 364 reply_json(JSON). 365send_graph(turtle, RDF) :- 366 format('Content-type: text/turtle; charset=UTF-8~n~n'), 367 rdf_save_turtle(stream(current_output), 368 [ expand(triple_in(RDF)), 369 only_known_prefixes(true), 370 silent(true) 371 ]). 372 373%! triple_in(+RDF, ?S,?P,?O, ?G) is nondet. 374% 375% Lookup a triple in the graph RDF, represented as a list of 376% rdf(S,P,O). 377% 378% @tbd Describe required indexing from rdf_save_turtle/2 and 379% implement that if the graph is big. 380 381:- public triple_in/5. % called from send_graph/2. 382 383triple_in(RDF, S,P,O,_G) :- 384 member(rdf(S,P,O), RDF). 385 386 387%! lod_description(+URI, -RDF, +Filter, +Options) is det. 388% 389% RDF is a graph represented as a list of rdf(S,P,O) that 390% describes URI. 391% 392% This predicate is hooked by cliopatria:lod_description/2. The 393% default is implemented by resource_CBD/3. 394% 395% @see SPARQL DESCRIBE 396 397lod_description(URI, RDF, _, _) :- 398 cliopatria:lod_description(URI, RDF), 399 !. 400lod_description(URI, RDF, Filter, Options) :- 401 option(bounded_description(Type), Options, cbd), 402 echo_filter(Filter), 403 rdf_bounded_description(rdf, Type, Filter, URI, RDF). 404 405echo_filter([]) :- !. 406echo_filter(Filters) :- 407 copy_term(Filters, Filters1), 408 term_variables(Filters1, Vars), 409 maplist(=(?), Vars), 410 filters_to_ntriples(Filters1, NTriples), 411 split_string(NTriples, "\n", "\n.\s", Strings0), 412 maplist(insert_q, Strings0, Strings), 413 atomics_to_string(Strings, "\n", String), 414 base64(String, Encoded), 415 format('Triple-Filter: ~w\r\n', [Encoded]). 416 417insert_q(String, QString) :- 418 split_string(String, " ", "", [S,P,O|M]), 419 map_q(S, QS), 420 map_q(P, QP), 421 map_q(O, QO), 422 atomics_to_string([QS,QP,QO|M], " ", QString). 423 424map_q("<?>", "?") :- !. 425map_q(S, S). 426 427filters_to_ntriples(Filters, String) :- 428 with_output_to( 429 string(String), 430 rdf_save_ntriples(stream(current_output), 431 [ expand(api_lod:triple_in(Filters))])). 432 433 434%! mimetype_format(?MimeType, ?Format) is nondet. 435% 436% Conversion between mimetypes and formats. 437 438mimetype_format(application/'rdf+xml', xmlrdf). 439mimetype_format(application/json, json). 440mimetype_format(application/'x-turtle', turtle). 441mimetype_format(text/turtle, turtle). 442mimetype_format(text/html, html). 443 444%! format_suffix(?Format, ?Suffix) is nondet. 445% 446% Suffix is the file name extension used for Format. 447 448format_suffix(xmlrdf, rdf). 449format_suffix(json, json). 450format_suffix(html, html). 451format_suffix(turtle, ttl). 452 453 454%! lod_resource(+Resource) is semidet. 455% 456% True if Resource is an existing resource for the LOD server. 457% Typically, this means it appears as a subject, but when 458% considering symmetric bounded descriptions, it should certainly 459% also hold for resources that only appear as object. 460 461lod_resource(Resource) :- 462 ( rdf(Resource, _, _) 463 ; rdf(_, Resource, _) 464 ; rdf(_, _, Resource) 465 ; rdf_graph(Resource) 466 ), 467 !. 468 469 470 /******************************* 471 * HOOKS * 472 *******************************/ 473 474:- multifile 475 cliopatria:redirect_uri/3, 476 cliopatria:lod_description/2. 477 478%! cliopatria:redirect_uri(+Format, +URI, -RedirectURL) 479% 480% Compose a RedirectionURL based on the output Format and the URI 481% that is in our RDF database. For example, this could map the URI 482% http://example.com/employe/bill into Bill's homepage at 483% http://example.com/~bill if Format is =html=. The default is to 484% a format-specific extension to the path component of URI, 485% returning e.g., http://example.com/employe/bill.rdf if the 486% requested format is RDF. 487% 488% @see This hook is used by redirect/3. 489% @param Format is one of =xmlrdf=, =turtle, =json= or =html=. 490 491 492%! cliopatria:lod_description(+URI, -RDF:list(rdf(s,p,o))) 493% 494% RDF is list of triples describing URI. The default is to use the 495% Concise Bounded Description as implemented by resource_CBD/3. 496% 497% @see This hook is used by lod_description/2 498% @see library(semweb/rdf_describe) provides several definitions 499% of bounded descriptions.