SWI-Prolog -- Literal matching and indexing

Documentation
- Reference manual
- Packages
  - SWI-Prolog Semantic Web Library 3.0
    - Two RDF APIs
      - library(semweb/rdf_db): The RDF database
        
        Query the RDF database
        
        Enumerating objects
        
        Modifying the RDF database
        
        Update view, transactions and snapshots
        
        Type checking predicates
        
        Loading and saving to file
        
        Graph manipulation
        
        Literal matching and indexing
        
        Predicate properties
        
        Prefix Handling
        
        Miscellaneous predicates
        
        Memory management considerations

3.3.8 Literal matching and indexing

Literal values are ordered and indexed using a skip list. The aim of this index is threefold.

Unlike hash-tables, binary trees allow for efficient prefix and range matching. Prefix matching is useful in interactive applications to provide feedback while typing such as auto-completion.
Having a table of unique literals we generate creation and destruction events (see rdf_monitor/2). These events can be used to maintain additional indexing on literals, such as‘by word’. See library(semweb/litindex).

As string literal matching is most frequently used for searching purposes, the match is executed case-insensitive and after removal of diacritics. Case matching and diacritics removal is based on Unicode character properties and independent from the current locale. Case conversion is based on the‘simple uppercase mapping’defined by Unicode and diacritic removal on the‘decomposition type’. The approach is lightweight, but somewhat simpleminded for some languages. The tables are generated for Unicode characters upto 0x7fff. For more information, please check the source-code of the mapping-table generator unicode_map.pl available in the sources of this package.

Currently the total order of literals is first based on the type of literal using the ordering numeric < string < term Numeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. Strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.

The ordered tree is used for indexed execution of literal(prefix(Prefix), Literal) as well as literal(like(Like), Literal) if Like does not start with a‘*’. Note that results of queries that use the tree index are returned in alphabetical order.