- Documentation
- Reference manual
- SWI-Prolog extensions
- The string type and its double quoted syntax
- Representing text: strings, atoms and code lists
- Predicates that operate on strings
- atom_string/2
- number_string/2
- term_string/2
- term_string/3
- string_chars/2
- string_codes/2
- string_bytes/3
- text_to_string/2
- string_length/2
- string_code/3
- get_string_code/3
- string_concat/3
- split_string/4
- sub_string/5
- atomics_to_string/2
- atomics_to_string/3
- string_upper/2
- string_lower/2
- read_string/3
- read_string/5
- open_string/2
- Why has the representation of double quoted text changed?
- Adapting code for double quoted strings
- Predicates to support adapting code for double quoted strings
- The string type and its double quoted syntax
- SWI-Prolog extensions
- Packages
- Reference manual
5.2.2 Predicates that operate on strings
Strings are manipulated using a set of predicates that mirrors the set of predicates used for manipulating atoms. In addition to the list below, string/1 performs the type check for this type and is described in section 4.5.
SWI-Prolog's string primitives are being synchronized with ECLiPSe. We expect the set of predicates documented in this section to be stable, although it might be expanded. In general, SWI-Prolog's text manipulation predicates accept any form of text as input argument - they accept anytext input. anytext comprises:
- atoms
- strings
- lists of character codes
- list of characters
- number types: integers, floating point numbers and non-integer rationals. Under the hood, these must first be formatted into a text representation according to some inner convention before they can be used.
The predicates produce the type indicated by the predicate name as output. This policy simplifies migration and writing programs that can run unmodified or with minor modifications on systems that do not support strings. Code should avoid relying on this feature as much as possible for clarity as well as to facilitate a more strict mode and/or type checking in future releases.
- atom_string(?Atom, ?String)
- Bi-directional conversion between an atom and a string. At least one of
the two arguments must be instantiated. An initially uninstantiated
variable on the “string side” is always instantiated to a
string. An initially uninstantiated variable on the “atom side” is
always instantiated to an atom. If both arguments are instantiated,
their list-of-character representations must match, but the types are
not enforced. The following all succeed:
atom_string("x",'x'). atom_string('x',"x"). atom_string(3.1415,3.1415). atom_string('3r2',3r2). atom_string(3r2,'3r2'). atom_string(6r4,3r2).
- number_string(?Number, ?String)
- Bi-directional conversion between a number and a string. At least one of
the two arguments must be instantiated. Besides the type used to
represent the text, this predicate differs in several ways from its ISO
cousin:170Note that SWI-Prolog's
syntax for numbers is not ISO compatible either.
- If String does not represent a number, the predicate fails rather than throwing a syntax error exception.
- Leading white space and Prolog comments are not allowed.
- Numbers may start with
or+
.-
- It is not allowed to have white space between a leading
or+
and the number.-
- Floating point numbers in exponential notation do not require a dot
before exponent, i.e.,
"1e10"
is a valid number.
Unlike other predicates of this family, if instantiated, String cannot be an atom.
The corresponding‘atom-handling’predicate is atom_number/2, with reversed argument order.
- term_string(?Term, ?String)
- Bi-directional conversion between a term and a string. If String
is instantiated, it is parsed and the result is unified with Term.
Otherwise Term is‘written’using the option
quoted(true)
and the result is converted to String. - term_string(?Term, ?String, +Options)
- As term_string/2,
passing Options to either read_term/2
or write_term/2.
For example:
?- term_string(Term, 'a(A)', [variable_names(VNames)]). Term = a(_9674), VNames = ['A'=_9674].
- string_chars(?String, ?Chars)
- Bi-directional conversion between a string and a list of characters. At
least one of the two arguments must be instantiated.
See also: atom_chars/2.
- string_codes(?String, ?Codes)
- Bi-directional conversion between a string and a list of character codes. At least one of the two arguments must be instantiated.
- string_bytes(?String, ?Bytes, +Encoding)
- True when the (Unicode) String is represented by Bytes
in
Encoding. If String is instantiated it may
represent text as an atom, string, list of character codes or list or
characters.
Bytes is always a list of integers in the range 0 ...
255. At least one of String or Bytes must be
instantiated. This predicate is notably intended as an intermediate step
to perform byte oriented operations on text. Examples are (base64)
encoding, encryption, computing a (secure) hash, etc. Encoding
is typically
utf8
. All valid stream encodings except forwchar_t
are supported. See section 2.18.1. Note that this translation is only provided for strings. Creating an atom from bytes requires atom_string/2.171Strings are an efficient intermediate and this conversion is needed only in some uncommon scenarios. - [det]text_to_string(+Text, -String)
- Converts Text to a string. Text is anytext
excluding the number types. When running in
--traditional mode,
'[]'
is ambiguous and interpreted as an empty string. - string_length(+String, -Length)
- Unify Length with the number of characters in String. This predicate is functionally equivalent to atom_length/2 and also accepts anytext as its first argument. Numeric types are formatted into strings before the length of their string representation is determined.172This behavior should be considered deprecated See also write_length/3.
- string_code(?Index, +String, ?Code)
- True when Code represents the character at the 1-based Index
position in String. If Index is unbound the string
is scanned from index 1. Raises a domain error if Index is
negative. Fails silently if Index is zero or greater than the
length of
String. The mode
string_code(-,+,+)
is deterministic if the searched-for Code appears only once in String. See also sub_string/5. - get_string_code(+Index, +String, -Code)
- Semi-deterministic version of string_code/3.
In addition, this version provides strict range checking, throwing a
domain error if Index is less than 1 or greater than the
length of String. ECLiPSe provides this to support
String[Index]
notation. - string_concat(?String1, ?String2, ?String3)
- Similar to atom_concat/3, but the unbound argument will be unified with a string object rather than an atom. Also, if both String1 and String2 are unbound and String3 is bound to text, it breaks String3, unifying the start with String1 and the end with String2 as append does with lists. Note that this is not particularly fast on long strings, as for each redo the system has to create two entirely new strings, while the list equivalent only creates a single new list-cell and moves some pointers around.
- [det]split_string(+String, +SepChars, +PadChars, -SubStrings)
- Break String into SubStrings. The SepChars
argument provides the characters that act as separators and thus the
length of
SubStrings is one more than the number of separators found if
SepChars and PadChars do not have common
characters. If
SepChars and PadChars are equal, sequences of
adjacent separators act as a single separator. Leading and trailing
characters for each substring that appear in PadChars are
removed from the substring. The input arguments can be either atoms,
strings or char/code lists. Compatible with ECLiPSe. Below are some
examples:
A simple split wherever there is a‘.’:
?- split_string("a.b.c.d", ".", "", L). L = ["a", "b", "c", "d"].
Consider sequences of separators as a single one:
?- split_string("/home//jan///nice/path", "/", "/", L). L = ["home", "jan", "nice", "path"].
Split and remove white space:
?- split_string("SWI-Prolog, 7.0", ",", " ", L). L = ["SWI-Prolog", "7.0"].
Only remove leading and trailing white space (trim the string):
?- split_string(" SWI-Prolog ", "", "\s\t\n", L). L = ["SWI-Prolog"].
In the typical use cases, SepChars either does not overlap PadChars or is equivalent to handle multiple adjacent separators as a single (often white space). The behaviour with partially overlapping sets of padding and separators should be considered undefined. See also read_string/5.
- sub_string(+String, ?Before, ?Length, ?After, ?SubString)
- This predicate is functionally equivalent to sub_atom/5,
but operates on strings. Note that this implies the string input
arguments can be either strings or atoms. If SubString is
unbound (output) it is unified with a string. The following example
splits a string of the form
<name>=<value> into the name part (an
atom) and the value (a string).
name_value(String, Name, Value) :- sub_string(String, Before, _, After, "="), !, sub_atom(String, 0, Before, _, Name), sub_string(String, _, After, 0, Value).
The next example defines a predicate that inserts a value at a position. See sub_atom/5 for more examples.
string_insert(Str, Val, At, NewStr) :- sub_string(Str, 0, At, A1, S1), sub_string(Str, At, A1, _, S2), atomics_to_string([S1,Val,S2], NewStr).
- atomics_to_string(+List, -String)
- List is a list of strings, atoms, or number types. Succeeds
if String can be unified with the concatenated elements of List.
Equivalent to
atomics_to_string(List,’’, String)
. - atomics_to_string(+List, +Separator, -String)
- Creates a string just like atomics_to_string/2,
but inserts
Separator between each pair of inputs. For example:
?- atomics_to_string([gnu, "gnat", 1], ', ', A). A = "gnu, gnat, 1"
- string_upper(+String, -UpperCase)
- Convert String to upper case and unify the result with UpperCase.
- string_lower(+String, LowerCase)
- Convert String to lower case and unify the result with LowerCase.
- read_string(+Stream, ?Length, -String)
- Read at most Length characters from Stream and
return them in the string String. If Length is
unbound, Stream is read to the end and Length is
unified with the number of characters read. The number of bytes
read depends on the encoding of Stream (see section
2.18.1). This predicate may be used to read a sequence of bytes when
the stream is in
octet
encoding. See open/4 and set_stream/2 for controlling the encoding. - read_string(+Stream, +SepChars, +PadChars, -Sep, -String)
- Read a string from Stream, providing functionality similar to
split_string/4.
The predicate performs the following steps:
- Skip all characters that match PadChars
- Read up to a character that matches SepChars or end of file
- Discard trailing characters that match PadChars from the collected input
- Unify String with a string created from the input and Sep with the code of the separator character read. If input was terminated by the end of the input, Sep is unified with -1.
The predicate read_string/5 called repeatedly on an input until Sep is -1 (end of file) is equivalent to reading the entire file into a string and calling split_string/4, provided that SepChars and PadChars are not partially overlapping.173Behaviour that is fully compatible would require unlimited look-ahead. Below are some examples:
Read a line:
read_string(Input, "\n", "\r", Sep, String)
Read a line, stripping leading and trailing white space:
read_string(Input, "\n", "\r\t ", Sep, String)
Read up to‘
,
’or‘)
’, unifying Sep with0',
i.e. Unicode 44, or0')
, i.e. Unicode 41:read_string(Input, ",)", "\t ", Sep, String)
- open_string(+String, -Stream)
- True when Stream is an input stream that accesses the content
of
String. String can be any text representation,
i.e., string, atom, list of codes or list of characters. The created Stream
has the
reposition
property (see stream_property/2). Note that the internal encoding of the data is either ISO Latin 1 or UTF-8.