2.20.2 Comparing 32- and 64-bits Prolog
Most of Prolog's memory usage consists of pointers. This indicates the primary drawback: Prolog memory usage almost doubles when using the 64-bit addressing model. Using more memory means copying more data between CPU and main memory, slowing down the system.
What then are the advantages? First of all, SWI-Prolog's addressing of the Prolog stacks does not cover the whole address space due to the use of type tag bits and garbage collection flags. On 32-bit hardware the stacks are limited to 128 MB each. This tends to be too low for demanding applications on modern hardware. On 64-bit hardware the limit is 2^32 times higher, exceeding the addressing capabilities of today's CPUs and operating systems. This implies Prolog can be started with stack sizes that use the full capabilities of your hardware.
Multi-threaded applications profit much more because every thread has its own set of stacks. The Prolog stacks start small and are dynamically expanded (see section 2.19.1). The C stack is also dynamically expanded, but the maximum size is reserved when a thread is started. Using 100 threads at the maximum default C stack of 8Mb (Linux) costs 800Mb virtual memory!41C-recursion over Prolog data structures is removed from most of SWI-Prolog. When removed from all predicates it will often be possible to use lower limits in threads. See http://www.swi-prolog.org/Devel/CStack.html
The implications of theoretical performance loss due to increased memory bandwidth implied by exchanging wider pointers depend on the design of the hardware. We only have data for the popular IA32 vs. AMD64 architectures. Here, it appears that the loss is compensated for by an instruction set that has been optimized for modern programming. In particular, the AMD64 has more registers and the relative addressing capabilities have been improved. Where we see a 10% performance degradation when placing the SWI-Prolog kernel in a Unix shared object, we cannot find a measurable difference on AMD64.