Tarantool User Guide, version 1.4.9-65-gc734b4d


1. Preface
Tarantool: an overview
Conventions
Reporting bugs
2. Getting started
3. Data model and data persistence
Dynamic data model
Data persistence
4. Language reference
Data manipulation
Memcached protocol
Administrative console
Writing stored procedures in Lua
Package box
Package box.tuple
Package box.space
Package box.index
Package box.fiber
Package box.session
Package box.ipc — inter procedure communication
Package box.socket — TCP and UDP sockets
Packages box.cfg, box.info, box.slab and box.stat: server introspection
Limitation of stored procedures
Defining triggers in Lua
Triggers on connect and disconnect
5. Replication
Replication architecture
Setting up the master
Setting up a replica
Recovering from a degraded state
6. Server administration
Server signal handling
System-specific administration notes
Debian GNU/Linux and Ubuntu
Fedora, RHEL, CentOS
FreeBSD
Mac OS X
7. Configuration reference
Command line options
The option file
8. Connectors
C
node.js
Perl
PHP
Python
Ruby
A. Server process titles
B. List of error codes

Chapter 1. Preface

Tarantool: an overview

Tarantool is an in-memory NoSQL database. The code is available for free under the terms of BSD license. Supported platforms are GNU/Linux, Mac OS and FreeBSD.

The server maintains all its data in random-access memory, and therefore has very low read latency. At the same time, a copy of the data is kept on non-volatile storage (a disk drive), and inserts and updates are performed atomically.

To ensure atomicity, consistency and crash-safety of the persistent copy, a write-ahead log (WAL) is maintained, and each change is recorded in the WAL before it is considered complete. The logging subsystem supports group commit.

If update and delete rate is high, a constantly growing write-ahead log file (or files) can pose a disk space problem, and significantly increase time necessary to restart from disk. A simple solution is employed: the server can be requested to save a concise snapshot of its current data. The underlying operating system's copy-on-write feature is employed to take the snapshot in a quick, resource-savvy and non-blocking manner. The copy-on-write technique guarantees that snapshotting has minimal impact on server performance.

Tarantool is lock-free. Instead of the operating system's concurrency primitives, such as threads and mutexes, Tarantool uses a cooperative multitasking environment to simultaneously operate on thousands of connections. A fixed number of independent execution threads within the server do not share state, but exchange data using low overhead message queues. While this approach limits server scalability to a few CPU cores, it removes competition for the memory bus and sets the scalability limit to the top of memory and network throughput. CPU utilization of a typical highly-loaded Tarantool server is under 10%.

Unlike most of NoSQL databases, Tarantool supports primary, secondary keys, multi-part keys, HASH, TREE and BITSET index types.

The key feature of Tarantool is support for Lua stored procedures, which can access and modify data atomically. Procedures can be created, modified and dropped at runtime.

Use of Lua as an extension language does not end with stored procedures: Lua programs can be used during startup, to define triggers and background tasks, interact with networked peers. Unlike popular application development frameworks based on "reactor" pattern, networking in server-side Lua is sequential, yet very efficient, as is built on top of the cooperating multitasking environment used by the server itself.

Extended with Lua, Tarantool typically replaces not one but a few existing components with a single well-performing system, changing and simplifying complex multi-tier Web application architectures.

Tarantool supports replication. Replicas may run locally or on a remote host. Tarantool replication is asynchronous and does not block writes to the master. When or if the master becomes unavailable, the replica can be switched to assume the role of the master without server restart.

The software is production-ready. Tarantool has been created and is actively used at Mail.Ru, one of the leading Russian web content providers. At Mail.Ru, the software serves the hottest data, such as online users and their sessions, online application properties, mapping between users and their serving shards, and so on.

Outside Mail.Ru the software is used by a growing number of projects in online gaming, digital marketing, social media industries. While product development is sponsored by Mail.Ru, the roadmap, bugs database and the development process are fully open. The software incorporates patches from dozens of community contributors, and most of the programming language drivers are written and maintained by the community.

Conventions

This manual is written in DocBook 5 XML markup language and is using the standard DocBook XSL formatting conventions:

UNIX shell command input is prefixed with '$ ' and is formatted using a fixed-width font:

$ tarantool_box --background
    

The same formatting style is used for file names: /path/to/var/dir.

Text that represents user input is formatted in boldface:

      $ your input here
    

Within user input, replaceable items are printed in italics:

      $ tarantool_box --option
    

Reporting bugs

Please report bugs in Tarantool at http://bugs.launchpad.net/tarantool. You can contact developers directly on #tarantool IRC channel or via a mailing list, tarantool-developers@lists.launchpad.net.

Caution: To prevent spam, Launchpad mailing list software silently drops all mail sent from non-registered email addresses. Launchpad registration also allows you to report bugs and create feature requests. You can always check whether or not your mail has been delivered to the mailing list in the public list archive, https://lists.launchpad.net/tarantool-developers.

Chapter 2. Getting started

This chapter describes installation procedures, the contents of binary and source package download, explains how to start, stop the server or connect to it with a command line client.

To install the latest stable version of Tarantool, check out instructions on the project download page. For many distributions the server and the command line client are available from the distribution's upstream. Local repositories for popular Linux distributions, as well as a FreeBSD port and a Mac OS X homebrew recipe are also available online. The online archive is automatically refreshed on each push into the stable branch of the server. Please follow distribution-specific instructions to find out how to manage Tarantool instances on your operating system.

The easiest way to try Tarantool without installing it is by downloading a binary or source package tarball. Binary packages use tarantool-<version>-<OS>-<machine>.tar.gz naming scheme. Source packages are named simply tarantool-<version>-src.tar.gz. You can find out the canonical name of your operating system and machine type with uname -o and uname -m respectively. The programs included into the binary tarball are linked statically to not have any external dependencies. Besides the downloaded package, you will need the following software:

  • Python 2.6 or newer, with PyYAML, python-daemon and python-pexpect modules,

    Note

    Python is used to run regression tests. If you do not plan to run tests you may skip this step.

To build Tarantool from source, additionally:

  • CMake 2.6 or newer,

  • GCC 4.4 or newer, with gcc-objc (ObjectiveC) language frontend or Clang 3.1 or newer,

  • libreadline-dev, when compiling the command line client.

After download, unpack the binary package, a new directory will be created:

  $ tar zxvf package-name.tar.gz

To remove the package, simply drop the directory containing the unpacked files.

The binary download contains subdirectories: bin, doc, man share, var, etc. The server, by default, looks for its configuration file in the current working directory and etc/. There is a correct and minimalistic tarantool.cfg in directory etc/, thus the server can be started right from the top level package directory:

  $ cd package-name && ./bin/tarantool_box
  ...
  1301424353.416 3459 104/33013/acceptor _ I> I am primary
  1301424353.416 3459 1/sched _ I> initialized

To stop the server, simply press Ctrl+C.

Once the server is started, you can connect to it and issue queries using a command line client:

  $ cd package-name && ./bin/tarantool
  localhost>  show info
  
  ---
  info:
    version: "1.4.5"
    uptime: 548
    pid: 3459
    logger_pid: 3461
    snapshot_pid: 0
    lsn: 1
    recovery_lag: 0.000
    recovery_last_update: 0.000
    status: primary
    config: "/home/kostja/tarantool/etc/tarantool.cfg"
    

Compiling from source

To use a source package, a few additional steps are necessary: configuration and build. The easiest way to configure a source directory with CMake is by requesting an in-source build:

  $ cd package-name && cmake . -DENABLE_CLIENT=true

Upon successful configuration, CMake prints the status of optional features:

  -- *** The following options are on in this configuration: ***
  -- ENABLE_CLIENT: true
  -- ENABLE_GCOV: ON
  -- ENABLE_TRACE: ON
  -- ENABLE_BACKTRACE: ON
  -- Backtrace is with symbol resolve: True
  -- ENABLE_STATIC: OFF
  --
  -- Configuring done
  -- Generating done

Now type 'make' to build Tarantool.

  $ make
  ...
  Linking C executable tarantool_box
  [100%] Built target tarantool_box

A complete instruction for building from source is located in the source tree, file README.md. There are also specialized build instructions for CetnOS, FreeBSD, OS X.

When make is complete, the server can be started right out of the in-source build. Use Tarantool regression testing framework:

$ ./test/run --start-and-exit

It will create necessary files in directory ./test/var/, and start the server with minimal configuration.

The command line client is located in client/tarantool:

$ ./client/tarantool/tarantool

Chapter 3. Data model and data persistence

This chapter describes how Tarantool stores values and what operations with data it supports.

Dynamic data model

Tarantool data is organized in tuples. Tuple length is varying: a tuple can contain any number of fields. A field can be either numeric — 32- or 64- bit unsigned integer, or binary string — a sequence of octets. Tuples are stored and retrieved by means of indexing. An index can cover one or multiple fields, in any order. Fields included into the first index are always assumed to be the identifying (unique) key. The remaining fields make up a value, associated with the key.

Apart from the primary key, it is possible to define secondary indexes on other tuple fields. A secondary index does not have to be unique and can cover multiple fields. The total number of fields in a tuple must be at least equal to the ordinal number of the last field participating in any index.

Supported index types are HASH, TREE and BITSET. HASH index is the fastest one, with smallest memory footprint. TREE index, in addition to key/value look ups, support partial key lookups, key-part lookups for multipart keys and ordered retrieval. BITSET indexes, while can serve as a standard unique key, are best suited for bit-pattern look-ups, i.e. search for objects satisfying multiple properties.

Tuple sets together with defined indexes form spaces. The basic server operations are insert, replace, delete, update, which modify a tuple in a space, and select, which retrieves tuples from a space. All operations that modify data require the primary key for look up. Select, however, may use any index.

A Lua stored procedure can combine multiple trivial commands, as well as access data using index iterators. Indeed, the iterators provide full access to the power of indexes, enabling index-type specific access, such as boolean expression evaluation for BITMAP indexes, or reverse range retrieval for TREEs.

All operations in Tarantool are atomic and durable: they are either executed and written to the write ahead log or rolled back. A stored procedure, containing a combination of basic operations, holds a consistent view of the database as long as it doesn't incur writes to the write ahead log or to network. In particular, a select followed by an update or delete is atomic.

While the subject of each data changing command is a single tuple, an update may modify one or more tuple fields, as well as add or delete fields, all in one command. It thus provides an alternative way to achieve multi-operation atomicity.

Currently, entire server schema must be specified in the configuration file. The schema contains all spaces and indexes. A server started with a configuration file that doesn't match contents of its data directory will most likely crash, but may also behave in a non-defined way. It is, however, possible to stop the server, add new spaces and indexes to the schema or temporarily disable existing spaces and indexes, and then restart the server.

Schema objects, such as spaces and indexes, are referred to by a numeric id. For example, to insert a tuple, it is necessary to provide id of the destination space; to select a tuple, one must provide the identifying key, space id and index id of the index used for lookup. Many Tarantool drivers provide a local aliasing scheme, mapping numeric identifiers to names. Use of numeric identifiers on the wire protocol makes it lightweight and easy to parse.

The configuration file shipped with the binary package defines only one space with id 0. It has no keys other than the primary. The primary key numeric id is also 0. Tarantool command line client supports a small subset of SQL, and it'll be used to demonstrate supported data manipulation commands:

  localhost> insert into t0 values (1)
  Insert OK, 1 row affected
  localhost> select * from t0 where k0=1
  Found 1 tuple:
  [1]
  localhost> insert into t0 values ('hello')
  An error occurred: ER_ILLEGAL_PARAMS, 'Illegal parameters'
  localhost> replace into t0 values (1, 'hello')
  Replace OK, 1 row affected
  localhost> select * from t0 where k0=1 
  Found 1 tuple:
  [1, 'hello']
  localhost> update t0 set k1='world' where k0=1
  Update OK, 1 row affected
  localhost> select * from t0 where k0=1
  Found 1 tuple:
  [1, 'world']
  localhost> delete from t0 where k0=1
  Delete OK, 1 row affected
  localhost> select * from t0 where k0=1
  No match

Please observe:

  • Since all object identifiers are numeric, Tarantool SQL subset expects identifiers that end with a number (t0, k0, k1, and so on): this number is used to refer to the actual space or index.

  • All commands actually tell the server which key/value pair to change. In SQL terms, that means that all DML statements must be qualified with the primary key. WHERE clause is, therefore, mandatory.

  • REPLACE replaces data when a tuple with given primary key already exists. Such replace can insert a tuple with a different number of fields.

Additional examples of SQL statements can be found in Tarantool regression test suite. A complete grammar of supported SQL is provided in Language reference chapter.

Since not all Tarantool operations can be expressed in SQL, to gain complete access to data manipulation functionality one must use a Perl, Python, Ruby or other programming language connector. The client/server protocol is open and documented: an annotated BNF can be found in the source tree, file doc/protocol.txt.

Data persistence

To maintain data persistence, Tarantool writes each data change request (INSERT, UPDATE, DELETE) into a write-ahead log. WAL files have extension .xlog and are stored in wal_dir. A new WAL file is created for every rows_per_wal records. Each INSERT, UPDATE or DELETE gets assigned a continuously growing 64-bit log sequence number. The name of the log file is based on the log sequence number of the first record this file contains.

Apart from a log sequence number and the data change request (its format is the same as in the binary protocol and is described in doc/box-protocol.txt), each WAL record contains a checksum and a UNIX time stamp.

Tarantool processes requests atomically: a change is either accepted and recorded in the WAL, or discarded completely. Let's clarify how this happens, using REPLACE command as an example:

  1. The server attempts to locate the original tuple by primary key. If found, a reference to the tuple is retained for later use.

  2. The new tuple is then validated. If it violates a unique-key constraint, misses an indexed field, or an index-field type does not match the type of the index, the change is aborted.

  3. The new tuple replaces the old tuple in all existing indexes.

  4. A message is sent to WAL writer running in a separate thread, requesting that the change is recorded in the WAL. The server switches to work on the next request until the write is acknowledged.

  5. On success, a confirmation is sent to the client. Upon failure, a rollback procedure is begun. During the rollback procedure, the transaction processor rolls back all changes to the database which occurred after the first failed change, from latest to oldest, up to the first failed change. All rolled back requests are aborted with ER_WAL_IO error. No new change is applied while rollback is in progress. When the rollback procedure is finished, the servers restarts the processing pipeline.

One advantage of the described algorithm is that complete request pipelining is achieved, even for requests on the same value of the primary key. As a result, database performance doesn't degrade even if all requests touch upon the same key in the same space.

The transaction processor and the WAL writer threads communicate using asynchronous (yet reliable) messaging; the transaction processor thread, not being blocked on WAL tasks, continues to handle requests quickly even at high volumes of disk I/O. A response to a request is sent as soon as it is ready, even if there were earlier incomplete requests on the same connection. In particular, SELECT performance, even for SELECTs running on a connection packed with UPDATEs and DELETEs, remains unaffected by disk load.

WAL writer employs a number of durability modes, as defined in configuration variable wal_mode. It is possible to turn the write ahead log completely off, by setting wal_mode to none. Even without the write ahead log it's still possible to take a persistent copy of the entire data set with SAVE SNAPSHOT.

Chapter 4. Language reference

This chapter provides a reference of Tarantool data operations and administrative commands.

Digression: data and administrative ports

Unlike many other key/value servers, Tarantool uses different TCP ports and client/server protocols for data manipulation and administrative statements. During start up, the server can connect to up to five TCP ports:

  • Read/write data port, to handle INSERTs, UPDATEs, DELETEs, SELECTs and CALLs. This port speaks the native Tarantool protocol, and provides full data access.

    The default value of the port is 33013, as defined in primary_port configuration option.

  • Read only port, which only accepts SELECTs and CALLs, default port number 33014, as defined in secondary_port configuration option.

  • Administrative port, which defaults to 33015, and is defined in admin_port configuration option.

  • Replication port (see replication_port), by default set to 33016, used to send updates to replicas. Replication is optional, and if this port is not set in the option file, the corresponding server process is not started.

  • Memcached port. Optional, read-write data port that speaks Memcached text protocol. This port is off by default.

In absence of authentication, this approach allows system administrators to restrict access to read/write or administrative ports. The client, however, has to be aware of the separation, and tarantool command line client automatically selects the correct port for you with help of a simple regular expression. SELECTs, UPDATEs, INSERTs, DELETEs and CALLs are sent to the primary port. SHOW, RELOAD, SAVE and other statements are sent to the administrative port.

Data manipulation

Five basic request types are supported: INSERT, UPDATE, DELETE, SELECT and CALL. All requests, including INSERT, UPDATE and DELETE may return data. A SELECT can be requested to limit the number of returned tuples. This is useful when searching in a non-unique index or when a special wildcard (zero-length string) value is supplied as search key or a key part.

UPDATE statement supports operations on fields — assignment, arithmetic operations (the field must be numeric), cutting and pasting fragments of a field, — as well as operations on a tuple: push and pop of a field at the tail of a tuple, deletion and insertion of a field. Multiple operations can be combined into a single update, and in this case they are performed atomically. Each operation expects field number as its first argument. When a sequence of changes is present, field identifier in each operation is assumed to be relative to the most recent state of the tuple, i.e. as if all previous operations in a multi-operation update have already been applied. In other words, it's always safe to merge multiple UPDATE statements into a single one, with no change in semantics.

Tarantool protocol was designed with focus on asynchronous I/O and easy integration with proxies. Each client request starts with a 12-byte binary header, containing three fields: request type, length, and a numeric id.

The mandatory length, present in request header simplifies client or proxy I/O. A response to a request is sent to the client as soon as it is ready. It always carries in its header the same type and id as in the request. The id makes it possible to match a request to a response, even if the latter arrived out of order.

Request type defines the format of the payload. INSERTs, UPDATEs and DELETEs can only be made by the primary key, so an index id and a key (possibly multipart) are always present in these requests. SELECTs can use secondary keys. UPDATE only needs to list the fields that are actually changed. With this one exception, all commands operate on whole tuple(s).

Unless implementing a client driver, one needn't concern oneself with the complications of the binary protocol. Language-specific drivers provide a friendly way to store domain language data structures in Tarantool, and the command line client supports a subset of standard SQL. A complete description of both, the binary protocol and the supported SQL, is maintained in annotated Backus-Naur form in the source tree: please see doc/box-protocol.txt and doc/sql.txt respectively.

Memcached protocol

If full access to Tarantool functionality is not needed, or there is no readily available connector for the programming language in use, any existing client driver for Memcached will make do as a Tarantool connector. To enable text Memcached protocol, turn on memcached_port in the option file. Since Memcached has no notion of spaces or secondary indexes, this port only makes it possible to access one dedicated space (see memcached_space) via its primary key. Unless tuple expiration is enabled with memcached_expire, TTL part of the message is stored but ignored.

Notice, that memcached_space is also accessible using the primary port or Lua. A common use of the Memcached port in Tarantool is when a Memcached default expiration algorithm is insufficient, and a custom Lua expiration procedure is used.

Tarantool does not support the binary protocol of Memcached. If top performance is a must, Tarantool's own binary protocol should be used.

Administrative console

The administrative console uses a simple text protocol. All commands are case-insensitive. You can connect to the administrative port using any telnet client, or a tool like rlwrap, if access to readline features is desired. Additionally, tarantool, the SQL-capable command line client, understands all administrative statements and automatically directs them to the administrative port. The server response to an administrative command, even though it is always in plain text, can be quite complex. It is encoded using YAML markup to simplify automated parsing.

To learn about all supported administrative commands, you can type help in the administrative console. A reference description also follows below:

save snapshot

Take a snapshot of all data and store it in snap_dir/<latest-lsn>.snap. To take a snapshot, Tarantool forks and quickly munmap(2)s all memory except the area where tuples are stored. Since all modern operating systems support virtual memory copy-on-write, this effectively creates a consistent snapshot of all tuples in the child process, which is then written to disk tuple by tuple. Since a snapshot is written sequentially, you can expect a very high write performance (averaging to 80MB/second on modern disks), which means an average database instance gets saved in a matter of minutes. Note, that as long as there are any changes to the parent memory through concurrent updates, there are going to be page splits, and therefore you need to have some extra free memory to run this command. 15%-30% of slab_alloc_arena is, on average, sufficient. This statement waits until a snapshot is taken and returns operation result. For example:

localhost> show info
---
info:
  version: "1.4.6"
  lsn: 843301
...
localhost> save snapshot
---
ok
...
localhost> save snapshot
---
fail: can't save snapshot, errno 17 (File exists)
...

Taking a snapshot does not cause the server to start a new write ahead log. Once a snapshot is taken, old WALs can be deleted as long as all replicas are up to date. But the WAL which was current at the time save snapshot started must be kept for recovery, since it still contains log records written after the start of save snapshot.

An alternative way to save a snapshot is to send the server SIGUSR1 UNIX signal. While this approach could be handy, it is not recommended for use in automation: a signal provides no way to find out whether the snapshot was taken successfully or not.

reload configuration

Re-read the configuration file. If the file contains changes to dynamic parameters, update the runtime settings. If configuration syntax is incorrect, or a read-only parameter is changed, produce an error and do nothing.

show configuration

Show the current settings. Displays all settings, including those that have default values and thus are not necessarily present in the configuration file.

show info

localhost> show info
---
info:
  version: "1.4.5-128-ga91962c"
  uptime: 441524
  pid: 12315
  logger_pid: 12316
  lsn: 15481913304
  recovery_lag: 0.000
  recovery_last_update: 1306964594.980
  status: primary
  config: "/usr/local/etc/tarantool.cfg"

recovery_lag holds the difference (in seconds) between the current time on the machine (wall clock time) and the time stamp of the last applied record. In replication setup, this difference can indicate the delay taking place before a change is applied to a replica.

recovery_last_update is the wall clock time of the last change recorded in the write ahead log. To convert it to human-readable time, you can use date -d@1306964594.980.

status is either "primary" or "replica/<hostname>".

show stat

Show the average number of requests per second, and the total number of requests since startup, broken down by request type: INSERT or SELECT or UPDATE or DELETE."

localhost> show stat
---
statistics:
  INSERT:        { rps:  139  , total:  48207694    }
  SELECT_LIMIT:  { rps:  0    , total:  0           }
  SELECT:        { rps:  1246 , total:  388322317   }
  UPDATE_FIELDS: { rps:  1874 , total:  743350520   }
  DELETE:        { rps:  147  , total:  48902544    }

show slab

Show the statistics of the slab allocator. The slab allocator is the main allocator used to store tuples. This can be used to monitor the total memory use and memory fragmentation.

items_used contains the % of slab_alloc_arena already used to store tuples.

arena_used contains the % of slab_alloc_arena that is already distributed to the slab allocator.

show palloc

A pool allocator is used for temporary memory, when serving client requests. Every fiber has its own temporary pool. Shows the current state of pools of all fibers.

save coredump

Fork and dump a core. Since Tarantool stores all tuples in memory, it can take some time. Mainly useful for debugging.

show fiber

Show all running fibers, with their stack. Mainly useful for debugging.

lua ...

Execute a chunk of Lua code. This can be used to define, invoke, debug and drop stored procedures, inspect server environment, perform automated administrative tasks.

Writing stored procedures in Lua

Lua is a light-weight, multi-paradigm, embeddable language. Stored procedures in Lua can be used to implement data manipulation patterns or data structures. A server-side procedure written in Lua can select and modify data, access configuration and perform administrative tasks. It is possible to dynamically define, invoke, alter and drop Lua procedures. Lua procedures can run in the background and perform administrative tasks, such as data expiration or re-sharding.

Tarantool uses LuaJIT just-in-time Lua compiler and virtual machine. Apart from increased performance, this provides such features as bitwise operations and 64-bit integer arithmetics.

Procedures can be invoked from the administrative console and using the binary protocol, for example:

localhost> lua function f1() return 'hello' end
---
...
localhost> call f1()
Found 1 tuple:
['hello']

In the language of the administrative console LUA ... evaluates an arbitrary Lua chunk. CALL is the SQL standard statement, so its syntax was adopted by Tarantool command line client to invoke the CALL command of the binary protocol.

In the example above, a Lua procedure is first defined using the text protocol of the administrative port, and then invoked using the Tarantool client-side SQL parser plus the binary protocol on the primary_port. Since it's possible to execute any Lua chunk in the administrative console, the newly created function f1() can be called there too:

localhost> lua f1()
---
 - hello
...
localhost> lua 1+2
---
 - 3
...
localhost> lua "hello".." world"
---
 - hello world
...

Lua procedures could also be called at the time of initialization using a dedicated init.lua script, located in work_dir. An example of such a script is given below:

    
-- Importing expirationd module
dofile("expirationd.lua")

function is_expired(args, tuple)
   if tuple == nil then
       return true
   end

   if #tuple <= args.field_no then
       return true
   end

   field = tuple[args.field_no]
   if field == nil or #field ~= 4 then
       return true
   end

   local current_time = os.time()
   local tuple_ts = box.unpack("i", field)
   return current_time >= tuple_ts + args.ttl
end
function purge(args, tuple)
    box.space[0]:delete(tuple[0])
end

-- Run task
expirationd.run_task("exprd space 0", 0, is_expired, purge,
                    { field_no = 1, ttl = 30 * 60 })

    

The initialization script can select and modify data. However, if the server is a running replica, data change requests from the start script fail just the same way they would fail if were sent from a remote client.

Another common task to perform in the initialization script is to start background fibers for data expiration, re-sharding, or communication with networked peers.

Finally, the script can be used to define Lua triggers invoked on various events within the system.

There is a single global instance of the Lua interpreter, which is shared across all connections. Anything prefixed with lua on the administrative console is sent directly to this interpreter. Any change of the interpreter state is immediately available to all client connections.

Each connection, however, is using its own Lua coroutine — a mechanism akin to Tarantool fibers. A coroutine has an own execution stack and a Lua closure — set of local variables and definitions.

The interpreter environment is not restricted when init.lua is loaded. But before the server starts accepting requests, the standard Lua APIs, such as for file I/O, process control and module management are unset, to avoid possible trivial security attacks.

In the binary protocol, it's only possible to invoke existing procedures, but not define or alter them. CALL request packet contains CALL command code (22), the name of a procedure to be called, and a tuple for procedure arguments. Currently, Tarantool tuples are type-agnostic, thus each field of the tuple is passed into the procedure as an argument of type string. For example:

kostja@atlas:~$ cat arg.lua
function f1(a)
    local s = a
    if type(a) == 'string' then
        s = ''
        for i=1, #a, 1 do
            s = s..string.format('0x%x ', string.byte(a, i))
        end
    end
    return type(a), s
end
kostja@atlas:~$ tarantool
localhost> lua dofile('arg.lua')
---
...
localhost> lua f1('1234')
---
 - string
 - 0x31 0x32 0x33 0x34
...
localhost> call f1('1234')
Call OK, 2 rows affected
['string']
['0x31 0x32 0x33 0x34 ']
localhost> lua f1(1234)
---
 - number
 - 1234
...
localhost> call f1(1234)
Call OK, 2 rows affected
['string']
['0xd2 0x4 0x0 0x0 ']

In the above example, the way the procedure receives its argument is identical in two protocols, when the argument is a string. A numeric field, however, when submitted via the binary protocol, is seen by the procedure as a 4-byte blob, not as a Lua number type.

In addition to conventional method invocation, Lua provides object-oriented syntax. Access to the latter is available on the administrative console only:

localhost> lua box.space[0]:truncate()
---
...
localhost> call box.space[0]:truncate()
error: 1:15 expected '('

Since it's impossible to invoke object methods from the binary protocol, the object-oriented syntax is often used to restrict certain operations to be used by a system administrator only.

Every value, returned from a stored function by means of return clause, is converted to a Tarantool tuple. Tuples are returned as such, in binary form; a Lua scalar, such as a string or an integer, is converted to a tuple with only one field. When the returned value is a Lua table, the resulting tuple contains only table values, but not keys.

When a function in Lua terminates with an error, the error is sent to the client as ER_PROC_LUA return code, with the original error message preserved. Similarly, an error which has occurred inside Tarantool (observed on the client as an error code), when happens during execution of a Lua procedure, produces a genuine Lua error:

localhost> lua function f1() error("oops") end
---
...
localhost> call f1()
Call ERROR, Lua error: [string "function f1() error("oops") end"]:1: oops (ER_PROC_LUA)
localhost> call box.insert('99', 1, 'test')
Call ERROR, Space 99 is disabled (ER_SPACE_DISABLED)
localhost> lua pcall(box.insert, 99, 1, 'test')
---
 - false
 - Space 99 is disabled
...

It's possible not only to invoke trivial Lua code, but call into Tarantool storage functionality, using box Lua library. The contents of the library can be inspected at runtime:

localhost> lua for k, v in pairs(box) do print(k, ": ", type(v)) end
---
fiber: table
space: table
cfg: table
on_reload_configuration: function
update: function
process: function
delete: function
insert: function
select: function
index: table
unpack: function
replace: function
select_range: function
pack: function
...

As is shown in the listing, box package ships:

  • high-level functions, such as process(), update(), select(), select_range(), insert(), replace(), delete(), to manipulate tuples and access spaces from Lua.

  • libraries, such as cfg, space, fiber, index, tuple, to access server configuration, create, resume and interrupt fibers, inspect contents of spaces, indexes and tuples, send and receive data over network.

Global Lua names added by Tarantool

tonumber64(value)

Convert a given string or a Lua number to a 64-bit integer. The returned value supports all arithmetic operations, but uses 64-bit integer arithmetics, rather than floating-point, arithmetics as in the built-in number type.

Example

localhost> lua tonumber64('123456789'), tonumber64(123456789)
---
 - 123456789
 - 123456789
...
localhost> lua i=tonumber64(1)
---
...
localhost> lua type(i), type(i*2),  type(i/2), i, i*2, i/2
---
 - cdata
 - cdata
 - cdata
 - 1
 - 2
 - 0
...

Package box

box.process(op, request)

Process a request passed in as a binary string. This is an entry point into the server request processor. It can be used to insert, update, select and delete tuples from within a Lua procedure.

This is a low-level API, and it expects all arguments to be packed in accordance with the binary protocol (iproto header excluded). Normally, there is no need to use box.process() directly: box.select(), box.update() and other convenience wrappers invoke box.process() with correctly packed arguments.

Parameters

op — number, any Tarantool command code, except 22 (CALL). See doc/box-protocol.txt.
request — command arguments packed in binary format.

Returns

This function returns zero or more tuples. In Lua, a tuple is represented by a userdata object of type box.tuple. If a Lua procedure is called from the administrative console, returned tuples are printed out in YAML format. When called from the binary protocol, the binary format is used.

Errors

Any server error produced by the executed command.

Please note, that since all requests from Lua enter the core through box.process(), all checks and triggers run by the core automatically apply. For example, if the server is in read-only mode, an update or delete fails. Analogously, if a system-wide "instead of" trigger is defined, it is run.

box.select(space_no, index_no, ...)

Search for a tuple or tuples in the given space. A wrapper around box.process().

Parameters

space_no — space id,
index_no — index number in the space, to be used for match
...— index key, possibly multipart.

Returns

Returns zero or more tuples.

Errors

Same as in box.process(). Any error results in a Lua exception.

Example

localhost> call box.insert(0, 'test', 'my first tuple')
Call OK, 1 rows affected
['test', 'my first tuple']
localhost> call box.select(0, 0, 'test')
Call OK, 1 rows affected
['test', 'my first tuple']
localhost> lua box.insert(5, 'testtest', 'firstname', 'lastname')
---
 - 'testtest': {'firstname', 'lastname'}
...
localhost> lua box.select(5, 1, 'firstname', 'lastname')
---
 - 'testtest': {'firstname', 'lastname'}
...

box.insert(space_no, ...)
box.select(space_no, index_no, offset, limit, ...)

Search for tuples in the given space. This is a full version of the built-in SELECT command, in which one can specify offset and limit in a multi-tuple return. The server may return multiple tuples when the index is non-unique or a partial key is used for search.

box.replace(space_no, ...)

Insert a tuple into a space. Tuple fields follow space_no. If a tuple with the same primary key already exists, box.insert() returns an error, while box.replace() replaces the existing tuple with a new one. These functions are wrappers around box.process()

Returns

Returns the inserted tuple.

box.update(space_no, key, format, ...)

Update a tuple identified by a primary key. If a key is multipart, it is passed in as a Lua table. Update arguments follow, described by format. The format and arguments are passed to box.pack() and the result is sent to box.process(). A correct format is a sequence of pairs: update operation, operation arguments. A single character of format describes either an operation which needs to take place or operation argument. A format specifier also works as a placeholder for the number of a field, which needs to be updated, or for an argument value. For example:

+p=p — add a value to one field and assign another,
:p — splice a field: start at offset, cut length bytes, and add a string.
#p — delete a field.
!p — insert a field (before the one specified).

Possible format specifiers are: + for addition, - for subtraction, & for bitwise AND, | for bitwise OR, ^ for bitwise exclusive OR (XOR), : for string splice and p for operation argument.

Returns

Returns the updated tuple.

Example

localhost> lua box.insert(0, 0, 'hello world')
---
 - 0: {'hello world'}
...
localhost> lua box.update(0, 0, '+p', 1, 1) -- add value 1 to field #1
---
error: 'Illegal parameters, numeric operation on a field with length != 4'
...
localhost> lua box.update(0, 0, '=p', 1, 1) -- assign field #1 to value 1
---
 - 0: {1}
...
localhost> lua box.update(0, 0, '+p', 1, 1)
---
 - 0: {2}
...
localhost> lua box.update(0, 2, '!p', 1, 'Bienvenue tout le monde!')
---
 - 2: {'Bienvenue tout le monde!', 'Hello world!'}
...
localhost> lua box.update(0, 2, '#p', 2, 'Bienvenue tout le monde!')
---
 - 2: {'Bienvenue tout le monde!'}
...

box.delete(space_no, ...)

Delete a tuple identified by a primary key.

Returns

Returns the deleted tuple.

Example

localhost> call box.delete(0, 'test')
Call OK, 1 rows affected
['test', 'my first tuple']
localhost> call box.delete(0, 'test')
Call OK, 0 rows affected
localhost> call box.delete(0, 'tes')
Call ERROR, Illegal parameters, key is not u32 (ER_ILLEGAL_PARAMS)

box.select_range(space_no, index_no, limit, key, ...)

Select a range of tuples, starting from offset specified by key. The key can be multipart. Limit selection with at most limit tuples. If no key is specified, start from the first key in the index.

For TREE indexes, this returns tuples in sorted order. For HASH indexes, the order of tuples is unspecified, and can change significantly if data is inserted or deleted between two calls to box.select_range(). If key is nil or unspecified, the selection starts from the start of the index. This is a simple wrapper around box.space[space_no]:select_range(index_no, ...). BITSET index does not support this call.

Example

localhost> show configuration
---
...
  space[4].cardinality: "-1"
  space[4].estimated_rows: "0"
  space[4].index[0].type: "HASH"
  space[4].index[0].unique: "true"
  space[4].index[0].key_field[0].fieldno: "0"
  space[4].index[0].key_field[0].type: "STR"
  space[4].index[1].type: "TREE"
  space[4].index[1].unique: "false"
  space[4].index[1].key_field[0].fieldno: "1"
  space[4].index[1].key_field[0].type: "STR"
...
localhost> insert into t4 values ('0', '0')
Insert OK, 1 rows affected
localhost> insert into t4 values ('1', '1')
Insert OK, 1 rows affected
localhost> insert into t4 values ('2', '2')
Insert OK, 1 rows affected
localhost> insert into t4 values ('3', '3')
Insert OK, 1 rows affected
ocalhost> lua box.select_range(4, 0, 10)
---
 - '3': {'3'}
 - '0': {'0'}
 - '1': {'1'}
 - '2': {'2'}
...
localhost> lua box.select_range(4, 1, 10)
---
 - '0': {'0'}
 - '1': {'1'}
 - '2': {'2'}
 - '3': {'3'}
...
localhost> lua box.select_range(4, 1, 2)
---
 - '0': {'0'}
 - '1': {'1'}
...
localhost> lua box.select_range(4, 1, 2, '1')
---
 - '1': {'1'}
 - '2': {'2'}
...

box.select_reverse_range(space_no, index_no, limit, key, ...)

Select a reverse range of tuples, starting from the offset specified by key. The key can be multipart. Limit selection with at most limit tuples. If no key is specified, start from the last key in the index.

For TREE indexes, this returns tuples in sorted order. Other index types do not support this call. If key is nil or unspecified, the selection starts from the end of the index.

Example

localhost> show configuration
---
...
  space[4].cardinality: "-1"
  space[4].estimated_rows: "0"
  space[4].index[0].type: "HASH"
  space[4].index[0].unique: "true"
  space[4].index[0].key_field[0].fieldno: "0"
  space[4].index[0].key_field[0].type: "STR"
  space[4].index[1].type: "TREE"
  space[4].index[1].unique: "false"
  space[4].index[1].key_field[0].fieldno: "1"
  space[4].index[1].key_field[0].type: "STR"
...
localhost> insert into t4 values ('0', '0')
Insert OK, 1 rows affected
localhost> insert into t4 values ('1', '1')
Insert OK, 1 rows affected
localhost> insert into t4 values ('2', '2')
Insert OK, 1 rows affected
localhost> insert into t4 values ('3', '3')
Insert OK, 1 rows affected
localhost> lua box.select_reverse_range(4, 0, 10)
---
 error: 'Illegal parameters, hash iterator is forward only
...
localhost> lua box.select_reverse_range(4, 1, 10)
---
 - '3': {'3'}
 - '2': {'2'}
 - '1': {'1'}
 - '0': {'0'}
...
localhost> lua box.select_reverse_range(4, 1, 2)
---
 - '3': {'3'}
 - '2': {'2'}
...
localhost> lua box.select_reverse_range(4, 1, 2, '1')
---
 - '1': {'1'}
 - '0': {'0'}
...

box.pack(format, ...)

To use Tarantool binary protocol primitives from Lua, it's necessary to convert Lua variables to binary format. This helper function is prototyped after Perl 'pack'. It takes a format and a list of arguments, and returns a binary string with all arguments packed according to the format.

Format specifiers

b — converts Lua variable to a 1-byte integer, and stores the integer in the resulting string
s — converts Lua variable to a 2-byte integer, and stores the integer in the resulting string, low byte first,
i — converts Lua variable to a 4-byte integer, and stores the integer in the resulting string, low byte first,
l — converts Lua variable to a 8-byte integer, and stores the integer in the resulting string, low byte first,
w — converts Lua integer to a BER-encoded integer,
p — stores the length of the argument as a BER-encoded integer followed by the argument itself (a 4-bytes for integers (LE order) and a binary blob for other types),
=, +, &, |, ^, : — stores the corresponding Tarantool UPDATE operation code: field assignment, addition, conjunction, disjunction, exclusive disjunction, splice (from Perl SPLICE function). Expects field number to update as an argument. These format specifiers only store the corresponding operation code and field number to update, but do not describe operation arguments.

Errors

Unknown format specifier.

Example

localhost> lua box.insert(0, 0, 'hello world')
---
 - 0: {'hello world'}
...
localhost> lua box.update(0, 0, "=p", 1, 'bye world')
---
 - 0: {'bye world'}
...
localhost> lua box.update(0, 0, ":p", 1, box.pack('ppp', 0, 3, 'hello'))
---
 - 0: {'hello world'}
...
localhost> lua box.update(0, 0, "=p", 1, 4)
---
 - 0: {4}
...
localhost> lua box.update(0, 0, "+p", 1, 4)
---
 - 0: {8}
...
localhost> lua box.update(0, 0, "^p", 1, 4)
---
 - 0: {12}
...

box.unpack(format, binary)

Counterpart to box.pack().

Example

localhost> lua tuple=box.replace(2, 0)
---
...
localhost> lua string.len(tuple[0])
---
 - 4
...
localhost> lua box.unpack('i', tuple[0])
---
 - 0
...
localhost> lua box.unpack('bsil', box.pack('bsil', 255, 65535, 4294967295, tonumber64('18446744073709551615')))
---
 - 255
 - 65535
 - 4294967295
 - 18446744073709551615
...
localhost> lua num, str, num64 = box.unpack('ppp', box.pack('ppp', 666, 'string', tonumber64('666666666666666')))
---
...
localhost> lua print(box.unpack('i', num));
---
666
...
localhost> lua print(str);
---
string
...
localhost> lua print(box.unpack('l', num64))
---
666666666666666
...
localhost> lua box.unpack('=p', box.pack('=p', 1, '666'))
---
 - 1
 - 666

box.print(...)

Redefines Lua print() built-in to print either to the log file (when Lua is used from the binary port) or back to the user (for the administrative console).

When printing to the log file, INFO log level is used. When printing to the administrative console, all output is sent directly to the socket.

Note: the administrative console output must be YAML-compatible.

box.dostring(s, ...)

Evaluates an arbitrary chunk of Lua code passed in s. If there is a compilation error, it's raised as a Lua error. In case of compilation success, all arguments which follow s are passed to the compiled chunk and the chunk is invoked.

This function is mainly useful to define and run an arbitrary piece of Lua code, without having to introduce changes to the global Lua environment.

Example

lua box.dostring('abc')
---
error: '[string "abc"]:1: ''='' expected near ''<eof>'''
...
lua box.dostring('return 1')
---
 - 1
...
lua box.dostring('return ...', 'hello', 'world')
---
 - hello
 - world
...
lua box.dostring('local f = function(key) t=box.select(0, 0, key); if t ~= nil then return t[0] else return nil end end return f(...)', 0)
---
 - nil
...

box.time()

Returns current system time (in seconds) as a Lua number. The time is taken from the event loop clock, which makes this call very cheap, but still useful for constructing artificial tuple keys.

box.time64()

Returns current system time (in seconds) as a 64-bit integer. The time is taken from the event loop clock.

box.uuid()

Generate 128-bit (16 bytes) unique id. The id is returned in binary form.

Requires libuuid library to be installed. The library is loaded in runtime, and if the library is not available, this function returns an error.

box.uuid_hex()

Generate hex-string of 128-bit (16 bytes) unique id. Return 32-bytes string.

Example
                lua box.uuid_hex()
                ---
                 - a4f29fa0eb6d11e19f7737696d7fa8ff
                ...
            
box.raise(errcode, errtext)

Raises a client error. The difference between this function and the built-in error() function in Lua is that when the error reaches the client, it's error code is preserved, whereas every Lua error is presented to the client as ER_PROC_LUA. This function makes it possible to emulate any kind of native exception, such as a unique constraint violation, no such space/index, etc. A complete list of errors is present in errcode.h file in the source tree. Lua constants which correspond to Tarantool errors are defined in box.error module. The error message can be arbitrary. Throws client error. Lua procedure can emulate any request errors (for example: unique key exception).

Example
                lua box.raise(box.error.ER_WAL_IO, 'Wal I/O error')
                ---
                error: 'Wal I/O error'
                ...
            
box.auto_increment(space_no, ...)

Insert values into space designated by space_no, using an auto-increment primary key. The space must have a NUM or NUM64 primary key index of type TREE.

Example
localhost> lua box.auto_increment(0, "I am a duplicate")
---
 - 1: {'I am a duplicate'}
...
localhost> lua box.auto_increment(0, "I am a duplicate")
---
 - 2: {'I am a duplicate'}
...
            
box.counter.inc(space_no, key)

Increments a counter identified by the key. The key can be multi-part, but there must be an index covering all fields of the key. If there is no tuple identified by the given key, creates a new one with initial counter value set to 1. Returns the new counter value back.

Example
localhost> lua box.counter.inc(0, 'top.mail.ru')
---
 - 1
...
localhost> lua box.counter.inc(0, 'top.mail.ru')
---
 - 2
...
box.counter.dec(space_no, key)

Decrements a counter identified by the given key. If the key is not found, is a no-op. When counter value drops to 0, the tuple is deleted.

Example
localhost> lua box.counter.dec(0, 'top.mail.ru')
---
 - 1
...
localhost> lua box.counter.dec(0, 'top.mail.ru')
---
 - 0
...

Package box.tuple

The package stands for box.tuple userdata type. It is possible to access individual tuple fields using an index, select a range of fields, iterate over all fields in a tuple or convert a tuple to a Lua table. Tuples are immutable.

Example

localhost> lua t=box.insert(0, 1, 'abc', 'cde', 'efg', 'ghq', 'qkl')
---
...
localhost> lua #t
---
 - 6
...
localhost> lua t[1], t[5]
---
 - abc
 - qkl
...
localhost> lua t[6]
---
error: 'Lua error: [string "return t[6]"]:1: box.tuple: index 6 is out of bounds (0..5)'
...
localhost> lua for k,v in t:pairs() do print(v) end
---

abc
cde
efg
ghq
qkl
...
localhost> lua t:unpack()
---
 -
 - abc
 - cde
 - efg
 - ghq
 - qkl
...
localhost> lua t:slice(1, 2)
---
 - abc
...
localhost> lua t:slice(1, 3)
---
 - abc
 - cde
...
localhost> lua t:slice(1, -1)
---
 - abc
 - cde
 - efg
 - ghq
...
localhost> lua t:transform(1, 3)
---
 - 1: {'ghq', 'qkl'}
...
localhost> lua t:transform(0, 1, 'zyx')
---
 - 'zyx': {'abc', 'cde', 'efg', 'ghq', 'qkl'}
...
localhost> lua t:transform(-1, 1, 'zyx')
---
 - 1: {'abc', 'cde', 'efg', 'ghq', 'zyx'}
...
localhost> lua t=box.insert(0, 'abc', 'def', 'abc')
---
...
localhost> lua t:find('abc')
---
 - 0
...
localhost> lua t:findall('abc')
---
 - 0
 - 2
...
localhost> lua t:find(1, 'abc')
---
 - 2
...

box.tuple.new(...)

Construct a new tuple from a Lua table or a scalar.

Example
localhost> lua box.tuple.new({tonumber64('18446744073709551615'), 'string', 1})
---
 - 18446744073709551615: {'string', 1}
...

Package box.space

This package is a container for all configured spaces. A space object provides access to space attributes, such as id, whether or not a space is enabled, space cardinality, estimated number of rows. It also contains object-oriented versions of box functions. For example, instead of box.insert(0, ...) one can write box.space[0]:insert(...). Package source code is available in file src/box/box.lua

A list of all space members follows.

space.n
Ordinal space number, box.space[i].n == i
space.enabled
Whether or not this space is enabled in the configuration file.
space.cardinality
A limit on tuple field count for tuples in this space. This limit can be set in the configuration file. Value 0 stands for unlimited.
space.index[]
A container for all defined indexes. An index is a Lua object of type box.index with methods to search tuples and iterate over them in predefined order.
space:select(index_no, ...)
space:select_range(index_no, limit, key)

Select a range of tuples, starting from offset specified by key. The key can be multipart. Limit selection with at most limit tuples. If no key is specified, start from the first key in the index.

For TREE indexes, this returns tuples in sorted order. For other indexes, the order of tuples is unspecified, and can change significantly if data is inserted or deleted between two calls to select_range(). If key is nil or unspecified, the selection starts from the start of the index.

space:select_reverse_range(limit, key)
Select a reverse range of tuples, limited by limit, starting from key. The key can be multipart. TREE index returns tuples in descending order. Other index types do not support this call.
space:insert(...)
space:replace(...)
space:delete(key)
space:update(key, format, ...)
space:insert(...)
Object-oriented forms of respective box methods.
space:len()
Returns number of tuples in the space.
space:truncate()
Deletes all tuples.
space:pairs()

A helper function to iterate over all space tuples, Lua style.

Example
localhost> lua for k,v in box.space[0]:pairs() do print(v) end
---
1: {'hello'}
2: {'my     '}
3: {'Lua    '}
4: {'world'}
...

Package box.index

This package implements methods of type box.index. Indexes are contained in box.space[i].index[] array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects in the storage engine.

index.unique
Boolean, true if the index is unique.
index.type
A string for index type, either 'TREE', 'HASH', 'BITSET'.
index.key_field[]
An array describing index key fields.
index.idx
The underlying userdata which does all the magic.
index:iterator(type, ...)

This method provides iteration support within an index. Parameter type is used to identify the semantics of iteration. Different index types support different iterators. The remaining arguments of the function are varying and depend on the iteration type. For example, TREE index maintains a strict order of keys and can return all tuples in ascending or descending order, starting from the specified key. Other index types, however, do not support ordering.

To understand consistency of tuples, returned by an iterator, it's essential to know the principles of the Tarantool transaction processing subsystem. An iterator in Tarantool does not own a consistent read view. Instead, each procedure is granted exclusive access to all tuples and spaces until it encounters a "context switch": caused a write to disk, network, or by an explicit call to box.fiber.yield(). When the execution flow returns to the yielded procedure, the data set could have changed significantly. Iteration, resumed after a yield point, does not preserve the read view, but continues with the new content of the database.

Parameters

type — iteration strategy as defined in tables below.

Returns

This method returns an iterator closure, i.e. a function which can be used to get the next value on each invocation.

Errors

Selected iteration type is not supported in the subject index type or supplied parameters do not match iteration type.

Table 4.1. Common iterator types

TypeArgumentsHASHTREEBITSETDescription
box.index.ALLnoneyesyesyes Iterate over all tuples in an index. When iterating over a TREE index, tuples are returned in ascending order of the key. When iterating over a HASH or BITSET index, tuples are returned in physical order or, in other words, unordered.
box.index.EQkeyyesyesyes

Equality iterator: iterate over all tuples matching the key. Parts of a multipart key need to be separated by comma.

Semantics of the match depends on the index. A HASH and TREE index only supports exact match: all parts of a key participating in the index must be provided. In case of TREE index, only few parts of a key or a key prefix are accepted for search. In this case, all tuples with the same prefix or matching key parts are considered matching the search criteria.

A non-unique HASH index returns tuples in unspecified order. When a TREE index is not unique, or only part of a key is given as a search criteria, matching tuples are returned in ascending order. BITSET indexes are always unique.

box.index.GTkeyyes (*)yes no Iterate over tuples strictly greater than the search key. For TREE indexes, a key prefix or key part can be sufficient. If the key is nil, iteration starts from the smallest key in the index. The tuples are returned in ascending order of the key. HASH index also supports this iterator type, but returns tuples in unspecified order. However, if the server does not receive updates, this iterator can be used to retrieve all tuples via a HASH index piece by piece, by supplying the last key from the previous range as the start key for an iterator over the next range. BITSET index does not support this iteration type yet.


Table 4.2. TREE iterator types

TypeArgumentsDescription
box.index.REQkey or key part Reverse equality iterator. Is equivalent to box.index.EQ with only distinction that the order of returned tuples is descending, not ascending.
box.index.GEkey or key part Iterate over all tuples for which the corresponding fields are greater or equal to the search key. The tuples are returned in ascending order. Similarly to box.index.EQ, key prefix or key part can be used to seed the iterator. If the key is nil, iteration starts from the smallest key in the index.
box.index.LTkey or key part Similar to box.index.GT, but returns all tuples which are strictly less than the search key. The tuples are returned in the descending order of the key. nil key can be used to start from the end of the index range.
box.index.LEkey or key part Similar to box.index.GE, but returns all tuples which are less or equal to the search key or key prefix, and returns tuples in descending order, from biggest to smallest. If the key is nil, iteration starts from the end of the index range.


Table 4.3. BITSET iterator types

TypeArgumentsDescription
box.index.BITS_ALL_SETbit mask Matches tuples in which all specified bits are set.
box.index.BITS_ANY_SETbit mask Matches tuples in which any of the specified bits is set.
box.index.BITS_ALL_NOT_SETbit mask Matches tuples in which none of the specified bits is set.


Examples

localhost> show configuration
---
...
  space[0].enabled: "true"
  space[0].index[0].type: "HASH"
  space[0].index[0].unique: "true"
  space[0].index[0].key_field[0].fieldno: "0"
  space[0].index[0].key_field[0].type: "NUM"
  space[0].index[1].type: "TREE"
  space[0].index[1].unique: "false"
  space[0].index[1].key_field[0].fieldno: "1"
  space[0].index[1].key_field[0].type: "NUM"
  space[0].index[1].key_field[1].fieldno: "2"
  space[0].index[1].key_field[1].type: "NUM"
...
localhost> INSERT INTO t0 VALUES (1, 1, 0)
Insert OK, 1 rows affected
localhost> INSERT INTO t0 VALUES (2, 1, 1)
Insert OK, 1 rows affected
localhost> INSERT INTO t0 VALUES (3, 1, 2)
Insert OK, 1 rows affected
localhost> INSERT INTO t0 VALUES (4, 2, 0)
Insert OK, 1 rows affected
localhost> INSERT INTO t0 VALUES (5, 2, 1)
Insert OK, 1 rows affected
localhost> INSERT INTO t0 VALUES (6, 2, 2)
Insert OK, 1 rows affected
localhost> lua it = box.space[0].index[1]:iterator(box.index.EQ, 1); print(it(), " ", it(), " ", it());
---
1: {1, 0} 2: {1, 1} 3: {1, 2}
...
localhost> lua it = box.space[0].index[1]:iterator(box.index.EQ, 1, 2); print(it(), " ", it(), " ", it());
---
3: {1, 2} nil nil
...
localhost> lua i = box.space[0].index[1]:iterator(box.index.GE, 2, 1);  print(it(), " ", it(), " ", it());
---
5: {2, 1} 6: {2, 2} nil
...
localhost> lua for v in box.space[0].index[1]:iterator(box.index.ALL) do print(v) end
---
1: {1, 0}
2: {1, 1}
3: {1, 2}
4: {2, 0}
5: {2, 1}
6: {2, 2}
...
localhost> lua i = box.space[0].index[0]:iterator(box.index.LT, 1);
---
error: 'Iterator type is not supported'

index:min()
The smallest value in the index. Available only for indexes of type 'TREE'.
index:max()
The biggest value in the index. Available only for indexes of type 'TREE'.
index:random(randint)
Return a random value from an index. A random non-negative integer must be supplied as input, and a value is selected accordingly in index-specific fashion. This method is useful when it's important to get insight into data distribution in an index without having to iterate over the entire data set.
index:count()
Iterate over an index, count the number of tuples which equal the provided search criteria. The argument can either point to a tuple, a key, or one or more key parts. Returns the number of matched tuples.

Package box.fiber

Functions in this package allow you to create, run and manage existing fibers.

A fiber is an independent execution thread implemented using a mechanism of cooperative multitasking. A fiber has three possible states: running, suspended or dead. When a fiber is created with box.fiber.create(), it is suspended. When a fiber is started with box.fiber.resume(), it is running. When a fiber's control is yielded back to the caller with box.fiber.yield(), it is suspended. When a fiber ends (due to return or by reaching the end of the fiber function), it is dead.

A fiber can also be attached or detached. An attached fiber is a child of the creator, and is running only if the creator has called box.fiber.resume(). A detached fiber is a child of Tarantool internal sched fiber, and gets scheduled only if there is a libev event associated with it.

To detach, a running fiber must invoke box.fiber.detach(). A detached fiber loses connection with its parent forever.

All fibers are part of the fiber registry, box.fiber. This registry can be searched (box.fiber.find()) either by fiber id (fid), which is numeric, or by fiber name, which is a string. If there is more than one fiber with the given name, the first fiber that matches is returned.

Once fiber function is done or calls return, the fiber is considered dead. Its carcass is put into a fiber pool, and can be reused when another fiber is created.

A runaway fiber can be stopped with box.fiber.cancel(). box.fiber.cancel(), however, is advisory — it works only if the runaway fiber is calling box.fiber.testcancel() once in a while. Most box.* hooks, such as box.delete() or box.update(), are calling box.fiber.testcancel(). box.select() doesn't.

In practice, a runaway fiber can only become unresponsive if it does a lot of computations and doesn't check whether it's been canceled.

The other potential problem comes from detached fibers which never get scheduled, because they are not subscribed to any events, or because no relevant events occur. Such morphing fibers can be killed with box.fiber.cancel() at any time, since box.fiber.cancel() sends an asynchronous wakeup event to the fiber, and box.fiber.testcancel() is checked whenever such an event occurs.

Like all Lua objects, dead fibers are garbage collected: the garbage collector frees pool allocator memory owned by the fiber, resets all fiber data, and returns the fiber to the fiber pool.

box.fiber.id(fiber)
Return a numeric id of the fiber.
box.fiber.self()
Return box.fiber userdata object for the currently scheduled fiber.
box.fiber.find(id)
Locate a fiber userdata object by id.
box.fiber.create(function)

Create a fiber for function.

Errors

Can hit a recursion limit.

box.fiber.resume(fiber, ...)
Resume a created or suspended fiber.
box.fiber.yield(...)

Yield control to the calling fiber, if the fiber is attached, or to sched otherwise.

If the fiber is attached, whatever arguments are passed to this call, are passed on to the calling fiber. If the fiber is detached, box.fiber.yield() returns back everything passed into it.

box.fiber.detach()
Detach the current fiber. This is a cancellation point. This is a yield point.
box.fiber.sleep(time)
Yield to the sched fiber and sleep time seconds. Only the current fiber can be made to sleep.
box.fiber.cancel(fiber)
Cancel a fiber. Running and suspended fibers can be canceled. Returns an error if the subject fiber does not permit cancel.
box.fiber.testcancel()
Check if the current fiber has been canceled and throw an exception if this is the case.

Package box.session

Learn session state, set on-connect and on-disconnect triggers.

A session is an object associated with each client connection. Through this module, it's possible to query session state, as well as set a Lua chunk executed on connect or disconnect event.

box.session.id()
Return a unique monotonic identifier of the current session. The identifier can be used to check whether or not a session is alive. 0 means there is no session (e.g. a procedure is running in a detached fiber).
box.session.fd(id)
Return an integer file descriptor associated with the connected client.
box.session.exists(id)
Return true if a session is alive, false otherwise.

This module also makes it possible to define triggers on connect and disconnect events. Please see the triggers chapter for details.

Package box.ipc — inter procedure communication

box.ipc.channel(capacity)

Create a new communication channel with predefined capacity. The channel can be used to synchronously exchange messages between stored procedures. The channel is garbage collected when no one is using it, just like any other Lua object. Channels can be worked with using functional or object-oriented syntax. For example, the following two lines are equivalent:

    channel:put(message)
    box.ipc.channel.put(channel, message)
box.ipc.channel.put(channel, message, timeout)
Send a message using a channel. If the channel is full, box.ipc.channel.put() blocks until there is a free slot in the channel. If timeout is provided, and the channel doesn't become empty for the duration of the timeout, box.ipc.channel.put() returns false. Otherwise it returns true.
box.ipc.channel.get(channel, timeout)
Fetch a message from a channel. If the channel is empty, box.ipc.channel.get() blocks until there is a message. If timeout is provided, and there are no new messages for the duration of the timeout, box.ipc.channel.get() returns error.
box.ipc.channel.broadcast(channel, message, timeout)
If the channel is empty, is equivalent to box.ipc.channel.put(). Otherwise sends the message to all readers of the channel.
box.ipc.channel.is_empty(channel)
Check if the channel is empty (has no messages).
box.ipc.channel.is_full(channel)
Check if the channel is full (has no room for a new message).
box.ipc.channel.has_readers(channel)
Check if the channel is empty and has readers waiting for a message.
box.ipc.channel.has_writers(channel)

Check if the channel is full and has writers waiting for empty room.

Example
local channel = box.ipc.channel(10)
function consumer_fiber()
    while true do
        local task = channel:get()
        ...
    end
end

function consumer2_fiber()
    while true do
        local task = channel:get(10)        -- 10 seconds
        if task ~= nil then
            ...
        else
            print("timeout!")
        end
    end
end

function producer_fiber()
    while true do
        task = box.select(...)
        ...
        if channel:is_empty() then
            # channel is empty
        end

        if channel:is_full() then
            # channel is full
        end

        ...
        if channel:has_readers() then
            # there are some fibers that wait data
        end
        ...

        if channel:has_writers() then
            # there are some fibers that wait readers
        end
        channel:put(task)
    end
end

function producer2_fiber()
    while true do
        task = box.select(...)

        if channel:put(task, 10) then       -- 10 seconds
            ...
        else
            print("timeout!")
        end
    end
end

Package box.socket — TCP and UDP sockets

BSD sockets is a mechanism to exchange data with a local or remote host in connection-oriented (TCP) or datagram-oriented (UDP) mode. Semantics of the calls in box.socket API closely follows semantics of the corresponding POSIX calls. Function names and signatures are mostly compatible with luasocket.

Similarly to luasocket, box.socket doesn't throw exceptions on errors. On success, most calls return a socket object. On error, a multiple return of nil, status, errno, errstr is produced. Status can be one of "error", "timeout", "eof" or "limit". On success, status is always nil. A call which returns data (recv(), recvfrom(), readline()) on success returns a Lua string of the requested size and nil status. On error or timeout, an empty string is followed by the corresponding status, error number and message. A call which sends data (send(), sendto()) on success returns the number of bytes sent, and the status is, again, nil. On error or timeout 0 is returned, followed by status, error number and message.

The last error can be retrieved from the socket using socket:error(). Any call except error() clears the last error first (but may set a new one).

Calls which require a socket address and in POSIX expect struct sockaddr_in, in box.socket simply accept host name and port as additional arguments. Name resolution is done automatically. If it fails, status is set to "error", errno is set to -1 and error string is set to "Host name resolution failed".

All calls that can take time block the calling fiber and can get it preempted. The implementation, however, uses non-blocking cooperative I/O, so Tarantool continues processing queries while a call is blocked. A timeout can be provided for any socket call which can take a long time.

As all other box libraries, the API can be used in procedural style (e.g. box.socket.close(socket)) as well as in object-oriented style (socket:close()).

A closed socket should not be used any more. Alternatively, the socket will be closed when its userdata is garbage collected by Lua.

box.socket.tcp()

Create a new TCP socket.

Returns

A new socket or nil.

box.socket.udp()

Create a new UDP socket.

Returns

A new socket or nil.

socket:connect(host, port, [timeout])

Connect a socket to a remote host. Can be used with IPv6 and IPv4 addresses, as well as domain names. If multiple addresses correspond to a domain, tries them all until successfully connected.

Returns

Returns a connected socket on success, nil, status, errno, errstr on error or timeout.

socket:send(data, [timeout])

Send data over a connected socket.

Returns

The number of bytes sent. On success, this is exactly the length of data. In case of error or timeout, returns the number of bytes sent before error, followed by status, errno, errstr.

socket:recv(size, [timeout])

Read size bytes from a connected socket. An internal read-ahead buffer is used to reduce the cost of this call.

Returns

A string of the requested length on success. On error or timeout, returns an empty string, followed by status, errno, errstr. If there was some data read before a timeout occurred, it will be available on the next call. In case the writing side has closed its end, returns the remainder read from the socket (possibly an empty string), followed by "eof" status.

socket:readline([limit], [separator list], [timeout])

Read a line from a connected socket.

socket:readline() with no arguments reads data from a socket until '\n' or eof. If a limit is set, the call reads data until a separator is found, or the limit is reached. By default, there is no limit. Instead of the default separator, a Lua table can be used with one or multiple separators. Then the data is read until the first matching separator is found.

Returns

A Lua string with data in case of success or an empty string in case of error. When multiple separators were provided in a separator table, the matched separator is returned as the third argument.

Table 4.4. readline() returns

data, nil, separatorsuccess
"", "timeout", ETIMEDOUT, errstrtimeout
"", "error", errno, errstrerror
data, "limit"limit
data, "eof"eof


socket:bind(host, port[, timeout])

Bind a socket to the given host/port. A UDP socket after binding can be used to receive data (see recvfrom()). A TCP socket can be used to accept new connections, after it's been put in listen mode. The timeout is used for name resolution only. If host name is an IP address, the call never yields and the timeout is unused.

Returns

Socket object on success, nil, status, errno, errstr on error.

socket:listen()

Start listening for incoming connections. The listen backlog, on Linux, is taken from /proc/sys/net/core/somaxconn, whereas on BSD is set to SOMAXCONN.

Returns

Socket on success, nil, "error", errno, errstr on error.

socket:accept([timeout])

Wait for a new client connection and create a connected socket.

Returns

peer_socket, nil, peer_host, peer_port on success. nil, status, errno, errstr on error.

socket:sendto(data, host, port, [timeout])

Send a message on a UDP socket to a specified host.

Returns

The number of bytes sent on success, 0, status, errno, errstr on error or timeout.

socket:recvfrom(limit[, timeout])

Receive a message on a UDP socket.

Returns

Message, nil, client address, client port on success, "", status, errno, errstr on error or timeout.

socket:shutdown(how)

Shutdown a reading, writing or both ends of a socket. Accepts box.socket.SHUT_RD, box.socket.SHUT_WR and box.socket.SHUT_RDWR.

Returns

Socket on success, nil, "error", errno, errstr on error.

socket:close()

Close (destroy) a socket. A closed socket should not be used any more.

socket:error()

Retrieve the last error occurred on a socket.

Returns

errno, errstr. 0, "Success" if there is no error.

Packages box.cfg, box.info, box.slab and box.stat: server introspection

Package box.cfg

This package provides read-only access to all server configuration parameters.

box.cfg
Example
localhost> lua for k, v in pairs(box.cfg) do print(k, " = ", v) end
---
io_collect_interval = 0
pid_file = box.pid
panic_on_wal_error = false
slab_alloc_factor = 2
slab_alloc_minimal = 64
admin_port = 33015
logger = cat - >> tarantool.log
...

Package box.info

This package provides access to information about server variables: pid, uptime, version and such. Its contents is identical to output of SHOW INFO.

box.info()

Since contents of box.info is dynamic, it's not possible to iterate over keys with Lua pairs() function. For this purpose, box.info() builds and returns a Lua table with all keys and values provided in the package.

Example
localhost> lua for k,v in pairs(box.info()) do print(k, ": ", v) end
---
version: 1.4.7-92-g4ba95ca
status: primary
pid: 1747
lsn: 1712
recovery_last_update: 1306964594.980
recovery_lag: 0.000
uptime: 39
build: table: 0x419cb880
logger_pid: 1748
config: /home/unera/work/tarantool/test/box/tarantool_good.cfg
...
box.info.status, box.info.pid, box.info.lsn, ...
Example
localhost> lua box.info.pid
---
 - 1747
...
localhost> lua box.info.logger_pid
---
 - 1748
...
localhost> lua box.info.version
---
 - 1.4.7-92-g4ba95ca
...
localhost> lua box.info.config
---
 - /home/unera/work/tarantool/test/box/tarantool_good.cfg
...
localhost> lua box.info.uptime
---
 - 3672
...
localhost> lua box.info.lsn
---
 - 1712
...
localhost> lua box.info.status
---
 - primary
...
localhost> lua box.info.recovery_lag
---
 - 0.000
...
localhost> lua box.info.recovery_last_update
---
 - 1306964594.980
...
localhost> lua box.info.snapshot_pid
---
 - 0
...
localhost> lua for k, v in pairs(box.info.build) do print(k .. ': ', v) end
---
flags:  -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -DCORO_ASM -fno-omit-frame-pointer -fno-stack-protector -fexceptions -funwind-tables -fgnu89-inline -pthread  -Wno-sign-compare -Wno-strict-aliasing -std=gnu99 -Wall -Wextra -Werror
target: Linux-x86_64-Debug
compiler: /usr/bin/gcc
options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_STATIC=OFF -DENABLE_GCOV=OFF -DENABLE_TRACE=ON -DENABLE_BACKTRACE=ON -DENABLE_CLIENT=OFF
...

Package box.slab

This package provides access to slab allocator statistics.

box.slab
Example
localhost> lua box.slab.arena_used
---
 - 4194304
...
localhost> lua box.slab.arena_size
---
 - 104857600
...
localhost> lua for k, v in pairs(box.slab.slabs) do print(k) end
---
64
128
...
localhost> lua for k, v in pairs(box.slab.slabs[64]) do print(k, ':', v) end
---
items:1
bytes_used:160
item_size:64
slabs:1
bytes_free:4194144
...

Package box.stat

This package provides access to request statistics.

box.stat
Example
localhost> lua box.stat -- a virtual table
---
 - table: 0x41a07a08
...
localhost> lua box.stat() -- a full table (the same)
---
 - table: 0x41a0ebb0
...
localhost> lua for k, v in pairs(box.stat()) do print(k) end
---
DELETE
SELECT
REPLACE
CALL
UPDATE
DELETE_1_3
...
localhost> lua for k, v in pairs(box.stat().DELETE) do print(k, ': ', v) end
---
total: 23210
rps: 22
...
localhost> lua for k, v in pairs(box.stat.DELETE) do print(k, ': ', v) end -- the same
---
total: 23210
rps: 22
...
localhost> lua for k, v in pairs(box.stat.SELECT) do print(k, ': ', v) end
---
total: 34553330
rps: 23
...
localhost>

Additional examples can be found in the open source Lua stored procedures repository and in the server test suite.

Limitation of stored procedures

There are two limitations in stored procedures support one should be aware of: execution atomicity and lack of typing.

Cooperative multitasking environment

Tarantool core is built around cooperative multi-tasking paradigm: unless a running fiber deliberately yields control to some other fiber, it is not preempted. Yield points are built into all calls from Tarantool core to the operating system. Any system call which can block is performed in asynchronous manner and the fiber waiting on the system call is preempted with a fiber ready to run. This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there is no concurrency around a resource, no race conditions and no memory consistency issues.

When requests are small, e.g. simple UPDATE, INSERT, DELETE, SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

A stored procedure, however, can perform complex computations, or be written in such a way that control is not given away for a long time. This can lead to unfair scheduling, when a single client throttles the rest of the system, or to apparent stalls in request processing. Avoiding this situation is responsibility of the stored procedure author. Most of box calls, such as box.insert(), box.update(), box.delete() are yield points; box.select() and box.select_range(), however, are not.

It should also be noted, that in absence of transactions, any yield in a stored procedure is a potential change in the database state. Effectively, it's only possible to have CAS (compare-and-swap) -like atomic stored procedures: i.e. procedures which select and then modify a record. Multiple data change requests always run through a built-in yield point.

Lack of field types

When invoking a stored procedure from the binary protocol, it's not possible to convey types of arguments. Tuples are type-agnostic. The conventional workaround is to use strings to pass all (textual and numeric) data.

Defining triggers in Lua

Triggers are Lua scripts invoked by the system upon a certain event. Tarantool currently only supports system-wide triggers, run when a new connection is established or dropped. Since trigger body is a Lua script, it is external to the server, and a trigger must be set up on each server start. This is most commonly done in the initialization script. Once a trigger for an event exists, it is automatically invoked whenever an event occurs. The performance overhead of triggers, as long as they are not defined, is minimal: merely a pointer dereference and check. If a trigger is defined, its overhead is equivalent to the overhead of calling a stored procedure.

Triggers on connect and disconnect

box.session.on_connect(chunk)

Set a callback (trigger) invoked on each connected session. The callback doesn't get any arguments, but is the first thing invoked in the scope of the newly created session. If the trigger fails by raising an error, the error is sent to the client and the connection is shut down. Returns the old value of the trigger.

Warning

If a trigger always results in an error, it may become impossible to connect to the server to reset it.

box.session.on_disconnect(chunk)
Set a trigger invoked after a client has disconnected. Returns the old value of the trigger. If the trigger raises an error, the error is logged but otherwise is ignored. The trigger is invoked while the session associated with the client still exists and can access session properties, such as id.

Chapter 5. Replication

To set up replication, it's necessary to prepare the master, configure a replica, and establish procedures for recovery from a degraded state.

Replication architecture

A replica gets all updates from the master by continuously fetching and applying its write ahead log (WAL). Each record in the WAL represents a single Tarantool command, such as INSERT, UPDATE, DELETE and is assigned a monotonically growing log sequence number (LSN). In essence, Tarantool replication is row-based: all data change commands are fully deterministic and operate on a single record.

A stored program invocation does not enter the Write Ahead Log. Instead, log events for actual UPDATEs and DELETEs, performed by the Lua code, are written to the log. This ensures that possible non-determinism of Lua does not cause replication going out of sync.

For replication to work correctly, the latest LSN on the replica must match or fall behind the latest LSN on the master. If the replica has its own updates, this leads to it getting out of sync, since updates from the master having identical LSNs are not applied. Indeed, if replication is ON, Tarantool does not accept updates, even on its primary_port.

Setting up the master

To prepare the master for connections from replica, it's only necessary to enable replication_port in the configuration file. An example configuration file can be found in test/box_replication/cfg/master.cfg. A master with enabled replication_port can accept connections from as many replicas as necessary on that port. Each replica has its own replication state.

Setting up a replica

The server, master or replica, always requires a valid snapshot file to boot from. For a master, it's usually prepared with with --init-storage option, for replicas it's usually copied from the master.

To start replication, configure replication_source. Other parameters can also be changed, but existing spaces and their primary keys on the replica must be identical to ones on the master.

Once connected to the master, the replica requests all changes that happened after the latest local LSN. It is therefore necessary to keep WAL files on the master host as long as there are replicas that haven't applied them yet. An example configuration can be found in test/box_replication/cfg/replica.cfg.

In absence of required WALs, a replica can be "re-seeded" at any time with a newer snapshot file, manually copied from the master.

Note

Replication parameters are "dynamic", which allows the replica to become a master and vice versa with help of RELOAD CONFIGURATION statement.

Recovering from a degraded state

"Degraded state" is a situation when the master becomes unavailable -- either due to hardware or network failure, or a programming bug. There is no reliable way for a replica to detect that the master is gone for good, since sources of failure and replication environments vary significantly.

A separate monitoring script (or scripts, if decision making quorum is desirable) is necessary to detect a master failure. Such script would typically try to update a tuple in an auxiliary space on the master, and raise alarm if a network or disk error persists longer than is acceptable.

When a master failure is detected, the following needs to be done:

  • First and foremost, make sure that the master does not accepts updates. This is necessary to prevent the situation when, should the master failure end up being transient, some updates still go to the master, while others already end up on the replica.

    If the master is available, the easiest way to turn on read-only mode is to turn Tarantool into a replica of itself. This can be done by setting master's replication_source to point to self.

    If the master is not available, best bet is to log into the machine and kill the server, or change the machine's network configuration (DNS, IP address).

    If the machine is not available, it's perhaps prudent to power it off.

  • Record the replica's LSN, by issuing SHOW INFO. This LSN may prove useful if there are updates on the master that never reached the replica.

  • Propagate the replica to become a master. This is done by setting replication_source on replica to an empty string.

  • Change the application configuration to point to the new master. This can be done either by changing the application's internal routing table, or by setting up old master's IP address on the new master's machine, or using some other approach.

  • Recover the old master. If there are updates that didn't make it to the new master, they have to be applied manually. You can use --cat option to read server logs.

Chapter 6. Server administration

Typical server administration tasks include starting and stopping the server, reloading configuration, taking snapshots, log rotation.

Server signal handling

The server is configured to gracefully shutdown on SIGTERM and SIGINT (keyboard interrupt) or SIGHUP. SIGUSR1 can be used to save a snapshot. All other signals are blocked or ignored. The signals are processed in the main event loop. Thus, if the control flow never reaches the event loop (thanks to a runaway stored procedure), the server stops responding to any signal, and can be only killed with SIGKILL (this signal can not be ignored).

System-specific administration notes

This chapter provides a cheatsheet for most common server management routines on every supported operating system.

Debian GNU/Linux and Ubuntu

Setting up an instance: ln -s /etc/tarantool/instances.available/instance-name.cfg /etc/tarantool/instances.enabled/

Starting all instances: service tarantool start

Stopping all instances: service tarantool stop

Starting/stopping one instance: service tarantool-instance-name start/stop

Fedora, RHEL, CentOS

tba

FreeBSD

tba

Mac OS X

tba

Chapter 7. Configuration reference

This chapter provides a reference of options which can be set in the command line or tarantool.cfg configuration file.

Tarantool splits its configuration parameters between command line options and a configuration file. Command line flags are provided for the most basic properties only: the rest must be set in the configuration file. At runtime, this allows to disambiguate the source of a configuration setting: it unequivocally comes either from the command line, or from the configuration file, but never from both.

Command line options

Tarantool follows the GNU standard for its command line interface: long options start with a double dash (--option), their short counterparts use a single one (-o). For phrases, both dashes and underscores can be used as word separators (--cfg-get and --cfg_get both work). If an option requires an argument, you can either separate it with a space or equals sign (--cfg-get=pid_file and --cfg-get pid_file both work).

  • --help, -h

    Print an annotated list of all available options and exit.

  • --version, -V

    Print product name and version, for example:

    $  ./tarantool_box --version
    Tarantool 1.4.0-69-g45551dd
            

    In this example:

    Tarantool is the name of the reusable asynchronous networking programming framework.
    Box is the name of the storage back-end.
    The 3-number version follows the standard <major>-<minor>-<patch> scheme, in which <major> number is changed only rarely, <minor> is incremented for each new milestone and indicates possible incompatible changes, and <patch> stands for the number of bug fix releases made after the start of the milestone. The optional commit number and commit SHA1 are output for non-released versions only, and indicate how much this particular build has diverged from the last release.

    Note

    Tarantool uses git describe to produce its version id, and this id can be used at any time to check out the corresponding source from our git repository.

  • --config=/path/to/config.file, -c

    Tarantool does not start without a configuration file. By default, the server looks for file named tarantool.cfg in the current working directory. An alternative location can be provided using this option.

  • --check-config

    Check the configuration file for errors. This option is normally used on the command line before reload configuration is issued on the administrative port, to ensure that the new configuration is valid. When configuration is indeed correct, the program produces no output and returns 0. Otherwise, information about discovered error is printed out and the program terminates with a non-zero value.

  • --cfg-get=option_name

    Given option name, print option value. If the option does not exist, or the configuration file is incorrect, an error is returned. If the option is not explicitly specified, its default value is used instead. Example:

    $ ./tarantool_box --cfg-get=admin_port
    33015   

  • --init-storage

    Initialize the directory, specified in vardir configuration option by creating an empty snapshot file in it. If vardir doesn't contain at least one snapshot, the server does not start. There is no magic with automatic initialization of vardir on boot to make potential system errors more noticeable. For example, if the operating system reboots and fails to mount the partition on which vardir is expected to reside, the rc.d or service script responsible for server restart will also fail, thanks to this option.

The only two options which have effect on a running server are:

  • --verbose, -v

    Increase verbosity level in log messages. This option currently has no effect.

  • --background, -b

    Detach from the controlling terminal and run in background.

    Caution

    Tarantool uses stdout and stderr for debug and error log output. When starting the server with option --background, make sure to either redirect its standard out and standard error streams, or provide logger option in the configuration file, since otherwise all logging information will be lost

The option file

All advanced configuration parameters must be specified in a configuration file, which is required for server start. If no path to the configuration file is specified on the command line (see --config), the server looks for a file named tarantool.cfg in the current working directory.

To facilitate centralized and automated configuration management, runtime configuration modifications are supported solely through RELOAD CONFIGURATION administrative statement. Thus, the procedure to change Tarantool configuration at runtime is to edit the configuration file. This ensures that, should the server get killed or restart, no unexpected changes to configuration can occur.

Not all configuration file settings are changeable at runtime: such settings will be highlighted in this reference. If the same setting is given more than once, the latest occurrence takes effect. You can always invoke SHOW CONFIGURATION from the administrative console to show the current configuration.

Tarantool maintains a set of all allowed configuration parameters in two template files, which are easy to maintain and extend: cfg/core_cfg.cfg_tmpl, src/box/box_cfg.cfg_tmpl. These files can always be used as a reference for any parameter in this manual.

In addition, two working examples can be found in the source tree: test/box/tarantool.cfg, test/box_big/tarantool.cfg.

Table 7.1. Basic parameters

NameTypeDefaultRequired?Dynamic?Description
usernamestring""nonoUNIX user name to switch to after start.
work_dirstring""nonoA directory to switch to with chdir(2) after start. Can be relative to the starting directory. If not specified, the current working directory of the server is the same as starting directory.
wal_dirstring""nonoA directory to store the write ahead log files (WAL) in. Can be relative to work_dir. You may choose to separate your snapshots and logs and store them on separate disks. This is how this parameter is most commonly used. If not specified, defaults to work_dir.
snap_dirstring""nonoA directory to store snapshots in. Can be relative to work_dir. If not specified, defaults to work_dir. See also wal_dir.
bind_ipaddrstring"INADDR_ANY"nonoThe network interface to bind to. By default, the server binds to all available addresses. Applies to all ports opened by the server.
primary_portintegernoneyesnoThe read/write data port. Has no default value, so must be specified in the configuration file. Normally set to 33013. Note: a replica also binds to this port, accepts connections, but these connections can only serve reads until the replica becomes a master.
secondary_portintegernonenonoAdditional, read-only port. Normally set to 33014. Not used unless is set.
admin_portintegernonenonoThe TCP port to listen on for administrative connections. Has no default value. Not used unless assigned a value. Normally set to 33015.
pid_filestringtarantool.pidnonoStore the process id in this file. Can be relative to work_dir.
custom_proc_title string""nono

Inject the given string into server process title (what's shown in COMMAND column of ps and top commands). For example, an unmodified Tarantool process group looks like:

kostja@shmita:~$ ps -a -o command | grep box
tarantool_box: primary pri:33013 sec:33014 adm:33015

After "sessions" custom_proc_title is injected it looks like:

kostja@shmita:~$ ps -a -o command | grep box
tarantool_box: primary@sessions pri:33013 sec:33014 adm:33015

Table 7.2. Configuring the storage

NameTypeDefaultRequired?Dynamic?Description
slab_alloc_arenafloat1.0nono How much memory Tarantool allocates to actually store tuples, in gigabytes. When the limit is reached, INSERT or UPDATE requests begin failing with error ER_MEMORY_ISSUE. While the server does not go beyond the defined limit to allocate tuples, there is additional memory used to store indexes and connection information. Depending on actual configuration and workload, Tarantool can consume up to 20-40% of the limit set here.
slab_alloc_minimalinteger64nonoSize of the smallest allocation unit. It can be tuned down if most of the tuples are very small.
slab_alloc_factorfloat2.0nonoUse slab_alloc_factor as the multiplier for computing the sizes of memory chunks that tuples are stored in. A lower value may result in less wasted memory depending on the total amount of memory available and the distribution of item sizes.
spacearray of objectsnoneyesnoThis is the main Tarantool parameter, describing the data structure that users get access to via client/server protocol. It holds an array of entries, and each entry represents a tuple set served by the server. Every entry is a composite object, best seen as a C programming language "struct" [a].

[a]

Space settings explained

Space is a composite parameter, i.e. it has properties.

/*
 * Each tuple consists of fields. Three field types are
 * supported.
 */

enum { STR, NUM, NUM64 } field_type;

/*
 * Tarantool is interested in field types only inasmuch as
 * it needs to build indexes on fields. An index
 * can cover one or more fields.
 */

struct index_field_t {
  unsigned int fieldno;
  enum field_type type;
};

/*
 * HASH and TREE index types are supported.
 */

enum { HASH, TREE } index_type;

struct index_t {
  index_field_t key_field[];
  enum index_type type;
  /* Secondary index may be non-unique */
  bool unique;
};

struct space_t
{
  /* A space can be quickly disabled and re-enabled at run time. */
  bool enabled;
  /*
   * If given, each tuple in the space must have exactly
   * this many fields.
   */
  unsigned int cardinality;
  /* Only used for HASH indexes, to preallocate memory. */
  unsigned int estimated_rows;
  struct index_t index[];
};

The way a space is defined in a configuration file is similar to how you would initialize a C structure in a program. For example, a minimal storage configuration looks like below:

space[0].enabled = 1
space[0].index[0].type = HASH
space[0].index[0].unique = 1
space[0].index[0].key_field[0].fieldno = 0
space[0].index[0].key_field[0].type = NUM64

The parameters listed above are mandatory. Other space properties are set in the same way. An alternative syntax, mainly useful when defining large spaces, exists:

space[0] = {
    enabled = 1,
    index = [
        {
            type = HASH,
            key_field = [
                {
                    fieldno = 0,
                    type = NUM64
                }
            ]
        }
    ]
}

When defining a space, please be aware of these restrictions:

  • at least one space must be configured,
  • each configured space needs at least one unique index,
  • "unique" property doesn't have a default, and must be set explicitly,
  • space configuration can not be changed dynamically, currently you need to restart the server even to disable or enable a space,
  • HASH indexes may cover only one field and can not be non-unique.


Table 7.3. Binary logging and snapshots

NameTypeDefaultRequired?Dynamic?Description
panic_on_snap_errorbooleantruenonoIf there is an error reading the snapshot (at server start), abort.
panic_on_wal_errorbooleanfalsenonoIf there is an error reading from a write ahead log (at server start), abort.
rows_per_walinteger500000nonoHow many log records to store in a single write ahead log file. When this limit is reached, Tarantool creates another WAL file named <first-lsn-in-wal>.wal This can be useful for simple rsync-based backups.
snap_io_rate_limitfloat0.0noyesReduce the throttling effect of SAVE SNAPSHOT on the INSERT/UPDATE/DELETE performance by setting a limit on how many megabytes per second it can write to disk. The same can be achieved by splitting wal_dir and snap_dir locations and moving snapshots to a separate disk.
wal_fsync_delayfloat0noyesDo not flush the write ahead log to disk more often than once in wal_fsync_delay seconds. By default the delay is zero, that is, the write ahead log is flushed after every write. Setting the delay may be necessary to increase write throughput, but may lead to several last updates being lost in case of a power failure. Such failure, however, does not read to data corruption: all WAL records have a checksum, and only complete records are processed during recovery.
wal_modestring"fsync_delay"noyesSpecify fiber-WAL-disk synchronization mode as: none: write ahead log is not maintained; write: fibers wait for their data to be written to the write ahead log (no fsync(2)); fsync: fibers wait for their data, fsync(2) follows each write(2); fsync_delay: fibers wait for their data, fsync(2) is called every N=wal_fsync_delay seconds (N=0.0 means no fsync(2) - equivalent to wal_mode = "write");

Table 7.4. Replication

NameTypeDefaultRequired?Dynamic?Description
replication_portinteger0nonoReplication port. If non-zero, Tarantool listens on the given port for incoming connections from replicas. See also replication_source, which complements this setting on the replica side.
replication_sourcestringNULLnoyesPair ip:port describing the master. If not empty, replication is on, and Tarantool does not accept updates on primary_port. This parameter is dynamic, that is, to enter master mode, simply set the value to an empty string and issue RELOAD CONFIGURATION.

Table 7.5. Networking

NameTypeDefaultRequired?Dynamic?Description
io_collect_intervalfloat0.0noyesIf non-zero, a sleep given duration is injected between iterations of the event loop. Can be used to reduce CPU load in deployments in which the number of client connections is large, but requests are not so frequent (for example, each connection issuing just a handful of requests per second).
readaheadinteger16384nonoThe size of read-ahead buffer associated with a client connection. The larger is the buffer, the more memory an active connection consumes and more requests can be read from the operating system buffer in a single system call. The rule of tumb is to make sure the buffer can contain at least a few dozen requests. Therefore, if a typical tuple in a request is large, e.g. a few kilobytes or even megabytes, the readahead buffer should be increased. If batched request processing is not used, it's prudent to leave this setting at its default.
backloginteger1024nonoThe size of listen backlog.

Table 7.6. Logging

NameTypeDefaultRequired?Dynamic?Description
log_levelinteger4noyesHow verbose the logging is. There are 5 log verbosity classes: 1 -- ERROR, 2 -- CRITICAL, 3 -- WARNING, 4 -- INFO, 5 -- DEBUG. By setting log_level, you can enable logging of all classes below or equal to the given level. Tarantool prints its logs to the standard error stream by default, but this can be changed with "logger" configuration parameter.
loggerstring""nonoBy default, the log is sent to the standard error stream (stderr). If this option is given, Tarantool creates a child process, executes the given command in it, and pipes its standard output to the standard input of the created process. Example setting: tee --append tarantool.log (this will duplicate log output to stdout and a log file).
logger_nonblockinteger0nonoIf this option is given, Tarantool does not block on the log file descriptor when it's not ready for write, and drops the message instead. If log_level is high, and a lot of messages go to the log file, setting this option to 1 may improve logging performance at the cost of some log messages getting lost.
too_long_thresholdfloat0.5noyesIf processing a request takes longer than the given value (in seconds), warn about it in the log. Has effect only if log_level is no less than 3 (WARNING).

Table 7.7. Memcached protocol support

NameTypeDefaultRequired?Dynamic?Description
memcached_portintegernonenono Turn on Memcached protocol support on the given port. All requests on this port are directed to a dedicated space, set in memcached_space. Memcached-style flags are supported and stored along with the value. The expiration time can also be set and is persistent, but is ignored, unless memcached_expire is turned on. Unlike Memcached, all data still goes to the binary log and to the replica, if latter one is set up, which means that power outage does not lead to loss of all data. Thanks to data persistence, cache warm up time is also very short.
memcached_spaceinteger23nono Space id to store memcached data in. The format of tuple is [key, metadata, value], with a HASH index based on the key. Since the space format is defined by Memcached data model, it must not be previously configured.
memcached_expirebooleanfalsenono Turn on tuple time-to-live support in memcached_space. This effectively turns Tarantool into a persistent, replicated and scriptable implementation of Memcached.
memcached_expire_per_loopinteger1024noyesHow many records to consider per iteration of the expiration loop. Tuple expiration is performed in a separate green thread within our cooperative multitasking framework and this setting effectively limits how long the expiration loop stays on CPU uninterrupted.
memcached_expire_full_sweepfloat3600noyesTry to make sure that every tuple is considered for expiration within this time frame (in seconds). Together with memcached_expire_per_loop this defines how often the expiration green thread is scheduled on CPU.

Chapter 8. Connectors

This chapter documents APIs for various programming languages.

Apart from the native Tarantool client driver, you can always use a Memcached driver of your choice, after enabling Memcached protocol in the configuration file.

C

Please see connector/c in the source tree.

Perl

Please refer to CPAN module DR::Tarantool.

PHP

Please see tarantool-php project at GitHub.

Ruby

You need Ruby 1.9 or later to use this connector. Connector sources are located in http://github.com/mailru/tarantool-ruby.

Appendix A. Server process titles

Linux and FreeBSD operating systems allow a running process to modify its title, which otherwise contains the program name. Tarantool uses this feature to aid to needs of system administration, such as figuring out what services are running on a host, TCP/IP ports in use, et cetera.

Tarantool process title follows the following naming scheme: program_name: role[@custom_proc_title] [ports in use]

program_name is typically tarantool_box. The role can be one of the following:

  • primary -- the master node,

  • replica/IP:port -- a replication node,

  • wal_writer -- a write ahead log management process (always pairs up with the main process, be it primary or replica).

  • replication_server -- runs only if replication_port is set, accepts connections on this port and creates a

  • replication_relay -- a process that servers a single replication connection.

Possible port names are: pri for primary_port, sec for secondary_port, adm for admin_port and memcached for memcached_port.

For example:

  • tarantool_box: primary pri:50000 sec:50001 adm:50002

  • tarantool_box: primary@infobox pri:15013 sec:15523 adm:10012

  • tarantool_box: wal_writer

Appendix B. List of error codes

In the current version of the binary protocol, error message, which is normally more descriptive than error code, is not present in server response. The actual message may contain a file name, a detailed reason or operating system error code. All such messages, however, are logged in the error log. When using Memcached protocol, the error message is sent to the client along with the code. Below follow only general descriptions of some popular codes. A complete list of errors can be found in file errcode.h in the source tree.

List of error codes

ER_NONMASTER

Attempt to execute an update on a running replica.

ER_ILLEGAL_PARAMS

Illegal parameters. Malformed protocol message.

ER_MEMORY_ISSUE

Out of memory: slab_alloc_arena limit is reached.

ER_WAL_IO

Failed to record the change in the write ahead log. Some sort of disk error.

ER_INDEX_VIOLATION

A unique index constraint violation: a tuple with the same key is already present in the index.

ER_KEY_PART_COUNT

Key part count is greater than index part count

ER_NO_SUCH_SPACE

Attempt to access a space that is not configured (doesn't exist).

ER_NO_SUCH_INDEX

No index with the given id exists.

ER_PROC_LUA

An error inside Lua procedure.