Tarantool is an in-memory NoSQL database. The code is available for free under the terms of BSD license. Supported platforms are GNU/Linux, Mac OS and FreeBSD.
The server maintains all its data in random-access memory, and therefore has very low read latency. At the same time, a copy of the data is kept on non-volatile storage (a disk drive), and inserts and updates are performed atomically.
To ensure atomicity, consistency and crash-safety of the persistent copy, a write-ahead log (WAL) is maintained, and each change is recorded in the WAL before it is considered complete. The logging subsystem supports group commit.
If update and delete rate is high, a constantly growing write-ahead log file (or files) can pose a disk space problem, and significantly increase time necessary to restart from disk. A simple solution is employed: the server can be requested to save a concise snapshot of its current data. The underlying operating system's “copy-on-write” feature is employed to take the snapshot in a quick, resource-savvy and non-blocking manner. The “copy-on-write” technique guarantees that snapshotting has minimal impact on server performance.
Tarantool is lock-free. Instead of the operating system's concurrency primitives, such as threads and mutexes, Tarantool uses a cooperative multitasking environment to simultaneously operate on thousands of connections. A fixed number of independent execution threads within the server do not share state, but exchange data using low overhead message queues. While this approach limits server scalability to a few CPU cores, it removes competition for the memory bus and sets the scalability limit to the top of memory and network throughput. CPU utilization of a typical highly-loaded Tarantool server is under 10%.
Unlike most of NoSQL databases, Tarantool supports primary, secondary keys, multi-part keys, HASH, TREE and BITSET index types.
The key feature of Tarantool is support for Lua stored procedures, which can access and modify data atomically. Procedures can be created, modified and dropped at runtime.
Use of Lua as an extension language does not end with stored procedures: Lua programs can be used during startup, to define triggers and background tasks, interact with networked peers. Unlike popular application development frameworks based on "reactor" pattern, networking in server-side Lua is sequential, yet very efficient, as is built on top of the cooperating multitasking environment used by the server itself.
Extended with Lua, Tarantool typically replaces not one but a few existing components with a single well-performing system, changing and simplifying complex multi-tier Web application architectures.
Tarantool supports replication. Replicas may run locally or on a remote host. Tarantool replication is asynchronous and does not block writes to the master. When or if the master becomes unavailable, the replica can be switched to assume the role of the master without server restart.
The software is production-ready. Tarantool has been created and is actively used at Mail.Ru, one of the leading Russian web content providers. At Mail.Ru, the software serves the “hottest” data, such as online users and their sessions, online application properties, mapping between users and their serving shards, and so on.
Outside Mail.Ru the software is used by a growing number of projects in online gaming, digital marketing, social media industries. While product development is sponsored by Mail.Ru, the roadmap, bugs database and the development process are fully open. The software incorporates patches from dozens of community contributors, and most of the programming language drivers are written and maintained by the community.
This manual is written in DocBook 5 XML markup language and is using the standard DocBook XSL formatting conventions:
UNIX shell command input is prefixed with '$ ' and is formatted using a fixed-width font:
$
tarantool_box--background
The same formatting style is used for file names:
/path/to/var/dir
.
Text that represents user input is formatted in boldface:
$
your input here
Within user input, replaceable items are printed in italics:
$
tarantool_box
--option
Please report bugs in Tarantool at http://bugs.launchpad.net/tarantool. You can contact developers directly on #tarantool IRC channel or via a mailing list, tarantool-developers@lists.launchpad.net.
Caution: To prevent spam, Launchpad mailing list software silently drops all mail sent from non-registered email addresses. Launchpad registration also allows you to report bugs and create feature requests. You can always check whether or not your mail has been delivered to the mailing list in the public list archive, https://lists.launchpad.net/tarantool-developers.
This chapter describes installation procedures, the contents of binary and source package download, explains how to start, stop the server or connect to it with a command line client.
To install the latest stable version of Tarantool, check out instructions on the project download page. For many distributions the server and the command line client are available from the distribution's upstream. Local repositories for popular Linux distributions, as well as a FreeBSD port and a Mac OS X “homebrew” recipe are also available online. The online archive is automatically refreshed on each push into the stable branch of the server. Please follow distribution-specific instructions to find out how to manage Tarantool instances on your operating system.
The easiest way to try Tarantool without installing it is by
downloading a binary or source package tarball.
Binary packages use tarantool-
naming scheme. Source packages are
named simply <version>
-<OS>
-<machine>
.tar.gztarantool-
.
You can find out the canonical name of your operating system
and machine type with uname -o and uname
-m respectively. The programs included into the binary
tarball are linked statically to not have any external dependencies.
Besides the downloaded package, you will need the following
software:
<version>
-src.tar.gz
Python 2.6 or newer, with PyYAML, python-daemon and python-pexpect modules,
Python is used to run regression tests. If you do not plan to run tests you may skip this step.
To build Tarantool from source, additionally:
CMake 2.6 or newer,
GCC 4.4 or newer, with gcc-objc (ObjectiveC) language frontend or Clang 3.1 or newer,
libreadline-dev, when compiling the command line client.
After download, unpack the binary package, a new directory will be created:
$
tarzxvf
.tar.gz
package-name
To remove the package, simply drop the directory containing the unpacked files.
The binary download contains subdirectories:
bin
, doc
, man
share
, var
,
etc
.
The server, by default, looks for its configuration file in
the current working directory and etc/
.
There is a correct and minimalistic
tarantool.cfg
in directory
etc/
, thus the server can be started
right from the top level package directory:
$
cdpackage-name
&& ./bin/tarantool_box ... 1301424353.416 3459 104/33013/acceptor _ I> I am primary 1301424353.416 3459 1/sched _ I> initialized
To stop the server, simply press Ctrl+C.
Once the server is started, you can connect to it and issue queries using a command line client:
$
cdpackage-name
&& ./bin/tarantoollocalhost>
show info
--- info: version: "1.4.5" uptime: 548 pid: 3459 logger_pid: 3461 snapshot_pid: 0 lsn: 1 recovery_lag: 0.000 recovery_last_update: 0.000 status: primary config: "/home/kostja/tarantool/etc/tarantool.cfg"
To use a source package, a few additional steps are necessary: configuration and build. The easiest way to configure a source directory with CMake is by requesting an in-source build:
$
cd&& cmake
package-name
. -DENABLE_CLIENT=true
Upon successful configuration, CMake prints the status of optional features:
-- *** The following options are on in this configuration: *** -- ENABLE_CLIENT: true -- ENABLE_GCOV: ON -- ENABLE_TRACE: ON -- ENABLE_BACKTRACE: ON -- Backtrace is with symbol resolve: True -- ENABLE_STATIC: OFF -- -- Configuring done -- Generating done
Now type 'make' to build Tarantool.
$
make
...
Linking C executable tarantool_box
[100%] Built target tarantool_box
A complete instruction for building from source is located
in the source tree, file README.md
. There
are also specialized build instructions for
CetnOS,
FreeBSD,
OS
X.
When make is complete, the server can be started right out of the in-source build. Use Tarantool regression testing framework:
$
./test/run--start-and-exit
It will create necessary files in directory
./test/var/
, and start the server with
minimal configuration.
The command line client is located in client/tarantool
:
$
./client/tarantool/tarantool
This chapter describes how Tarantool stores values and what operations with data it supports.
Tarantool data is organized in tuples. Tuple length is varying: a tuple can contain any number of fields. A field can be either numeric — 32- or 64- bit unsigned integer, or binary string — a sequence of octets. Tuples are stored and retrieved by means of indexing. An index can cover one or multiple fields, in any order. Fields included into the first index are always assumed to be the identifying (unique) key. The remaining fields make up a value, associated with the key.
Apart from the primary key, it is possible to define secondary indexes on other tuple fields. A secondary index does not have to be unique and can cover multiple fields. The total number of fields in a tuple must be at least equal to the ordinal number of the last field participating in any index.
Supported index types are HASH, TREE and BITSET. HASH index is the fastest one, with smallest memory footprint. TREE index, in addition to key/value look ups, support partial key lookups, key-part lookups for multipart keys and ordered retrieval. BITSET indexes, while can serve as a standard unique key, are best suited for bit-pattern look-ups, i.e. search for objects satisfying multiple properties.
Tuple sets together with defined indexes form spaces. The basic server operations are insert, replace, delete, update, which modify a tuple in a space, and select, which retrieves tuples from a space. All operations that modify data require the primary key for look up. Select, however, may use any index.
A Lua stored procedure can combine multiple trivial commands, as well as access data using index iterators. Indeed, the iterators provide full access to the power of indexes, enabling index-type specific access, such as boolean expression evaluation for BITMAP indexes, or reverse range retrieval for TREEs.
All operations in Tarantool are atomic and durable: they are either executed and written to the write ahead log or rolled back. A stored procedure, containing a combination of basic operations, holds a consistent view of the database as long as it doesn't incur writes to the write ahead log or to network. In particular, a select followed by an update or delete is atomic.
While the subject of each data changing command is a single tuple, an update may modify one or more tuple fields, as well as add or delete fields, all in one command. It thus provides an alternative way to achieve multi-operation atomicity.
Currently, entire server schema must be specified in the configuration file. The schema contains all spaces and indexes. A server started with a configuration file that doesn't match contents of its data directory will most likely crash, but may also behave in a non-defined way. It is, however, possible to stop the server, add new spaces and indexes to the schema or temporarily disable existing spaces and indexes, and then restart the server.
Schema objects, such as spaces and indexes, are referred to by a numeric id. For example, to insert a tuple, it is necessary to provide id of the destination space; to select a tuple, one must provide the identifying key, space id and index id of the index used for lookup. Many Tarantool drivers provide a local aliasing scheme, mapping numeric identifiers to names. Use of numeric identifiers on the wire protocol makes it lightweight and easy to parse.
The configuration file shipped with the binary package defines
only one space with id 0
. It has no keys
other than the primary. The primary key numeric id is also
0
. Tarantool command line client
supports a small subset of SQL, and it'll be used to
demonstrate supported data manipulation commands:
localhost> insert into t0 values (1) Insert OK, 1 row affected localhost> select * from t0 where k0=1 Found 1 tuple: [1] localhost> insert into t0 values ('hello') An error occurred: ER_ILLEGAL_PARAMS, 'Illegal parameters' localhost> replace into t0 values (1, 'hello') Replace OK, 1 row affected localhost> select * from t0 where k0=1 Found 1 tuple: [1, 'hello'] localhost> update t0 set k1='world' where k0=1 Update OK, 1 row affected localhost> select * from t0 where k0=1 Found 1 tuple: [1, 'world'] localhost> delete from t0 where k0=1 Delete OK, 1 row affected localhost> select * from t0 where k0=1 No match
Please observe:
Since all object identifiers are numeric, Tarantool SQL subset
expects identifiers that end with a number (t0
,
k0
, k1
, and so on):
this number is used to refer to the actual space or
index.
All commands actually tell the server which key/value pair to change. In SQL terms, that means that all DML statements must be qualified with the primary key. WHERE clause is, therefore, mandatory.
REPLACE replaces data when a tuple with given primary key already exists. Such replace can insert a tuple with a different number of fields.
Additional examples of SQL statements can be found in Tarantool regression test suite. A complete grammar of supported SQL is provided in Language reference chapter.
Since not all Tarantool operations can be expressed in SQL, to gain
complete access to data manipulation functionality one must use
a Perl, Python, Ruby or other
programming language connector. The client/server
protocol is open and documented: an annotated BNF can be found
in the source tree, file doc/protocol.txt
.
To maintain data persistence, Tarantool writes each data change
request (INSERT, UPDATE, DELETE) into a write-ahead log. WAL
files have extension .xlog
and are stored in wal_dir. A new WAL file is created for every rows_per_wal records. Each INSERT, UPDATE or DELETE
gets assigned a continuously growing 64-bit log sequence number.
The name of the log file is based on the log sequence
number of the first record this file contains.
Apart from a log sequence number and the data change request
(its format is the same as in the binary protocol and is described
in doc/box-protocol.txt
),
each WAL record contains a checksum and a UNIX time stamp.
Tarantool processes requests atomically: a change is either accepted and recorded in the WAL, or discarded completely. Let's clarify how this happens, using REPLACE command as an example:
The server attempts to locate the original tuple by primary key. If found, a reference to the tuple is retained for later use.
The new tuple is then validated. If it violates a unique-key constraint, misses an indexed field, or an index-field type does not match the type of the index, the change is aborted.
The new tuple replaces the old tuple in all existing indexes.
A message is sent to WAL writer running in a separate thread, requesting that the change is recorded in the WAL. The server switches to work on the next request until the write is acknowledged.
On success, a confirmation is sent to the client. Upon failure, a rollback procedure is begun. During the rollback procedure, the transaction processor rolls back all changes to the database which occurred after the first failed change, from latest to oldest, up to the first failed change. All rolled back requests are aborted with ER_WAL_IO error. No new change is applied while rollback is in progress. When the rollback procedure is finished, the servers restarts the processing pipeline.
One advantage of the described algorithm is that complete request pipelining is achieved, even for requests on the same value of the primary key. As a result, database performance doesn't degrade even if all requests touch upon the same key in the same space.
The transaction processor and the WAL writer threads communicate using asynchronous (yet reliable) messaging; the transaction processor thread, not being blocked on WAL tasks, continues to handle requests quickly even at high volumes of disk I/O. A response to a request is sent as soon as it is ready, even if there were earlier incomplete requests on the same connection. In particular, SELECT performance, even for SELECTs running on a connection packed with UPDATEs and DELETEs, remains unaffected by disk load.
WAL writer employs a number of durability modes, as defined in configuration variable wal_mode. It is possible to turn the write ahead log completely off, by setting wal_mode to none. Even without the write ahead log it's still possible to take a persistent copy of the entire data set with SAVE SNAPSHOT.
This chapter provides a reference of Tarantool data operations and administrative commands.
Unlike many other key/value servers, Tarantool uses different TCP ports and client/server protocols for data manipulation and administrative statements. During start up, the server can connect to up to five TCP ports:
Read/write data port, to handle INSERTs, UPDATEs, DELETEs, SELECTs and CALLs. This port speaks the native Tarantool protocol, and provides full data access.
The default value of the port is 33013
,
as defined in primary_port
configuration option.
Read only port, which only accepts SELECTs and CALLs,
default port number 33014
, as defined in
secondary_port configuration option.
Administrative port, which defaults to 33015
,
and is defined in admin_port
configuration option.
Replication port (see replication_port), by default set to
33016
, used to send updates to
replicas. Replication is optional, and if this port is not
set in the option file, the corresponding server process
is not started.
Memcached port. Optional, read-write data port that speaks Memcached text protocol. This port is off by default.
In absence of authentication, this approach allows system administrators to restrict access to read/write or administrative ports. The client, however, has to be aware of the separation, and tarantool command line client automatically selects the correct port for you with help of a simple regular expression. SELECTs, UPDATEs, INSERTs, DELETEs and CALLs are sent to the primary port. SHOW, RELOAD, SAVE and other statements are sent to the administrative port.
Five basic request types are supported: INSERT, UPDATE, DELETE, SELECT and CALL. All requests, including INSERT, UPDATE and DELETE may return data. A SELECT can be requested to limit the number of returned tuples. This is useful when searching in a non-unique index or when a special “wildcard” (zero-length string) value is supplied as search key or a key part.
UPDATE statement supports operations on fields — assignment, arithmetic operations (the field must be numeric), cutting and pasting fragments of a field, — as well as operations on a tuple: push and pop of a field at the tail of a tuple, deletion and insertion of a field. Multiple operations can be combined into a single update, and in this case they are performed atomically. Each operation expects field number as its first argument. When a sequence of changes is present, field identifier in each operation is assumed to be relative to the most recent state of the tuple, i.e. as if all previous operations in a multi-operation update have already been applied. In other words, it's always safe to merge multiple UPDATE statements into a single one, with no change in semantics.
Tarantool protocol was designed with focus on asynchronous I/O and easy integration with proxies. Each client request starts with a 12-byte binary header, containing three fields: request type, length, and a numeric id.
The mandatory length, present in request header simplifies client or proxy I/O. A response to a request is sent to the client as soon as it is ready. It always carries in its header the same type and id as in the request. The id makes it possible to match a request to a response, even if the latter arrived out of order.
Request type defines the format of the payload. INSERTs, UPDATEs and DELETEs can only be made by the primary key, so an index id and a key (possibly multipart) are always present in these requests. SELECTs can use secondary keys. UPDATE only needs to list the fields that are actually changed. With this one exception, all commands operate on whole tuple(s).
Unless implementing a client driver, one needn't
concern oneself with the complications of the binary
protocol. Language-specific
drivers provide a friendly way to store domain
language data structures in Tarantool, and the command line
client supports a subset of standard SQL.
A complete description of both, the binary protocol and
the supported SQL, is maintained in annotated Backus-Naur
form in the source tree: please see
doc/box-protocol.txt
and
doc/sql.txt
respectively.
If full access to Tarantool functionality is not needed, or there is no readily available connector for the programming language in use, any existing client driver for Memcached will make do as a Tarantool connector. To enable text Memcached protocol, turn on memcached_port in the option file. Since Memcached has no notion of spaces or secondary indexes, this port only makes it possible to access one dedicated space (see memcached_space) via its primary key. Unless tuple expiration is enabled with memcached_expire, TTL part of the message is stored but ignored.
Notice, that memcached_space is also accessible using the primary port or Lua. A common use of the Memcached port in Tarantool is when a Memcached default expiration algorithm is insufficient, and a custom Lua expiration procedure is used.
Tarantool does not support the binary protocol of Memcached. If top performance is a must, Tarantool's own binary protocol should be used.
The administrative console uses a simple text protocol. All commands are case-insensitive. You can connect to the administrative port using any telnet client, or a tool like rlwrap, if access to readline features is desired. Additionally, tarantool, the SQL-capable command line client, understands all administrative statements and automatically directs them to the administrative port. The server response to an administrative command, even though it is always in plain text, can be quite complex. It is encoded using YAML markup to simplify automated parsing.
To learn about all supported administrative commands, you can type help in the administrative console. A reference description also follows below:
Take a snapshot of all data and store it in
snap_dir/<latest-lsn>.snap
.
To take a snapshot, Tarantool forks and quickly
munmap(2)
s all memory except the area
where tuples are stored. Since all modern operating systems support
virtual memory copy-on-write, this effectively creates a
consistent snapshot of all tuples in the child process,
which is then written to disk tuple by tuple. Since a
snapshot is written sequentially, you can expect a very
high write performance (averaging to 80MB/second on modern
disks), which means an average database instance gets
saved in a matter of minutes. Note, that as long as there
are any changes to the parent memory through concurrent
updates, there are going to be page splits, and therefore
you need to have some extra free memory to run this
command. 15%-30% of slab_alloc_arena
is, on average, sufficient. This statement waits until a
snapshot is taken and returns operation result. For
example:
localhost> show info --- info: version: "1.4.6" lsn: 843301 ... localhost> save snapshot --- ok ... localhost> save snapshot --- fail: can't save snapshot, errno 17 (File exists) ...
Taking a snapshot does not cause the server to start a new write ahead log. Once a snapshot is taken, old WALs can be deleted as long as all replicas are up to date. But the WAL which was current at the time save snapshot started must be kept for recovery, since it still contains log records written after the start of save snapshot.
An alternative way to save a snapshot is to send the server SIGUSR1 UNIX signal. While this approach could be handy, it is not recommended for use in automation: a signal provides no way to find out whether the snapshot was taken successfully or not.
Re-read the configuration file. If the file contains changes to dynamic parameters, update the runtime settings. If configuration syntax is incorrect, or a read-only parameter is changed, produce an error and do nothing.
Show the current settings. Displays all settings, including those that have default values and thus are not necessarily present in the configuration file.
localhost> show info --- info: version: "1.4.5-128-ga91962c" uptime: 441524 pid: 12315 logger_pid: 12316 lsn: 15481913304 recovery_lag: 0.000 recovery_last_update: 1306964594.980 status: primary config: "/usr/local/etc/tarantool.cfg"
recovery_lag holds the difference (in seconds) between the current time on the machine (wall clock time) and the time stamp of the last applied record. In replication setup, this difference can indicate the delay taking place before a change is applied to a replica.
recovery_last_update is
the wall clock time of the last change recorded in the
write ahead log. To convert it to human-readable time,
you can use date -d@1306964594.980
.
status is either "primary" or "replica/<hostname>".
Show the average number of requests per second, and the total number of requests since startup, broken down by request type: INSERT or SELECT or UPDATE or DELETE."
localhost> show stat --- statistics: INSERT: { rps: 139 , total: 48207694 } SELECT_LIMIT: { rps: 0 , total: 0 } SELECT: { rps: 1246 , total: 388322317 } UPDATE_FIELDS: { rps: 1874 , total: 743350520 } DELETE: { rps: 147 , total: 48902544 }
Show the statistics of the slab allocator. The slab allocator is the main allocator used to store tuples. This can be used to monitor the total memory use and memory fragmentation.
items_used contains the % of slab_alloc_arena already used to store tuples.
arena_used contains the % of slab_alloc_arena that is already distributed to the slab allocator.
A pool allocator is used for temporary memory, when serving client requests. Every fiber has its own temporary pool. Shows the current state of pools of all fibers.
Fork and dump a core. Since Tarantool stores all tuples in memory, it can take some time. Mainly useful for debugging.
Show all running fibers, with their stack. Mainly useful for debugging.
Execute a chunk of Lua code. This can be used to define, invoke, debug and drop stored procedures, inspect server environment, perform automated administrative tasks.
Lua is a light-weight, multi-paradigm, embeddable language. Stored procedures in Lua can be used to implement data manipulation patterns or data structures. A server-side procedure written in Lua can select and modify data, access configuration and perform administrative tasks. It is possible to dynamically define, invoke, alter and drop Lua procedures. Lua procedures can run in the background and perform administrative tasks, such as data expiration or re-sharding.
Tarantool uses LuaJIT just-in-time Lua compiler and virtual machine. Apart from increased performance, this provides such features as bitwise operations and 64-bit integer arithmetics.
Procedures can be invoked from the administrative console and using the binary protocol, for example:
localhost> lua function f1() return 'hello' end
---
...
localhost> call f1()
Found 1 tuple:
['hello']
In the language of the administrative console LUA ... evaluates an arbitrary Lua chunk. CALL is the SQL standard statement, so its syntax was adopted by Tarantool command line client to invoke the CALL command of the binary protocol.
In the example above, a Lua procedure is first defined
using the text protocol of the administrative port,
and then invoked using the Tarantool client-side SQL
parser plus the binary protocol on the primary_port.
Since it's possible to execute any Lua chunk in the
administrative console, the newly created function f1()
can be called there too:
localhost> lua f1()
---
- hello
...
localhost> lua 1+2
---
- 3
...
localhost> lua "hello".." world"
---
- hello world
...
Lua procedures could also be called at the time of initialization using a dedicated init.lua script, located in work_dir. An example of such a script is given below:
-- Importing expirationd module dofile("expirationd.lua") function is_expired(args, tuple) if tuple == nil then return true end if #tuple <= args.field_no then return true end field = tuple[args.field_no] if field == nil or #field ~= 4 then return true end local current_time = os.time() local tuple_ts = box.unpack("i", field) return current_time >= tuple_ts + args.ttl end function purge(args, tuple) box.space[0]:delete(tuple[0]) end -- Run task expirationd.run_task("exprd space 0", 0, is_expired, purge, { field_no = 1, ttl = 30 * 60 })
The initialization script can select and modify data. However, if the server is a running replica, data change requests from the start script fail just the same way they would fail if were sent from a remote client.
Another common task to perform in the initialization script is to start background fibers for data expiration, re-sharding, or communication with networked peers.
Finally, the script can be used to define Lua triggers invoked on various events within the system.
There is a single global instance of the Lua interpreter, which is
shared across all connections. Anything prefixed with
lua
on the administrative console is sent
directly to this interpreter. Any change of the interpreter
state is immediately available to all client connections.
Each connection, however, is using its own Lua coroutine — a mechanism akin to Tarantool fibers. A coroutine has an own execution stack and a Lua closure — set of local variables and definitions.
The interpreter environment is not restricted when init.lua is loaded. But before the server starts accepting requests, the standard Lua APIs, such as for file I/O, process control and module management are unset, to avoid possible trivial security attacks.
In the binary protocol, it's only possible to invoke existing procedures, but not define or alter them. CALL request packet contains CALL command code (22), the name of a procedure to be called, and a tuple for procedure arguments. Currently, Tarantool tuples are type-agnostic, thus each field of the tuple is passed into the procedure as an argument of type “string”. For example:
kostja@atlas:~$ cat arg.lua
function f1(a)
local s = a
if type(a) == 'string' then
s = ''
for i=1, #a, 1 do
s = s..string.format('0x%x ', string.byte(a, i))
end
end
return type(a), s
end
kostja@atlas:~$ tarantool
localhost> lua dofile('arg.lua')
---
...
localhost> lua f1('1234')
---
- string
- 0x31 0x32 0x33 0x34
...
localhost> call f1('1234')
Call OK, 2 rows affected
['string']
['0x31 0x32 0x33 0x34 ']
localhost> lua f1(1234)
---
- number
- 1234
...
localhost> call f1(1234)
Call OK, 2 rows affected
['string']
['0xd2 0x4 0x0 0x0 ']
In the above example, the way the procedure receives its argument is identical in two protocols, when the argument is a string. A numeric field, however, when submitted via the binary protocol, is seen by the procedure as a 4-byte blob, not as a Lua “number” type.
In addition to conventional method invocation, Lua provides object-oriented syntax. Access to the latter is available on the administrative console only:
localhost> lua box.space[0]:truncate()
---
...
localhost> call box.space[0]:truncate()
error: 1:15 expected '('
Since it's impossible to invoke object methods from the binary protocol, the object-oriented syntax is often used to restrict certain operations to be used by a system administrator only.
Every value, returned from a stored function by means of
return
clause, is converted to a Tarantool tuple.
Tuples are returned as such, in binary form; a Lua scalar, such as
a string or an integer, is converted to a tuple with only
one field. When the returned value is a Lua
table, the resulting tuple contains only table
values, but not keys.
When a function in Lua terminates with an error, the error is sent to the client as ER_PROC_LUA return code, with the original error message preserved. Similarly, an error which has occurred inside Tarantool (observed on the client as an error code), when happens during execution of a Lua procedure, produces a genuine Lua error:
localhost> lua function f1() error("oops") end
---
...
localhost> call f1()
Call ERROR, Lua error: [string "function f1() error("oops") end"]:1: oops (ER_PROC_LUA)
localhost> call box.insert('99', 1, 'test')
Call ERROR, Space 99 is disabled (ER_SPACE_DISABLED)
localhost> lua pcall(box.insert, 99, 1, 'test')
---
- false
- Space 99 is disabled
...
It's possible not only to invoke trivial Lua code, but call
into Tarantool storage functionality, using
box
Lua library. The contents of the library can be
inspected at runtime:
localhost> lua for k, v in pairs(box) do print(k, ": ", type(v)) end
---
fiber: table
space: table
cfg: table
on_reload_configuration: function
update: function
process: function
delete: function
insert: function
select: function
index: table
unpack: function
replace: function
select_range: function
pack: function
...
As is shown in the listing, box
package ships:
high-level functions, such as
process(), update(), select(), select_range(), insert(),
replace(), delete()
, to manipulate
tuples and access spaces from Lua.
libraries, such as cfg, space, fiber, index, tuple
,
to access server configuration, create, resume and
interrupt fibers, inspect contents of spaces, indexes
and tuples, send and receive data over network.
Global Lua names added by Tarantool
Convert a given string or a Lua number to a 64-bit integer. The returned value supports all arithmetic operations, but uses 64-bit integer arithmetics, rather than floating-point, arithmetics as in the built-in number type.
localhost> lua tonumber64('123456789'), tonumber64(123456789) --- - 123456789 - 123456789 ... localhost> lua i=tonumber64(1) --- ... localhost> lua type(i), type(i*2), type(i/2), i, i*2, i/2 --- - cdata - cdata - cdata - 1 - 2 - 0 ...
Process a request passed in as a binary string. This is an entry point into the server request processor. It can be used to insert, update, select and delete tuples from within a Lua procedure.
This is a low-level API, and it expects
all arguments to be packed in accordance
with the binary protocol (iproto
header excluded). Normally, there is no need
to use box.process()
directly:
box.select(), box.update()
and other convenience wrappers
invoke box.process()
with
correctly packed arguments.
op — number, any
Tarantool command code, except 22 (CALL). See
doc/box-protocol.txt .
|
request — command
arguments packed in binary format. |
This function returns zero or more tuples. In Lua, a
tuple is represented by a
userdata object of type
box.tuple
. If
a Lua procedure is called from the administrative
console, returned tuples are printed out in YAML
format. When called from the binary
protocol, the binary format is used.
Any server error produced by the executed command.
Please note, that since all requests from Lua enter the core through box.process(), all checks and triggers run by the core automatically apply. For example, if the server is in read-only mode, an update or delete fails. Analogously, if a system-wide "instead of" trigger is defined, it is run.
Search for a tuple or tuples in the given space. A
wrapper around box.process()
.
space_no — space id,
|
index_no — index number in the
space, to be used for match |
... — index key,
possibly multipart.
|
Returns zero or more tuples.
Same as in box.process()
. Any error
results in a Lua exception.
localhost> call box.insert(0, 'test', 'my first tuple') Call OK, 1 rows affected ['test', 'my first tuple'] localhost> call box.select(0, 0, 'test') Call OK, 1 rows affected ['test', 'my first tuple'] localhost> lua box.insert(5, 'testtest', 'firstname', 'lastname') --- - 'testtest': {'firstname', 'lastname'} ... localhost> lua box.select(5, 1, 'firstname', 'lastname') --- - 'testtest': {'firstname', 'lastname'} ...
Search for tuples in the given space. This is a full version of the built-in SELECT command, in which one can specify offset and limit in a multi-tuple return. The server may return multiple tuples when the index is non-unique or a partial key is used for search.
Insert a tuple into a space. Tuple fields
follow space_no
. If a tuple with
the same primary key already exists,
box.insert()
returns an error, while
box.replace()
replaces the existing
tuple with a new one. These functions are
wrappers around box.process()
Returns the inserted tuple.
Update a tuple identified by a primary
key
. If a key is multipart,
it is passed in as a Lua table. Update arguments follow,
described by format
.
The format and arguments are passed to
box.pack()
and the result is sent
to box.process()
.
A correct format
is a sequence of
pairs: update operation, operation arguments. A
single character of format describes either an
operation which needs to take place or operation
argument. A format specifier also works as a
placeholder for the number of a field, which needs
to be updated, or for an argument value.
For example:
+p=p — add a value
to one field and assign another,
|
:p — splice a
field: start at offset, cut length bytes, and add a
string. |
#p — delete a
field. |
!p — insert a field
(before the one specified). |
Possible format specifiers are: “+” for addition, “-” for subtraction, “&” for bitwise AND, “|” for bitwise OR, “^” for bitwise exclusive OR (XOR), “:” for string splice and “p” for operation argument.
Returns the updated tuple.
localhost> lua box.insert(0, 0, 'hello world') --- - 0: {'hello world'} ... localhost> lua box.update(0, 0, '+p', 1, 1) -- add value 1 to field #1 --- error: 'Illegal parameters, numeric operation on a field with length != 4' ... localhost> lua box.update(0, 0, '=p', 1, 1) -- assign field #1 to value 1 --- - 0: {1} ... localhost> lua box.update(0, 0, '+p', 1, 1) --- - 0: {2} ... localhost> lua box.update(0, 2, '!p', 1, 'Bienvenue tout le monde!') --- - 2: {'Bienvenue tout le monde!', 'Hello world!'} ... localhost> lua box.update(0, 2, '#p', 2, 'Bienvenue tout le monde!') --- - 2: {'Bienvenue tout le monde!'} ...
Delete a tuple identified by a primary key.
Returns the deleted tuple.
localhost> call box.delete(0, 'test') Call OK, 1 rows affected ['test', 'my first tuple'] localhost> call box.delete(0, 'test') Call OK, 0 rows affected localhost> call box.delete(0, 'tes') Call ERROR, Illegal parameters, key is not u32 (ER_ILLEGAL_PARAMS)
Select a range of tuples, starting from offset
specified by key
. The key can be
multipart. Limit selection with at most
limit
tuples. If no key is specified,
start from the first key in the index.
For TREE indexes, this returns tuples in sorted order.
For HASH indexes, the order of tuples is unspecified, and
can change significantly if data is inserted or deleted
between two calls to box.select_range()
.
If key
is nil
or unspecified,
the selection starts from the start of the index.
This is a simple wrapper around box.space[space_no]:select_range(index_no, ...)
.
BITSET index does not support this call.
localhost> show configuration --- ... space[4].cardinality: "-1" space[4].estimated_rows: "0" space[4].index[0].type: "HASH" space[4].index[0].unique: "true" space[4].index[0].key_field[0].fieldno: "0" space[4].index[0].key_field[0].type: "STR" space[4].index[1].type: "TREE" space[4].index[1].unique: "false" space[4].index[1].key_field[0].fieldno: "1" space[4].index[1].key_field[0].type: "STR" ... localhost> insert into t4 values ('0', '0') Insert OK, 1 rows affected localhost> insert into t4 values ('1', '1') Insert OK, 1 rows affected localhost> insert into t4 values ('2', '2') Insert OK, 1 rows affected localhost> insert into t4 values ('3', '3') Insert OK, 1 rows affected ocalhost> lua box.select_range(4, 0, 10) --- - '3': {'3'} - '0': {'0'} - '1': {'1'} - '2': {'2'} ... localhost> lua box.select_range(4, 1, 10) --- - '0': {'0'} - '1': {'1'} - '2': {'2'} - '3': {'3'} ... localhost> lua box.select_range(4, 1, 2) --- - '0': {'0'} - '1': {'1'} ... localhost> lua box.select_range(4, 1, 2, '1') --- - '1': {'1'} - '2': {'2'} ...
Select a reverse range of tuples, starting from the offset
specified by key
. The key can be
multipart.
Limit selection with at most limit
tuples.
If no key is specified, start from the last key in
the index.
For TREE indexes, this returns tuples in sorted order.
Other index types do not support this call.
If key
is nil
or unspecified,
the selection starts from the end of the index.
localhost> show configuration --- ... space[4].cardinality: "-1" space[4].estimated_rows: "0" space[4].index[0].type: "HASH" space[4].index[0].unique: "true" space[4].index[0].key_field[0].fieldno: "0" space[4].index[0].key_field[0].type: "STR" space[4].index[1].type: "TREE" space[4].index[1].unique: "false" space[4].index[1].key_field[0].fieldno: "1" space[4].index[1].key_field[0].type: "STR" ... localhost> insert into t4 values ('0', '0') Insert OK, 1 rows affected localhost> insert into t4 values ('1', '1') Insert OK, 1 rows affected localhost> insert into t4 values ('2', '2') Insert OK, 1 rows affected localhost> insert into t4 values ('3', '3') Insert OK, 1 rows affected localhost> lua box.select_reverse_range(4, 0, 10) --- error: 'Illegal parameters, hash iterator is forward only ... localhost> lua box.select_reverse_range(4, 1, 10) --- - '3': {'3'} - '2': {'2'} - '1': {'1'} - '0': {'0'} ... localhost> lua box.select_reverse_range(4, 1, 2) --- - '3': {'3'} - '2': {'2'} ... localhost> lua box.select_reverse_range(4, 1, 2, '1') --- - '1': {'1'} - '0': {'0'} ...
To use Tarantool binary protocol primitives from Lua, it's necessary to convert Lua variables to binary format. This helper function is prototyped after Perl 'pack'. It takes a format and a list of arguments, and returns a binary string with all arguments packed according to the format.
b — converts Lua
variable to a 1-byte
integer, and stores the integer in the resulting
string
|
s — converts Lua
variable to a 2-byte
integer, and stores the integer in the resulting
string, low byte first,
|
i — converts Lua
variable to a 4-byte
integer, and stores the integer in the resulting
string, low byte first,
|
l — converts Lua
variable to a 8-byte
integer, and stores the integer in the resulting
string, low byte first,
|
w — converts Lua
integer to a BER-encoded integer,
|
p — stores the length
of the argument as a BER-encoded integer
followed by the argument itself (a 4-bytes for integers (LE order)
and a binary blob for other types),
|
=, +, &, |, ^, : —
stores the corresponding Tarantool UPDATE
operation code: field assignment, addition,
conjunction, disjunction, exclusive disjunction,
splice (from Perl SPLICE function). Expects
field number to update as an argument. These format
specifiers only store the corresponding operation
code and field number to update, but do not
describe operation arguments.
|
Unknown format specifier.
localhost> lua box.insert(0, 0, 'hello world') --- - 0: {'hello world'} ... localhost> lua box.update(0, 0, "=p", 1, 'bye world') --- - 0: {'bye world'} ... localhost> lua box.update(0, 0, ":p", 1, box.pack('ppp', 0, 3, 'hello')) --- - 0: {'hello world'} ... localhost> lua box.update(0, 0, "=p", 1, 4) --- - 0: {4} ... localhost> lua box.update(0, 0, "+p", 1, 4) --- - 0: {8} ... localhost> lua box.update(0, 0, "^p", 1, 4) --- - 0: {12} ...
Counterpart to box.pack()
.
localhost> lua tuple=box.replace(2, 0) --- ... localhost> lua string.len(tuple[0]) --- - 4 ... localhost> lua box.unpack('i', tuple[0]) --- - 0 ... localhost> lua box.unpack('bsil', box.pack('bsil', 255, 65535, 4294967295, tonumber64('18446744073709551615'))) --- - 255 - 65535 - 4294967295 - 18446744073709551615 ... localhost> lua num, str, num64 = box.unpack('ppp', box.pack('ppp', 666, 'string', tonumber64('666666666666666'))) --- ... localhost> lua print(box.unpack('i', num)); --- 666 ... localhost> lua print(str); --- string ... localhost> lua print(box.unpack('l', num64)) --- 666666666666666 ... localhost> lua box.unpack('=p', box.pack('=p', 1, '666')) --- - 1 - 666
Redefines Lua print()
built-in to print either to the log file
(when Lua is used from the binary port) or back to the user (for the
administrative console).
When printing to the log file, INFO log level is used. When printing to the administrative console, all output is sent directly to the socket.
Note: the administrative console output must be YAML-compatible.
Evaluates an arbitrary chunk of Lua code passed in
s
. If there is a compilation error,
it's raised as a Lua error. In case of compilation
success, all arguments which follow s
are passed to the compiled chunk and the chunk is
invoked.
This function is mainly useful to define and run an arbitrary piece of Lua code, without having to introduce changes to the global Lua environment.
lua box.dostring('abc') --- error: '[string "abc"]:1: ''='' expected near ''<eof>''' ... lua box.dostring('return 1') --- - 1 ... lua box.dostring('return ...', 'hello', 'world') --- - hello - world ... lua box.dostring('local f = function(key) t=box.select(0, 0, key); if t ~= nil then return t[0] else return nil end end return f(...)', 0) --- - nil ...
Returns current system time (in seconds) as a Lua number. The time is taken from the event loop clock, which makes this call very cheap, but still useful for constructing artificial tuple keys.
Returns current system time (in seconds) as a 64-bit integer. The time is taken from the event loop clock.
Generate 128-bit (16 bytes) unique id. The id is returned in binary form.
Requires libuuid library to be installed. The library is loaded in runtime, and if the library is not available, this function returns an error.
Generate hex-string of 128-bit (16 bytes) unique id. Return 32-bytes string.
lua box.uuid_hex() --- - a4f29fa0eb6d11e19f7737696d7fa8ff ...
Raises a client error. The difference between this function
and the built-in error()
function in Lua
is that when the error reaches the client, it's error code
is preserved, whereas every Lua error is presented to the
client as ER_PROC_LUA
. This function
makes it possible to emulate any kind of native exception,
such as a unique constraint violation, no such space/index,
etc. A complete list of errors is present in errcode.h
file in the source tree.
Lua constants which correspond to Tarantool errors
are defined in box.error
module. The error
message can be arbitrary.
Throws client error. Lua procedure can emulate any
request errors (for example: unique key exception).
lua box.raise(box.error.ER_WAL_IO, 'Wal I/O error') --- error: 'Wal I/O error' ...
Insert values into space designated by space_no, using an auto-increment primary key. The space must have a NUM or NUM64 primary key index of type TREE.
localhost> lua box.auto_increment(0, "I am a duplicate") --- - 1: {'I am a duplicate'} ... localhost> lua box.auto_increment(0, "I am a duplicate") --- - 2: {'I am a duplicate'} ...
Increments a counter identified by the key. The key can be multi-part, but there must be an index covering all fields of the key. If there is no tuple identified by the given key, creates a new one with initial counter value set to 1. Returns the new counter value back.
localhost> lua box.counter.inc(0, 'top.mail.ru') --- - 1 ... localhost> lua box.counter.inc(0, 'top.mail.ru') --- - 2 ...
Decrements a counter identified by the given key. If the key is not found, is a no-op. When counter value drops to 0, the tuple is deleted.
localhost> lua box.counter.dec(0, 'top.mail.ru') --- - 1 ... localhost> lua box.counter.dec(0, 'top.mail.ru') --- - 0 ...
The package stands for box.tuple
userdata
type. It is possible to access individual tuple fields using
an index, select a range of fields, iterate over all fields in
a tuple or convert a tuple to a Lua table. Tuples are
immutable.
localhost> lua t=box.insert(0, 1, 'abc', 'cde', 'efg', 'ghq', 'qkl') --- ... localhost> lua #t --- - 6 ... localhost> lua t[1], t[5] --- - abc - qkl ... localhost> lua t[6] --- error: 'Lua error: [string "return t[6]"]:1: box.tuple: index 6 is out of bounds (0..5)' ... localhost> lua for k,v in t:pairs() do print(v) end --- abc cde efg ghq qkl ... localhost> lua t:unpack() --- - - abc - cde - efg - ghq - qkl ... localhost> lua t:slice(1, 2) --- - abc ... localhost> lua t:slice(1, 3) --- - abc - cde ... localhost> lua t:slice(1, -1) --- - abc - cde - efg - ghq ... localhost> lua t:transform(1, 3) --- - 1: {'ghq', 'qkl'} ... localhost> lua t:transform(0, 1, 'zyx') --- - 'zyx': {'abc', 'cde', 'efg', 'ghq', 'qkl'} ... localhost> lua t:transform(-1, 1, 'zyx') --- - 1: {'abc', 'cde', 'efg', 'ghq', 'zyx'} ... localhost> lua t=box.insert(0, 'abc', 'def', 'abc') --- ... localhost> lua t:find('abc') --- - 0 ... localhost> lua t:findall('abc') --- - 0 - 2 ... localhost> lua t:find(1, 'abc') --- - 2 ...
Construct a new tuple from a Lua table or a scalar.
localhost> lua box.tuple.new({tonumber64('18446744073709551615'), 'string', 1}) --- - 18446744073709551615: {'string', 1} ...
This package is a container for all
configured spaces. A space object provides access to space
attributes, such as id, whether or not a space is
enabled, space cardinality, estimated number of rows. It also
contains object-oriented versions of box
functions. For example, instead of box.insert(0, ...)
one can write box.space[0]:insert(...)
.
Package source code is available in file src/box/box.lua
A list of all space
members follows.
box.space[i].n == i
box.index
with
methods to search tuples and iterate over them in predefined order.
Select a range of tuples, starting from offset specified by
key
. The key can be multipart.
Limit selection with at most limit
tuples.
If no key is specified, start from the first key in the index.
For TREE indexes, this returns tuples in sorted order.
For other indexes, the order of tuples is unspecified, and
can change significantly if data is inserted or deleted
between two calls to select_range()
.
If key
is nil
or unspecified,
the selection starts from the start of the index.
limit
, starting from key
.
The key can be multipart. TREE index returns
tuples in descending order. Other index types
do not support this call.
box
methods.
A helper function to iterate over all space tuples, Lua style.
localhost> lua for k,v in box.space[0]:pairs() do print(v) end --- 1: {'hello'} 2: {'my '} 3: {'Lua '} 4: {'world'} ...
This package implements methods of type box.index
.
Indexes are contained in box.space[i].index[]
array
within each space object. They provide an API for
ordered iteration over tuples. This API is a direct
binding to corresponding methods of index objects in the
storage engine.
This method provides iteration support within an
index. Parameter type
is used to
identify the semantics of iteration. Different
index types support different iterators. The
remaining arguments of the function are varying
and depend on the iteration type. For example,
TREE index maintains a strict order of keys and
can return all tuples in ascending or descending
order, starting from the specified key. Other
index types, however, do not support ordering.
To understand consistency of tuples, returned by an iterator, it's essential to know the principles of the Tarantool transaction processing subsystem. An iterator in Tarantool does not own a consistent read view. Instead, each procedure is granted exclusive access to all tuples and spaces until it encounters a "context switch": caused a write to disk, network, or by an explicit call to box.fiber.yield(). When the execution flow returns to the yielded procedure, the data set could have changed significantly. Iteration, resumed after a yield point, does not preserve the read view, but continues with the new content of the database.
type — iteration strategy as defined in tables below. |
This method returns an iterator closure, i.e.
a function
which can be used to
get the next value on each invocation.
Selected iteration type is not supported in the subject index type or supplied parameters do not match iteration type.
Table 4.1. Common iterator types
Type | Arguments | HASH | TREE | BITSET | Description |
---|---|---|---|---|---|
box.index.ALL | none | yes | yes | yes | Iterate over all tuples in an index. When iterating over a TREE index, tuples are returned in ascending order of the key. When iterating over a HASH or BITSET index, tuples are returned in physical order or, in other words, unordered. |
box.index.EQ | key | yes | yes | yes |
Equality iterator: iterate over all tuples matching the key. Parts of a multipart key need to be separated by comma. Semantics of the match depends on the index. A HASH and TREE index only supports exact match: all parts of a key participating in the index must be provided. In case of TREE index, only few parts of a key or a key prefix are accepted for search. In this case, all tuples with the same prefix or matching key parts are considered matching the search criteria. A non-unique HASH index returns tuples in unspecified order. When a TREE index is not unique, or only part of a key is given as a search criteria, matching tuples are returned in ascending order. BITSET indexes are always unique. |
box.index.GT | key | yes (*) | yes | no |
Iterate over tuples strictly greater than the search key.
For TREE indexes, a key prefix or key part can be sufficient.
If the key is nil , iteration starts from
the smallest key in the index. The tuples are returned
in ascending order of the key.
HASH index also supports this iterator type, but returns
tuples in unspecified order. However, if the server
does not receive updates, this iterator can be used
to retrieve all tuples via a HASH index piece by piece,
by supplying the last key from the previous range as the
start key for an iterator over the next range.
BITSET index does not support this iteration type yet.
|
Table 4.2. TREE iterator types
Type | Arguments | Description |
---|---|---|
box.index.REQ | key or key part |
Reverse equality iterator. Is equivalent to
box.index.EQ with only distinction that
the order of returned tuples is descending, not
ascending.
|
box.index.GE | key or key part |
Iterate over all tuples for which the corresponding
fields are greater or equal to the search key. The
tuples are returned in ascending order. Similarly to
box.index.EQ , key prefix or key part can
be used to seed the iterator. If the key is
nil , iteration starts from the smallest
key in the index.
|
box.index.LT | key or key part |
Similar to box.index.GT ,
but returns all tuples which are strictly less
than the search key. The tuples are returned
in the descending order of the key.
nil key can be used to start
from the end of the index range.
|
box.index.LE | key or key part |
Similar to box.index.GE , but
returns all tuples which are less or equal to the
search key or key prefix, and returns tuples
in descending order, from biggest to smallest.
If the key is nil , iteration starts
from the end of the index range.
|
Table 4.3. BITSET iterator types
Type | Arguments | Description |
---|---|---|
box.index.BITS_ALL_SET | bit mask | Matches tuples in which all specified bits are set. |
box.index.BITS_ANY_SET | bit mask | Matches tuples in which any of the specified bits is set. |
box.index.BITS_ALL_NOT_SET | bit mask | Matches tuples in which none of the specified bits is set. |
localhost> show configuration --- ... space[0].enabled: "true" space[0].index[0].type: "HASH" space[0].index[0].unique: "true" space[0].index[0].key_field[0].fieldno: "0" space[0].index[0].key_field[0].type: "NUM" space[0].index[1].type: "TREE" space[0].index[1].unique: "false" space[0].index[1].key_field[0].fieldno: "1" space[0].index[1].key_field[0].type: "NUM" space[0].index[1].key_field[1].fieldno: "2" space[0].index[1].key_field[1].type: "NUM" ... localhost> INSERT INTO t0 VALUES (1, 1, 0) Insert OK, 1 rows affected localhost> INSERT INTO t0 VALUES (2, 1, 1) Insert OK, 1 rows affected localhost> INSERT INTO t0 VALUES (3, 1, 2) Insert OK, 1 rows affected localhost> INSERT INTO t0 VALUES (4, 2, 0) Insert OK, 1 rows affected localhost> INSERT INTO t0 VALUES (5, 2, 1) Insert OK, 1 rows affected localhost> INSERT INTO t0 VALUES (6, 2, 2) Insert OK, 1 rows affected localhost> lua it = box.space[0].index[1]:iterator(box.index.EQ, 1); print(it(), " ", it(), " ", it()); --- 1: {1, 0} 2: {1, 1} 3: {1, 2} ... localhost> lua it = box.space[0].index[1]:iterator(box.index.EQ, 1, 2); print(it(), " ", it(), " ", it()); --- 3: {1, 2} nil nil ... localhost> lua i = box.space[0].index[1]:iterator(box.index.GE, 2, 1); print(it(), " ", it(), " ", it()); --- 5: {2, 1} 6: {2, 2} nil ... localhost> lua for v in box.space[0].index[1]:iterator(box.index.ALL) do print(v) end --- 1: {1, 0} 2: {1, 1} 3: {1, 2} 4: {2, 0} 5: {2, 1} 6: {2, 2} ... localhost> lua i = box.space[0].index[0]:iterator(box.index.LT, 1); --- error: 'Iterator type is not supported'
Functions in this package allow you to create, run and manage existing fibers.
A fiber is an independent execution thread implemented
using a mechanism of cooperative multitasking.
A fiber has three possible states: running, suspended or dead.
When a fiber is created with box.fiber.create()
,
it is suspended.
When a fiber is started with box.fiber.resume()
, it is running.
When a fiber's control is yielded back to the caller with
box.fiber.yield()
, it is suspended.
When a fiber ends (due to return
or by reaching the
end of the fiber function), it is dead.
A fiber can also be attached or detached.
An attached fiber is a child of the creator,
and is running only if the creator has called
box.fiber.resume()
. A detached fiber is a child of
Tarantool internal “sched” fiber, and gets
scheduled only if there is a libev event associated
with it.
To detach, a running fiber must invoke box.fiber.detach()
.
A detached fiber loses connection with its parent forever.
All fibers are part of the fiber registry, box.fiber
.
This registry can be searched (box.fiber.find()
)
either by fiber id (fid), which is numeric, or by fiber name,
which is a string. If there is more than one fiber with the given
name, the first fiber that matches is returned.
Once fiber function is done or calls return
,
the fiber is considered dead. Its carcass is put into
a fiber pool, and can be reused when another fiber is
created.
A runaway fiber can be stopped with box.fiber.cancel()
.
box.fiber.cancel()
, however, is advisory — it works
only if the runaway fiber is calling box.fiber.testcancel()
once in a while. Most box.*
hooks, such as box.delete()
or box.update()
, are calling box.fiber.testcancel()
.
box.select()
doesn't.
In practice, a runaway fiber can only become unresponsive if it does a lot of computations and doesn't check whether it's been canceled.
The other potential problem comes from detached
fibers which never get scheduled, because they are not subscribed
to any events, or because no relevant events occur. Such morphing fibers
can be killed with box.fiber.cancel()
at any time,
since box.fiber.cancel()
sends an asynchronous wakeup event to the fiber,
and box.fiber.testcancel()
is checked whenever such an event occurs.
Like all Lua objects, dead fibers are garbage collected: the garbage collector frees pool allocator memory owned by the fiber, resets all fiber data, and returns the fiber to the fiber pool.
box.fiber
userdata
object for the currently scheduled fiber.
Create a fiber for function
.
Can hit a recursion limit.
Yield control to the calling fiber, if the fiber is attached, or to sched otherwise.
If the fiber is attached, whatever arguments are passed
to this call, are passed on to the calling fiber.
If the fiber is detached, box.fiber.yield()
returns back everything passed into it.
time
seconds.
Only the current fiber can be made to sleep.
fiber
.
Running and suspended fibers can be canceled.
Returns an error if the subject fiber does not permit cancel.
Learn session state, set on-connect and on-disconnect triggers.
A session is an object associated with each client connection. Through this module, it's possible to query session state, as well as set a Lua chunk executed on connect or disconnect event.
This module also makes it possible to define triggers on connect and disconnect events. Please see the triggers chapter for details.
Create a new communication channel with predefined capacity. The channel can be used to synchronously exchange messages between stored procedures. The channel is garbage collected when no one is using it, just like any other Lua object. Channels can be worked with using functional or object-oriented syntax. For example, the following two lines are equivalent:
channel:put(message) box.ipc.channel.put(channel, message)
Check if the channel is full and has writers waiting for empty room.
local channel = box.ipc.channel(10) function consumer_fiber() while true do local task = channel:get() ... end end function consumer2_fiber() while true do local task = channel:get(10) -- 10 seconds if task ~= nil then ... else print("timeout!") end end end function producer_fiber() while true do task = box.select(...) ... if channel:is_empty() then # channel is empty end if channel:is_full() then # channel is full end ... if channel:has_readers() then # there are some fibers that wait data end ... if channel:has_writers() then # there are some fibers that wait readers end channel:put(task) end end function producer2_fiber() while true do task = box.select(...) if channel:put(task, 10) then -- 10 seconds ... else print("timeout!") end end end
BSD sockets is a mechanism to exchange data with a local or
remote host in connection-oriented (TCP) or datagram-oriented
(UDP) mode.
Semantics of the calls in box.socket
API closely follows
semantics of the corresponding POSIX calls. Function names
and signatures are mostly compatible with
luasocket.
Similarly to luasocket, box.socket
doesn't throw exceptions
on errors. On success, most calls return a socket object.
On error, a multiple return of nil, status, errno, errstr
is produced.
Status
can be one of "error"
, "timeout"
,
"eof"
or "limit"
. On
success, status is always nil
.
A call which returns data (recv()
, recvfrom()
,
readline()
) on success returns a Lua string of
the requested size and nil
status. On error or timeout,
an empty string is followed by the corresponding status, error number and message.
A call which sends data (send()
, sendto()
) on
success returns the number of bytes sent, and the status
is, again,
nil
. On error or timeout 0
is returned,
followed by status, error number and message.
The last error can be retrieved from the socket using
socket:error()
. Any call except error()
clears
the last error first (but may set a new one).
Calls which require a socket address and in POSIX expect
struct sockaddr_in
, in box.socket
simply accept host name and port as additional arguments.
Name resolution is done automatically. If it fails,
status is set to "error"
, errno is set to -1
and error string is set to "Host name resolution failed"
.
All calls that can take time block the calling fiber and can get it preempted. The implementation, however, uses non-blocking cooperative I/O, so Tarantool continues processing queries while a call is blocked. A timeout can be provided for any socket call which can take a long time.
As all other box
libraries, the API can be used
in procedural style (e.g. box.socket.close(socket)
) as well
as in object-oriented style (socket:close()
).
A closed socket should not be used any more. Alternatively, the socket will be closed when its userdata is garbage collected by Lua.
Create a new TCP socket.
A new socket or nil
.
Create a new UDP socket.
A new socket or nil
.
Connect a socket to a remote host. Can be used with IPv6 and IPv4 addresses, as well as domain names. If multiple addresses correspond to a domain, tries them all until successfully connected.
Returns a connected socket on success,
nil, status, errno, errstr
on error or timeout.
Send data over a connected socket.
The number of bytes sent. On success, this is exactly
the length of data
. In case of error or timeout,
returns the number of bytes sent before error,
followed by status, errno, errstr
.
Read size
bytes from a connected socket.
An internal read-ahead buffer is used to reduce the cost
of this call.
A string of the requested length on success.
On error or timeout, returns an empty string, followed
by status, errno, errstr
.
If there was some data read before a timeout occurred, it
will be available on the next call.
In case the writing side has closed its end, returns the remainder
read from the socket (possibly an empty string),
followed by "eof"
status.
Read a line from a connected socket.
socket:readline()
with no arguments reads data from a socket
until '\n' or eof.
If a limit is set, the call reads data until a separator is found,
or the limit is reached. By default, there is no limit.
Instead of the default separator, a Lua table can be used
with one or multiple separators. Then the data is read
until the first matching separator is found.
A Lua string with data in case of success or an empty string in case of error. When multiple separators were provided in a separator table, the matched separator is returned as the third argument.
Table 4.4. readline()
returns
data, nil, separator | success |
"", "timeout", ETIMEDOUT, errstr | timeout |
"", "error", errno, errstr | error |
data, "limit" | limit |
data, "eof" | eof |
Bind a socket to the given host/port.
A UDP socket after binding can be used
to receive data (see recvfrom()
). A TCP socket
can be used to accept new connections, after it's
been put in listen mode.
The timeout is used for name resolution only. If host
name is an IP address, the call never yields and
the timeout is unused.
Socket object on success, nil, status, errno, errstr
on error.
Start listening for incoming connections. The listen
backlog, on Linux, is taken from /proc/sys/net/core/somaxconn
,
whereas on BSD is set to SOMAXCONN
.
Socket on success, nil, "error", errno, errstr
on error.
Wait for a new client connection and create a connected socket.
peer_socket, nil, peer_host, peer_port
on success.
nil, status, errno, errstr
on error.
Send a message on a UDP socket to a specified host.
The number of bytes sent on success, 0, status, errno, errstr
on error or timeout.
Receive a message on a UDP socket.
Message, nil
, client address, client port on success,
"", status, errno, errstr
on error or timeout.
Shutdown a reading, writing or both ends of a socket. Accepts box.socket.SHUT_RD, box.socket.SHUT_WR and box.socket.SHUT_RDWR.
Socket on success, nil, "error", errno, errstr
on error.
Close (destroy) a socket. A closed socket should not be used any more.
Retrieve the last error occurred on a socket.
errno, errstr
. 0, "Success"
if there is no error.
This package provides read-only access to all server configuration parameters.
Package box.info
This package provides access to information about server variables: pid, uptime, version and such. Its contents is identical to output of SHOW INFO.
Since contents of box.info is dynamic, it's not possible to iterate over keys with Lua pairs() function. For this purpose, box.info() builds and returns a Lua table with all keys and values provided in the package.
localhost> lua for k,v in pairs(box.info()) do print(k, ": ", v) end --- version: 1.4.7-92-g4ba95ca status: primary pid: 1747 lsn: 1712 recovery_last_update: 1306964594.980 recovery_lag: 0.000 uptime: 39 build: table: 0x419cb880 logger_pid: 1748 config: /home/unera/work/tarantool/test/box/tarantool_good.cfg ...
localhost> lua box.info.pid --- - 1747 ... localhost> lua box.info.logger_pid --- - 1748 ... localhost> lua box.info.version --- - 1.4.7-92-g4ba95ca ... localhost> lua box.info.config --- - /home/unera/work/tarantool/test/box/tarantool_good.cfg ... localhost> lua box.info.uptime --- - 3672 ... localhost> lua box.info.lsn --- - 1712 ... localhost> lua box.info.status --- - primary ... localhost> lua box.info.recovery_lag --- - 0.000 ... localhost> lua box.info.recovery_last_update --- - 1306964594.980 ... localhost> lua box.info.snapshot_pid --- - 0 ... localhost> lua for k, v in pairs(box.info.build) do print(k .. ': ', v) end --- flags: -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -DCORO_ASM -fno-omit-frame-pointer -fno-stack-protector -fexceptions -funwind-tables -fgnu89-inline -pthread -Wno-sign-compare -Wno-strict-aliasing -std=gnu99 -Wall -Wextra -Werror target: Linux-x86_64-Debug compiler: /usr/bin/gcc options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_STATIC=OFF -DENABLE_GCOV=OFF -DENABLE_TRACE=ON -DENABLE_BACKTRACE=ON -DENABLE_CLIENT=OFF ...
Package box.slab
This package provides access to slab allocator statistics.
localhost> lua box.slab.arena_used --- - 4194304 ... localhost> lua box.slab.arena_size --- - 104857600 ... localhost> lua for k, v in pairs(box.slab.slabs) do print(k) end --- 64 128 ... localhost> lua for k, v in pairs(box.slab.slabs[64]) do print(k, ':', v) end --- items:1 bytes_used:160 item_size:64 slabs:1 bytes_free:4194144 ...
This package provides access to request statistics.
localhost> lua box.stat -- a virtual table --- - table: 0x41a07a08 ... localhost> lua box.stat() -- a full table (the same) --- - table: 0x41a0ebb0 ... localhost> lua for k, v in pairs(box.stat()) do print(k) end --- DELETE SELECT REPLACE CALL UPDATE DELETE_1_3 ... localhost> lua for k, v in pairs(box.stat().DELETE) do print(k, ': ', v) end --- total: 23210 rps: 22 ... localhost> lua for k, v in pairs(box.stat.DELETE) do print(k, ': ', v) end -- the same --- total: 23210 rps: 22 ... localhost> lua for k, v in pairs(box.stat.SELECT) do print(k, ': ', v) end --- total: 34553330 rps: 23 ... localhost>
Additional examples can be found in the open source Lua stored procedures repository and in the server test suite.
There are two limitations in stored procedures support one should be aware of: execution atomicity and lack of typing.
Tarantool core is built around cooperative multi-tasking paradigm: unless a running fiber deliberately yields control to some other fiber, it is not preempted. “Yield points” are built into all calls from Tarantool core to the operating system. Any system call which can block is performed in asynchronous manner and the fiber waiting on the system call is preempted with a fiber ready to run. This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there is no concurrency around a resource, no race conditions and no memory consistency issues.
When requests are small, e.g. simple UPDATE, INSERT, DELETE, SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.
A stored procedure, however, can perform complex computations,
or be written in such a way that control is not given away for a
long time. This can lead to unfair scheduling, when a single
client throttles the rest of the system, or to apparent stalls
in request processing.
Avoiding this situation is responsibility of the stored procedure
author. Most of box
calls, such as
box.insert()
, box.update()
,
box.delete()
are yield points; box.select()
and box.select_range()
, however, are not.
It should also be noted, that in absence of transactions, any yield in a stored procedure is a potential change in the database state. Effectively, it's only possible to have CAS (compare-and-swap) -like atomic stored procedures: i.e. procedures which select and then modify a record. Multiple data change requests always run through a built-in yield point.
When invoking a stored procedure from the binary protocol, it's not possible to convey types of arguments. Tuples are type-agnostic. The conventional workaround is to use strings to pass all (textual and numeric) data.
Triggers are Lua scripts invoked by the system upon a certain event. Tarantool currently only supports system-wide triggers, run when a new connection is established or dropped. Since trigger body is a Lua script, it is external to the server, and a trigger must be set up on each server start. This is most commonly done in the initialization script. Once a trigger for an event exists, it is automatically invoked whenever an event occurs. The performance overhead of triggers, as long as they are not defined, is minimal: merely a pointer dereference and check. If a trigger is defined, its overhead is equivalent to the overhead of calling a stored procedure.
Set a callback (trigger) invoked on each connected session. The callback doesn't get any arguments, but is the first thing invoked in the scope of the newly created session. If the trigger fails by raising an error, the error is sent to the client and the connection is shut down. Returns the old value of the trigger.
If a trigger always results in an error, it may become impossible to connect to the server to reset it.
To set up replication, it's necessary to prepare the master, configure a replica, and establish procedures for recovery from a degraded state.
A replica gets all updates from the master by continuously fetching and applying its write ahead log (WAL). Each record in the WAL represents a single Tarantool command, such as INSERT, UPDATE, DELETE and is assigned a monotonically growing log sequence number (LSN). In essence, Tarantool replication is row-based: all data change commands are fully deterministic and operate on a single record.
A stored program invocation does not enter the Write Ahead Log. Instead, log events for actual UPDATEs and DELETEs, performed by the Lua code, are written to the log. This ensures that possible non-determinism of Lua does not cause replication going out of sync.
For replication to work correctly, the latest LSN on the replica must match or fall behind the latest LSN on the master. If the replica has its own updates, this leads to it getting out of sync, since updates from the master having identical LSNs are not applied. Indeed, if replication is ON, Tarantool does not accept updates, even on its primary_port.
To prepare the master for connections from replica, it's only
necessary to enable replication_port in
the configuration file. An example configuration file can be
found in test/box_replication/cfg/master.cfg
. A master with enabled replication_port can accept connections
from as many replicas as necessary on that port. Each replica
has its own replication state.
The server, master or replica, always requires a valid snapshot file to boot from. For a master, it's usually prepared with with --init-storage option, for replicas it's usually copied from the master.
To start replication, configure replication_source. Other parameters can also be changed, but existing spaces and their primary keys on the replica must be identical to ones on the master.
Once connected to the master, the replica requests all changes
that happened after the latest local LSN. It is therefore
necessary to keep WAL files on the master host as long as
there are replicas that haven't applied them yet. An example
configuration can be found in test/box_replication/cfg/replica.cfg
.
In absence of required WALs, a replica can be "re-seeded" at any time with a newer snapshot file, manually copied from the master.
Replication parameters are "dynamic", which allows the replica to become a master and vice versa with help of RELOAD CONFIGURATION statement.
"Degraded state" is a situation when the master becomes unavailable -- either due to hardware or network failure, or a programming bug. There is no reliable way for a replica to detect that the master is gone for good, since sources of failure and replication environments vary significantly.
A separate monitoring script (or scripts, if decision making quorum is desirable) is necessary to detect a master failure. Such script would typically try to update a tuple in an auxiliary space on the master, and raise alarm if a network or disk error persists longer than is acceptable.
When a master failure is detected, the following needs to be done:
First and foremost, make sure that the master does not accepts updates. This is necessary to prevent the situation when, should the master failure end up being transient, some updates still go to the master, while others already end up on the replica.
If the master is available, the easiest way to turn on read-only mode is to turn Tarantool into a replica of itself. This can be done by setting master's replication_source to point to self.
If the master is not available, best bet is to log into the machine and kill the server, or change the machine's network configuration (DNS, IP address).
If the machine is not available, it's perhaps prudent to power it off.
Record the replica's LSN, by issuing SHOW INFO. This LSN may prove useful if there are updates on the master that never reached the replica.
Propagate the replica to become a master. This is done by setting replication_source on replica to an empty string.
Change the application configuration to point to the new master. This can be done either by changing the application's internal routing table, or by setting up old master's IP address on the new master's machine, or using some other approach.
Recover the old master. If there are updates that didn't make it to the new master, they have to be applied manually. You can use --cat option to read server logs.
Typical server administration tasks include starting and stopping the server, reloading configuration, taking snapshots, log rotation.
The server is configured to gracefully shutdown on SIGTERM and SIGINT (keyboard interrupt) or SIGHUP. SIGUSR1 can be used to save a snapshot. All other signals are blocked or ignored. The signals are processed in the main event loop. Thus, if the control flow never reaches the event loop (thanks to a runaway stored procedure), the server stops responding to any signal, and can be only killed with SIGKILL (this signal can not be ignored).
This chapter provides a cheatsheet for most common server management routines on every supported operating system.
Setting up an instance: ln -s /etc/tarantool/instances.available/instance-name.cfg /etc/tarantool/instances.enabled/
Starting all instances: service tarantool start
Stopping all instances: service tarantool stop
Starting/stopping one instance: service tarantool-instance-name start/stop
This chapter provides a reference of options which can be set in the command line or
tarantool.cfg
configuration file.
Tarantool splits its configuration parameters between command line options and a configuration file. Command line flags are provided for the most basic properties only: the rest must be set in the configuration file. At runtime, this allows to disambiguate the source of a configuration setting: it unequivocally comes either from the command line, or from the configuration file, but never from both.
Tarantool follows the GNU
standard for its command line interface: long
options start with a double dash (--option
),
their short counterparts use a single one (-o
).
For phrases, both dashes and
underscores can be used as word separators
(--cfg-get
and --cfg_get
both work).
If an option requires an argument, you can either separate it
with a space or equals sign (--cfg-get=pid_file
and
--cfg-get pid_file
both work).
Print an annotated list of all available options and exit.
Print product name and version, for example:
$
./tarantool_box --version
Tarantool 1.4.0-69-g45551dd
In this example:
“Tarantool” is the name of the reusable asynchronous networking programming framework. |
“Box” is the name of the storage back-end. |
The 3-number version follows the standard
<major>-<minor>-<patch>
scheme, in which <major> number
is changed only rarely, <minor>
is incremented for each new milestone and
indicates possible incompatible changes,
and <patch> stands for the number of
bug fix releases made after the start of the
milestone. The optional commit number and
commit SHA1 are output for non-released versions
only, and indicate how much this particular build has diverged
from the last release.
|
Tarantool uses git describe to produce its version id, and this id can be used at any time to check out the corresponding source from our git repository.
--config=
, /path/to/config.file
-c
Tarantool does not start without a configuration file. By
default, the server looks for file named
tarantool.cfg
in the current working
directory. An alternative location can be provided using
this option.
--check-config
Check the configuration file for errors. This option is
normally used on the command line
before “reload configuration”
is issued on the administrative port, to ensure that the new
configuration is valid. When configuration is
indeed correct, the program produces no output and returns 0
.
Otherwise, information about discovered error is printed out
and the program terminates with a non-zero value.
--cfg-get=
option_name
Given option name, print option value. If the option does not exist, or the configuration file is incorrect, an error is returned. If the option is not explicitly specified, its default value is used instead. Example:
$
./tarantool_box --cfg-get=admin_port
33015
Initialize the directory, specified in vardir
configuration option by creating an empty snapshot file in
it. If vardir
doesn't contain at
least one snapshot, the server does not start. There is no
“magic” with automatic initialization of
vardir
on boot to make
potential system errors more noticeable. For example, if the
operating system reboots and fails to mount the partition on
which vardir
is expected to reside, the
rc.d
or service script
responsible for server restart will also fail, thanks to this
option.
The only two options which have effect on a running server are:
--verbose
, -v
Increase verbosity level in log messages. This option currently has no effect.
--background
, -b
Detach from the controlling terminal and run in background.
Tarantool uses
stdout
and
stderr
for
debug and error log output. When starting the server with
option --background
, make sure to
either redirect its standard out and standard error
streams, or provide logger option
in the configuration file, since otherwise all logging
information will be lost
All advanced configuration parameters must be specified in a
configuration file, which is required for server start. If no path to
the configuration file is specified on the command line (see
--config
),
the server looks for a file named tarantool.cfg
in the current working directory.
To facilitate centralized and automated configuration management, runtime configuration modifications are supported solely through RELOAD CONFIGURATION administrative statement. Thus, the procedure to change Tarantool configuration at runtime is to edit the configuration file. This ensures that, should the server get killed or restart, no unexpected changes to configuration can occur.
Not all configuration file settings are changeable at runtime: such settings will be highlighted in this reference. If the same setting is given more than once, the latest occurrence takes effect. You can always invoke SHOW CONFIGURATION from the administrative console to show the current configuration.
Tarantool maintains a set of all allowed configuration
parameters in two template files, which are easy to maintain
and extend:
cfg/core_cfg.cfg_tmpl
,
src/box/box_cfg.cfg_tmpl
.
These files can always be used as a reference for any
parameter in this manual.
In addition, two working examples can be found in the source tree:
test/box/tarantool.cfg
,
test/box_big/tarantool.cfg
.
Table 7.1. Basic parameters
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
username | string | "" | no | no | UNIX user name to switch to after start. |
work_dir | string | "" | no | no | A directory to switch to with chdir(2) after start. Can be relative to the starting directory. If not specified, the current working directory of the server is the same as starting directory. |
wal_dir | string | "" | no | no | A directory to store the write ahead log files (WAL) in. Can be relative to work_dir. You may choose to separate your snapshots and logs and store them on separate disks. This is how this parameter is most commonly used. If not specified, defaults to work_dir. |
snap_dir | string | "" | no | no | A directory to store snapshots in. Can be relative to work_dir. If not specified, defaults to work_dir. See also wal_dir. |
bind_ipaddr | string | "INADDR_ANY" | no | no | The network interface to bind to. By default, the server binds to all available addresses. Applies to all ports opened by the server. |
primary_port | integer | none | yes | no | The read/write data port. Has no default value, so must be specified in the configuration file. Normally set to 33013. Note: a replica also binds to this port, accepts connections, but these connections can only serve reads until the replica becomes a master. |
secondary_port | integer | none | no | no | Additional, read-only port. Normally set to 33014. Not used unless is set. |
admin_port | integer | none | no | no | The TCP port to listen on for administrative connections. Has no default value. Not used unless assigned a value. Normally set to 33015. |
pid_file | string | tarantool.pid | no | no | Store the process id in this file. Can be relative to work_dir. |
custom_proc_title | string | "" | no | no |
Inject the given string into server process title (what's shown in COMMAND column of ps and top commands). For example, an unmodified Tarantool process group looks like: kostja@shmita:~$ ps -a -o command | grep box tarantool_box: primary pri:33013 sec:33014 adm:33015 After "sessions" custom_proc_title is injected it looks like: kostja@shmita:~$ ps -a -o command | grep box tarantool_box: primary@sessions pri:33013 sec:33014 adm:33015 |
Table 7.2. Configuring the storage
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
slab_alloc_arena | float | 1.0 | no | no | How much memory Tarantool allocates to actually store tuples, in gigabytes. When the limit is reached, INSERT or UPDATE requests begin failing with error ER_MEMORY_ISSUE. While the server does not go beyond the defined limit to allocate tuples, there is additional memory used to store indexes and connection information. Depending on actual configuration and workload, Tarantool can consume up to 20-40% of the limit set here. |
slab_alloc_minimal | integer | 64 | no | no | Size of the smallest allocation unit. It can be tuned down if most of the tuples are very small. |
slab_alloc_factor | float | 2.0 | no | no | Use slab_alloc_factor as the multiplier for computing the sizes of memory chunks that tuples are stored in. A lower value may result in less wasted memory depending on the total amount of memory available and the distribution of item sizes. |
space | array of objects | none | yes | no | This is the main Tarantool parameter, describing the data structure that users get access to via client/server protocol. It holds an array of entries, and each entry represents a tuple set served by the server. Every entry is a composite object, best seen as a C programming language "struct" [a]. |
[a] Space settings explainedSpace is a composite parameter, i.e. it has properties. /* * Each tuple consists of fields. Three field types are * supported. */ enum { STR, NUM, NUM64 } field_type; /* * Tarantool is interested in field types only inasmuch as * it needs to build indexes on fields. An index * can cover one or more fields. */ struct index_field_t { unsigned int fieldno; enum field_type type; }; /* * HASH and TREE index types are supported. */ enum { HASH, TREE } index_type; struct index_t { index_field_t key_field[]; enum index_type type; /* Secondary index may be non-unique */ bool unique; }; struct space_t { /* A space can be quickly disabled and re-enabled at run time. */ bool enabled; /* * If given, each tuple in the space must have exactly * this many fields. */ unsigned int cardinality; /* Only used for HASH indexes, to preallocate memory. */ unsigned int estimated_rows; struct index_t index[]; }; The way a space is defined in a configuration file is similar to how you would initialize a C structure in a program. For example, a minimal storage configuration looks like below: space[0].enabled = 1 space[0].index[0].type = HASH space[0].index[0].unique = 1 space[0].index[0].key_field[0].fieldno = 0 space[0].index[0].key_field[0].type = NUM64 The parameters listed above are mandatory. Other space properties are set in the same way. An alternative syntax, mainly useful when defining large spaces, exists: space[0] = { enabled = 1, index = [ { type = HASH, key_field = [ { fieldno = 0, type = NUM64 } ] } ] } When defining a space, please be aware of these restrictions:
|
Table 7.3. Binary logging and snapshots
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
panic_on_snap_error | boolean | true | no | no | If there is an error reading the snapshot (at server start), abort. |
panic_on_wal_error | boolean | false | no | no | If there is an error reading from a write ahead log (at server start), abort. |
rows_per_wal | integer | 500000 | no | no | How many log records to store in a single write
ahead log file. When this limit is reached, Tarantool
creates another WAL file named
<first-lsn-in-wal>.wal
This can be useful for simple rsync-based backups.
|
snap_io_rate_limit | float | 0.0 | no | yes | Reduce the throttling effect of SAVE SNAPSHOT on the INSERT/UPDATE/DELETE performance by setting a limit on how many megabytes per second it can write to disk. The same can be achieved by splitting wal_dir and snap_dir locations and moving snapshots to a separate disk. |
wal_fsync_delay | float | 0 | no | yes | Do not flush the write ahead log to disk more often than once in wal_fsync_delay seconds. By default the delay is zero, that is, the write ahead log is flushed after every write. Setting the delay may be necessary to increase write throughput, but may lead to several last updates being lost in case of a power failure. Such failure, however, does not read to data corruption: all WAL records have a checksum, and only complete records are processed during recovery. |
wal_mode | string | "fsync_delay" | no | yes | Specify fiber-WAL-disk synchronization mode as: none: write ahead log is not maintained; write: fibers wait for their data to be written to the write ahead log (no fsync(2)); fsync: fibers wait for their data, fsync(2) follows each write(2); fsync_delay: fibers wait for their data, fsync(2) is called every N=wal_fsync_delay seconds (N=0.0 means no fsync(2) - equivalent to wal_mode = "write"); |
Table 7.4. Replication
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
replication_port | integer | 0 | no | no | Replication port. If non-zero, Tarantool listens on the given port for incoming connections from replicas. See also replication_source, which complements this setting on the replica side. |
replication_source | string | NULL | no | yes | Pair ip:port describing the master. If not empty, replication is on, and Tarantool does not accept updates on primary_port. This parameter is dynamic, that is, to enter master mode, simply set the value to an empty string and issue RELOAD CONFIGURATION. |
Table 7.5. Networking
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
io_collect_interval | float | 0.0 | no | yes | If non-zero, a sleep given duration is injected between iterations of the event loop. Can be used to reduce CPU load in deployments in which the number of client connections is large, but requests are not so frequent (for example, each connection issuing just a handful of requests per second). |
readahead | integer | 16384 | no | no | The size of read-ahead buffer associated with a client connection. The larger is the buffer, the more memory an active connection consumes and more requests can be read from the operating system buffer in a single system call. The rule of tumb is to make sure the buffer can contain at least a few dozen requests. Therefore, if a typical tuple in a request is large, e.g. a few kilobytes or even megabytes, the readahead buffer should be increased. If batched request processing is not used, it's prudent to leave this setting at its default. |
backlog | integer | 1024 | no | no | The size of listen backlog. |
Table 7.6. Logging
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
log_level | integer | 4 | no | yes | How verbose the logging is. There are 5 log verbosity classes: 1 -- ERROR, 2 -- CRITICAL, 3 -- WARNING, 4 -- INFO, 5 -- DEBUG. By setting log_level, you can enable logging of all classes below or equal to the given level. Tarantool prints its logs to the standard error stream by default, but this can be changed with "logger" configuration parameter. |
logger | string | "" | no | no | By default, the log is sent to the standard
error stream (stderr ). If this
option is given, Tarantool creates a child process,
executes the given command in it, and pipes its standard
output to the standard input of the created process.
Example setting: tee --append
tarantool.log (this will duplicate log output
to stdout and a log file).
|
logger_nonblock | integer | 0 | no | no | If this option is given, Tarantool does not block on the log file descriptor when it's not ready for write, and drops the message instead. If log_level is high, and a lot of messages go to the log file, setting this option to 1 may improve logging performance at the cost of some log messages getting lost. |
too_long_threshold | float | 0.5 | no | yes | If processing a request takes longer than the given value (in seconds), warn about it in the log. Has effect only if log_level is no less than 3 (WARNING). |
Table 7.7. Memcached protocol support
Name | Type | Default | Required? | Dynamic? | Description |
---|---|---|---|---|---|
memcached_port | integer | none | no | no | Turn on Memcached protocol support on the given port. All requests on this port are directed to a dedicated space, set in memcached_space. Memcached-style flags are supported and stored along with the value. The expiration time can also be set and is persistent, but is ignored, unless memcached_expire is turned on. Unlike Memcached, all data still goes to the binary log and to the replica, if latter one is set up, which means that power outage does not lead to loss of all data. Thanks to data persistence, cache warm up time is also very short. |
memcached_space | integer | 23 | no | no | Space id to store memcached data in. The format of tuple is [key, metadata, value], with a HASH index based on the key. Since the space format is defined by Memcached data model, it must not be previously configured. |
memcached_expire | boolean | false | no | no | Turn on tuple time-to-live support in memcached_space. This effectively turns Tarantool into a persistent, replicated and scriptable implementation of Memcached. |
memcached_expire_per_loop | integer | 1024 | no | yes | How many records to consider per iteration of the expiration loop. Tuple expiration is performed in a separate “green” thread within our cooperative multitasking framework and this setting effectively limits how long the expiration loop stays on CPU uninterrupted. |
memcached_expire_full_sweep | float | 3600 | no | yes | Try to make sure that every tuple is considered for expiration within this time frame (in seconds). Together with memcached_expire_per_loop this defines how often the expiration “green” thread is scheduled on CPU. |
This chapter documents APIs for various programming languages.
Apart from the native Tarantool client driver, you can always use a Memcached driver of your choice, after enabling Memcached protocol in the configuration file.
Please see connector/c
in the source tree.
Please refer to CPAN module DR::Tarantool.
Please see tarantool-php
project at GitHub.
Please see http://github.com/mailru/tarantool-python
.
You need Ruby 1.9 or later
to use this connector. Connector sources are located in http://github.com/mailru/tarantool-ruby
.
Linux and FreeBSD operating systems allow a running process to modify its title, which otherwise contains the program name. Tarantool uses this feature to aid to needs of system administration, such as figuring out what services are running on a host, TCP/IP ports in use, et cetera.
Tarantool process title follows the following naming scheme:
program_name
: role
[@custom_proc_title] [ports in use]
program_name is typically tarantool_box. The role can be one of the following:
primary -- the master node,
replica/IP
:port
-- a replication node,
wal_writer -- a write ahead log management process (always pairs up with the main process, be it primary or replica).
replication_server -- runs only if replication_port is set, accepts connections on this port and creates a
replication_relay -- a process that servers a single replication connection.
Possible port names are: “pri” for primary_port, “sec” for secondary_port, “adm” for admin_port and “memcached” for memcached_port.
For example:
tarantool_box: primary pri:50000 sec:50001 adm:50002
tarantool_box: primary@infobox pri:15013 sec:15523 adm:10012
tarantool_box: wal_writer
In the current version of the binary protocol, error message,
which is normally more descriptive than error code,
is not present in server response. The actual message may contain
a file name, a detailed reason or operating system error code.
All such messages, however, are logged in the error log. When
using Memcached protocol, the error message is sent to the
client along with the code. Below follow only general descriptions
of some popular codes. A complete list of errors can be found in
file errcode.h
in the source tree.
List of error codes
Attempt to execute an update on a running replica.
Illegal parameters. Malformed protocol message.
Out of memory: slab_alloc_arena limit is reached.
Failed to record the change in the write ahead log. Some sort of disk error.
A unique index constraint violation: a tuple with the same key is already present in the index.
Key part count is greater than index part count
Attempt to access a space that is not configured (doesn't exist).
No index with the given id exists.
An error inside Lua procedure.