When Placid (Thailand) Ltd. started working on a greenfield FinTech application that would handle a mission-critical core-banking system at scale, they immediately chose YottaDB. Such a core-banking application requires both high performance as well as uncompromising robustness, and must deliver both at scale with large numbers of concurrent users.
Not all instances in the application use YottaDB, however — instances that handle data that’s used for reporting, for example, may not need to have the performance or the ability to handle concurrency that the core banking system does, and so they can use other databases.
When planning out the application architecture, the decision to use YottaDB as the data store was made even before deciding on the language to write the application in. The need for robustness and performance was the most important consideration, and combined with the fact that the YottaDB code-base has been production-tested for decades, it was the unquestioned choice. Ultimately, the team decided to write the application in Go, because it is a high-performance language and performance is the priority for this application.
Not every developer on the team interacts directly with YottaDB — and for those who don’t have experience with YottaDB’s native API, there would be a learning curve. Placid handles this by having a small, dedicated database team that creates a YottaDB framework that exposes a fintech-friendly API which allows other developers on the team to access YottaDB — they don’t need to know the YottaDB native API. This adds an element of future-proofing, by allowing the framework to be tweaked internally without requiring changes to the financial applications that use it.
As a result, the Placid team is able to combine the high performance, robustness, and scalability to large numbers of concurrent users with YottaDB, which matches the high performance that Go is known for. On an average day, the banking application handles one million customers.
Comsan Chanma, department manager of the SME team, said he’d recommend other teams follow a similar strategy when implementing YottaDB with the language of their choice — and mentioned that the choice to use YottaDB can and should be completely independent of the programming language. But by having a dedicated database team, you’re able to take advantage of YottaDB’s exceptional performance and consistency.
Credits
- Photo of 1,000 baht bill appears to have no copyright.
- Photo of plush Go gopher courtesy the Go Authors and released under the Creative Commons Attribution 3.0 Unported license.
TL;DR
Efficient access to critical sections is perhaps the single most important factor in determining database performance in high contention environments like large production systems. In our development of r2.04, we are paying a lot of attention to critical section access. This is a summary of our work and results to date (r2.04 is still under development as of this post’s publication date).
This post comes with the caveat that critical sections are only one determinant of application throughput. Other factors like compute time, or user input time may well be more important to your application. Even within an application, workloads vary: for example, the workload of interest posting in a bank is different from the workload of processing customer transactions.
What are Critical Sections?
Imagine a bus where just one person can board at a time. How fast one can load the bus depends on how long it takes each person to board the bus, and how long it takes between people boarding. Space taken by people waiting to board the bus can also be a consideration when space is limited, such as inside a bus terminal.
If there are only a few people waiting to board, no special organization is needed. They can cluster round the door, and as soon as one person boards, another can follow. But if there are more than a few, it will be more efficient overall if they queue, as shown in the picture above, rather than cluster around the door. If there are many more people wanting to board the bus, it would be best to have barriers to organize the queue. Barriers are especially important to keep queues organized if space is limited.
Contrast the case of people boarding a bus with patients consulting a doctor. Both are “one at a time” activities, in the sense that one person boards the bus at a time. and the doctor sees one patient at a time. But the strategies for organizing access are different. Since the time that it takes to board a bus is short, it makes sense for those waiting to board to stand outside the bus. Since the time a doctor spends with a patient is longer, one usually makes an appointment. In some intermediate scenarios like waiting to be served at a counter, one might take a ticket and wait for its number to be called. You don’t make an appointment to board a bus, and in most countries you don’t line up outside the doctor’s office.
A critical section in software is like the door of the bus. It is a section of code that only one process at a time can execute. Contention occurs when multiple processes all need to execute a critical section, like many people wanting to board the bus. In the course of r2.04, we invested heavily in analyzing and optimizing code, especially code that executes in critical sections. That is not discussed here; this blog post is about handling critical section contention.
Access to Critical Sections
Access to critical sections is conceptually like people boarding a bus.
If there are a small number of processes contending for access to a critical section, like a small number of people clustering around the bus door, it is probably most efficient for each process to just “spin” in a tight loop, and keep trying to get the critical section. This is efficient because there is no overhead, and as soon as the critical section becomes available, a waiting process will get it. The disadvantage of this approach is that a spinning process consumes a CPU. Since a computer has a limited number of CPUs, having more than a few spinning processes prevents processes that don’t need the critical section from doing useful work.
If there are many processes contending for access to a critical section, it makes sense for them to queue. As with people queuing for a bus, this can be a simple queue where the processes organize themselves into a queue, or a queue with barriers that requires the action of an external agent (e.g., someone to erect the barriers).
While this spin-and-queue approach is conceptually simple, the devil is in the details.
Implementing Critical Sections
GT.M / YottaDB
For many years, the upstream GT.M included its own code to control access to critical sections that YottaDB used unchanged. This implementation is as follows.
- Spinning is like people clustering around the bus door.
- When a process is unable to get a critical section, it “hard spins” for a certain number of iterations, called the hard spin count, iterating in a tight loop to get the critical section. The r2.04 code base includes a PAUSE opcode to reduce the impact of its hard spin loops.
- After the hard spin limit is reached, it “soft spins” for a certain number of iterations called the soft spin count. The difference between a hard spin and a soft spin is that in the latter the process executes a sched_yield() each time through the loop. This relinquishes the CPU, and moves the process to the back of the run queue. When it reaches the front and gets the CPU, it again tries to acquire the critical section. It tries soft spins for soft spin count iterations. An important difference between a hard spin and a soft spin is that the sched_yield() of each iteration of a soft spin causes a context switch. Context switches use operating system resources, and may involve critical sections within an operating system.
- Queuing is like people waiting in line. When a process does not get the critical section after its hard spins and soft spins it adds itself to a queue in shared memory. Adding itself to this queue itself requires a critical section, albeit a tiny one that executes almost instantly and does not require any special organization. When a process releases a critical section, it wakes up the first process in the queue, which then tries to get the critical section. If it does not get the critical section, it goes through the hard-spins and soft spins and queuing all over again.
Owing to the number of changes we made to the builtin implementation and its evident abandonment by GT.M, we refer to it as the “YottaDB mutex.”
Linux
Linux provides pthread_mutex functions that use a similar technique.
Fairness vs. Throughput
Neither mutex implementation is “fair.” A process that reaches the front of the queue could again be pushed to the back of the queue by another process that just started its attempt to acquire the critical section (called “barging,” it is not unlike a queue-jumper that walks up to the door of the bus and gets in, pretending not to see the queue).
Fairness isn’t free. Implementing a guaranteed first-in / first-out fairness would be computationally expensive to the point of being unacceptable for a high-throughput application. Practical code to handle contention balances fairness against throughput.
While spin-and-queue techniques work well across most workloads of typical applications, their unfairness can be exacerbated when a system is pushed to its limits, resulting in large numbers of processes contending for the same critical section. While some processes randomly execute faster, others equally randomly execute slower, i.e., under heavy loads, the unfairness can manifest itself as inconsistent response times and throughput, rather than a proportional slowdown of all processes.
From r1.24 (V6.3-006) to r1.26 (V6.3-007)
GT.M versions through V6.3-006 used the builtin implementation of critical sections, as did YottaDB releases through r1.24. GT.M versions starting with V6.3-007 use pthread_mutexes, as do YottaDB releases from r1.26 through r2.02. While we do not know the motivation for the GT.M change, the developers would have had a reason to make the change.
The r2.04 Critical Section Journey
While no customer or user of YottaDB had expressed any concerns about performance, and although each YottaDB release was slightly faster than the GT.M versions merged into its code base (owing to small performance enhancements we have made that the upstream developers have not adopted), we decided to make performance and scalability a key feature of r2.04.
The Journey Begins
The twin axioms of performance improvement are:
- Be careful what you measure, because that is what you will improve.
- If you choose well, you will improve performance overall.
We decided to start with an M program that computes lengths of 3n+1 sequences, since that is a database update intensive workload that can easily be scaled up and down in both problem size as well as the number of concurrent processes. Running the program with twice the number of concurrent processes as CPUs (i.e., with some contention, but not a heavy load), we observed the data in the table below.
- The first column is a GT.M version or YottaDB release. Each YottaDB release is preceded by a GT.M version, the latest one whose source code is merged into the YottaDB release. r2.03-pre is the YottaDB code under development for r2.04 with V7.1-002 code and our enhancements and fixes merged prior to the critical section contention work.
- The second column shows an average elapsed time (i.e., less is better) for benchmark runs.
- The third column compares the elapsed times with GT.M V6.3-006, the last GT.M version released before the switch to pthread_mutexes (i.e., larger – more positive – is better; smaller – more negative – is worse).
- The fourth column compares the performance of each YottaDB release with the GT.M version it includes (larger – more positive – is better; smaller – more negative – is worse).
| Build | Elapsed time (milliseconds) |
vs. V6.3-006 | YottaDB vs. GT.M |
|---|---|---|---|
| V6.3-006 | 12,576 | – | – |
| r1.24 | 12,279 | 2.4% | 2.4% |
| V6.3-007 | 14,257 | -13.4% | – |
| r1.26 | 14,069 | -11.9% | 1.32% |
| V7.0-005 | 14,868 | -18.2% | – |
| r2.02 | 16,570 | -31.8% | -11.45% |
| V7.1-002 | 15,519 | -23.4% | – |
| r2.03-pre | 14,956 | -18.9% | 3.6% |
| r2.03-pre+ | 11,243 | -10.6% | 24.8% |
The last row, labeled r2.03-pre+ is the build after we included a host of performance improvements to code that executes inside critical sections (analogous to the time taken by each person to board the bus), and before our changes discussed below. It shows that even prior to the enhancements discussed below, the evolving YottaDB r2.04 code executed the 3n+1 benchmark 24.8% faster than GT.M V7.1-002.
(We were somewhat surprised by the r2.02 number, because such a slowdown has been the experience of neither us nor our users, and no such slowdown was apparent in our testing prior to the release. While it is perhaps an artifact of a specific benchmark, it is nevertheless the number we saw, and as we are objective developers, it is the number we are reporting here.)
The “Culprit”
It didn’t take long to discover that the change from the builtin mutex to pthread_mutex from GT.M V6.3-006 to V6.3-007 was the cause of the apparent slowdown. When we reverted to the builtin mutex (whose code we carefully examined for speed-up opportunities, which we implemented), we found that performance had been restored!
Then began the quest to determine what benefit pthread_mutexes offered, that motivated the change in V6.3-007. So we expanded the simulated workloads. One was a workload that simulated interest posting by a bank during day-end processing, and the other was code that did nothing other than to stress critical section acquisition. We ran all three benchmarks up to 16,384 processes, which is an extreme level of stress. While a large application can have 16,384 processes, it would be unusual to have that many processes contending for the same critical section at the same time. In the bus boarding analogy, it would be like a large bus terminal with many buses and many people, but with everyone wanting to board the same bus at the same time.
Behavior under extreme stress of the interest posting benchmark is shown below. With the 3n+1 benchmark, the YottaDB mutex always performed better.
- Times are in milliseconds, i.e., smaller numbers are better.
- r2.02 is the current YottaDB production release.
- r2.03 corresponds to r2.03-pre+ in the table above, i.e., just prior to changes to use the YottaDB mutexes.
- r2.03+mutex is the code with the change to use YottaDB mutexes.

Two observations are evident from the graph.
- The changes to YottaDB r2.03 show a clear improvement over r2.02, analogous to speeding up the time it takes for each passenger to board the bus.
- YottaDB mutexes perform better than pthread_mutexes until around 2,000 processes, above which pthread_mutexes performed better. By analogy, when the number of people wanting to board the bus becomes large, externally organized queuing helps.
Adaptive Mutexes
That led us to the question as to whether it was possible to make mutexes adaptive, to use YottaDB mutexes at low to normal loads, to switch to pthread_mutexes under heavy loads, and to switch back under normal loads. By way of analogy, as the number of people wanting to board a bus increases to a point where they no longer organize themselves into a queue, one can bring in retractable belt barriers to organize a queue.
Important requirements of any adaptive method are:
- It must have minimal overhead: it should take minimal additional code to implement.
- Switching should be relatively inexpensive, especially the switch from YottaDB mutexes to pthread_mutexes. Since that switch happens under an increasing load, the switch should not further stress the system.
- It should have hysteresis, i.e., it should not bounce back-and-forth between the two techniques.
- It should not pretend to be universal: there will be workloads for which the adaptive method is not well suited.
Overhead is minimized by building on the existing database statistics, as reported by, for example, ZSHOW. While the statistics in the database fileheader record data from creation of a database file, switching is based on changes in the statistics.
A field in the database file header, which is mirrored in the shared memory segment for that region, records the current mutex type in use. Switching from YottaDB mutexes to pthread_mutexes is accomplished by setting the field to specify pthread_mutexes, and waking up all processes waiting for the mutex. There will be a momentary blip in CPU usage as the awoken processes execute pthread_mutexes. In an attempt to encourage fairness, the process initiating the switch awakens queued processes in the order in which they are queued; however, there is no guarantee that queue order will be preserved. Switching from pthread_mutexes to YottaDB mutexes is essentially the reverse, except that the processes are woken up by Linux, the manager of pthread_mutexes.
Switching from Linux mutexes to YottaDB mutexes occurs at a lower load than that at which switching from YottaDB mutexes to Linux mutexes occur. A heuristic that emulates a damped low-pass filter ensures that a load which randomly varies around the switching thresholds does not cause frequent switching back and forth.
The following graph shows the performance of the adaptive method. The interest posting benchmark is the same as the above graph; however, the results are from a different server. This benchmark showed similar results on all servers.
- Times are in milliseconds, i.e., smaller numbers are better.
- r2.02 is the current YottaDB production release.
- V7.1-002 is the GT.M version merged into the YottaDB code base.
- r2.03 (pthread mutex) corresponds to the r2.04 code base under development, with V7.1-002 merged, prior to the change to use the YottaDB mutexes.
- r2.03+ (ydb_mutex) corresponds to the previous item, with the change to use YottaDB mutexes.
- r2.03+ (adaptive mutex) corresponds to the code with the adaptive method.

From the graph:
- The r2.04 code base with V7.1-002 merged tracks V7.1-002 – any differences are just the randomness inherent to benchmarking.
- Both the above show a significant improvement over 2.02.
- Benchmarks that use the YottaDB mutex and the adaptive mutex track each other, and are significantly faster than the above at workloads through 2,048 processes.
- Above 2,048 processes, the benchmark with the YottaDB mutex shows significantly worse performance compared to r2.03 (pthread mutex) and V7.1-002, whereas the adaptive mutex tracks r2.03 (pthread mutex) and V7.1-002.
In summary, the adaptive mutex adapts to the workload, and gives you the best option for the current load – you can have your cake and eat it too. In our analogy, it is as if when the number of people wanting to board the bus reaches a certain threshold, retractable belt barriers suddenly appear, and suddenly disappear when the number of people wanting to board drops.
Since there may be workloads for which the adaptive method does not work well, the YottaDB r2.04 code includes MUPIP SET commands to ensure that processes accessing a database region or file use only a specified mutex.
mupip set -mutex_type=adaptiveto specify Adaptive mutex, the default for newly created database files.mupip set -mutex_type=ydbto use the YottaDB mutex.mupip set -mutex_type=pthreadto use pthread mutex.
You also have to use either the -region or the -file option to identify the regions or database files for which you wish to specify the mutex type.
What Now?
As of today, the mutex code for r2.04 is stable and merged into the master branch, ready for the r2.04 release. We are currently working on other code for r2.04. Although not suitable for production use, the master branch is stable for development, testing, and benchmarking. If you care to try the r2.04 master branch to see how mutexes perform with your application workloads, here’s how to install the master branch.
export tmpdir=<tmp directory> installdir=<ydb installation directory> mkdir -p $tmpdir wget https://download.yottadb.com/ydbinstall.sh chmod +x ydbinstall.sh sudo ./ydbinstall.sh --from-source --branch master --installdir $installdir --nolinkenv --nolinkexec --nopkg-config
Once you are done testing, simply remove the installed YottaDB from $installdir: sudo rm -rf $installdir.
If you do test it, please share your results. Thank you very much.
Credits
- Photo of People Boarding a bus at Davenport and Oakwood, Toronto, in 1927 is in the public domain.
- Photo of Mango Bus stand Jamshedpur bus terminal by Shahbaz26 released under the Creative Commons Attribution-Share Alike 4.0 International license.
We thank Lothar Jöckel for his first guest blog post, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.
TL;DR
If you’re a Nim developer interested in working with a powerful hierarchical NoSQL engine, or an M developer interested in working with powerful modern programming language then this is for you.
The combination of Nim’s modern language features with YottaDB’s battle-tested database engine creates a powerful stack for building high-performance, reliable systems. This binding bridges the gap between a database proven over decades of use in mission-critical applications and a contemporary systems programming language.
I’m pleased to announce nim-yottadb, a language binding that connects Nim with the YottaDB database. This gives you direct access to global and local variables, transactions, iteration, locks, and more – all from Nim.
A Simple Example
setvar:
^Users("john_doe", "profile", "name") = "John Doe"
^Users("john_doe", "profile", "email") = "john@example.com"
let userName = get: ^Users("john_doe", "profile", "name")
echo "Hello, ", userName
In this post, I want to walk you through:
- What YottaDB is (at a high level)
- Why binding it to Nim is interesting
- What features nim-yottadb currently offers
- How its API and DSL (Domain Specific Language) design works
- Caveats, threading, and implementation notes
- Examples and next steps
Key Features of YottaDB
YottaDB is a high-performance, schema-less, key-value database designed for extreme scalability and reliability, particularly in transaction-heavy environments. Its origin is the M (affectionately known as MUMPS) language and database, which has been battle-tested in mission-critical systems for decades.
Key-Value Data Model with a Hierarchical Twist
At its core, data is stored as sparse, multi-dimensional arrays. A “global” variable starts with a caret (^), can have subscripts, creating a natural tree structure. A global variable is just a key-value node that persists and is shared, i.e., it is in the database.
^Patients("Smith", "John", 2024, "Visit") = "Checkup"
This model is incredibly flexible (schema-less) and allows for efficient hierarchical data access.
Extreme Performance and Low Latency
YottaDB has a in-memory database engine with cooperating processes managing database files that use transaction journals for Durability. All data operations are performed directly in memory, making it exceptionally fast.
A daemonless database whose logic executes in the address spaces of processes accessing the database, with control structures in shared memory, eliminates connection overheads and minimizes resource contention. This yields massive scalability on multi-core servers.
Rock-Solid Reliability and ACID Transactions
YottaDB is proven in mission-critical applications where data loss is unacceptable (e.g., banks, hospitals). It provides fully ACID (Atomic, Consistent, Isolated, Durable), robust transaction processing system that uses write-ahead journaling. Database updates are first written to a journal file before being applied to the database, ensuring data can be recovered after a crash.
Tight Integration of Database and Programming Language
This is a hallmark of the M heritage. Database operations are simple commands within the M language. There is no separate query language (like SQL) or connection string. A simple command like SET ^Customer(123)="John" both updates the variable in memory and commits the change to the database. This eliminates object-relational mapping (ORM) overhead and makes the code very concise for data manipulation.
YottaDB brings this tight integration to languages other than M.
Mature and Robust Codebase
M technology has its roots in the 1960s MUMPS. YottaDB itself is a direct descendant of GT.M, which was first deployed in 1986. The code base has decades of use in mission-critical, high-availability systems, and is the database of record at some of world’s largest real-time core-banking systems.
Efficient Database Replication
Efficient database replication provides built-in, low-latency replication between database instances. This is crucial for creating hot-standby systems for disaster recovery. Replicas can also be used downstream from production systems for real-time analytics, reporting, etc.
Replication protects mission-critical applications like core banking systems, healthcare information systems, stock exchanges, and any other application that requires the ability to remain continuously available in the face of unplanned as well as planned events.
Nim
The Nim programming language is a statically typed, compiled systems language that has a unique and powerful set of features. It’s often described as having the performance of C or C++, the expressiveness of Python, and the safety of Rust or Ada.
Python-like Syntax with Static Typing
This is one of the most immediately appealing features. Nim’s syntax uses significant whitespace (indentation) like Python, making it clean and easy to read. Unlike Python, it’s statically typed, meaning type errors are caught at compile time, leading to more robust and performant code. The compiler does all the type checking before the program ever runs.
Looks like Python, but is statically typed and compiled!
proc greet(name: string, age: int): string =
return "Hello, " & name & ". You are " & $age & " years old."
echo greet("Alice", 30)
Compiles to Efficient C, C++, and JavaScript
Nim doesn’t have a virtual machine. Instead, it compiles its source code down to another language. By compiling to C, C++, or even Objective-C, it achieves performance comparable to these native languages. The generated C code can be compiled on virtually any platform.
The JavaScript target allows you to write both your backend and frontend logic in the same language, enabling full-stack development with Nim.
Powerful Metaprogramming
This is one of Nim’s superpowers. You can generate code at compile time, reducing boilerplate and creating powerful Domain-Specific Languages (DSLs).
With Templates you perform simple syntactic substitutions (hygienic macros). They are like C macros but much safer and more integrated.
The most powerful feature is Macros. You can manipulate the Abstract Syntax Tree (AST) of your code at compile time. This allows you to implement new language features, validate complex conditions, or generate code based on custom logic.
Memory Safety and Control
Nim offers a pragmatic approach to memory management.
It comes with several built-in garbage collectors (e.g., deferred reference counting, mark-and-sweep, …). The default is fast and pause-free for most applications.
For systems programming or real-time applications where garbage collection pauses are unacceptable, you can use manual memory management (alloc(), dealloc()) or leverage the --gc:arc or --gc:orc (Owned Reference Counting) options, which provide deterministic, non-tracing memory management without a garbage collector.
Generics, Union Types, and More
Nim’s type system is both practical and expressive. Full support for generic programming, allows you to write flexible and reusable code for different types.
Sum Types (Variant Objects) let you define a type that can hold values of different, but fixed, types. This is excellent for modeling state.
With Distinct Types you create a new type that has the same underlying representation as an existing type but is considered incompatible with it (e.g., type Meter = distinct int and type Kilogram = distinct int prevents you from accidentally adding meters to kilograms).
Zero-Cost Abstraction and Efficiency
Nim is designed to be highly efficient.
- No Runtime Overhead: Features like iterators, templates, and generics are resolved at compile time, resulting in code that is as fast as hand-written C.
- Value Types: Structs are value types by default (stored on the stack), which is cache-friendly and fast.
- Direct Control: You have low-level control over memory layout, pointers, and can even inline assembly code when needed.
Unified Function Call Syntax (UFCS)
This syntactic sugar allows for both method-call and function-call syntax, where a.f(b) is equivalent to f(a, b). This enables a fluent, “chaining” style of programming that is very readable, similar to what you find in Unix pipes or Rust.
Cross-Platform and Interoperability
- Native Executables: Nim compiles to a single, dependency-free native executable, making deployment trivial.
- Excellent C/C++ Interoperability: You can directly import and call C/C++ functions and libraries with minimal effort, making it easy to leverage existing codebases.
- Cross-Compilation: It’s straightforward to compile for a different target platform (e.g., compile for Windows on a Linux machine).
Async/Await for Concurrency
Nim has a built-in async/await mechanism for writing highly scalable asynchronous I/O operations, similar to what you find in Python, JavaScript, or C#. This makes it well-suited for network servers and clients.
What nim-yottadb provides
Many environments use YottaDB already. With a Nim binding, you can write new components or tooling in Nim that integrate with existing YottaDB data.
The flexibility of Nim’s metaprogramming (macros, templates) enables a nicer API and DSL wrapper over the more “raw” C interface. You can mask lower-level details, make code more expressive, and reduce boilerplate.
Core (Simple-API)
The binding exposes a basic set of database operations, roughly mapping to YottaDB capabilities:
ydb_data— inspect node or subtree state (e.g. whether there is data, subtree, both or neither)ydb_delete— delete a node or an entire subtreeydb_delete_excl— delete local variables except some exclusionsydb_get / ydb_set— read or assign the value of a local or global variableydb_incr— atomic increment (local or global)ydb_lock— lock one or more global variablesydb_lock_incr/ydb_lock_dec— manipulate a lock countydb_node_next/ydb_node_previous— traverse siblings or nodes in and out of orderydb_subscript_next/ydb_subscript_previous— step through subscript ranges under a globalydb_ci— call a routine (M / YottaDB “Call-In Interface”)
These correspond fairly directly to the YottaDB C API (or M primitives). The binding supports both single-threaded and multi-threaded modes (automatically selected when compiling with --threads).
Extensions & syntactic sugar
To make working with the binding more ergonomic, nim-yottadb adds:
Iterators
You can iterate over next/previous nodes or subscripts, looping over nodes instead of manually calling next node/sub.
YdbVar
YdbVar is a type with overloaded operators ($, []) so that a global can be referenced in a natural, array-like way.
DSL
Instead of using the simple API directly an alternative exists with a DSL (Domain Specific Language). The DSL offers Nim-style keywords/mnemonics for common operations. So instead of writing
ydb_setvar("^building", @["Room", "1", "size"], "22.5")
you can write
setvar: ^building("Room", 1, "size")=22.5
setvar / get
setvar:
^XX(1,2,3)=123
^XX("B",1)="AB"
Support for mixed type subscripts
setvar: ^X(id, 4711, "pi") = 3.1414
setvar: in a loop
for id in 0..<5:
setvar:
^CUST(id, "Timestamp") = cpuTime()
^CUST(id, "loop") = id
increment
Increment a global in the database by 1
let nexttxid = increment: ^CNT("TXID")
let accid = increment: ^Customer(nexttxid, by=1000)
data
Test if a node or tree exists and has a subtree.
setvar:
^X(5)="F"
^X(5,1)="D"
dta = data: ^X(5)
assert YdbData(dta) == YDB_DATA_VALUE_DESC
delnode
Delete a node. If all nodes of a global are removed, the global itself is removed.
delnode: ^X(1) # delete node
deltree
Delete a subtree of a global. If all nodes are removed, the global itself is removed.
deltree: ^X(1)
lock
Lock upto 35 named lock resources. Other processes trying to lock a locked resource must wait until the lock is released. {} has to be used to lock multiple resources in one operation; an empty list releases all locked resources. If lock: is called again, the previous locks are automatically released first.
lock:
{
^LL("HAUS", "11"),
^LL("HAUS", "12"),
^LL("HAUS", "XX"), # not yet existent, but ok
}
Additional locks can be acquired or released, without affecting other locks held by a process, by prefixing lock resource names + or -.
The template withlock simplifies the locking further:
let amount = 1500.50
withlock(4711):
setvar:
^custacct(4711, "amount") = amount
^booking(4711, "txnbr") = amount
On leaving the withlock block, the lock is automatically released.
nextnode / prevnode / nextsubscript / prevsubscript
Traverse a global/subscript in the collating sequence.
(rc, node) = nextnode: ^LL()
(rc, node) = prevnode: ^LL("HAUS", "ELEKTRIK", "DOSEN", "1")
(rc, subs) = nextsubscript: ^LL("HAUS", "ELEKTRIK")
(rc, subs) = prevsubscript: ^LL("HAUS", "FLAECHEN")
‘get’ with postfix
It is possible to enforce a type when getting data from YottaDB. By using a “postfix” an expected type can be defined and tested.
let i = get: ^global(1).int16 let f = get: ^global(4711).float32
If the value from the db is greater or smaller than the range defined through the postfix, a ValueError exception is raised.
The following postfixes are implemented:
int,int8,int16,int32,int64uint,uint8,uint16,uint32,uint64float,float32,float64
The .binary Postfix
The binary postfix allows binary data of virtually unlimited size to be read from the DB. setcan save data, theoretically upto 99,999,999 MB.
let dbval = get: ^tmp(4711).binary
Saving a Nim Object-Tree to the database
Based on the Nim object model, it is possible to store objects, even complex ones, in the database. A global variable is created for each type, e.g., Address, Customer, etc. Attributes are then stored with their corresponding names.
type
Address* = object of RootObj
street*: string
zip*: uint
city*: string
state*: string
let address = Address(street: "Bachstrasse 14", zip:6033, city:"Buchs", state:"AG")
store(@["4711"], address)
The data is stored as
^Address(4711,"city")="Buchs" ^Address(4711,"state")="AG" ^Address(4711,"street")="Bachstrasse 14" ^Address(4711,"zip")=6033
Performance
In general, the Nim / YottaDB language binding has excellent performance. Simple tests on a MacMini M4 with a virtualized Ubuntu (2 Cores / 4GB Memory) gives the following figures where every test had 10 million different records.
upcount dsl 2439 ms. (Increment a Global) set dsl 2479 ms. (Set global value) nextnode dsl 1536 ms. (Iterator over all nodes) delnode dsl 2774 ms. (Delete all nodes)
This means writing 4,100,041 records per second and traversing the nodes at 6,510,416 nodes per second. I think these are impressive numbers!
A Larger Example
The following program retrieves images from a directory, stores them in the database, and extracts them into another directory.
import os
import std/[times, strutils]
import yottadb
proc walk(path: string): seq[string] =
for kind, path in walkDir(path):
case kind:
of pcFile, pcLinkToFile:
result.add(path)
of pcDir, pcLinkToDir:
result.add(walk(path))
proc loadImagesToDb(basedir: string) =
for image in walk(basedir):
let image_number = increment(^CNT("image_number"))
setvar:
^images($image_number) = readFile(image)
^images($image_number, "path") = image
^images($image_number, "created") = now()
proc saveImage(target: string, path: string, img: string) =
if not dirExists(target):
createDir(target)
let filename = path.split("/")[^1]
let fullpath = target & "/" & filename
writeFile(fullpath, img)
proc readImagesFromDb(target: string) =
var (rc, subs) = nextsubscript: ^images(@[""]) # -> @["223"], @["224"], ...
while rc == YDB_OK:
let img = getblob(^images(subs))
let path = get(^images(subs, "path"))
saveImage(target, path, img)
(rc, subs) = nextsubscript: ^images(subs)
if isMainModule:
loadImagesToDb("./images") # read from the folder and save in db
readImagesFromDb("./images_fromdb") # read from db and save under this folder
Conclusion
The nim-yottadb binding successfully bridges two powerful technologies from different eras of computing. YottaDB brings decades of refinement in hierarchical data management and transaction processing, while Nim offers modern language features, metaprogramming capabilities, and performance characteristics that rival lower-level systems languages.
What makes this integration particularly compelling is how Nim’s DSL capabilities and clean syntax make YottaDB’s hierarchical data model feel natural and expressive. The ability to write database operations that look like native Nim code, while maintaining the performance and reliability of a battle-tested database engine, represents the best of both worlds.
The performance benchmarks demonstrate that this binding doesn’t sacrifice speed for convenience—Nim applications can leverage YottaDB’s capabilities with minimal overhead, making it suitable for the same high-performance, transaction-heavy use cases that YottaDB has traditionally served.
For developers working with existing YottaDB systems, nim-yottadb provides a path to modernize tooling and develop new components without abandoning proven database infrastructure. For Nim developers, it opens access to a unique class of hierarchical database that excels in scenarios where relational databases might struggle.
As the binding continues to evolve, it represents not just a technical achievement, but a practical solution for building robust, high-performance systems that need both modern development ergonomics and proven data reliability. Whether you’re extending legacy M applications or building new systems from scratch, nim-yottadb offers a compelling combination of performance, reliability, and developer experience.
The project is available on GitHub as nim-yottadb, and welcomes contributions from both the Nim and YottaDB communities.
About Lothar Jöckel
Lothar began his IT career in 1989 at McDonnell Douglas in the Health Software Division, with Pick/Reality and the “Homer” system. In a varied career of many years, he worked on library systems, banking systems, military and intelligence systems, and was an early user of AI image recognition methods to determine the positions of trains in the Swiss Federal Railways (SBB). He is curious about new languages and frameworks, such as Rust and Nim.
Contact: lothar.joeckel@gmail.com
Credits
- Visualization of Nimber products of powers of 2. Copyright Watchduck and licensed under the Creative Commons Attribution 3.0 Unported license.
- Blog roll picture of Nim game. Copyright Uncopy and licensed under the Creative Commons Attribution 3.0 Unported license.
We thank Alex Woodhead for his first blog post, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.
TL;DR
Generating patterns for string pattern matching in the M language is expert friendly. An AI tool eases the task. There is a live demonstration where you can try it for yourself.
Using AI to Generate M Patterns
Programs need to determine whether a string matches the pattern for a specific type of data. For example:
| Data Type | Example |
|---|---|
| info@genput.com | |
| Phone Number | 213-101-0101 |
| Date | 06-07-2025 |
Pattern matching allows M (affectionately known as MUMPS) language code to determine whether a string matches a pattern, for example, whether a line of input contains a telephone number, an e-mail address, a date, etc. For example, 1"("3N1")"2(1"-"3N)1N matches a US telephone number in one common format, e.g., (123)-456-7890.
As the syntax of M patterns can be arcane, this blog post describes an AI generative pretrained transformer (GPT) for generating the code for M patterns from natural language input as well as sample datasets.
Patterns can be generated from natural language descriptions or from examples.
Patterns from Descriptions
A natural language description of a telephone number in the format (###)-###-#### where # is a digit could be:
module A one of character minus followed by three of numeric characters the main pattern is as follows: one of character open-parenthesis followed by three of numeric characters followed by one of character close parenthesis followed by two of module A followed by one of numeric character
Natural language descriptions aim to be readable and understandable without technical training. Updating a textual description is more user friendly and less error prone than manual pattern match code. The technical demonstration uses generative AI to transform these text descriptions into M source code code, thus making existing pattern match code accessible for maintenance and support.
Patterns from Samples
Consider the following sample records:
| Item | Record |
|---|---|
| 1 | (003)-615-2614 |
| 2 | (519)-523-0258 |
| 3 | (266)-885-4964 |
| 4 | (261)-274-3909 |
| 5 | (752)-129-3876 |
| 6 | (173)-514-2497 |
| 7 | (040)-511-8991 |
| 8 | (467)-715-3325 |
| 9 | (488)-269-8099 |
| 10 | (025)-705-8417 |
An author of pattern match code may follow a development process:
- Review the sample records for format features:
- Numeric sequence length
- Open and Closing Brackets
- Hyphen delimiter
- Decide the characteristics of format rule elements to use.
- A rule can be defined in different ways.
- Which approach is elegant, optimal and maintainable?
- Implement Pattern Match code to encapsulate the format.
- Test the code against code samples.
What if AI could replace these steps and create pattern match code automatically from the same data samples? The technical demonstration replaces all manual steps 1 through 4 above with automatic code creation.
Finally this technical demonstration closes the development iteration loop for pattern match code by offering:
- Generating natural language descriptions from existing pattern code.
- Generating compliant samples and anti-samples from existing pattern code.
Building Models
In early model training cycles it was found that the features for description-to-pattern were incompatible with samples-to-pattern challenges. Hence development then proceeded to create two separate models, one for each task area.
Description-to-Pattern Model
A synthetic dataset was created consisting of sets of pattern match codes followed by their respective descriptions. An extensive balanced representation, and randomized variation of pattern features was needed to learn:
- Repetition range, e.g., “One-To-Three” numeric characters
- Structure
- Optionality
- Alphanumeric, Numeric, Punctuation
- Literal strings
For model description text, multiple sample variations are introduced to anticipate flexibility for prompt input for example:
- Quantity alias: “two” can be represented in similar words like “Double”, “Twice” and “Pair” or the value “2”
- Character alias: “-” and the word “minus” can be used interchangeably.
By introducing description variation in named written natural languages, a single assimilated model can process prompts in multiple languages.
Samples-to-Pattern Model
Another synthetic dataset was created consisting of sets of sample values followed their respective pattern code. As before an extensive balanced representation, and randomized variation of pattern features was needed to learn:
- Repetition range: “one-To-three” numeric characters
- Structure
- Optionality
- Alphanumeric, Numeric, Punctuation
- Literal strings
Additionally the dataset is balanced for:
-
- Generalization
- Prefer generating exact patterns for small data samples
- Prefer generating flexible patterns for samples with wide variation
- Delimiters, for example:
- Character
"-"in text"(003)-615-2614" - Character
"^"in text"Smith^Bob^M^19340815"
- Character
- “Open-Close” pairs features for example:
"("and")"in text"(003)-615-2614""<"and">"
- Generalization
To clarify and elaborate on how the term “generalization” is being used here: In model training there are the concepts overfitting and underfitting. If a model is over trained on a sample dataset it does not perform well for future tasks on new samples not in the original training dataset. There is a similar scenario for the completed PatternMatch model. The model needs to suggest from a small number of sample records, the likely useful pattern match expected. Some patterns can represent tens of thousands of variations of possible samples. Some patterns are very specific with only a few possible samples that all fit in the prompt supplied to the AI. The training data is deliberately curated to provide a wide and graduated range of exact to more generalized patterns. This appears to imbue a logical “pattern usefulness”.
To postulate how the model may achieve this, consider the next possible generated character when outputting software code. This has a constrained range of possibilities.
- In samples where the association of the next token in sequence is weighted to relate to only a small number of possible tokens, this deep feature then suggests generating patterns with small number of possible samples.
- In samples where the association of the next token in sequence is weighted to relate to a wider variety of possible tokens, this deep feature is cascading to prefer generating more general patterns for large possible samples.
The term generalization is wrapping the spectrum of this behavior. The process is not using a human chain-of-thought reasoning to achieve output, but a more fundamental set of self-taught features, impressed during training.
Samples are required to fit within the context window of the GPT. Randomized variation in the number of sample records within each training item, improves deduction capability at low sample numbers. Empirical benchmarking is used to evaluate performance for Generalization vs Delimiters style patterns.
Within the pipeline, two forms of pattern were tracked, where one is a simplified form. This endows generative behavior with a preference for shorter pattern forms, e.g., consider rule “two numeric followed by two to four numeric” can be more simply expressed as “four to six numeric”.
Training Effort
Environment: Nvidia Cuda A10 GPU on the Huggingface platform.
Description-to-Pattern Model
A full retrain is needed when incorporating each new language.
| Stage | Continuous GPU training |
|---|---|
| New dataset | 4 days |
Samples-to-Pattern Model
| Stage | Continuous GPU training |
|---|---|
| Prototype base dataset | 4 days |
| Main dataset | 13 days |
| Second refined dataset | 2 days |
| Third refined dataset | 4 days |
As the models are separated by task, it becomes convenient to add new language support to descriptions with a relatively quick turnaround.
Benchmarking
Complete success of a pattern match was defined as its ability to satisfy all of its respective candidate sample records, not just those that fit within a context window.
Overview Benchmark Report
| Total benchmark tests used | 3895 |
|---|---|
| Mean success across all matches | 91.75% |
| Complete pattern match success | 81.98% |
| Partial pattern match success | 15.74% |
| Unsuccessful match records | 2.28% |
The following table gives examples from benchmark candidates demonstrating partial success. It shows the percentage of sample records successfully matched to generated pattern code.
| Item | Sample Size | Context Window | Rows Matched | % Match | Actual Generated Pattern | Pattern Template |
|---|---|---|---|---|---|---|
| 1 | 31 | 31 | 28 | 90.3 | 4UN5AN2.3(4UNP2"/"1UP... |
4UN5AN2(4PUN2"/"1PU... |
| 2 | 31 | 31 | 17 | 54.8 | 5N5ANP3"7Æ6N1ŃS8"1(2... |
5.6N5NPA3"7Æ6N1ŃS8"... |
| 3 | 31 | 24 | 15 | 62.5 | 5.8"6Ã02"1.2LNP1.2(1"K"4... |
5.8"6Ã02"1.2LNP1.2(1"K"4... |
| 4 | 31 | 19 | 28 | 90.3 | 3P5UN5NP5UN5UN3AN3... |
3P5NU5NP4UN5NU4NA... |
| 5 | 31 | 10 | 7 | 22.6 | 4LNP3.6AN4"×_"5"¤īĩĵ®ü"... |
4NPL2.5NA4"×_"5"¤īĩĵ®ü... |
| 6 | 30 | 9 | 26 | 86.7 | 5.6"ŀ¬¦"5"!pp%"1UNP5"Ù6... |
5.6"ŀ¬¦"5"!pp%"1PNU5"Ù... |
| 7 | 25 | 12 | 5 | 20.0 | 2"AAHH"5"30"3(4.7UP,4L)... |
2"AAHH"5"30"5(3UP,4L,4... |
| 8 | 31 | 14 | 30 | 96.8 | 5LP4")))¦¦§÷"4.10LN5.10U... |
5PL4")))¦¦§÷"4.11NL5.10P... |
Column Explanation
- Sample Size – The number records available for test sample. Sometimes a pattern describes less than 31 possible exact matches.
- Context Window – Maximum number of sample records used by model to generate a new pattern.
- Rows matched – This is the number of ALL sample records that were matched by the generated pattern. This may be larger than the actual context window.
- Rows matched – Percentage of Rows Matched divided by Sample Size.
Items 4, 6 and 8 all achieve a match count greater than the context window. When pattern match code is generated from sample values the context window used and proportion of matching is returned and displayed as a comment in the technical demonstration.
Front-end Framework
The technical demonstration employs a Gradio web framework as the web display layer. This fits well with HuggingFace ecosystem and infrastructure choices. It provides both a browser displayable layer and a callable API. The callable API is used by benchmarking scripts to leverage cloud GPUs to effect faster report turnaround. The Gradio framework provides a low code user interface implementation approach, freeing more time to focused on domain specific training challenges. For example, in a Python virtual environment with CUDA available, one can translate from English to French, in just a few lines of code:
from transformers import pipeline
import gradio as gr
pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fr",device="cuda")
demo = gr.Interface.from_pipeline(pipe)
demo.launch(server_port=8080)
This renders the following fully operational web interface for translation:

The following screenshot shows the more involved demo tool interface utilizing the same Gradio framework:

Legend:
| Item | Control | Description |
|---|---|---|
| 1 | Pattern Text | One or more lines of M code. The pattern can be extracted from the first line containing a pattern match expression. |
| 2 | Describe Text | This holds a structured description corresponding to code code expression pattern. |
| 3 | Describe Pattern Button | This button action transforms the pattern in code into a human readable structured description. |
| 4 | Pattern from Description | This button action uses generative AI to translate the natural language description found in English, French, Spanish, or Portuguese into new pattern match code. |
| 5 | Matches List | Sample values that match the code expression. |
| 6 | Non-Matches List | Sample values that fail to match the code expresion. |
| 7 | Generate Values Button | This button action extracts the pattern in code and generates the two lists of matching and non-matching values. |
| 8 | Pattern from Values Button | This button uses generative AI to transform the values in the “Matches” column into a new pattern match code expression. |
| 9 | Validate Values Button | Re-validates all the sample values in both the “Matches” and “Non-Matches” columns. |
The “tick” and “cross” symbols in the “Good” column, confirm expected sample result for “Match” or “Non-Match” context. The use of “tick” and “cross” was chosen to communicate purpose in a mixed language user interface. The “gear” symbol on buttons is used to convey which parts of the user interface employ generative AI functionality.
Example of user workflow steps for the use-case to adjust an existing pattern match expression:
| Step | Description |
|---|---|
| 1 | Paste lines of code into “Pattern” text input. |
| 2 | Press “Describe Pattern” button. |
| 3 | Adjust or extend natural language description in chosen language (English, French, Spanish, Portuguese). |
| 4 | Press “Pattern from Description” button to generate new pattern code. |
| 5 | Press “Generate Values” button for new random testing values to explore and refine matching behavior. |
| 6 | Edit and add new samples in “Matching” and “Non-Matching” lists |
| 7 | Press “Validate Values” button to confirm adjusted sample values all pass as required. |
Internationalization
In order to provide to runtime switchable language in the web interface, the community Python package gradio_i18n was employed. Label values are defined in a static dictionary keyed by language. The fiddliest part in demo was getting column titles updating when language changes.
For description training data, a different approach is needed that also incorporates tracking for:
- Single or Plural quantities.
- Gender, e.g., French has words “un” and “une” for number one.
| English | French |
|---|---|
| a table | une table |
| a candlestick | un bugeoir |
In Spanish, the plural address differs.
| English | Spanish |
|---|---|
| the tables | las mesas |
| the Cars | los coches |
For language specific error and informational messages, Python format strings are employed instead of composition. For example the info message placeholder: “info_X_of_Y_rows_used”:
| Language | Python Format String |
|---|---|
| English | "{} of {} rows used." |
| French | "{} de {} lignes sont utilisées" |
| Spanish | "utilizan {} de {} filas" |
| Portuguese | "{} de {} linhas são usadas" |
Application Tricks
Retry and Context
When using a list of values to deduce a pattern, the order of the supplied values can influence the model success. To game this, there is a server-side retry-loop that shuffles the values in the context window. Attempting up to 5 times to find a fully matching pattern as validated against ALL of the sample values. It stops at the first full match.
When only a subset of sample values fit in the context window the returned pattern will include a description of the number of rows of samples used. This gives a hint and opportunity to promote specific rows into the context window for a better inference.
Generated patterns are validated against the whole list of patterns. Where this is a partial success pattern suggestion the number of sample rows that do match will be quantified and displayed in code comment.
Future Opportunities
The YottaDB and GT.M pattern match operator offera an extensibility mechanism. Similar to how pattern match code letters ANP mean Alphanumeric, Numeric and Punctuation respectively. Additional letters can encode context specific meanings. Should there be a commonly accepted usage of extensibility for a particular set of applications, it may be possible to tailor an additional trained model to accommodate these features.
About Alex Woodhead
An M programmer with almost a quarter century of experience, Alex Woodhead has worked on projects relating to clinical information system projects on multiple continents. Now living in Southern California, he is focused on synthetic data pipelines for generative AI for novel tools and automation, and is incubating genput.
Contact: info@genput.com
-
-
- Picture of Hunting Carpet made by Ghyath ud-Din Jami, Wool, cotton and silk, 1542–1543, Museo Poldi Pezzoli, Milan. This file has been identified as being free of known restrictions under copyright law, including all related and neighboring rights.
- Blog roll picture generated by OpenArt in response to a prompt by K.S. Bhaskar.
- Alan Turing quote from https://mathshistory.st-andrews.ac.uk/Biographies/Turing/quotations/
- Other graphics provided by the author.
-
YottaDB r2.02 includes a number of features and enhancements that make YottaDB easier to use, and more like other Linux programs. For example:
- With `ydbsh`, you can create shebang style scripts with M code.
- SOCKET devices support TLS connections using server certifications that do not require a password, such as those issued by Let’s Encrypt.
- Several optimizations to speed up Boolean operations.
In addition to enhancements and fixes made by YottaDB, r2.02 completes the merging of V7.0 GT.M versions into the YottaDB code base, GT.M V7.0-002, V7.0-003, V7.0-004, and V7.0-005, as described in the Release Notes.
Comparing YottaDB and Redis Using 3n+1 SequencesTL;DR: Performance Comparisons has instructions for you build a Docker container that allows you to make a side by side comparison of RedisⓇ,[1] Xider™,[2] and YottaDBⓇ. The image above is a screenshot of Xider and YottaDB outperforming Redis with a 32-process workload.
In 2025, the Journal of Computing Languages (JCL) plans a special issue recognizing 30 years of Lua. Since YottaDB has a native Lua API, we have submitted an article to JCL for that issue. A preprint of that article is on arXiv. The article compares the Redis and YottaDB APIs, and delves deeper into the performance comparison described below.
The choice of workload is important when benchmarking databases.
- A realistic benchmark must perform a large number of accesses, in order to remove timing jitter.
- Accesses must be different types, as real world workloads are not monolithic in their databases accesses.
- The benchmark must run on a variety of computing hardware.
- The benchmark must give consistent results on repeated runs.
- The workload should be simple and understandable.
Computing the lengths of 3n+1 sequences meets these requirements. Given an integer n, the next number in the sequence is (a) n÷2 if n is even, and (b) 3n+1 if n is odd. The Collatz Conjecture, one of the great unproven results of number theory, asserts that all such sequences end in 1, i.e., there are no loops, and no sequences of infinite length. For example:
- 3 → 10 → 5 → 16 → 8 → 4 → 2 → 1
- 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1
Note that the sequences starting with 3 and 13 both meet at 10. If there are two processes, one computing the lengths of 3n+1 sequences starting with 13, and the other those of sequences starting with 3, and if they use a database to store the results as they work, then the process which reaches 10 later can simply use the results of the earlier process which has already computed and stored the results for the rest of the sequence.
The picture[3] is a visualization of the 3n+1 sequences of the numbers through 20,000, illustrating the convergence of the sequences, with all eventually converging on 1.
The problem is simple enough for programs to be easily written in virtually any language. While our intent in our blog post Solving the 3n+1 Problem with YottaDB was to compare programming languages, the programs can also be used to compare databases, and database APIs. In particular, we can compare Redis, YottaDB, and Xider. Xider is an API layer that allows YottaDB to provide applications with a Redis-compatible API, with two options:
- over TCP using the Redis Serialization Protocol (RESP); and
- for databases residing on the same machine as the client, we offer a direct connection to the database engine using Python and Lua APIs. The API is the same as the TCP / RESP API, except that it calls the database engine using our in-process native API calls.
In addition to the above two comparisons, which use the same program for Redis and Xider, there are also Lua, Python, and M programs that directly access the database, i.e., without going through a Redis compatible API layer.
We invite you to compare Redis, Xider, and YottaDB for yourself, and send us your comments.
[1] Redis is a registered trademark of Redis Ltd. Any rights therein are reserved to Redis Ltd. Any use by YottaDB is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and YottaDB.
[2] As Xider is still under development, not all Redis functionality is available. Any implemented functionality is upward compatible with Redis. Since Xider is continuously released, if functionality your application needs is not available, please check whether there is an Issue for it, and create one if there isn’t one. You can create a GitLab notification to stay informed with Xider updates.
[3] Picture generated using The Collatz Conjecture and Edmund Harriss’s visualisation tool.
We thank Kurt Le Breton for his first blog post, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.
A New Hope: The Emergence of MUMPS and YottaDB
In a computer universe not so far far away…
It is a time of great innovation.
Revolutionary technology is emerging from the depths of computing history, where storage space and code efficiency were once the ultimate challenges.
In the midst of this revolution, a succinct though ancient language, born from the constraints of early computing, stands as a beacon of simplicity and robustness.
MUMPS emerges from its historic confines on a unique mission: to engage in an epic battle against a squadron of transactions, overcoming all conflicts with unparalleled grace.
As millions of processes vie for victory in a grand auction of galactic proportions, the force of YottaDB’s internal transaction management triumphs, showing how old design constraints can lead to new levels of resilience.
A Chatbot, a Guest Writer, and a Coding Journey
Dramatic? Maybe a little over the top. Corny? Yeah, of course, but what self-respecting sort of nerd would I be without some sci-fi references! I do hope to have piqued your curiosity though, and yes, there will be an epic battle of transactions before we conclude, so stay tuned.
Truth be told, I’m genuinely surprised to be writing this today. I didn’t expect a spur-of-the-moment email to the YottaDB team would result in a request from Bhaskar to be a guest writer. Seriously, I’m a coder, not a wordsmith, so I apologize for getting ChatGPT to help out with the introduction (though I must admit, I’m partly responsible for the corniness).
ChatGPT! Yes, this whole episode started with AI – it got me into this mess in the first place. I promise, though, this is not a story about AI but rather … well … let me rewind.
Back in the late 90’s when I landed my first programming job it was with a medical technology company in Australia writing hospital information systems and then shortly afterwards I was with another firm writing laboratory ones. You can probably guess by now that at the core of both these products was MUMPS. That was my first introduction to the technology.
Now, I do know that the ANSI Standard refers to the language as M, however since I come from the world of medical software, I’m allowed a little case of the MUMPS now and then. So in the spirit of Bhaskar’s presentations, I’ll be keeping those mustard stains on my tie. But I digress…
Since I was fresh out of college, super green, and a little naive, I asked our database engineer why the company didn’t use XYZ Brand relational database. The short answer: it just wouldn’t cut it. At the time, I felt this wasn’t much of an explanation, but I did become curious.
I’m not quite sure what sparked my interest, but the hierarchical structure really appealed to me. Even today, I can’t fully explain why, but I’m still captivated by MUMPS. Bitten by the bug so to speak … I know, bad joke … perhaps I will change that tie after all! Thinking on it now, I would describe M as unique: it’s a little rough around the edges and yet, I think, remains fascinating and compelling even today.
So where is this all leading, you ask? Specifically to an implementation detail in GT.M, and by extension YottaDB, that truly stands out. It’s a hidden gem, hinted at in M’s language semantics, but only fully realized through GT.M’s engineers. I’ll delve into details a bit later.
But first let’s jump back to ChatGPT and the very start of our story.
The Hidden Gem: YottaDB’s Conflict Resolution
I enjoy coding! I really do. Like most programmers, I have my fair share of “pet” projects to work on. As a long-time M geek, it’s no surprise that at least one of those is M-based. And with YottaDB being open-sourced, it was the natural choice, since hobbies rarely generate income.
But, as any developer with niche interests knows, it can get a bit lonely in the wee hours of the morning when there’s no one around to bounce ideas off. So, of course I ended up talking to a chatbot!
It was during one of these moments with ChatGPT, while weighing the pros and cons of a decision I was stuck on, that the conversation delved into the internals of YottaDB’s transaction handling, particularly its automatic conflict resolution and retry mechanism.
And herein lies the hidden gem I wish to explore with you all.
A Reality of Relational Databases
But before we dive in, please let me take another quick step back in time. I didn’t spend my entire career with M-based companies. Eventually, I moved on and found myself working in the more ubiquitous world of relational databases. As capable as these systems are, I always felt something was missing. For certain data needs, I knew a more elegant way to model it would be hierarchically. However, I was stuck normalizing everything into a trillion tables.
I really missed M, but now I was working in the relational world and had to code within the constraints of that architecture. Speaking of those constraints, let me state an obvious, but crucial, point: relational databases are self-contained products. Your data is separate from your application logic. Hence the database has no understanding of how other software uses that data; it simply responds to the requests it’s given.
It simply responds, until it can’t!
Write issues are generally rare for any system under light load. But, just occasionally, one process might commit its transaction before another can, resulting in a concurrency violation. The database doesn’t have enough context to manage this for you and so its responsibility rightly ends in maintaining data integrity.
All databases handle failed requests through various mechanisms, but ultimately, it’s your code that will face the brunt of those errors. The hot-potato is always thrown back to the application developer. It’s a significant amount of error handling to manage in addition to your primary code. The main point is, issues intensify under heavier and heavier load.
In the relational world, this is just an accepted reality.
YottaDB’s Approach: Seamless Conflict Handling
But with YottaDB, a different reality awaits you.
At the heart of a YottaDB based system is the integration of code execution and data access from within the same process. This tight coupling, required by the M language, lays the foundation for YottaDB’s robust resolution mechanism.
Building on this, YottaDB also employs an optimistic concurrency strategy, which operates on the assumption of success and is highly performant most of the time. When inevitable conflicts occur, and multiple processes attempt to update the same piece of data, YottaDB steps in like a police officer directing traffic. It allows one process to proceed, then the next, until everything is resolved.
How? Well, you see, YottaDB has enough context to automatically manage this for you. Which approach it takes depends on whether you’re using the C-API with modern languages, or writing traditional M code. For now let’s keep a chronological flow and first focus on M to explore how the language specifies transactional semantics.
How M Specifies Transactions
A transaction begins with the command:
tstart
Your logic can then modify both variables and database entries. When it’s ready to finalize the transaction, it issues:
tcommit
Now here’s when YottaDB’s hidden gem becomes apparent: when you use a modified tstart
and activate the automatic retry mechanism. On write conflicts, YottaDB presses its hidden rewind button: not only are database changes undone, but your application logic also rolls back to the tstart checkpoint. Depending on the transaction variant you use, some, none, or all local variables can be reverted to their checkpoint-time values. Once that’s done, your code runs again. Often, a retry will succeed with a tcommit, but in cases of persistent conflict, YottaDB dons its police officer cap to decide the order of resolution.
Please pause to ponder how cool this is.
Can you think of a strategy to retro-fit this ability into traditional databases? Neither can I.
Now, briefly, here are the variations that control how, if any, local variables are reset along with database changes. The upcoming code sample will showcase two of these.
- tstart * – All local variables on the stack will be reset on every retry.
- tstart () – No local variables will be reset at all.
- tstart (a,b,c) – Only variables a, b, and c will be reset on each retry.
Why Developers Can Relax: YottaDB Has You Covered
So what’s left for the application developer to handle? Not much really. YottaDB manages conflict resolution for you, allowing you to focus on crafting simple business logic.
Its ability to handle conflicts gracefully is a core strength, ensuring robustness even under heavy load. While it might slow down during resolution, it happily works through your code to completion.
Reaching Out to the YottaDB Team
Thanks for sticking with me this far – your patience is about to pay off as our grand auction of galactic proportions starts soon.
But first, let me circle back to that AI conversation. I was surprised this unique capability wasn’t a bigger part of YottaDB’s marketing and remarked about that to ChatGPT. Its response piqued my interest enough that I decided to pass the insights along to the team.
In a world where developers will choose between NoSQL and relational databases, YottaDB’s ability to handle conflicts seamlessly is a huge differentiator – if only more people knew about it.
That’s why I reached out to the YottaDB team – I wanted to ask that this powerful feature got the attention it deserves. And as fate would have it, Bhaskar decided I was the right person to help spread the word. So here I am, not only showing why YottaDB is great but also sharing how a single email and a meddling AI can turn into a full-blown article!
Prepare for Battle: Bid Wars Begins
And now, the moment we’ve been waiting for has arrived. The epic showdown between a squadron of transactions as they battle for supremacy in a galactic-scale auction is about to commence. Buckle up, and let’s get our M code ready for this thrilling mission!
Breaking Down the Bid Wars Code
(Click on the image to access the code. Note: the code kills the ^Action global variable.)
The BidWars.m auction system simulates a highly concurrent bidding environment where many bidders compete to place bids on an item, in this case, our favourite astromech droid. The system launches a specified number of concurrent bidder processes, each trying to outbid the others by incrementing the auction price. This sample showcases YottaDB’s transaction mechanisms when handling contentious access and updates.
- Bidders: The system starts by launching a configurable number of bidders, each represented as an independent process.
- Concurrency: Once all bidders are launched, the auction begins, and each bidder attempts to become the leader by placing higher bids.
- Bid Placement: Each bidder process checks whether it’s the current auction leader. If not, the process places a new bid, incrementing the auction price randomly by a small amount.
- Average Time Calculation: Throughout the auction, the system calculates and logs the average time taken per bid.
- Final Bid: Once the auction duration is complete, the system waits for the remaining bidders to finish and announces the final winning bid, along with key statistics like total bids received and bids per second.
- Live Updates: As the auction progresses, the system provides real-time performance feedback on the current price, total bids, and average time per bid.
A Simpler, Cleaner Way to Handle Conflicts
The entire auction process demonstrates YottaDB’s capability to manage concurrent transactions and handle bidding conflicts gracefully, ensuring the auction runs smoothly despite a frenzy of bidders.
As you examine the source code, you might notice there’s no error handling. This isn’t an oversight but rather a deliberate choice so I could highlight this under-appreciated gem. From my perspective, it’s fascinating how YottaDB handles writes so seamlessly you don’t need to include error-handling code for them.
For those not familiar with M, I hope you find the code approachable and clear. The real value here is in YottaDB’s ability to manage conflicts internally, which simplifies development and keeps your code clean. While error handling remains important in other contexts, the fact that YottaDB takes care of conflicts automatically is a powerful feature that I believe deserves more attention.
YottaDB’s Legacy
In reflecting on the design and elegance of this product, it’s remarkable to consider its origins. Born out of the need to maximize efficiency and keep things small due to the constraints of early hardware, it emerged with simplicity and robustness. Despite the passage of half a century and the evolution of modern database systems, no contemporary solution has yet matched the seamless ease of use that YottaDB offers in the area of handling write conflicts.
The necessity of those early days has shaped YottaDB into something truly special. As I look at the current landscape of database technologies, it’s a testament to the brilliance of those early engineers that their effortless conflict resolution mechanism remains unmatched.
This legacy technology, forged from necessity, continues to stand out, reminding us that sometimes amazing innovations can hide in unexpected places.
About Kurt Le Breton
Kurt is a software engineer based in Australia. His career began in the medical software field, where he discovered his enthusiasm for hierarchical data storage. He later transitioned to developing specialized software for accounting workflows. Outside of programming, he enjoys photography, reading, and dabbling in bookbinding. Sitting around a campfire for an evening is one of the ways he likes to relax and take a break from the modern world.
- Photo of R2D2. Credit: Kristen DelValle, used under Creative Commons by 2.0
Octo now supports dates and times.
While the ability to store and process dates and times is essential to many data processing applications, they are perhaps the least standard basic functionality across SQL implementations, as shown by the following table from SQL Date and Time.
|
Example |
Format |
SQL |
Oracle |
MySQL |
PostgreSQL |
|
2022-04-22 10:34:23 |
YYYY-MM-DD |
|
|
DATETIME |
TIMESTAMP |
|
2022-04-22 |
YYYY-MM-DD |
DATE |
|
DATE |
DATE |
|
10:34:23 |
hh:mm:ss.nn |
TIME |
TIME |
TIME |
TIME |
|
2022-04-22 |
YYYY-MM-DD |
DATETIME |
TIMESTAMP |
|
|
|
2022 |
YYYY |
|
|
YEAR |
|
|
12-Jan-22 |
DD-MON-YY |
|
TIMESTAMP |
|
|
ISO 8601 is an international standard for dates and times that SQL implementations support.
Applications can have specialized needs for dates. For example, medical applications need to store imprecise dates, like “July 1978”, or just “1978” (for example, I know that my tonsils were removed in 1958, but I have no idea when in 1958). Fileman dates allow for storing dates with arbitrary levels of (im)precision. Specialized dates result in ad hoc implementations of dates when using SQL to access, e.g., as VARCHAR, INTEGER or NUMERIC types.
Octo provides several date and time types:
- DATE
- TIME [WITHOUT TIME ZONE]
- TIMESTAMP [WITHOUT TIME ZONE]
- TIME WITH TIME ZONE
- TIMESTAMP WITH TIME ZONE.
[WITHOUT TIME ZONE] means that the text is optional.
Formats that Octo supports are:
- TEXT (values such as ‘2023-01-01′ and ’01:01:01’)
- HOROLOG (values in $HOROLOG format)
- ZHOROLOG (values in $ZHOROLOG format)
- ZUT (integers interpreted as $ZUT values)
- FILEMAN (numeric values of the form YYYMMDD.HHMMSS), where
- YYY is year since 1700 with 000 not allowed
- MM is month; two digits 01 through 12
- DD is the day of the month; two digits 01 through 31
- HH is the hour; two digits, 00 through 23
- MM is the minute; two digits, 00 through 23
- SS is second; two digits, 00 through 59
Here is an example of a query using Fileman dates against a VistA VeHU Docker image, which has simulated patient data for training purposes which you can run and experiment with.
OCTO> SELECT P.NAME AS PATIENT_NAME, P.PATIENT_ID as PATIENT_ID, P.DATE_OF_BIRTH, P.Age, P.WARD_LOCATION, PM.DATE_TIME as Admission_Date_Time,
TOKEN(REPLACE(TOKEN(REPLACE(P.WARD_LOCATION,'WARD ',''),'-',1),'WARD ',''),' ',2) AS PCU,
CONCAT(TOKEN(REPLACE(TOKEN(REPLACE(P.WARD_LOCATION,'WARD ',''),'-',2),'WARD ',''),' ',1),' ',TOKEN(P.WARD_LOCATION,'-',3)) AS UNIT,
P.ROOM_BED as ROOM_BED, REPLACE(P.DIVISION,'VEHU ','') as FACILTY, P.SEX as SEX, P.CURRENT_ADMISSION as CURRENT_ADMISSION,
P.CURRENT_MOVEMENT as CURRENT_MOVEMENT, PM.PATIENT_MOVEMENT_ID as Current_Patient_Movement,
PM.TYPE_OF_MOVEMENT as Current_Movement_Type, AM.PATIENT_MOVEMENT_ID as Admission_Movement,
AM.TYPE_OF_MOVEMENT as Admission_Type
FROM PATIENT P
LEFT JOIN patient_movement PM ON P.CURRENT_MOVEMENT=PM.PATIENT_MOVEMENT_ID
LEFT JOIN patient_movement AM ON P.CURRENT_ADMISSION=AM.PATIENT_MOVEMENT_ID
WHERE P.CURRENT_MOVEMENT IS NOT NULL
AND P.ward_location NOT LIKE 'ZZ%'
AND P.NAME NOT LIKE 'ZZ%'
AND PM.DATE_TIME > timestamp'2015-01-01 00:00:00' LIMIT 5;
patient_name|patient_id|date_of_birth|age|ward_location|admission_date_time|pcu|unit|room_bed|facilty|sex|current_admission|current_movement|current_patient_movement|current_movement_type|admission_movement|admission_type
ONEHUNDRED,PATIENT|100013|1935-04-07|89|ICU/CCU|2015-09-10 08:38:18|| |ICU-10|DIVISION|M|4686|4686|4686|1|4686|1
TWOHUNDREDSIXTEEN,PATIENT|100162|1935-04-07|89|7A GEN MED|2016-06-26 20:24:39|GEN| ||CBOC|M|4764|4764|4764|1|4764|1
ONEHUNDREDNINETYSIX,PATIENT|100296|1935-04-07|89|7A GEN MED|2015-09-25 11:56:03|GEN| ||CBOC|M|4711|4711|4711|1|4711|1
ZERO,INPATIENT|100709|1945-03-09|79|7A SURG|2015-04-04 13:38:10|SURG| |775-A|CBOC|M|4672|4672|4672|1|4672|1
EIGHT,INPATIENT|100716|1945-03-09|79|3E NORTH|2015-04-03 11:25:45|NORTH| |3E-100-1|CBOC|M|4636|4636|4636|1|4636|1
(5 rows)
OCTO>
If your application has specialized dates, we invite you to discuss your requirements with us, so that we can extend Octo to meet your needs.
- Photo of Mayan calendar stone fragment from the Classical Period (550-800 CE) at Ethnologiches Museum Berlin. Credit: José Luis Filpo Cabana, used under the Creative Commons Attribution-Share Alike 4.0 International license.
- Photo of one of NIST’s ytterbium lattice atomic clocks. NIST physicists combined two of these experimental clocks to make the world’s most stable single atomic clock. The image is a stacked composite of about 10 photos in which an index card was positioned in front of the lasers to reveal the laser beam paths. Credit: N. Phillips/NIST
YottaDB r2.00 Released
YottaDB r2.00 is a major new release with substantial new functionality and database format enhancements.
- Inherited from the upstream GT.M V7.0-000, YottaDB r2.00 creates database files of up to 16Gi blocks. For example, the maximum size of a database file with 4KiB blocks is 64TiB, which means you can use fewer regions for extremely large databases. With YottaDB r2.00, you can continue to use database files created by r1.x releases, except that the maximum size of a database file created with prior YottaDB releases remains unchanged.
- For direct mode, as well as utility programs, YottaDB can optionally use GNU Readline, if it is installed on the system. This includes the ability to access and use command history from prior sessions.
- Listening TCP sockets can be passed between processes.
- The
ydbinstall/ydbinstall.shscript has multiple enhancements.
In addition to enhancements and fixes made by YottaDB, r2.00 inherits numerous other enhancements and fixes from GT.M V7.0-000 and V7.0-001, all described in the Release Notes.
Graphical Monitoring of Statistics Shared by ProcessesQuick Start
Monitor the shared database statistics of your existing applications in minutes, starting right now.
- Ensure that node.js is installed.
- Use the ydbinstall script to install the YottaDB GUI. (This also installs the YottaDB web server.)
- Ensure that in the environment of each application process, the variable ydb_statshare is set to 1.
- Start the YottaDB web server using the same global directory as your application.
- Connect to the web server to start the YottaDB GUI. In the Dashboard, choose Database Administration / Monitor Database. Choose the data you want to monitor, and choose how you want it displayed. Click Start to see the data.
The GUI comes with a demo that includes a simulated application. This video walks you through using the demo.
Read on to dig deeper.
Motivation
Visual presentation is the most effective way most of us ingest complex data.
As with the unique Minard depiction of Napoleon’s disastrous march on Moscow shown here, we routinely use graphs every day.
Processes accessing databases have tens of internal counters (collectively referred to as “statistics”) for each database file that they have open. The YottaDB GUI allows you to visually monitor these statistics in real time.
Statistics
There are two sources of statistics to monitor: statistics shared by processes and statistics in the database file header. Each has its uses.
Shared Statistics
YottaDB processes can opt to share their operational database statistics. If at process startup, the environment variable ydb_statshare is 1. Optionally, the environment variable ydb_statsdir can be set to a temporary directory for sharing and monitoring.
File Header Statistics
Statistics in the database file header capture the aggregate data of all processes accessing the database from the database creation. Viewing these statistics requires access to the database file but does not require application processes to share statistics.
Monitoring
Shared Statistics
Monitoring statistics shared by processes enables focused analysis, for example, visualization of current performance issues and the behavior of specific processes. The YottaDB GUI provides graphical monitoring of shared statistics.
With the intent of ensuring that it is intuitive to use, the GUI has integrated online documentation, including mouse-overs, but no separate user documentation other than installation instructions.
This video on this page walks you from start to finish, to graphically monitor statistics of an existing application on a remote server.
If you want to implement your own monitoring of shared statistics, YottaDB provides a %YGBLSTAT() utility program.
File Header Statistics
For production instances, we recommend continuously capturing statistics every minute or so. You can use the gvstat program program or write your own similar program. Capturing the data and creating baselines will help you in capacity planning as well as incident analysis. Continuously monitoring and displaying key parameters can additionally give you insight into the dynamic behavior of your application.
The guest blog post YottaDB Dashboard by Ram Sailopal demonstrates monitoring File Header Statistics with Grafana.
Security
Statistics are metadata, not data. While metadata should be shared advisedly, it does not typically have the same confidentiality restrictions as data. The GUI and web server follow normal YottaDB security policies.
- Processes must opt-in to share statistics.
- The web server process must be started by a userid on the system, and has no access capabilities beyond those of that userid.
- Database monitoring can be performed by a read-only GUI. You can see from the videos that the GUI is operating in read-only mode.
- While the GUI does have a read-write mode, for example, to support editing of global directories, database monitoring requires just read-only access.
- To access statistics, the web server needs access to the global directory and the $ydb_statsdir directory if it is specified. You can use Linux permissions to control access.
- The JavaScript libraries used are mature, versioned, and statically served from where the GUI is installed on the server.
Please Try It!
We invite you to use the GUI to monitor database statistics shared by your application processes and tell us what you think. As it is a new application, we are sure that it offers many opportunities for enhancement and improvement.
- If you have YottaDB support, please reach out to us through your YottaDB support channel.
- If you do not have YottaDB support, you can reach out to us:
- On the gui channel of our Discord server.
- By creating an Issue on the YDBGUI project at Gitlab.
Thank you for using the YottaDB GUI.