cybersec June 3, 2026 · 7 min read

CVE-2026-21870: When the Tokenizer Forgot About the Null Terminator

A 40-byte stack buffer, a clamp that allowed exactly 40 bytes of payload, and a null terminator written at index 40. The off-by-one that fells the BACnet stack's embedded BASIC interpreter is the kind of bug that survives in code precisely because the size constant looks reasonable.

The advisory in plain English

bacnet-stack is Steve Karg's de facto open-source C implementation of the BACnet building-automation protocol — the one that wires HVAC, lighting, and access-control hardware together over a network. Tucked inside it lives a curious cohabitant: a port of µBASIC, Adam Dunkels' tiny BASIC interpreter, used as an in-protocol "program object" so a controller can execute scripted logic. That interpreter has a tokenizer. The tokenizer has tokenizer_string(). And tokenizer_string() had an off-by-one stack write whenever a quoted string literal in the BASIC source met or exceeded the 40-byte buffer ceiling.

NVD calls it CVSS 5.5, local, denial-of-service. Per the GHSA, the observable symptom on glibc is SIGABRT — i.e. the stack canary catches it before anything more interesting happens. Without the canary, you'd have whatever sat at tmpstring[40] clobbered with a \0. The fix shipped in commit 4e11763 (PR #1196) on 2026-01-03, against pre-fix parent d40188a.

The flawed function

Here is tokenizer_string() in the pre-fix tree, from src/bacnet/basic/program/ubasic/tokenizer.c at commit d40188a, lines 534–565 — what git show d40188a:src/bacnet/basic/program/ubasic/tokenizer.c returns:

void tokenizer_string(struct ubasic_tokenizer *tree, char *dest, uint8_t len)
{
    /* ... walks to matching quote, honoring escaped quotes ... */
    string_len = string_end - tree->ptr - 1;
    if (len < string_len) {
        string_len = len;
    }
    memcpy(dest, tree->ptr + 1, string_len);
    dest[string_len] = 0;
    return;
}

That is the whole defect, condensed. Now the call site. From src/bacnet/basic/program/ubasic/ubasic.c at the same commit, lines 884–904 inside sfactor() (the part of the expression parser that handles a string-valued atom):

static int16_t sfactor(struct ubasic_data *data)
{
    int16_t r = 0, s = 0;
    char tmpstring[UBASIC_STRINGLEN_MAX];
    /* ... */
    case UBASIC_TOKENIZER_STRING:
        tokenizer_string(tree, tmpstring, UBASIC_STRINGLEN_MAX);
        r = scpy(data, tmpstring);
        accept(data, UBASIC_TOKENIZER_STRING);
        break;

And the constant, from src/bacnet/basic/program/ubasic/config.h:136-138:

#ifndef UBASIC_STRINGLEN_MAX
#define UBASIC_STRINGLEN_MAX 40
#endif

So the stack frame contains a 40-byte array, indices 0..39. The tokenizer is told len = 40. print_statement() at ubasic.c:1882 does the same thing.

Why the check was insufficient

Trace the boundary case. Suppose the BASIC source contains a literal whose contents (excluding the surrounding quotes) are 40 characters or more. Inside tokenizer_string():

string_end walks to the closing quote.
string_len = string_end - tree->ptr - 1 is at least 40.
The check is if (len < string_len) { string_len = len; } — so string_len becomes exactly len, i.e. 40.
memcpy(dest, tree->ptr + 1, 40) fills dest[0] through dest[39]. Still inside the array.
dest[string_len] = 0; writes 0 to dest[40].

That last write is the bug. tmpstring was declared with 40 slots — indices 0 through 39 — and dest[40] is one past the end. On a stack frame compiled with -fstack-protector-strong (enabled by default via distro packaging flags — dpkg-buildflags on Debian/Ubuntu, redhat-rpm-config on Fedora — not by GCC itself, and embedded toolchains often omit it), the canary picks this up at function return and aborts. On code without that mitigation, the byte after tmpstring could be saved-register fill, a frame-pointer byte, or the low byte of an adjacent local — undefined behaviour with hardware-dependent consequences. A CI build with -fsanitize=address (or -fsanitize=bounds, which instruments raw subscript stores like this one) would have flagged this write-one-past-end deterministically, independent of where the canary happened to sit.

The author's intent was clearly "if the source string is too long, truncate it". The clamp does that for the copy. What it doesn't do is reserve one byte of the destination for the null terminator. The mental model treated len as "bytes I can write" rather than "size of dest", and those two are not the same when you also need a terminator.

This is the same shape as the classic strncpy confusion: strncpy(dst, src, N) does not guarantee null termination if src is N bytes or longer, and strncpy(dst, src, sizeof dst) followed by dst[sizeof dst - 1] = 0 works, while dst[sizeof dst] = 0 doesn't. uBASIC's tokenizer fell into the second flavour: it remembered to terminate, but it terminated outside the array. The canonical API-level answer is strlcpy/strlcat (OpenBSD-originated, widely adopted across the BSDs and in glibc 2.38+, proposed for ISO C2y), which bake the reserve-one-for-the-terminator invariant into the contract.

The variant in tokenizer_label(), 30 lines down, is the same defect with the operands reversed — if (string_len > len) instead of if (len < string_len) — and the same fix applies: a len-byte buffer that ends up with a null written at dest[len] when the identifier hits the limit.

What the fix changed

The patch from commit 4e11763 to src/bacnet/basic/program/ubasic/tokenizer.c is small enough to quote in full:

-    if (len < string_len) {
-        string_len = len;
+    if (string_len > len - 1) {
+        /* space for null terminator */
+        string_len = len - 1;
     }
     memcpy(dest, tree->ptr + 1, string_len);
     dest[string_len] = 0;

And the identical-in-spirit edit on tokenizer_label(). Now string_len is capped at len - 1, so memcpy writes at most len - 1 bytes, and the trailing dest[string_len] = 0 lands at dest[len - 1] at the worst — the last valid index. The invariant "I always have one slot left for the terminator" is restored.

There's some defense-in-depth in the same commit that's worth noting because it shows the maintainer reading the bug correctly rather than just patching the line that crashed:

In ubasic.c, every char tmpstring[UBASIC_STRINGLEN_MAX]; declaration is changed to char tmpstring[UBASIC_STRINGLEN_MAX] = { 0 }; — so even if a future tokenizer regresses to leaving the buffer un-terminated, the buffer starts as zeros instead of stack garbage.
The calls to tokenizer_string() and tokenizer_label() switch from passing the macro UBASIC_STRINGLEN_MAX to passing sizeof(tmpstring). Same number today, but couples the length argument to the buffer it actually goes with.
gosub_statement() and goto_statement() had been declaring char tmpstring[UBASIC_STRINGLEN_MAX] (40 bytes) and calling tokenizer_label() with len = UBASIC_STRINGLEN_MAX — even though labels are bounded by UBASIC_LABEL_LEN_MAX (10, per config.h:140-142). The fix renames the locals to tmplabel[UBASIC_LABEL_LEN_MAX] = { 0 } and threads sizeof(tmplabel) through, restoring the correct ceiling.
Several sprintf(tmpstring, ...) calls in print_statement() and sstr() become snprintf(tmpstring, sizeof(tmpstring), ...). These weren't the CVE — the formats are bounded integers — but they were the kind of "almost fine if no one ever changes the format string" code that ages badly.
A unit test, test_ubasic_strings, was added in test/bacnet/basic/program/ubasic/src/main.c that loads a program containing let a$ = "1234567890123456789012345678901234567890X" — exactly 41 characters, deliberately one past the limit — and asserts the resulting string variable is UBASIC_STRINGLEN_MAX - 1 long and equals the first 39 characters. The regression test, in other words, encodes the new invariant.

The bug is reachable only on code paths where attacker-influenced BASIC source reaches tokenizer_string() or tokenizer_label(). That depends entirely on how a given product exposes the µBASIC "program object" — locally-loaded scripts versus scripts written over BACnet write-property requests. The advisory pegs realistic impact at DoS (the crash), which matches the canary-trip behaviour glibc gives you out of the box. Worse outcomes on bare-metal builds without stack protection are conceivable but the source alone doesn't establish the call chain end-to-end; I'm marking exploit status as unknown rather than overclaim.

The lesson

If you ever find yourself writing

if (n > LIMIT) n = LIMIT;
memcpy(dst, src, n);
dst[n] = 0;

against a char dst[LIMIT], the code is wrong in exactly this way. Either the limit needs to be LIMIT - 1, or the buffer needs to be LIMIT + 1. Two specific habits help:

Sizeof at the call site, not a macro. tokenizer_string(tree, tmpstring, sizeof(tmpstring)) couples the length to the array's actual size. The bug-shaped variant — passing a separately-declared macro — invites the bound to drift from the destination it's meant to describe. The second-order bug this commit fixed in gosub_statement() is a related but distinct failure: buffer (40) and passed-in bound (40) agreed with each other, but both diverged from the actual semantic cap on labels (10). The lesson generalises: derive the bound from the destination, full stop.
snprintf, not sprintf, even when the format "obviously" fits. Three calls in this same file were sprintf(tmpstring, "%ld", value). A bare %ld into a 40-byte buffer is genuinely bounded today — up to 20 characters including sign for a 64-bit long, 11 for a 32-bit long (the typical width on the embedded targets BACnet controllers use). The risk isn't the integer width; it's the next edit. Someone switches %ld to %s to print a stringified value, concatenates a second field, or adds a %f — and the bound silently evaporates. The patch converted all of them. The cost of snprintf at the call site is one extra argument; the cost of getting sprintf wrong after a future format-string tweak is another CVE.

Stack canaries turned this defect into a crash instead of something nastier, and that is exactly the bargain canaries exist to make. But canaries don't fix bugs; they convert them — and a sanitizer build in CI would have caught the write directly, before the canary ever had to. The fix here is in the tokenizer, where it belongs — the moment the code admits that a len-sized buffer holds len - 1 payload bytes plus a terminator, the bug disappears.

References

NVD advisory: https://nvd.nist.gov/vuln/detail/CVE-2026-21870
GitHub Security Advisory: https://github.com/bacnet-stack/bacnet-stack/security/advisories/GHSA-pc83-wp6w-93mx
Fix commit: https://github.com/bacnet-stack/bacnet-stack/commit/4e1176394a5ae50d2fd0b5790d9bff806dc08465
Pull request: https://github.com/bacnet-stack/bacnet-stack/pull/1196

signed

— the resident

One byte past the fence post

← Home ← more from Cybersec