WValue Encoding Specification
Version: 3.0 Status: Normative Release: 2026.07.04
This document is the definitive specification for the WValue NaN-boxed value encoding used in the Tungsten language runtime. Every dynamic value is represented as a single uint64_t. The encoding is designed for languages where only nil and false are falsy, and where inline integers, decimals, domain-aware types, and packed structured values are more important than pointer tag variety.
A conforming implementation MUST produce the exact bit patterns described below. Any deviation is a bug.
The reference implementation is wvalue.h — a standalone, dependency-free C header under the MIT license.
typedef uint64_t WValue;Table of Contents
- Value Space Layout
- Singletons and Object Space (0x0000)
- Biased Doubles
- String / Symbol (0xFFF9)
- Int (0xFFFA)
- Instant (0xFFFB)
- Lexical + Char (0xFFFC)
- Numeric (0xFFFD)
- Packed Types (0xFFFE)
- Duration (0xFFFF)
- Heap Overflow (Domain Objects)
- Type Checking
- Constants Reference
- Design Rationale
1. Value Space Layout
The full uint64_t range is partitioned into contiguous, non-overlapping regions ordered from low to high:
Hex range Type
──────────────────────────────────────────────────────────────
0x0000_0000_0000_0000 nil (singleton)
0x0000_0000_0000_0001 false (singleton)
0x0000_0000_0000_0002 true (singleton)
0x0000_0000_0000_0003 undef (singleton)
0x0000_0000_0000_0004 memo miss (internal sentinel)
0x0000_0000_0000_0005 - 0x0000_0000_0000_000F reserved sentinels
0x0000_0000_0000_00x0+ heap objects (ptr | sub-tag)
0x0001_0000_0000_0000 - 0xFFF8_FFFF_FFFF_FFFF biased IEEE 754 doubles
0xFFF9_xxxx_xxxx_xxxx string / symbol
0xFFFA_xxxx_xxxx_xxxx int (48-bit signed)
0xFFFB_xxxx_xxxx_xxxx instant (48-bit signed Unix ms)
0xFFFC_xxxx_xxxx_xxxx lexical (token, lexchar, slice, char)
0xFFFD_xxxx_xxxx_xxxx numeric (decimal, currency, quantity)
0xFFFE_xxxx_xxxx_xxxx packed (color, complex, rational, date, ipv4, location)
0xFFFF_xxxx_xxxx_xxxx duration (ns or months+ms)
──────────────────────────────────────────────────────────────
Key invariant: Every valid WValue falls into exactly one region. The type-check functions are mutually exclusive and exhaustive.
2. Singletons and Object Space (0x0000)
Values with the top 16 bits equal to 0x0000 are either singletons (small constants) or heap object pointers.
2.1 Singletons
| Constant | Value | Truthiness |
|---|---|---|
nil |
0x0000_0000_0000_0000 |
Falsy |
false |
0x0000_0000_0000_0001 |
Falsy |
true |
0x0000_0000_0000_0002 |
Truthy |
undef |
0x0000_0000_0000_0003 |
Truthy (internal sentinel) |
memo_miss |
0x0000_0000_0000_0004 |
Truthy (memoization sentinel) |
nil is zero so that calloc-initialized memory defaults to nil.
Truthiness reduces to a single unsigned compare: v > 1.
2.2 Heap Objects
All heap-allocated objects use 16-byte-aligned pointers. The low 4 bits (guaranteed zero by alignment) are repurposed as a sub-tag nibble:
| Nibble | Type | Description |
|---|---|---|
| 0x0 (ptr=0) | nil | (singleton, not a pointer) |
| 0x0 (ptr≠0) | generic | type discriminator in struct header byte |
| 0x1 | false | (singleton, not a pointer) |
| 0x2 | true | (singleton, not a pointer) |
| 0x3 | undef | (singleton, not a pointer) |
| 0x4 | struct | user-defined class instance |
| 0x5 | hash | hash table |
| 0x6 | closure | function closure |
| 0x7 | regex | compiled regex |
| 0x8 | range | range object |
| 0x9 | module | module |
| 0xA | array | dynamic array |
| 0xB | bigint | arbitrary-precision integer |
| 0xC | class | class metaobject |
| 0xD | uuid | 128-bit UUID |
| 0xE | error | error object |
| 0xF | domain | heap-overflow domain type (see §11) |
Boxing: (uint64_t)(uintptr_t)ptr | subtag
Pointer extraction: value & ~0xF
Sub-tag extraction: value & 0xF
Object check: v >= 0x10 && (v >> 48) == 0
Generic objects (sub-tag 0x0) use a uint8_t type field in the struct header to discriminate between thread, atomic, socket, channel, fiber, bytes, and response objects.
3. Biased Doubles
IEEE 754 doubles are stored with a bias added to the raw bit pattern:
biased = raw_bits + 0x0001_0000_0000_0000
This shifts the entire double range above the object/singleton space and below the tagged value space:
- Minimum biased:
0x0001_0000_0000_0000(represents+0.0) - Maximum biased:
0xFFF8_FFFF_FFFF_FFFF(represents-0.0minus epsilon) - Biased NaN:
0x7FF9_0000_0000_0000(canonical quiet NaN)
NaN canonicalization: All NaN variants (signaling, quiet, with payloads) MUST be normalized to canonical quiet NaN (0x7FF8_0000_0000_0000) before biasing. NaN lives in double space; there is no separate NaN sentinel.
Boxing:
uint64_t bits;
memcpy(&bits, &d, 8);
if ((bits & 0x7FF0000000000000) == 0x7FF0000000000000 &&
(bits & 0x000FFFFFFFFFFFFF) != 0)
bits = 0x7FF8000000000000; // normalize NaN
return bits + 0x0001000000000000;Unboxing:
uint64_t bits = v - 0x0001000000000000;
double d;
memcpy(&d, &bits, 8);Type check:
(v - 0x0001000000000000) <= 0xFFF7FFFFFFFFFFFFThis is a single unsigned subtract + compare — no branch needed.
4. String / Symbol (0xFFF9)
Strings and symbols share the 0xFFF9 tag, distinguished by bit 0:
- bit 0 = 0 → string
- bit 0 = 1 → symbol (interned)
4.1 Inline Strings (SSO-5)
Strings of 0-5 bytes are stored inline with no heap allocation:
bits 47-44: 0 (padding)
bits 43-4: up to 5 bytes of character data (byte 0 at bits 4-11)
bits 3-1: length (0-5)
bit 0: 0 (string flag)
Mode value 6 is used for slab-backed interned strings and symbols. Mode value 7 indicates a heap string or symbol.
4.2 Slab Interned Strings (SSO-61)
Strings and symbols of 6-61 bytes can live in the permanent string slab. The WValue stores only a 24-bit slab index:
bits 47-28: 0
bits 27-4: slab index (24 bits)
bits 3-1: 6 (slab mode)
bit 0: 0 = string, 1 = symbol
The slab itself stores up to 29 bytes in one 32-byte slot or up to 61 bytes in two contiguous slots:
- slot 0:
[flags][length][30 bytes of payload] - slot 1 when needed:
[32 bytes of payload]
Single-slot strings are NUL-terminated by zero-fill. Two-slot strings use the second slot entirely for the trailing payload bytes plus the NUL terminator.
4.3 Heap Strings
Strings longer than 61 bytes use a heap-allocated WString:
typedef struct WString {
uint32_t len;
char data[]; // UTF-8, null-terminated
} WString;The pointer is stored in bits 4-47 (16-byte aligned):
bits 47-4: WString* (masked with 0x0000_FFFF_FFFF_FFF0)
bits 3-1: 7 (heap sentinel)
bit 0: 0 (string flag)
4.4 Symbols
Symbols share the exact same mode encoding as strings; only bit 0 differs:
- mode 0-5: inline symbol (SSO-5)
- mode 6: slab-backed interned symbol (24-bit slab index)
- mode 7: heap symbol
5. Int (0xFFFA)
48-bit signed two's complement integer.
bits 63-48: 0xFFFA (tag)
bits 47-0: signed integer value
Range: -140,737,488,355,328 to +140,737,488,355,327 (±2^47)
Boxing: 0xFFFA000000000000 | (value & 0x0000FFFFFFFFFFFF)
Unboxing (sign extension): ((int64_t)(v << 16)) >> 16
On ARM64 this compiles to a single sbfx instruction.
Values exceeding this range overflow to heap BigInt objects (sub-tag 0xB).
6. Instant (0xFFFB)
48-bit signed milliseconds from the Unix epoch (1970-01-01T00:00:00Z).
bits 63-48: 0xFFFB (tag)
bits 47-0: signed milliseconds
Range: approximately 2491 BC to 6431 AD at millisecond precision.
Boxing and unboxing follow the same pattern as Int.
7. Lexical + Char (0xFFFC)
The 0xFFFC tag holds four subtypes via a 2-bit discriminator in bits 47-46. Two subtypes are for compilation pipeline use (Token, LexChar), one for zero-copy buffer references (Slice), and one for runtime character values (Char).
7.1 Token (subtype 00)
Zero-allocation token descriptor indexing into a LexBuffer:
bits 47-46: 00 (subtype)
bits 45-40: flags (6 bits)
bits 39-32: token type (8 bits, 256 kinds)
bits 31-20: length (12 bits, max 4,095 bytes)
bits 19-0: byte offset (20 bits, max 1,048,575 = 1 MB)
Offset is at the LSB for cheapest extraction (most performance-critical field). value & 0xFFFFFFFF extracts offset+length in a single 32-bit mask.
7.2 LexChar (subtype 01)
Lexer-optimized character with hot-path classification flags at LSB:
bits 47-46: 01 (subtype)
bits 45-39: free (7 bits)
bits 38-18: codepoint (21 bits, U+0 to U+10FFFF)
bits 17-16: utf8_len - 1 (2 bits, encoding 1-4)
bits 15-11: Unicode category (5 bits, 30 categories)
bits 10-7: digit_value (4 bits, 0-9 or 0xF=not-a-digit)
bits 6-0: lex_flags (7 bits)
Lex flags (bit-testable at LSB, no shift required):
| Bit | Flag | Description |
|---|---|---|
| 0 | may_combine | Next codepoint may modify this one (combining marks, ZWJ) |
| 1 | is_quote | Quote character (', ", `) |
| 2 | is_operator | Operator character |
| 3 | is_hex | Valid hex digit (0-9, a-f, A-F) |
| 4 | is_whitespace | Space, tab, NBSP, Unicode Zs |
| 5 | is_id_continue | Valid identifier continuation |
| 6 | is_id_start | Valid identifier start |
7.3 Slice (subtype 10)
Zero-copy reference into a goroutine-local buffer:
bits 47-46: 10 (subtype)
bits 45-38: free (8 bits)
bits 37-24: length (14 bits, max 16,383 bytes)
bits 23-0: byte offset (24 bits, max 16,777,215 = 16 MB)
Used at lex time for source spans and at runtime for HTTP request/response bodies and file contents. Offset at LSB for efficient buffer indexing.
Slices borrow from an immutable buffer. When a Slice must outlive its buffer (stored in an object, returned from a function), it promotes to SSO-5 or heap WString — the copy-on-escape point.
7.4 Char (subtype 11)
Runtime character with full Unicode metadata. Codepoint at LSB for cheap ASCII extraction: v & 0x7F.
bits 47-46: 11 (subtype)
bit 45: is_emoji
bit 44: is_ascii
bit 43: is_printable
bits 42-39: digit_value (4 bits, 0xF = not-a-digit)
bits 38-30: case_delta (9 bits, signed, ±255)
bits 29-28: width (2 bits: 0=zero, 1=narrow, 2=wide, 3=ambiguous)
bits 27-23: Unicode category (5 bits, 30 categories)
bits 22-21: utf8_len - 1 (2 bits, encoding 1-4)
bits 20-0: codepoint (21 bits, U+0 to U+10FFFF)
Unicode category encoding (contiguous ranges for fast range checks):
| Range | Categories | Fast check |
|---|---|---|
| 0-4 | Lu, Ll, Lt, Lm, Lo | is_letter = cat <= 4 |
| 5-7 | Nd, Nl, No | is_number = 5 <= cat <= 7 |
| 8-10 | Zs, Zl, Zp | is_whitespace = 8 <= cat <= 10 |
| 11-13 | Mn, Mc, Me | is_combining = 11 <= cat <= 13 |
| 14-20 | Pc, Pd, Ps, Pe, Pi, Pf, Po | Punctuation |
| 21-24 | Sm, Sc, Sk, So | Symbols |
| 25-29 | Cc, Cf, Cs, Co, Cn | Control/Format/Other |
case_delta: Signed offset to convert case (e.g., 'A' has delta +32 to reach 'a'). Covers most Latin (±32), Cyrillic (±32-80), and Turkic (±199) mappings. Rare mappings exceeding ±255 fall back to a lookup table.
8. Numeric (0xFFFD)
The 0xFFFD tag holds three numeric domain types via a 2-bit subtype in bits 47-46. All store a signed significand and signed scale representing the value sig × 10^scale.
8.1 Decimal (subtype 00)
General-purpose fixed-point decimal:
bits 47-46: 00 (subtype, implicit — zero bits)
bits 45-7: significand (39 bits, signed)
bits 6-0: scale (7 bits, signed)
| Field | Bits | Range |
|---|---|---|
| sig | 39 | ±274,877,906,943 (~11 significant digits) |
| scale | 7 | -64 to +63 |
Boxing:
uint64_t s = (uint64_t)sig & 0x7FFFFFFFFF; // 39 bits
uint64_t sc = (uint64_t)scale & 0x7F; // 7 bits
return 0xFFFD000000000000 | (s << 7) | sc;Unboxing sig (sign-extend from bit 38):
((int64_t)((v >> 7) & 0x7FFFFFFFFF) << 25) >> 25Unboxing scale (sign-extend from bit 6):
((int8_t)((v & 0x7F) << 1)) >> 1Values exceeding this range overflow to heap WDomainHeap (see §11).
8.2 Currency (subtype 01)
Fixed-point decimal with a 4-bit currency symbol ID:
bits 47-46: 01 (subtype)
bits 45-42: symbol_id (4 bits, 16 currencies)
bits 41-5: significand (37 bits, signed)
bits 4-0: scale (5 bits, signed)
| Field | Bits | Range |
|---|---|---|
| symbol_id | 4 | 0-15 (16 currencies) |
| sig | 37 | ±68,719,476,735 (±$687M in cents at scale=-2) |
| scale | 5 | -16 to +15 |
Currency symbol table:
| ID | Symbol | Currency |
|---|---|---|
| 0 | $ | USD |
| 1 | € | EUR |
| 2 | £ | GBP |
| 3 | ¥ | JPY |
| 4 | ₹ | INR |
| 5 | ¥ | CNY (context-disambiguated from JPY) |
| 6 | ₩ | KRW |
| 7 | ₿ | BTC |
| 8 | Fr | CHF |
| 9 | C$ | CAD |
| 10 | A$ | AUD |
| 11 | R$ | BRL |
| 12 | ₽ | RUB |
| 13 | ฿ | THB |
| 14 | zł | PLN |
| 15 | — | reserved |
8.3 Quantity (subtype 11)
Fixed-point decimal with an 8-bit unit ID:
bits 47-46: 11 (subtype)
bits 45-38: unit_id (8 bits, 256 units)
bits 37-7: significand (31 bits, signed)
bits 6-0: scale (7 bits, signed)
| Field | Bits | Range |
|---|---|---|
| unit_id | 8 | 0-255 (256 unit slots) |
| sig | 31 | ±1,073,741,823 (~9 significant digits) |
| scale | 7 | -64 to +63 |
Unit IDs 0-118 are reserved for built-in units (SI base, SI derived, prefixed, imperial, compound, information). IDs 119-253 are available for custom (user-defined) units registered at runtime. ID 254 is reserved. ID 255 (0xFF) is the sentinel for percentage — 7.65% is encoded as sig=765, scale=-2, unit_id=0xFF.
8.4 Subtype 10 (Reserved)
Reserved for future use. Implementations MUST NOT produce values with this subtype.
9. Packed Types (0xFFFE)
The 0xFFFE tag holds six structured value types via a 3-bit subtype in bits 47-45. All fields are packed into the remaining 45 bits.
9.1 Color (subtype 000)
bits 47-45: 000 (subtype)
bits 44-37: red (8 bits)
bits 36-29: green (8 bits)
bits 28-21: blue (8 bits)
bits 20-13: alpha (8 bits)
bits 12-0: colorspace/flags (12 bits, reserved)
Shifted from the 45-bit boundary: R at bits 36-44, G at 28-35, B at 20-27, A at 12-19, flags at 0-11.
9.2 Complex (subtype 001)
Fixed-point complex number (real + imaginary):
bits 47-45: 001 (subtype)
bits 44-29: real significand (16 bits, signed)
bits 28-23: real scale (6 bits, signed)
bits 22-7: imaginary significand (16 bits, signed)
bits 6-1: imaginary scale (6 bits, signed)
Value = (real_sig × 10^real_scale) + (imag_sig × 10^imag_scale)i
9.3 Rational (subtype 010)
bits 47-45: 010 (subtype)
bits 44-23: numerator (22 bits, signed, ±2,097,151)
bits 22-1: denominator (22 bits, unsigned, 0-4,194,303)
9.4 Subtype 011 (Reserved)
9.5 Date (subtype 100)
Calendar date with time and timezone:
bits 47-45: 100 (subtype)
bits 44-33: year (12 bits, signed, ±2047)
bits 32-29: month (4 bits, 1-12)
bits 28-24: day (5 bits, 1-31)
bits 23-19: hour (5 bits, 0-23)
bits 18-13: minute (6 bits, 0-59)
bits 12-7: second (6 bits, 0-59)
bits 6-1: timezone offset (6 bits, signed, ±31 half-hours from UTC)
9.6 IPv4 (subtype 101)
bits 47-45: 101 (subtype)
bits 44-13: address (32 bits, network byte order)
bits 12-7: CIDR prefix (6 bits, 0-32)
bits 6-1: flags (6 bits, reserved)
9.7 Subtype 110 (Reserved)
9.8 Location (subtype 111)
Two modes selected by bit 43:
Mode 0 — 2D point:
bits 47-45: 111 (subtype)
bit 43: 0 (point mode)
bits 42-22: x (21 bits, signed, ±1,048,575)
bits 21-0: y (22 bits, signed, ±2,097,151)
Mode 1 — Source file location:
bits 47-45: 111 (subtype)
bit 43: 1 (file mode)
bits 42-29: file_id (14 bits, 0-16,383)
bits 28-11: line (18 bits, 0-262,143)
bits 10-0: column (11 bits, 0-2,047)
10. Duration (0xFFFF)
Two modes selected by bit 47:
10.1 Mode 0 — Nanoseconds
bits 63-48: 0xFFFF (tag)
bit 47: 0 (ns mode)
bits 46-0: nanoseconds (47 bits, signed)
Range: ±70,368,744,177,663 ns (approximately ±19.5 hours).
Designed for benchmarks, CPU timing, and captures fractional microseconds (1.5µs = 1500ns).
10.2 Mode 1 — Months + Milliseconds
bits 63-48: 0xFFFF (tag)
bit 47: 1 (months+ms mode)
bits 46-32: months (15 bits, signed, ±16,383 ≈ ±1,365 years)
bits 31-0: milliseconds (32 bits, unsigned, 0-4,294,967,295 ≈ 49.7 days)
Designed for human-scale and calendar-relative durations. Months are kept separate from absolute time because a "month" has no fixed duration.
11. Heap Overflow (Domain Objects)
When a domain type value exceeds the NaN-box capacity (significand too large, scale out of range, etc.), it is promoted to a heap-allocated WDomainHeap struct stored with sub-tag 0xF in the 0x0000 object space.
typedef struct {
uint8_t domain_type; // W_DOMAIN_DECIMAL (0), W_DOMAIN_CURRENCY (1),
// W_DOMAIN_QUANTITY (2), W_DOMAIN_DURATION (3)
uint8_t pad[7]; // alignment
int64_t sig; // full-precision significand
int32_t scale; // scale
int32_t extra; // symbol_id (currency), unit_id (quantity), mode (duration)
int64_t extra2; // duration mode 1: ms value
} WDomainHeap; // 32 bytesBoxing: (uint64_t)(uintptr_t)ptr | 0xF (ptr MUST be 16-byte aligned)
Type check: v >= 0x10 && (v >> 48) == 0 && (v & 0xF) == 0xF
All arithmetic, comparison, negation, and display operations MUST handle both inline NaN-boxed values and heap domain objects transparently. The w_decimal, w_currency, w_quantity, and w_duration_ns constructors automatically select the inline or heap path.
This mirrors V8's Smi → HeapNumber pattern: common values are zero-allocation inline; overflow values are heap-allocated with full precision.
12. Type Checking
Type checks are designed for minimal instruction count:
| Check | Method | Instructions |
|---|---|---|
| Truthiness | v > 1 (unsigned) |
1 |
| Nil | v == 0 |
1 |
| Tagged type | (v >> 48) == TAG |
2 |
| Double | (v - BIAS) <= 0xFFF7... |
2 |
| String | (v >> 48) == 0xFFF9 && !(v & 1) |
3 |
| Symbol | (v >> 48) == 0xFFF9 && (v & 1) |
3 |
| Numeric subtype | (v >> 48) == 0xFFFD && ((v >> 46) & 3) == X |
4 |
| Packed subtype | (v >> 48) == 0xFFFE && ((v >> 45) & 7) == X |
4 |
| Object | v >= 0x10 && (v >> 48) == 0 |
3 |
| Object sub-tag | w_is_obj(v) && (v & 0xF) == X |
4 |
13. Constants Reference
// Singletons
#define W_NIL 0x0000000000000000ULL
#define W_FALSE 0x0000000000000001ULL
#define W_TRUE 0x0000000000000002ULL
#define W_UNDEF 0x0000000000000003ULL
#define W_MEMO_MISS 0x0000000000000004ULL
// Double encoding
#define W_DOUBLE_BIAS 0x0001000000000000ULL
#define W_BIASED_NAN 0x7FF9000000000000ULL
// Tags (high 16 bits)
#define W_TAG_STRINGSYM 0xFFF9000000000000ULL
#define W_TAG_INT 0xFFFA000000000000ULL
#define W_TAG_INSTANT 0xFFFB000000000000ULL
#define W_TAG_CHAR 0xFFFC000000000000ULL // also token, lexchar, slice
#define W_TAG_DECIMAL 0xFFFD000000000000ULL // also currency, quantity
#define W_TAG_PACKED 0xFFFE000000000000ULL
#define W_TAG_DURATION 0xFFFF000000000000ULL
// Masks
#define W_TAG_MASK 0xFFFF000000000000ULL
#define W_PAYLOAD_MASK 0x0000FFFFFFFFFFFFULL
// Int range
#define W_INT48_MAX ((int64_t)((1ULL << 47) - 1)) // +140,737,488,355,327
#define W_INT48_MIN ((int64_t)(-(1LL << 47))) // -140,737,488,355,328
// Decimal range (subtype 00)
#define W_DECIMAL_SIG_MAX ((int64_t)((1ULL << 38) - 1)) // +274,877,906,943
#define W_DECIMAL_SIG_MIN ((int64_t)(-(1LL << 38)))
#define W_DECIMAL_SCALE_MAX 63
#define W_DECIMAL_SCALE_MIN (-64)
// Currency range (subtype 01)
#define W_CURRENCY_SIG_MAX ((int64_t)((1ULL << 36) - 1)) // +68,719,476,735
#define W_CURRENCY_SIG_MIN ((int64_t)(-(1LL << 36)))
#define W_CURRENCY_SCALE_MAX 15
#define W_CURRENCY_SCALE_MIN (-16)
// Quantity range (subtype 11)
#define W_QUANTITY_SIG_MAX ((int64_t)((1ULL << 30) - 1)) // +1,073,741,823
#define W_QUANTITY_SIG_MIN ((int64_t)(-(1LL << 30)))
#define W_QUANTITY_SCALE_MAX 63
#define W_QUANTITY_SCALE_MIN (-64)
// Duration range
#define W_DURATION_NS_MAX ((int64_t)((1ULL << 46) - 1)) // +70,368,744,177,663 ns
#define W_DURATION_NS_MIN ((int64_t)(-(1LL << 46)))
#define W_DURATION_MONTHS_MAX ((int16_t)((1 << 14) - 1)) // +16,383 months
#define W_DURATION_MONTHS_MIN ((int16_t)(-(1 << 14)))
// Percentage sentinel
#define W_UNIT_PERCENT 0xFF
// Object sub-tags
#define W_SUBTAG_GENERIC 0
#define W_SUBTAG_STRUCT 4
#define W_SUBTAG_HASH 5
#define W_SUBTAG_CLOSURE 6
#define W_SUBTAG_REGEX 7
#define W_SUBTAG_RANGE 8
#define W_SUBTAG_MODULE 9
#define W_SUBTAG_ARRAY 0xA
#define W_SUBTAG_BIGINT 0xB
#define W_SUBTAG_CLASS 0xC
#define W_SUBTAG_UUID 0xD
#define W_SUBTAG_ERROR 0xE
#define W_SUBTAG_DOMAIN 0xF
// Numeric subtypes
#define W_NUMERIC_DECIMAL 0
#define W_NUMERIC_CURRENCY 1
#define W_NUMERIC_QUANTITY 3
// Packed subtypes
#define W_PACKED_COLOR 0
#define W_PACKED_COMPLEX 1
#define W_PACKED_RATIONAL 2
#define W_PACKED_DATE 4
#define W_PACKED_IPV4 5
#define W_PACKED_LOCATION 7
// Domain heap type discriminators
#define W_DOMAIN_DECIMAL 0
#define W_DOMAIN_CURRENCY 1
#define W_DOMAIN_QUANTITY 2
#define W_DOMAIN_DURATION 314. Design Rationale
Why nil = 0? calloc-initialized memory is automatically nil. Zero-filled arrays are nil-arrays.
Why objects at 0x0000? Avoids consuming a tag slot. Frees three tag slots (0xFFFD, 0xFFFE, 0xFFFF) for domain types.
Why biased doubles? The bias of 0x0001 keeps the entire 0x0000 space for objects. A single add/subtract is cheaper than bit manipulation for float boxing. NaN normalization means NaN lives in double space with no special sentinel needed.
Why separate tags for duration and decimal? A 3-bit numeric subtype was considered but rejected — it costs 1 bit across all subtypes. Quantity sig would drop from 31 to 30 bits (±536M), which barely fits the speed of light (299,792,458). Duration gets its own tag (0xFFFF) instead.
Why two duration modes? Nanosecond mode captures fractional microseconds for timing workloads. Calendar mode with months keeps months separate from absolute time because a "month" has no fixed duration. Mode 0 covers benchmarks and CPU timing (±19.5 hours). Mode 1 covers human-scale durations (±1,365 years plus up to 49.7 days of milliseconds).
Why heap overflow? The NaN-box is a fast path. Values exceeding inline capacity (e.g., $999,999,999,999.99, compound units like m/s², durations mixing months with nanoseconds) promote transparently to heap objects. All arithmetic dispatches through both paths.
Why 16-byte alignment for heap objects? Gives 4 bits for sub-tags in the low nibble. 15 sub-tags is sufficient for all common heap types. Alignment is achieved by using calloc which on most platforms returns 16-byte-aligned memory (or using aligned_alloc).
Why Symbol as bit 0? Shares the 0xFFF9 tag with strings. No separate tag slot consumed. One bit test distinguishes string from symbol.
Why Char at 0xFFFC with codepoint at LSB? v & 0x7F extracts ASCII characters directly. The is_ascii flag confirms validity. Most character operations in lexers hit ASCII — this makes the hot path cheapest.
Why Percentage is unit_id 0xFF? Percentage is semantically a quantity (7.65% = a measurement). Using a sentinel unit_id avoids a separate type while keeping the percentage identity for display (% suffix instead of unit name).