When we say ReLU.chat embeds text into "384 dimensions," it sounds arbitrary. It is not. The 384-dim output is a deliberate design choice for all-MiniLM-L6-v2, the sentence transformer we use. This post explains what those numbers mean, why the dimension count is what it is, and what that implies for browser deployment.
What a Sentence Embedding Is
A sentence embedding is a fixed-length vector of real numbers that represents the meaning of a sentence. Two sentences with similar meaning should have similar vectors. The similarity is measured by cosine distance:
similarity(A, B) = (A · B) / (||A|| * ||B||)
The vector lives in a high-dimensional space. Cosine similarity ranges from -1 (opposite) to 1 (identical). For semantically related sentences, you typically see 0.6-0.9.
A 384-dimensional vector is just 384 numbers. For the sentence "A Nash equilibrium is a strategy profile where no player wants to deviate," one dimension might loosely correspond to "game theory," another to "stability," another to "multi-agent," and so on. But it is misleading to assign labels to dimensions — they are learned, not interpretable, and the meaning is distributed across them.
Why 384 Specifically
The choice of 384 is the result of three competing pressures:
Accuracy — more dimensions means more capacity to encode nuance. BERT-base uses 768, BERT-large uses 1024. Larger vectors discriminate better between similar concepts.
Storage — a 384-dim FP32 vector is 1.5 KB. For a knowledge base of 10,000 fragments, that is 15 MB. At 768 dims it would be 30 MB. At 1024 dims, 40 MB.
Latency — every dimension requires work in the transformer. 384 dims of computation is roughly half of 768. For browser inference this matters.
MiniLM-L6-v2 chose 384 because the original all-MiniLM-L12-v2 (12-layer) at 384 dim was already nearly as accurate as BERT-base at 768 dim for sentence-similarity benchmarks. The number 384 is not fundamental — it is the point where the model is "good enough" for the size and speed budget.
How It Is Learned
A sentence transformer is trained with a contrastive objective. The model sees pairs of sentences and learns to:
- Make similar sentences have similar embeddings
- Make dissimilar sentences have dissimilar embeddings
L = max(0, margin - cos(A, A_positive) + cos(A, A_negative))
Or, in the more modern setup, a softmax over in-batch negatives scaled by a temperature. Either way, the model is rewarded for pulling similar sentences together and pushing dissimilar ones apart in the embedding space.
The "L6" in MiniLM-L6-v2 means 6 transformer layers. The original BERT-base has 12. The L6 model is smaller and faster but slightly less accurate. For a retrieval system that already uses BM25 as a complement, the L6 accuracy is sufficient.
What 384 Dimensions Actually Encode
The dimensions are not labeled. But we can probe them. If you train a linear probe on dimension 47 (or whatever), you might find that it correlates with the presence of a named entity, or sentiment polarity, or tense of the main verb. The model has learned to allocate dimensions to whatever signals help minimize the contrastive loss.
Empirically, MiniLM-L6-v2's dimensions encode a mix of:
- Topical signals — what the sentence is about
- Syntactic signals — grammar, structure, length
- Semantic relations — synonymy, entailment, contradiction
- Style markers — formal vs informal, technical vs general
The Practical Sweet Spot
For a browser-deployed retrieval system, 384 dim hits a sweet spot:
- Accuracy: 90-95% of full BERT-base for sentence similarity tasks
- Size: 22 MB quantized, fits on a phone in seconds
- Latency: ~15ms per embedding on a mid-range laptop, ~3ms on WebGPU
- Storage: 15 MB for 10,000 KB fragments — acceptable
What This Means for Users
The user does not see "384 dimensions." They see a chatbot that:
- Recognizes paraphrases ("how do I split profits fairly" → finds "Shapley value")
- Handles synonyms ("auction" ↔ "sealed-bid mechanism")
- Tolerates typos and slight rewordings
- Fails gracefully on out-of-domain queries (low similarity → no good match)