How ReLU.chat Works

An open-source, browser-based chatbot platform. A sentence transformer embeds queries and knowledge-base entries into 384-dimensional vectors. A signal layer extracts BM25 sparse signals, dense cosine similarities, entity matches, and intent scores — then fuses them into a 25-feature decision packet. An RL-trained MLP policy network decides how to respond, and a fragment-based composition engine renders the final answer.

1. Overview

ReLU.chat processes every message entirely in your browser. No API keys, no cloud LLMs, no telemetry. The system uses a sentence-transformer (all-MiniLM-L6-v2, quantized ONNX, ~22 MB) to embed queries and knowledge-base entries into 384-dimensional vectors. A signal layer combines BM25 sparse retrieval with dense cosine similarity, entity extraction, and temperature-calibrated intent classification into a 25-feature decision packet. A reinforcement-learning-trained MLP policy network (25 inputs → 128 → 64 → 6 action heads) decides how to respond, and a fragment-based composition engine renders the final answer with linguistic connectors.

QueryUser types
EmbeddingMiniLM-L6-v2
384-dim ONNX
Signal LayerBM25 + cosine
ensemble ranking
Features25-dim vector
extraction
PolicyMLP 25→128→64
6 action heads
ComposeFragment engine
+ connectors
ResponseRendered text

2. Signal Layer

After embedding, a lightweight signal layer prepares a structured DecisionPacket for the policy network. It combines multiple retrieval and classification signals into a coherent pre-policy feature bundle.

  1. BM25 Sparse Retrieval — Term-frequency-based scoring (k1=1.5, b=0.75) with IDF pre-computed from the knowledge base at load time. Provides complementary lexical matching alongside dense embeddings.
  2. Entity Extraction — Regex-based alias matching with session context enrichment. Detected entities are boosted in downstream ranking.
  3. Intent Classification — Cosine similarity against intent prototypes (definition, example, formal, application, comparison), calibrated with temperature=1.5 softmax for reliable confidence estimates.
  4. Ensemble Ranking — Dense cosine similarity (0.7 weight) and BM25 scores (0.3 weight) are fused into a combined ranking. A neural reranking pass applies a token-overlap bonus to refine the top results.
  5. Feature Extraction — The ensemble ranking, calibrated intent scores, entity data, and session context are compiled into the 25-feature vector that feeds the policy network.

The signal layer is stateless and runs entirely in the browser — no server calls, no external inference APIs. The resulting DecisionPacket contains the query embedding, entity list, calibrated intent distribution, dense and sparse rankings, confidence metrics, and session context.

3. Feature Extraction

Every query produces a 25-feature vector that feeds the policy network. These features capture similarity, entity presence, intent distribution, session history, and fragment metadata.

25-Feature Layout

IdxNameTypeRangeDescription
0qSimTop1f32[0,1]Ensemble similarity (dense + BM25) to top-1 KB entry
1qSimTop2f32[0,1]Ensemble similarity to top-2 KB entry
2entityCountu8[0,3]Named entities extracted (capped)
3entityBoostHitbool{0,1}Top-5 ranked entry matches a detected entity
4–8intent*Scoref32[0,1]Cosine scores vs definition, example, formal, application, comparison prototypes
9lastTopicSimf32[0,1]Cosine of query to last topic embedding
10lastTopicAgeu8[0,8]Turns since last topic change (capped)
11kbCoveragef32[0,1]Fraction of KB entries with sim > 0.25
12queryLenTokensu8[1,32]Token count after stop-word removal
13hasComparisonCuebool{0,1}"vs", "compare", "difference" detected
14hasFormalCuebool{0,1}"prove", "theorem", "formal" detected
15hasExampleCuebool{0,1}"example", "illustrate", "case" detected
16botCreativityf32[0,1]Bot profile creativity ceiling
17domainMatchf32[0,1]Max cosine to domain prototype embeddings
18followUpTypeu8[0,7]Session follow-up type (simplify, elaborate, etc.)
19wasAmbiguousbool{0,1}Previous turn flagged as ambiguous
20avgTruthConff32[0,1]Average truth confidence of fragments in top results
21avgSourceConff32[0,1]Average source confidence of fragments in top results
22minDifficultyu8[0,4]Minimum difficulty across available fragments
23fragDiversityu8[0,5]Distinct fragment styles available
24avoidWithCountf32[0,1]Fraction of top entries with compatibility constraints

4. Policy Network (MLP)

The policy is a multilayer perceptron trained via reinforcement learning to select the optimal response parameters given the 25-dim feature context.

Architecture

Input:     Float32Array(25) — 25 normalized features
  ↓
fc1:       Linear(25, 128) + ReLU       (3,328 params)
  ↓
fc2:       Linear(128, 64) + ReLU       (8,256 params)
  ↓
Heads (all share fc2 output, 64 dims):
  mode_head:        Linear(64, 5)   → softmax → [normal, off_topic, greeting, help, comparison]
  intent_head:      Linear(64, 5)   → softmax → [definition, example, formal, application, comparison]
  topic_count_head: Linear(64, 4)   → softmax → [1, 2, 3, 4]
  frag_count_head:  Linear(64, 4)   → softmax → [1, 2, 3, 4]
  creativity_head:  Linear(64, 1)   → sigmoid → [0, 1]
  tone_head:        Linear(64, 4)   → softmax → [neutral, formal, intuitive, playful]

Total:  ~13,079 parameters (trained, exportable)
Version: 0.2.0 (25-feature input, includes avoidWithCount)

Multi-Engine Inference

The policy runtime (policy/policy-runtime.js) attempts engines in priority order:

  1. WASM Engine — Compiled ONNX model via WebAssembly. Fastest path when available. _wasmPlanAnswer() in policy-runtime.js.
  2. MLP Engine — Pure-JS float32 math (policy/mlp-inference.js). No dependencies. Same architecture and weights as the PyTorch-trained model. Always available.
  3. Heuristic Fallback — 15 parameterized decision thresholds. Used when neither WASM nor MLP weights are loaded. Ensures the system is always functional.

5. Training Pipeline

The MLP policy network is trained offline using PyTorch, then exported as JSON weights for the browser-based JS engine.

Pipeline Stages

Step 1

Prompt Generation

Seed prompts are generated from KB entries and intent prototypes, then automatically augmented with synonym substitution, typos, informal phrasing, conversational context, and rephrasing. Target: 5000+ per bot. An optional LLM augmentation pass adds additional diversity.

Step 2

Retrieval Dataset

build_retrieval_dataset() embeds all KB entries and queries using sentence-transformers/all-MiniLM-L6-v2 (real embeddings) or a TF-IDF fallback when the library is unavailable. Computes per-sample cosine rankings, entity extractions, intent scores, and the full 25-feature vector.

Step 3

RL Training (REINFORCE)

The policy network is trained with a state-dependent value baseline. Each step: forward pass → sample actions → ε-greedy exploration → compute reward → policy gradient update with gradient clipping. The reward function has 6 dynamic components: intent match, topic precision, fragment coherence, length penalty, creativity alignment, and guardrail compliance.

Step 4

Weight Export

Trained PyTorch parameters are remapped to JS-compatible keys (fc1.weight, mode_head.bias, etc.) and exported to assets/models/policy/policy.weights.json. The JS MLPPolicy class validates all 16 weight tensor shapes at construction time (fail-fast).

Step 5

ONNX & WASM

export_onnx() freezes the PyTorch graph and exports to policy.onnx (opset 17, constant folding, validated). compile_wasm() compiles to WASM via available toolchains (wonnx-cli or onnx2json), with wasm-opt -O3 optimization. When no compilation tools are available, the JS MLP engine serves as the primary runtime.

6. Fragment-Based Response Composition

Each knowledge-base entry contains categorized fragments (def, int, ex, form, app) with metadata fields: truth_confidence, source_confidence, difficulty, style, avoid_with.

The policy produces an AnswerPlan specifying:

composeV2() in core/nlp.js reads the AnswerPlan and assembles the final text by selecting fragments, applying linguistic connectors ("For instance,", "More formally,", etc.), prefixed by openers and suffixed by closers — all indexed from the plan with modulo-safety.

Comparison Mode

When mode === 'comparison', the policy selects a comparisonOpenerKey (both, contrast, or similarity) from the template. The renderer uses patterned openers like "Both A and B are important concepts here." and distributes categories across multiple topics.

7. Action Schema & Validation

Every AnswerPlan passes through validatePlan() (policy/action-schema.js) which enforces:

8. Feature Serialization

For the WASM boundary, features are packed into a 107-byte buffer:

packFeatures(features) → {
  float32: Float32Array(25),     // offset 0,  100 bytes
  uint8:   Uint8Array(7),       // offset 100, 7 bytes
  buffer:  ArrayBuffer(107)      // total
}

Uint8Array layout:
  [0] = entityCount         (u8, 0-3)
  [1] = packed booleans     (bits: entityBoostHit|hasComparisonCue|hasFormalCue|hasExampleCue|wasAmbiguous)
  [2] = lastTopicAge        (u8, 0-8)
  [3] = queryLenTokens      (u8, 1-32)
  [4] = followUpType        (u8, 0-7)
  [5] = minDifficulty       (u8, 0-4)
  [6] = fragDiversity       (u8, 0-5)

9. Heuristic Fallback

When the MLP policy engine is unavailable (e.g., during cold start or weight load failure), planAnswerHeuristic() generates the same AnswerPlan structure using 15 parameterized decision thresholds covering greeting detection, off-topic handling, comparison fallback, entity boost, and creativity defaults. This ensures the system is always functional even without trained weights.

10. Open Source

The full codebase is available at github.com/yunusemrejr/relu-chat under the MIT license. This includes: