An open-source, browser-based chatbot platform. A sentence transformer embeds queries and knowledge-base entries into 384-dimensional vectors. A signal layer extracts BM25 sparse signals, dense cosine similarities, entity matches, and intent scores — then fuses them into a 25-feature decision packet. An RL-trained MLP policy network decides how to respond, and a fragment-based composition engine renders the final answer.
ReLU.chat processes every message entirely in your browser. No API keys, no cloud LLMs, no telemetry. The system uses a sentence-transformer (all-MiniLM-L6-v2, quantized ONNX, ~22 MB) to embed queries and knowledge-base entries into 384-dimensional vectors. A signal layer combines BM25 sparse retrieval with dense cosine similarity, entity extraction, and temperature-calibrated intent classification into a 25-feature decision packet. A reinforcement-learning-trained MLP policy network (25 inputs → 128 → 64 → 6 action heads) decides how to respond, and a fragment-based composition engine renders the final answer with linguistic connectors.
After embedding, a lightweight signal layer prepares a structured DecisionPacket for the policy network. It combines multiple retrieval and classification signals into a coherent pre-policy feature bundle.
The signal layer is stateless and runs entirely in the browser — no server calls, no external inference APIs. The resulting DecisionPacket contains the query embedding, entity list, calibrated intent distribution, dense and sparse rankings, confidence metrics, and session context.
Every query produces a 25-feature vector that feeds the policy network. These features capture similarity, entity presence, intent distribution, session history, and fragment metadata.
| Idx | Name | Type | Range | Description |
|---|---|---|---|---|
| 0 | qSimTop1 | f32 | [0,1] | Ensemble similarity (dense + BM25) to top-1 KB entry |
| 1 | qSimTop2 | f32 | [0,1] | Ensemble similarity to top-2 KB entry |
| 2 | entityCount | u8 | [0,3] | Named entities extracted (capped) |
| 3 | entityBoostHit | bool | {0,1} | Top-5 ranked entry matches a detected entity |
| 4–8 | intent*Score | f32 | [0,1] | Cosine scores vs definition, example, formal, application, comparison prototypes |
| 9 | lastTopicSim | f32 | [0,1] | Cosine of query to last topic embedding |
| 10 | lastTopicAge | u8 | [0,8] | Turns since last topic change (capped) |
| 11 | kbCoverage | f32 | [0,1] | Fraction of KB entries with sim > 0.25 |
| 12 | queryLenTokens | u8 | [1,32] | Token count after stop-word removal |
| 13 | hasComparisonCue | bool | {0,1} | "vs", "compare", "difference" detected |
| 14 | hasFormalCue | bool | {0,1} | "prove", "theorem", "formal" detected |
| 15 | hasExampleCue | bool | {0,1} | "example", "illustrate", "case" detected |
| 16 | botCreativity | f32 | [0,1] | Bot profile creativity ceiling |
| 17 | domainMatch | f32 | [0,1] | Max cosine to domain prototype embeddings |
| 18 | followUpType | u8 | [0,7] | Session follow-up type (simplify, elaborate, etc.) |
| 19 | wasAmbiguous | bool | {0,1} | Previous turn flagged as ambiguous |
| 20 | avgTruthConf | f32 | [0,1] | Average truth confidence of fragments in top results |
| 21 | avgSourceConf | f32 | [0,1] | Average source confidence of fragments in top results |
| 22 | minDifficulty | u8 | [0,4] | Minimum difficulty across available fragments |
| 23 | fragDiversity | u8 | [0,5] | Distinct fragment styles available |
| 24 | avoidWithCount | f32 | [0,1] | Fraction of top entries with compatibility constraints |
The policy is a multilayer perceptron trained via reinforcement learning to select the optimal response parameters given the 25-dim feature context.
Input: Float32Array(25) — 25 normalized features
↓
fc1: Linear(25, 128) + ReLU (3,328 params)
↓
fc2: Linear(128, 64) + ReLU (8,256 params)
↓
Heads (all share fc2 output, 64 dims):
mode_head: Linear(64, 5) → softmax → [normal, off_topic, greeting, help, comparison]
intent_head: Linear(64, 5) → softmax → [definition, example, formal, application, comparison]
topic_count_head: Linear(64, 4) → softmax → [1, 2, 3, 4]
frag_count_head: Linear(64, 4) → softmax → [1, 2, 3, 4]
creativity_head: Linear(64, 1) → sigmoid → [0, 1]
tone_head: Linear(64, 4) → softmax → [neutral, formal, intuitive, playful]
Total: ~13,079 parameters (trained, exportable)
Version: 0.2.0 (25-feature input, includes avoidWithCount)
The policy runtime (policy/policy-runtime.js) attempts engines in priority order:
_wasmPlanAnswer() in policy-runtime.js.policy/mlp-inference.js). No dependencies. Same architecture and weights as the PyTorch-trained model. Always available.The MLP policy network is trained offline using PyTorch, then exported as JSON weights for the browser-based JS engine.
Seed prompts are generated from KB entries and intent prototypes, then automatically augmented with synonym substitution, typos, informal phrasing, conversational context, and rephrasing. Target: 5000+ per bot. An optional LLM augmentation pass adds additional diversity.
build_retrieval_dataset() embeds all KB entries and queries using sentence-transformers/all-MiniLM-L6-v2 (real embeddings) or a TF-IDF fallback when the library is unavailable. Computes per-sample cosine rankings, entity extractions, intent scores, and the full 25-feature vector.
The policy network is trained with a state-dependent value baseline. Each step: forward pass → sample actions → ε-greedy exploration → compute reward → policy gradient update with gradient clipping. The reward function has 6 dynamic components: intent match, topic precision, fragment coherence, length penalty, creativity alignment, and guardrail compliance.
Trained PyTorch parameters are remapped to JS-compatible keys (fc1.weight, mode_head.bias, etc.) and exported to assets/models/policy/policy.weights.json. The JS MLPPolicy class validates all 16 weight tensor shapes at construction time (fail-fast).
export_onnx() freezes the PyTorch graph and exports to policy.onnx (opset 17, constant folding, validated). compile_wasm() compiles to WASM via available toolchains (wonnx-cli or onnx2json), with wasm-opt -O3 optimization. When no compilation tools are available, the JS MLP engine serves as the primary runtime.
Each knowledge-base entry contains categorized fragments (def, int, ex, form, app) with metadata fields: truth_confidence, source_confidence, difficulty, style, avoid_with.
The policy produces an AnswerPlan specifying:
composeV2() in core/nlp.js reads the AnswerPlan and assembles the final text by selecting fragments, applying linguistic connectors ("For instance,", "More formally,", etc.), prefixed by openers and suffixed by closers — all indexed from the plan with modulo-safety.
When mode === 'comparison', the policy selects a comparisonOpenerKey (both, contrast, or similarity) from the template. The renderer uses patterned openers like "Both A and B are important concepts here." and distributes categories across multiple topics.
Every AnswerPlan passes through validatePlan() (policy/action-schema.js) which enforces:
For the WASM boundary, features are packed into a 107-byte buffer:
packFeatures(features) → {
float32: Float32Array(25), // offset 0, 100 bytes
uint8: Uint8Array(7), // offset 100, 7 bytes
buffer: ArrayBuffer(107) // total
}
Uint8Array layout:
[0] = entityCount (u8, 0-3)
[1] = packed booleans (bits: entityBoostHit|hasComparisonCue|hasFormalCue|hasExampleCue|wasAmbiguous)
[2] = lastTopicAge (u8, 0-8)
[3] = queryLenTokens (u8, 1-32)
[4] = followUpType (u8, 0-7)
[5] = minDifficulty (u8, 0-4)
[6] = fragDiversity (u8, 0-5)
When the MLP policy engine is unavailable (e.g., during cold start or weight load failure), planAnswerHeuristic() generates the same AnswerPlan structure using 15 parameterized decision thresholds covering greeting detection, off-topic handling, comparison fallback, entity boost, and creativity defaults. This ensures the system is always functional even without trained weights.
The full codebase is available at github.com/yunusemrejr/relu-chat under the MIT license. This includes:
core/ — NLP engine, chatbot engine, session memory, BM25 scorer, signal layerpolicy/ — Feature extractor, MLP inference, action schema, policy runtimedev/scripts/ — PyTorch training, weight export, prompt augmentation