RIN Engine — Experimental Neural Inference Runtime

Deterministic memory allocation, energy measurement, and CPU inference without GPU.

RIN Engine is an experimental C runtime + Python CLI for running neural network inference on CPU. It explores techniques for energy-efficient computation: arena allocation (zero malloc during inference), RAPL-based energy metering, and alternative normalization/softmax implementations using integer arithmetic.

Quick start

git clone https://github.com/KakashiTech/rin-engine.git
cd rin-engine
make shared
pip install -e .
rin bench --mode transformer

If you have a .rin model:

rin run model.rin --prompt "Hello, world!" --max-tokens 50

What works today (production-ready)

Component	Description	File
Arena allocator	O(1) allocation, O(1) reset, zero malloc/free during inference	`rin_arena.h`
RAPL energy meter	Per-inference energy measurement via Intel/AMD RAPL MSRs	`rin_energy_meter.h`
5 inference modes	MLP, SNN, ATTN, RIN, Transformer	`rin run --mode`
Python CLI	`rin {run,bench,energy,inspect,import}`	`rin/`
C test suite	8 quantitative validation tests, 0 failures	`make test`
Backend router	Auto-selects native CPython ext → ctypes fallback	`rin/_backend.py`

Verified performance

Hardware: AMD Ryzen 5 5600G, DDR4-3200, GCC 15. Model: 4-layer, 262K params, INT8.

Comparison	RIN Engine	ONNX	Ratio
Latency (µs/inf)	202	783 (FP32)	3.9× faster
Energy (µJ/pkg)	5,654	28,510 (FP32)	5.0× efficient

Note: RIN Engine uses INT8 quantization while ONNX runs FP32. This is the current baseline. A direct INT8-vs-INT8 comparison requires an INT8-capable ONNX Runtime build (ortextensions or onnxruntime-trt) — will be added in a future release.

Research components (experimental)

These components are implemented but not independently validated:

Component	Description	Status
LIF engine	Multiplication-free spiking neurons via bit shifts	`tested`
Power-of-Two softmax	Softmax using LUT instead of exp()	`tested`
BSPN	LayerNorm replacement using only bit shifts	`tested` (<0.5% error)
Phase gating	Spectral gating for dynamic sparsity	`unverified`
TDA validator	Betti numbers for model distillation quality	`unverified`
Mechanistic distill	Activation-pattern-preserving distillation	`unverified`
BHRR (Bounded Hash Retrieval & Replace)	Multi-slot associative memory (S=8, gradient descent through hash slots)	`tested` (dim=128/text, dim=256/logits)
AVX-512 kernels	VNNI, block sparsity, kernel fusion	`needs hardware`
DCT engine	Fixed-point DCT-II, AAN algorithm	`tested` (7.3% Q15 err)
Distributed infer	Multi-node protocol	`wireframe`

CLI reference

rin run model.rin --prompt "Hello" --max-tokens 100
rin bench model.rin --mode transformer
rin energy model.rin
rin inspect model.rin
rin import model.pth --output model.rin

Python API

from rin import RinEngine

with RinEngine("model.rin", mode="mlp") as engine:
    output = engine.generate("Hello", max_tokens=50)
    print(output)
    print(f"Energy: {engine.energy_joules:.3f} J")

Project structure

rin-engine/
├── rin/               ← Python package (CLI + runtime)
├── lib/               ← C inference engine + headers
├── _cengine.c         ← CPython C extension
├── experimental/      ← Research prototypes
├── benchmarks/        ← Benchmark scripts and data
├── bench-all.sh       ← Reproducible benchmark
├── Makefile           ← Build system
└── setup.py           ← Python package build

What this project IS and IS NOT

IS	IS NOT
Experimental CPU inference runtime	A production ML deployment framework
Arena + RAPL + CLI	A drop-in PyTorch replacement
Research prototype for novel techniques	A validated scientific contribution
Honest about verified vs unverified claims	A drop-in PyTorch replacement or production deployment framework

Known limitations

RAPL measures package energy only (no DRAM, I/O, PSU)
Turbo Boost not locked during benchmarks
Weight format compatibility between Python export and C loader may diverge
No statistical significance testing on benchmarks
AVX-512 kernels require hardware most consumers don't have

BHRR — Bounded Hash Retrieval & Replace

BHRR replaces the standard KV-cache attention with associative memory: K-sign bits are hashed to one of S slots; V-sign bits accumulate in that slot. Retrieval reconstructs a weighted context by hashing the query's K-sign to the same slot and element-wise multiplying with Q-sign.

Key properties:

O(1) memory per head: S × hd int32 accumulators (no KV sequence growth)
Differentiable slots: slot_id = Σ(K_sign[i]·(i+1)) mod S, gradients flow through to K
Multi-slot SNR: √(hd·S/K) — at hd=32, S=8, K=64 → 2.0
Training: scripts/train_bhrr_fast.py — vectorized multi-slot transformer with .rin export
C inference: rin_demo_bhrr — uint8 symmetric weights, Q15 arithmetic, VPMADDUBSW-based GEMV

Validation results (Shakespeare, CPU-only):

Config	Val loss	Output quality
dim=128, 2L, 4H, BHRR (476K params)	2.45	Coherent structured text ("LARI:\nS:\nWhererere t c...")
dim=128, 2L, 4H, softmax baseline (460K params)	2.40	—
dim=256, 2L, 8H, BHRR (1.7M params)	2.40	Meaningful English ("MI thand thay the thand the hand...")

BHRR output at dim=256 demonstrates that the hash-retrieval mechanism learns valid linguistic patterns, though uint8 quantization error at larger dimensions can shift the argmax on some tokens.

Architecture decision record: lib/BHRR_ADR.md

Building

make              # C library + test binaries
make test         # run validation suite (0 failures expected)
make shared       # build librin.so for ctypes
make rin_demo_bhrr# build BHRR C inference demo
pip install -e .  # build CPython extension + install CLI

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
experimental		experimental
lib		lib
rin		rin
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
_cengine.c		_cengine.c
bench-all.sh		bench-all.sh
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RIN Engine — Experimental Neural Inference Runtime

Quick start

What works today (production-ready)

Verified performance

Research components (experimental)

CLI reference

Python API

Project structure

What this project IS and IS NOT

Known limitations

BHRR — Bounded Hash Retrieval & Replace

Building

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RIN Engine — Experimental Neural Inference Runtime

Quick start

What works today (production-ready)

Verified performance

Research components (experimental)

CLI reference

Python API

Project structure

What this project IS and IS NOT

Known limitations

BHRR — Bounded Hash Retrieval & Replace

Building

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages