Nivalis - AI product engineering studio

Ship AI features that hold up in production.

Nivalis builds AI-enabled products for startups and e-commerce teams - and open-sources the tooling we rely on: evaluation, orchestration, and reliability.

Talk to us Browse open source

We work with a limited number of teams at a time.

Entity: Nivalis - AI product engineering studio
Focus: LLM features, retrieval, evals, orchestration, production reliability
Proof: Open-source libraries maintained by Nivalis
Engage: Ship an AI feature or adopt our OSS

Principles

We build AI systems like software systems: explicit constraints, measurable behavior, and clear ownership.

Outcomes over prompts.
Prompts are cheap; measurable behavior is not.
Evals are the product spec.
If you can't score it, you can't improve it.
Reliability beats cleverness.
Guardrails, fallbacks, and clear failure modes.
Latency is UX.
Tokens, calls, and retrieval have a cost - optimize end-to-end.
Data is leverage.
Use real conversations, real edge cases, real feedback loops.
Secure by default.
PII discipline, least privilege, vendor boundaries.
Open source when it compounds.
Publish reusable primitives, not glue.

What we build

AI product features

Assistants, copilots, semantic search, automated workflows. Designed to be useful, not magical.

AI infrastructure (the boring parts that matter)

Retrieval pipelines, tool/function calling, queues, caching, rate limits, and cost controls.

Quality systems

Offline evals, online monitoring, dataset curation, red-teaming, regression tests, human feedback loops.

Fast iteration, hard reliability.

Open source

We publish the components we use to build reliable AI systems - small, documented, opinionated.

Featured library

<LIB_NAME>

Why it exists: <recurring pain removed>

Install

npm i <package-name>

Browse all libraries

GitHub · Docs · Examples · Changelog

<LIB_2> - <one-line value>

<LIB_3> - <one-line value>

<LIB_4> - <one-line value>

How we work

We optimize for time to reliable- not time to demo.

Define success
We turn the feature into an eval suite + acceptance thresholds.
Build the system
Retrieval/tools/flows with guardrails, caching, and fallbacks.
Close the loop
Ship, measure, collect failures, improve the dataset, repeat.

Selected outcomes

Lowered hallucination rates on critical intents via targeted evals + retrieval constraints.
Reduced latency and cost through caching, batching, and prompt/token budgets.
Increased conversion on high-intent flows by integrating AI assistance with deterministic UX.

FAQ

Do you build "agents"?

Yes - when they're bounded. We prefer narrow tools with explicit permissions and measurable outcomes.

How do you ensure quality?

We treat evals as a first-class artifact: offline regression + online monitoring + human review on edge cases.

Which models do you use?

We're model-agnostic. We pick based on quality/latency/cost constraints and add fallbacks.

Can we use your open-source libraries commercially?

Yes, subject to each repository license.

Work with Nivalis

If you want AI features that ship with evals, observability, and sane failure modes:

Talk to us Browse open source