Nivalis - AI product engineering studio

Ship AI features that hold up in production.

Nivalis builds AI-enabled products for startups and e-commerce teams - and open-sources the tooling we rely on: evaluation, orchestration, and reliability.

Talk to usBrowse open source

We work with a limited number of teams at a time.

Entity
Nivalis - AI product engineering studio
Focus
LLM features, retrieval, evals, orchestration, production reliability
Proof
Open-source libraries maintained by Nivalis
Engage
Ship an AI feature or adopt our OSS

Principles

We build AI systems like software systems: explicit constraints, measurable behavior, and clear ownership.

  • Outcomes over prompts.

    Prompts are cheap; measurable behavior is not.

  • Evals are the product spec.

    If you can't score it, you can't improve it.

  • Reliability beats cleverness.

    Guardrails, fallbacks, and clear failure modes.

  • Latency is UX.

    Tokens, calls, and retrieval have a cost - optimize end-to-end.

  • Data is leverage.

    Use real conversations, real edge cases, real feedback loops.

  • Secure by default.

    PII discipline, least privilege, vendor boundaries.

  • Open source when it compounds.

    Publish reusable primitives, not glue.

What we build

AI product features

Assistants, copilots, semantic search, automated workflows. Designed to be useful, not magical.

AI infrastructure (the boring parts that matter)

Retrieval pipelines, tool/function calling, queues, caching, rate limits, and cost controls.

Quality systems

Offline evals, online monitoring, dataset curation, red-teaming, regression tests, human feedback loops.

Fast iteration, hard reliability.

Open source

We publish the components we use to build reliable AI systems - small, documented, opinionated.

Featured library

<LIB_NAME>

<what it enables in one sentence>

Designed for: <evals | orchestration | memory | RAG | monitoring | etc.>

Why it exists: <recurring pain removed>

Install

npm i <package-name>
Browse all libraries
GitHub · Docs · Examples · Changelog
<LIB_2> - <one-line value>
<LIB_3> - <one-line value>
<LIB_4> - <one-line value>

How we work

We optimize for time to reliable- not time to demo.

  1. Define success

    We turn the feature into an eval suite + acceptance thresholds.

  2. Build the system

    Retrieval/tools/flows with guardrails, caching, and fallbacks.

  3. Close the loop

    Ship, measure, collect failures, improve the dataset, repeat.

Selected outcomes

  • Lowered hallucination rates on critical intents via targeted evals + retrieval constraints.
  • Reduced latency and cost through caching, batching, and prompt/token budgets.
  • Increased conversion on high-intent flows by integrating AI assistance with deterministic UX.

FAQ

Do you build "agents"?

Yes - when they're bounded. We prefer narrow tools with explicit permissions and measurable outcomes.

How do you ensure quality?

We treat evals as a first-class artifact: offline regression + online monitoring + human review on edge cases.

Which models do you use?

We're model-agnostic. We pick based on quality/latency/cost constraints and add fallbacks.

Can we use your open-source libraries commercially?

Yes, subject to each repository license.

Work with Nivalis

If you want AI features that ship with evals, observability, and sane failure modes: