QiYuan Labs — An AI engineering studio

§ MANIFESTO

AI in production. Not in pitch decks.

Most AI features die between demo and deploy. The prompt that worked on Tuesday breaks on Thursday. The agent that looked autonomous needs a human to review every reply. The cost-per-query, on the call you forgot to make to finance, was twelve dollars.

QiYuan Labs is the studio you call when the demo has to become a product. We build the unglamorous infrastructure that keeps LLM features running in production — evaluation harnesses, retrieval pipelines, guardrails, cost telemetry, the model-routing layer your founder vision didn't account for. And the SaaS and commerce systems your AI features sit inside.

We ship the AI that survives contact with real users. Not the slide that wins the meeting.

§ 02 — AI IN PRODUCTION

What we ship in AI.

Production-grade LLM features, not weekend demos. Each block below is a deliverable we've shipped, with the eval, guardrails, and cost telemetry that keep it running.

AGENTS

Agentic workflows.

Multi-step agents on Claude or OpenAI, with the human-in-the-loop checkpoints production teams actually trust. Built around your tools, not a framework demo.

RAG

Retrieval over private data.

Search across your Drive, Notion, Slack, or product database. Citations back to source. Role-based access enforced at query time, not at index time.

COPILOTS

Internal copilots.

Support, sales, ops, analytics. We surface drafts in your team's existing tools — Gorgias, Zendesk, Linear, Slack — not a separate UI no one logs into.

EVAL

Eval harnesses.

Automated evaluation on every prompt or model change. Catch regressions before customers do, not in the Slack thread the day after launch.

SAFETY

Guardrails & safety.

Input filtering, output validation, jailbreak detection, PII scrubbing. The unglamorous layer that makes AI shippable to enterprise customers.

COST

Cost & cache tuning.

Prompt-cache strategy, model routing, batch processing. We bring LLM bills down by 40–70% without measurable quality loss. We measure both.

§ 03 — PRACTICES

Four practices. One partner.

AI is the lead practice, but production AI rarely lives alone. We also build the SaaS, commerce, and advisory layers your AI features depend on.

§ 01

AI Engineering

The full AI lifecycle in production — agents, RAG, copilots, eval, guardrails, cost. We use Claude, OpenAI, and open-weight models depending on what your workload actually needs.

Claude API
OpenAI
RAG
LangGraph
Eval pipelines

From $8,000 Per workstream · or $180/hr

§ 02

Custom SaaS Development

End-to-end product engineering for B2B SaaS — multi-tenant architecture, billing, admin tooling, and the observability you need to sleep at night. Frequently the home of an AI feature.

TypeScript
Postgres
ClickHouse
Stripe
AWS / GCP

From $15,000 Fixed scope · or T&M

§ 03

E-commerce Engineering

For brands scaling past their platform's defaults — headless storefronts, custom Shopify apps, checkout extensions, AI-driven product search, and the integrations that make a modern DTC stack work.

Shopify Plus
Hydrogen
Next.js
Adyen
Airwallex

From $5,000 Per integration

§ 04

Technical Advisory

Fractional CTO and architecture reviews — including AI strategy, vendor selection (Claude vs OpenAI vs open-weight), and the call no one wants to make on whether your prototype is ready for production.

AI strategy
Architecture
Security
Hiring
Cost audit

From $1,500 Per month, retainer

§ 04 — SELECTED WORK

Recent AI engagements.

Production AI we've shipped, plus the SaaS and commerce work alongside it. Clients named with permission, others anonymized at their request. Metrics are real.

AI · AgentInternal Tools2025

Agentic support copilot for a DTC apparel brand.

Claude-powered triage agent drafts replies for the support team. Retrieval over Shopify orders, returns, product specs. Eval harness on every prompt change.

62%

Reply-time cut

$11k

Monthly savings

0

Hallucinations / 90d

AI · RAGEnterprise2025

Knowledge retrieval for a 60-person services firm.

Claude-backed search over twelve years of Drive and Notion archives. Role-based access at query time, citations back to source, nightly embedding refresh.

3wk→4d

Ramp time

8.6

User NPS (/10)

100%

Citation accuracy

SaaS · B2BAnalytics2025

Multi-tenant analytics for a logistics SaaS.

Customer-facing analytics with row-level tenant isolation, custom query builder, and scheduled CSV/PDF delivery. ClickHouse-backed; Next.js + custom D3 chart library.

14 wks

Discovery → launch

3×

Query speed gain

22%

Tier-1 churn drop

CommerceShopify2025

Headless Shopify for a beauty brand expanding to APAC.

Liquid theme → Next.js Hydrogen on Cloudflare. Multi-currency Markets, Sanity content, custom subscription bundles via checkout extension.

+18%

Mobile conv. rate

1.4s

LCP, P75, APAC

3

Markets in 9 wks

All case studies

§ 05 — PRINCIPLES

How we operate.

Six principles we apply to every engagement. They're written down because we've been on the other side of partners who didn't.

01

Honest scoping.

We write a fixed scope before kickoff. If we can't see the path to delivery, we say so — and send you to someone who can.

02

Senior-only delivery.

Every engineer on your project has shipped production systems. No bait-and-switch to junior teams after signature.

03

You own the code.

Work-for-hire. You own the IP, the repo, the deployment, and the keys. Everything we build is yours to fork or hand off.

04

Ship to measure.

Every project ships with telemetry. We track the metric we promised to move and report against it weekly.

05

Async-first.

Daily written updates in Slack or Linear. Synchronous time reserved for design reviews, demos, and whiteboard conversations.

06

30-day support.

Every project includes a thirty-day post-launch support window for production bugs and tuning. No surprise invoices.

§ 06 — FIELD REPORTS

Word from clients.

People we've shipped for. Names used with permission; some titles abbreviated.

They scoped the project the way an internal team would, not the way an agency does. We knew exactly what we were buying — and the analytics module they shipped is now one of our top three reasons customers don't churn.

Marcus R.

Co-founder, Logistics SaaS

Our LLM copilot is in production with zero hallucination incidents in three months. The eval harness they built is now the template for the rest of our AI roadmap.

Jennifer L.

Head of Ops, DTC apparel brand

We ship the AI features that survive contact with customers.