Overhead close-up of an open engineering notebook filled with architecture diagrams and handwritten data schema notes, a mechanical pencil resting diagonally across the page, flat north-facing daylight, no shadows

/ Work we shipped

The problems, the pivots, and what actually shipped.

Each project brief starts with the problem statement as written on day one. Then we show how the data changed it. Outcomes are measured in latency, cost, accuracy, and adoption — not benchmark scores.

Wide shot of two engineers at a whiteboard covered in data-flow diagrams, one pointing at a node cluster, flat architectural studio light, genuine work moment, no eye contact with camera

Close-up of hands annotating a printed contract page with a red pen, dense margin notes visible, flat daylight from a window to the left, paper texture sharp in focus

— Four shipped systems

Projects indexed by problem, not by client.

Demand forecasting

Why the model was right and the data was wrong.

Initial brief: improve inventory accuracy. Actual problem: upstream ERP exports were dropping nulls silently. Rebuilt the ingestion layer before touching the model. Result: 34% reduction in stockout events within 90 days.

Outcome: −34% stockouts, +18% forecast accuracy, zero retraining in 6 months.

Document intelligence

The workflow nobody mapped was the one breaking the system.

Brief: automate contract review. Discovery: reviewers had three informal workarounds not in any SOP. Redesigned extraction targets around actual behavior. Adoption hit 78% in week two — no training required.

Outcome: 78% adoption week 2, review time −61%, zero rollback requests.

Anomaly detection

Retrieval-augmented generation

We shipped the second model. The first one taught us why.

The knowledge base was clean. The query patterns were not.

First model flagged everything — alert fatigue killed adoption in week one. Spent three weeks redefining what 'anomaly' meant to the ops team. Second model shipped with a 94% true-positive rate.

Brief: internal Q&A over proprietary docs. Problem: users asked questions the docs never anticipated. Rebuilt the chunking and retrieval layer around real query logs. Latency dropped to under 800ms at p95.

Outcome: 94% true-positive rate, mean response time −47%, ops team expanded coverage to two new services.

Outcome: p95 latency 780ms, answer relevance score +41%, support ticket volume −29%.

Start a conversation

Pattern recognition

We document what we didn't ship.

Every case study here includes a section on the approach we abandoned and why. That's not humility — it's the most technically useful thing we can hand a new client before scoping begins.

Groundwork AI

The plumbing that makes the magic work.

Pages

Home

Services

Our Work

About

Writing

Contact

Start a conversation

hello@groundwork.ai

New York, NY — remote-first

Response within one business day

Problem first. Code second.