
Running an AI headshot product at scale is not a model problem. We had good models. It's an infrastructure problem — and for a long time, that infrastructure was quietly killing us.
GPU jobs failing silently. Customers paying for professional headshots and waiting a full day. Our engineering team spending entire sprints debugging providers instead of improving the product. Gross margins sitting at around 40% with no clear path up.
That changed when we started working with Runflow.
BetterPic processes hundreds of thousands of AI inference jobs every month — portrait generation, background removal, quality scoring, clothing swap, and more. Each job touches multiple models, requires specific GPU memory, and needs to return a result fast enough that the customer doesn't notice it happened.
The reality before Runflow: we had no retry logic. If a GPU job failed — which happened constantly due to quota limits, provider outages, or memory errors — it simply didn't complete. Customers opened support tickets. We opened dashboards. Time passed.
"Customers were paying for AI headshots and waiting a full day. That's not a cost problem, that's a product crisis."
— Thibaut Hennau, CEO, BetterPic
We faced an uncomfortable choice: hire a dedicated ML infrastructure and DevOps team — expensive, slow, and a distraction from our core product — or find a platform that had already solved this problem from the inside out.
Runflow is a ComfyUI deployment and orchestration platform built specifically for production AI workloads. The short version: we call an endpoint, and everything underneath — GPU routing, retry logic, quality checks, failover across datacenters — happens automatically.
Today, BetterPic runs more than 10 distinct AI inference pipelines through Runflow. Here's what that looks like in practice:
The part that changed our engineering velocity most wasn't any single feature — it was the things we stopped having to think about.
When a GPU job fails, Runflow retries automatically and routes across providers and datacenters. GPU quota limits and availability issues that used to cause outages are now invisible to us. Customer wait times collapsed. Support tickets dropped.
Runflow's Sentinel layer evaluates every output before delivery. It's not checking whether the job completed — it's checking whether the result is actually good. Eight specialized evaluation passes run per generation: face similarity, segmentation, pose analysis, and LLM-based judges for identity, garment fit, skin realism, and more. If an output fails, it's retried. If the retry passes, it ships. If it doesn't, we know about it before the customer does.
This is the only quality evaluation system of its kind outside of Google's Vertex AI enterprise platform — and it's available to us through a single configuration toggle.
Our team builds and iterates workflows in ComfyUI. When a workflow is ready, it deploys to Runflow's infrastructure directly from inside ComfyUI — no file uploads, no manual configuration, no DevOps ceremony. The endpoint is live with typed inputs, auto-generated API docs, and auto-scaling GPU already configured.
# One endpoint call from our application layer
curl -X POST https://api.runflow.io/v1/flows/headshot/run \
-H "Authorization: Bearer rf_..." \
-H "Content-Type: application/json" \
-d '{"images": ["selfie_1.jpg", "selfie_2.jpg"],
"style": "professional",
"quality": "premium"}'
Our developers don't need to understand GPU infrastructure, model quantization, or provider-level scheduling. They call an endpoint. Runflow handles the rest.
Runflow runs on L40S GPUs at $1.95 per hour — 44% cheaper than the market rate of $3.51/hr, billed by the second with zero idle cost. Workers scale to zero when not processing requests, which means we pay only for actual compute consumed.
But the pricing model is only part of the story. The bigger driver of margin improvement was the orchestration layer: intelligent GPU scheduling across providers, model quantization, and multi-step workflows combining open-source and closed-source models efficiently.
87% — Current Gross Margin | 40% → 87% — In 12 Months | 30%+ — Savings vs In-House
The steepest gains came in the first three to four months — primarily from eliminating silent job failures and their downstream costs in support, refunds, and churn. The curve has continued upward since, as optimization compounds over time.
Before Runflow: ~40% gross margin · 24-hour customer wait times · silent job failures · zero retry mechanisms.
After Runflow: 87% gross margin · reliable delivery · 10+ AI workflows running autonomously.
The organizational change is as significant as the financial one. BetterPic has no dedicated ML infrastructure team. No DevOps headcount focused on GPU management. Our engineers build product features — and when a new AI capability is ready, they ship it by calling an endpoint.
"Our developers can integrate a new AI feature by calling an endpoint. They don't need machine learning expertise, infrastructure knowledge, or DevOps skills. We just focus on making the best headshot product, and Runflow handles everything underneath."
— Thibaut Hennau, CEO, BetterPic
This is not a small thing. Every hour not spent debugging infrastructure is an hour spent improving the headshot product, expanding to new markets, or onboarding the next team customer. The leverage compounds.
We're sharing this because the problem we had — reliable, cost-efficient GPU orchestration for production AI workflows — is not unique to us. If you're running an AI product and your team spends meaningful time on infrastructure, retry logic, or quality control, the math on a managed orchestration layer is worth running.
Runflow's platform is now open to other teams. You can deploy any ComfyUI workflow as a production API, enable Sentinel quality evaluation, and get GPU pricing that undercuts the market — without building any of the supporting infrastructure yourself.
Every BetterPic headshot, background edit, and clothing swap runs through Runflow's infrastructure. If you're building with AI at scale, it's worth a look.

