Pilot Batch Protocol: The Test Run That Saves a Full-Run Failure

The Run Before the Run

The single most expensive day in a CPG founder's first year is often the first full production run that turns out wrong. The labels peel. The fill weights miss spec. The pH drifts. The viscosity is off. The co-packer has already burned a kettle's worth of ingredients, used a slot of fill-line time, and produced a few thousand units of inventory that may or may not be sellable. The bill arrives whether the product is right or not.

The fix is a pilot batch: a small, controlled production run that proves the recipe works in this facility, on this equipment, with this packaging, before anyone commits to a full run. Done well, the pilot saves money, accelerates the launch, and surfaces every problem early enough to fix without panic.

What a Pilot Batch Is (and Isn't)

A pilot batch is not a kitchen test. It's not a co-packer "sample." It is a deliberately small production run inside the actual facility, on the actual equipment, with the actual packaging, run as a dress rehearsal of the full production day. Typical sizes range from 10 to 50 gallons, sometimes up to 100 gallons depending on the kettle minimum and the volume needed to test the fill line.

A pilot is not a free run. Co-packers price pilots, and the per-unit cost is much higher than commercial-scale production because all the fixed setup costs land on a small batch. Treat the pilot fee as insurance against a much more expensive failure on the full run. It usually pays for itself ten times over.

Sizing the Pilot

Right-size the pilot to test the questions that matter most.

If your concern is recipe behavior in the kettle, you need enough volume to get a representative cook. A 5-gallon pilot in a 100-gallon kettle isn't a real test. It heats and evaporates differently. Aim for at least 30 to 40 percent of the kettle's working capacity, ideally more.

If your concern is fill line and packaging, you need enough finished product to run the line at production speed for at least 15 to 30 minutes. Setup-to-running on a filler is where most issues surface, and you need enough material to push past the setup phase into steady-state operation.

If your concern is shelf life or post-fill behavior, you need enough finished units to set aside a meaningful sample for storage testing, typically 24 to 48 jars minimum, often more.

The pilot size that satisfies all three concerns at once is usually 30 to 50 gallons, producing roughly 800 to 1,500 units depending on container size.

What to Test on the Pilot

This is the part most founders skip. They run the pilot and "see how it goes." That's not a test, it's a vibe. Build a checklist before pilot day and bring it with you.

Flavor match

Side-by-side taste against the gold-standard kitchen sample. Rate at the line, at one hour, at 24 hours, and ideally at one week. Flavor drift in the first 24 hours after fill is common and tells you whether the recipe needs adjustment.

Viscosity

Bostwick or Brookfield, at multiple time points, at a defined temperature. Compare to your spec. See sauce viscosity at scale for the deeper version of this.

Fill weight

Random sample at least 20 jars off the line and weigh each. Calculate average and standard deviation. Compare to label declaration. Fill weight is one of the most common spec failures, and it's easy to miss until a weights and measures inspector flags it.

pH and water activity

If your product is acidified or relies on Aw for shelf stability, measure both at multiple points across the batch. Verify spec compliance and check batch homogeneity.

Brix

For sweetened or reduction-style products, confirm Brix lands in spec.

Label adhesion

Apply labels to filled jars immediately, and again to jars that have been refrigerated and returned to room temperature (simulating warehouse conditions). Check for bubbling, lifting, or flagging at the corners.

Sealing and closure integrity

Pull random jars and run vacuum check, torque test, or inverted leak test depending on closure type. Set aside a few for two-week and four-week revisits.

Color and visual presentation

Photograph the jar against a neutral background. Compare to your reference. Color drift between kitchen and kettle is common, especially for tomato-based or dark sauces.

Yield

Compare expected to actual yield. A yield miss of 10 percent or more flags evaporation, transfer loss, or formulation error that needs to be reconciled before the full run.

The Data Sheet to Bring

Walk into pilot day with a one-page data sheet that lists every measurement, the target value, the acceptable range, and a column for the actual reading. Have your co-packer's QC tech sign each row as it's verified. This sheet becomes your go/no-go document for the full run, and it lives in your batch record file forever.

Without this sheet, you'll show up to the pilot, watch a few things, and leave with vague impressions instead of decisions.

Who Should Be in the Room

The brand owner. The recipe developer or formulator. The co-packer's production lead. Ideally the co-packer's QC tech. If the recipe involves any third-party ingredient with sensitivity (a custom spice blend, a specialty hot pepper, a unique vinegar), having that supplier reachable by phone helps.

What you do not want: a pilot run remotely while you wait for an email summary. The point is to be in the room and see the equipment behave with your recipe.

Decision Criteria for Go/No-Go on Full Production

Define these before the pilot, in writing. A useful framework:

Green (proceed to full production): All critical specs in range (pH, Aw, Brix, fill weight average, viscosity at 24 hours). Flavor match approved by brand. Sealing and closure integrity pass.

Yellow (proceed with adjustment): Specs largely in range with one or two flags that have a clear, named fix (e.g., viscosity 1 cm low on Bostwick), addressed by a 0.05 percent xanthan increase before full run. Run the adjustment in a small re-test or commit to it on the full run with a tight QC gate.

Red (do not run full production): A critical spec misses by a meaningful margin, or the flavor doesn't match, or a packaging integrity issue surfaces. Reformulate or re-engineer the process before scheduling the full run.

The tempting move when a yellow shows up is to run the full production anyway and hope. Don't. The fixes you apply on a 50-gallon pilot are cheap. The same fixes applied to a 1,000-gallon failed production run are not.

What a Pilot Costs

Variable, but typical ranges I see for sauces and condiments:

Co-packer pilot fee: $1,500 to $5,000 for kettle time, fill setup, and labor on a 30 to 50 gallon run. Some co-packers credit part of this against the eventual full run; ask.

Ingredients: at pilot scale, often $300 to $1,500 depending on recipe. Often higher per unit than commercial scale because you can't access full-pallet pricing on a small order.

Packaging: a few hundred to a few thousand dollars depending on whether you can buy partial pallets or have to commit to full minimums. This is sometimes the binding constraint that makes pilots inconvenient.

Lab work: optional but valuable. A few hundred dollars to send finished product out for pH, Aw, or microbial baseline testing.

All-in, a structured pilot for a sauce or condiment commonly lands in the $3,000 to $8,000 range. That's expensive sounding until you compare it to a wrong full run.

Frequently Asked Questions

Can I skip the pilot and go straight to a small full run?

You can, and many founders do. The trade-off is that the smallest full run on most co-packer schedules is several thousand units, so a "small" mistake costs several thousand units of bad inventory. A pilot caps that downside.

What if the co-packer says they don't do pilots?

Many large co-packers don't, or charge so much that the economics tilt toward just running the full batch and accepting risk. In that case, find a smaller co-packer for the pilot (a craft or commissary partner) and run the production at the larger facility once the recipe is locked.

Should I do a pilot for every reformulation, or just the first launch?

For meaningful changes (ingredient swaps, process changes, packaging changes), yes, run a pilot or at minimum a tightly controlled first batch with full QC. For minor adjustments (small spice tweak, sweetness adjustment within range), often a side-by-side at the start of a normal production run is enough.

Who keeps the pilot data?

You do. The brand owns the recipe and the spec. Get a copy of every measurement, every photo, and every notation from pilot day. The co-packer keeps their own records, but yours is the master file.

The Honest Reason Pilots Get Skipped

Most founders skip pilots because they want to feel like they're moving forward, and a pilot feels like a delay. In practice, pilots almost always accelerate the timeline, because they prevent the much longer setback of a failed full run. If you're approaching a first production run, a pilot is the cheapest insurance you can buy. Book a Free Discovery Call if you want help structuring one.