Data Schema & Validation

Before You Begin

Because BidOptic is a zero-egress system, we cannot look at your data to tell you if it is formatted correctly. Before initiating an Evaluation Agreement, we ask prospective Design Partners to validate a sample of their historical logs locally.

👉 Go to the BidOptic Local Schema Validator Repository

The validator is a standalone, open-source Python script. It requires no BidOptic licence, runs entirely locally on your machine, and produces a pass/fail report in under 60 seconds. Fix any [BLOCKER] errors it raises before scheduling your calibration call.

Required Schema

Your extract must be a single flat CSV or Parquet file containing exactly these 11 columns. Column names are case-sensitive.

Column	Type	Description
`timestamp`	ISO 8601 Datetime	UTC timestamp of the bid request. Used to reconstruct temporal patterns, hourly behaviour profiles, and recency decay curves.
`user_id`	String / Integer	Your internal user identifier. BidOptic remaps these to anonymous sequential integers during ingestion. The original IDs are never stored.
`publisher_id`	String	Your internal publisher or supply-source identifier. Used to train publisher-level floor price models and quality signals.
`ad_size`	String	Creative dimension or format (e.g. `300x250`, `728x90`, `native`, `video_pre`). Used as a feature in CTR and floor modelling.
`bid_price`	Float	The price your DSP submitted for this auction. Must be greater than zero for won auctions.
`clearing_price`	Float	The price you actually paid on won auctions. Set to `0.0` on lost auctions. Used to calibrate second-price dynamics and margin.
`is_won`	Integer (0 / 1)	Whether your bid won the auction. Used to train the win-rate model and derive publisher floor estimates.
`is_clicked`	Integer (0 / 1)	Whether the impression resulted in a click. The primary training signal for the CTR model.
`is_converted`	Integer (0 / 1)	Whether the impression chain resulted in a conversion event. The primary training signal for the CVR and LTV models.
`conversion_timestamp`	ISO 8601 Datetime (nullable)	UTC timestamp of the conversion event. Null for non-converting rows. Required for the conversion delay model. Rows with no conversions at all should leave this column entirely null.
`conversion_value`	Float	Revenue attributed to this conversion. Set to `0.0` for non-converting rows. Used by the LTV model. Note: If your data contains no revenue variance (e.g., all 0s or 1s), the system automatically enters Binary Conversion Mode. The LTV model is disabled, and every conversion is assigned a fixed value of 1.00 in your campaign currency. This is ideal for CPA-focused campaigns.

Optional column: bid_latency_ms (Float, milliseconds). This represents the round-trip time from bid request receipt to bid response submission, as observed by your DSP. If present, it trains the Latency Twin directly from your infrastructure data. If absent, a lognormal distribution is synthesized from market priors and flagged in the calibration audit output.

Hard Limits

Minimum 100,000 rows required

Minimum 100,000 rows required. Datasets below this threshold do not provide sufficient statistical coverage for the CTR, CVR, and floor price models to produce reliable calibrations. The calibration pipeline will abort with a hard error if this threshold is not met.

Constraint	Value	Impact if Violated
Minimum row count	100,000 rows	Calibration aborted
Minimum date span	7 days	Calibration aborted
Minimum conversion count	50 conversions	Pre-flight validation aborted. Below this count, model output is statistically unreliable, do not bypass this check.
ML CVR/LTV Threshold	100 conversions	If between 50-99 conversions, ML models are disabled and simulation falls back to tabular segment-mean estimates
Maximum null rate on critical columns	5%	Calibration aborted for the affected column
Maximum null rate on non-critical columns	20%	Warning issued; affected model accuracy degrades
Minimum win rate	0.1%	Calibration aborted (likely a pre-filtered dataset)
Maximum `conversion_timestamp` null rate (among converted rows)	20%	Warning issued; conversion delay model accuracy degrades

Critical columns (5% null hard limit): bid_price, publisher_id, timestamp, is_won.

Non-critical columns (20% null soft limit): clearing_price, is_clicked, conversion_value.

ad_size nulls are handled separately. Any null triggers an informational warning, but ad_size is not subject to the 20% threshold and is never a blocker. Null values are expected for native and video inventory and are relabelled unknown internally — no action is required if they reflect your inventory mix.

Notes on Data Preparation

Time range. A minimum of 7 days is a hard requirement. Datasets shorter than this are rejected outright (see Hard Limits table). We recommend at least 14 days for reliable weekly seasonality coverage and no more than 90 days. Beyond 90 days, older data may dilute recent trends. Note that the evaluation accuracy protocol requires approximately 9 weeks of history (8 weeks for calibration, 1 week held out for comparison against your live week-9 results). If your initial export is shorter than ~63 days, you will pass schema validation but may not be able to run the full evaluation trigger, so consider pulling a longer window before beginning the evaluation.

Sampling. Do not downsample by outcome. If you subsample to reduce file size, use random sampling across all rows. Outcome-stratified samples (e.g. keeping only won impressions) will produce a miscalibrated win-rate model.

User IDs. You may hash or otherwise pseudonymise user IDs before providing the extract. BidOptic remaps all IDs to sequential integers during ingestion regardless.

Currency. All price columns must be denominated in currency. If your DSP logs in CPM, divide by 1,000 before export.