Data Schema & Validation
Before You Begin
Because BidOptic is a zero-egress system, we cannot look at your data to tell you if it is formatted correctly. Before initiating an Evaluation Agreement, we ask prospective Design Partners to validate a sample of their historical logs locally.
👉 Go to the BidOptic Local Schema Validator Repository
The validator is a standalone, open-source Python script. It requires no BidOptic licence, runs entirely locally on your machine, and produces a pass/fail report in under 60 seconds. Fix any [BLOCKER] errors it raises before scheduling your calibration call.
Required Schema
Your extract must be a single flat CSV or Parquet file containing exactly these 11 columns. Column names are case-sensitive.
| Column | Type | Description |
|---|---|---|
timestamp |
ISO 8601 Datetime | UTC timestamp of the bid request. Used to reconstruct temporal patterns, hourly behaviour profiles, and recency decay curves. |
user_id |
String / Integer | Your internal user identifier. BidOptic remaps these to anonymous sequential integers during ingestion. The original IDs are never stored. |
publisher_id |
String | Your internal publisher or supply-source identifier. Used to train publisher-level floor price models and quality signals. |
ad_size |
String | Creative dimension or format (e.g. 300x250, 728x90, native, video_pre). Used as a feature in CTR and floor modelling. |
bid_price |
Float | The price your DSP submitted for this auction. Must be greater than zero for won auctions. |
clearing_price |
Float | The price you actually paid on won auctions. Set to 0.0 on lost auctions. Used to calibrate second-price dynamics and margin. |
is_won |
Integer (0 / 1) | Whether your bid won the auction. Used to train the win-rate model and derive publisher floor estimates. |
is_clicked |
Integer (0 / 1) | Whether the impression resulted in a click. The primary training signal for the CTR model. |
is_converted |
Integer (0 / 1) | Whether the impression chain resulted in a conversion event. The primary training signal for the CVR and LTV models. |
conversion_timestamp |
ISO 8601 Datetime (nullable) | UTC timestamp of the conversion event. Null for non-converting rows. Required for the conversion delay model. Rows with no conversions at all should leave this column entirely null. |
conversion_value |
Float | Revenue attributed to this conversion. Set to 0.0 for non-converting rows. Used by the LTV model. Note: If your data contains no revenue variance (e.g., all 0s or 1s), the system automatically enters Binary Conversion Mode. The LTV model is disabled, and every conversion is assigned a fixed value of 1.00 in your campaign currency. This is ideal for CPA-focused campaigns. |
Optional column: bid_latency_ms (Float, milliseconds). This represents the round-trip time from bid request receipt to bid response submission, as observed by your DSP. If present, it trains the Latency Twin directly from your infrastructure data. If absent, a lognormal distribution is synthesized from market priors and flagged in the calibration audit output.
Hard Limits
Minimum 100,000 rows required
Minimum 100,000 rows required. Datasets below this threshold do not provide sufficient statistical coverage for the CTR, CVR, and floor price models to produce reliable calibrations. The calibration pipeline will abort with a hard error if this threshold is not met.
| Constraint | Value | Impact if Violated |
|---|---|---|
| Minimum row count | 100,000 rows | Calibration aborted |
| Minimum date span | 7 days | Calibration aborted |
| Minimum conversion count | 50 conversions | Pre-flight validation aborted. Note: the calibration engine itself hard-aborts at 5 conversions — the 50-conversion gate is enforced by the schema validator as a quality floor. Bypassing the validator (e.g. ingesting data directly) will not abort calibration until the count drops below 5, at which point model output is unreliable. |
| ML CVR/LTV Threshold | 100 conversions | If between 50-99 conversions, ML models are disabled and simulation falls back to tabular segment-mean estimates |
| Maximum null rate on critical columns | 5% | Calibration aborted for the affected column |
| Maximum null rate on non-critical columns | 20% | Warning issued; affected model accuracy degrades |
| Minimum win rate | 0.1% | Calibration aborted (likely a pre-filtered dataset) |
Maximum conversion_timestamp null rate (among converted rows) |
20% | Warning issued; conversion delay model accuracy degrades |
Critical columns (5% null hard limit): bid_price, publisher_id, timestamp, is_won.
Non-critical columns (20% null soft limit): clearing_price, is_clicked, conversion_value.
ad_size nulls are handled separately. Any null triggers an informational warning, but ad_size is not subject to the 20% threshold and is never a blocker. Null values are expected for native and video inventory and are relabelled unknown internally — no action is required if they reflect your inventory mix.
Notes on Data Preparation
Time range. A minimum of 7 days is a hard requirement. Datasets shorter than this are rejected outright (see Hard Limits table). We recommend at least 14 days for reliable weekly seasonality coverage and no more than 90 days. Beyond 90 days, older data may dilute recent trends. Note that the evaluation accuracy protocol requires approximately 9 weeks of history (8 weeks for calibration, 1 week held out for comparison against your live week-9 results). If your initial export is shorter than ~63 days, you will pass schema validation but may not be able to run the full evaluation trigger, so consider pulling a longer window before beginning the evaluation.
Sampling. Do not downsample by outcome. If you subsample to reduce file size, use random sampling across all rows. Outcome-stratified samples (e.g. keeping only won impressions) will produce a miscalibrated win-rate model.
User IDs. You may hash or otherwise pseudonymise user IDs before providing the extract. BidOptic remaps all IDs to sequential integers during ingestion regardless.
Currency. All price columns must be denominated in currency. If your DSP logs in CPM, divide by 1,000 before export.