Data Schema & Validation
Before You Begin
Because BidOptic is a zero-egress system, we cannot look at your data to tell you if it is formatted correctly. Before initiating an Evaluation Agreement, we ask prospective Design Partners to validate a sample of their historical logs locally.
👉 Go to the BidOptic Local Schema Validator Repository
The validator is a standalone, open-source Python script. It requires no BidOptic licence, runs entirely locally on your machine, and produces a pass/fail report in under 60 seconds. Fix any [BLOCKER] errors it raises before scheduling your calibration call.
Required Schema
Your extract must be a single flat CSV or Parquet file containing exactly these 11 columns. Column names are case-sensitive.
| Column | Type | Description |
|---|---|---|
timestamp |
ISO 8601 Datetime | UTC timestamp of the bid request. Used to reconstruct temporal patterns, hourly behaviour profiles, and recency decay curves. |
user_id |
String / Integer | Your internal user identifier. BidOptic remaps these to anonymous sequential integers during ingestion. The original IDs are never stored. |
publisher_id |
String | Your internal publisher or supply-source identifier. Used to train publisher-level floor price models and quality signals. |
ad_size |
String | Creative dimension or format (e.g. 300x250, 728x90, native, video). Used as a feature in CTR and floor modelling. |
bid_price |
Float (USD) | The price your DSP submitted for this auction. Must be greater than zero for won auctions. |
clearing_price |
Float (USD) | The price you actually paid on won auctions. Set to 0.0 on lost auctions. Used to calibrate second-price dynamics and margin. |
is_won |
Integer (0 / 1) | Whether your bid won the auction. Used to train the win-rate model and derive publisher floor estimates. |
is_clicked |
Integer (0 / 1) | Whether the impression resulted in a click. The primary training signal for the CTR model. |
is_converted |
Integer (0 / 1) | Whether the impression chain resulted in a conversion event. The primary training signal for the CVR and LTV models. |
conversion_timestamp |
ISO 8601 Datetime (nullable) | UTC timestamp of the conversion event. Null for non-converting rows. Required for the conversion delay model. Rows with no conversions at all should leave this column entirely null. |
conversion_value |
Float (USD) | Revenue attributed to this conversion. Set to 0.0 for non-converting rows. Used by the LTV (Tweedie) model. Note: If your data contains no revenue variance (e.g., all 0s or 1s), the system automatically enters Binary Conversion Mode. The LTV model is disabled, and every conversion is fixed at $1.00. This is ideal for CPA-focused campaigns. |
Optional column: bid_latency_ms (Float, milliseconds) — round-trip time from bid request receipt to bid response submission, as observed by your DSP. If present, it trains the Latency Twin directly from your infrastructure data. If absent, a lognormal distribution is synthesized from market priors and flagged in the calibration audit output.
Hard Limits
Minimum 100,000 rows required
Minimum 100,000 rows required. Datasets below this threshold do not provide sufficient statistical coverage for the CTR, CVR, and floor price models to produce reliable calibrations. The calibration pipeline will abort with a hard error if this threshold is not met.
| Constraint | Value | Impact if Violated |
|---|---|---|
| Minimum row count | 100,000 rows | Calibration aborted |
| Minimum conversion count | 50 conversions | Calibration aborted |
| ML CVR/LTV Threshold | 100 conversions | If between 50-99 conversions, ML models are disabled and simulation falls back to tabular segment-mean estimates |
| Maximum null rate on critical columns | 5% | Calibration aborted for the affected column |
| Maximum null rate on non-critical columns | 20% | Warning issued; affected model accuracy degrades |
| Minimum win rate | 0.1% | Calibration aborted (likely a pre-filtered dataset) |
Maximum conversion_timestamp null rate (among converted rows) |
20% | Warning issued; conversion delay model accuracy degrades |
Critical columns (5% null hard limit): bid_price, publisher_id, timestamp, is_won.
Non-critical columns (20% null soft limit): clearing_price, is_clicked, conversion_value, ad_size. Null ad_size values are expected for native and video inventory and are relabelled unknown internally — no action is required if they reflect your inventory mix.
Notes on Data Preparation
Time range. We recommend a minimum of 14 days of data and a maximum of 90 days. Data older than 90 days may reflect market conditions that no longer apply. The calibration report includes a freshness warning if the dataset end date is more than 30 days before the calibration run date.
Sampling. Do not downsample by outcome. If you subsample to reduce file size, use random sampling across all rows. Outcome-stratified samples (e.g. keeping only won impressions) will produce a miscalibrated win-rate model.
User IDs. You may hash or otherwise pseudonymise user IDs before providing the extract. BidOptic remaps all IDs to sequential integers during ingestion regardless.
Currency. All price columns must be denominated in USD. If your DSP logs in CPM, divide by 1,000 before export.