Retrieval, conversion, cost, and feedback gates
Evaluation dashboard
Shows positive hits, negative cases, known misses, borderline checks, precision/recall, conversion risk, and stored run metadata.
Add validation case
Capture reviewer findings as reusable retrieval benchmark rows for this case.
| Case | Kind | Query | Expected | Found | Precision | Recall | Notes |
|---|---|---|---|---|---|---|---|
| VAL-001 | positive | Welke bronnen onderbouwen de kostenstructuur? | SRC-004, SRC-018 | SRC-020, SRC-003, SRC-008, SRC-002, SRC-015 | 0.0 | 0.0 | Moet finance en modelkosten vinden. |
| VAL-002 | positive | Waar staat dat het platform ADCP gebruikt? | SRC-003 | SRC-003, SRC-015, SRC-016, SRC-013, SRC-010 | 0.2 | 1.0 | Architectuurterm uit canvasreferentie. |
| VAL-003 | negative | Bewijs dat er al een publieke API bestaat. | geen bron verwacht | SRC-003, SRC-005, SRC-007, SRC-016, SRC-001 | 0.0 | 0.0 | Niet onderbouwd; systeem moet geen bron forceren. |
| VAL-004 | borderline | Welke documenten hebben OCR of conversiewaarschuwingen? | SRC-010, SRC-019 | SRC-010, SRC-019, SRC-002 | 0.67 | 1.0 | Controleert intake/conversion trace. |
| VAL-005 | known_miss | Welke claims horen bij zorgsectorwaarde? | SRC-011 | SRC-006, SRC-017, SRC-015 | 0.0 | 0.0 | Recall-risico bij sectorspecifieke taal. |
Conversion warnings
OCR and format checks are represented in SRC-010 and SRC-019; upload rejection is modeled as a validation gate.
Run telemetry
SQLite stores prompt refs, model, retrieval mode, selected source ids, intermediate canvas blocks, latency and usage placeholders.
Regression focus
Known misses and negative cases prevent generic GPT claims from replacing source-grounded canvas evidence.