evaluation

Demo Manufacturing BV

Source intake runs stored source-traceable

Retrieval, conversion, cost, and feedback gates

Evaluation dashboard

Shows positive hits, negative cases, known misses, borderline checks, precision/recall, conversion risk, and stored run metadata.

Add validation case

Capture reviewer findings as reusable retrieval benchmark rows for this case.

5 cases
CaseKindQueryExpectedFoundPrecisionRecallNotes
VAL-001positiveWelke bronnen onderbouwen de kostenstructuur?SRC-004, SRC-018SRC-020, SRC-003, SRC-008, SRC-002, SRC-0150.00.0Moet finance en modelkosten vinden.
VAL-002positiveWaar staat dat het platform ADCP gebruikt?SRC-003SRC-003, SRC-015, SRC-016, SRC-013, SRC-0100.21.0Architectuurterm uit canvasreferentie.
VAL-003negativeBewijs dat er al een publieke API bestaat.geen bron verwachtSRC-003, SRC-005, SRC-007, SRC-016, SRC-0010.00.0Niet onderbouwd; systeem moet geen bron forceren.
VAL-004borderlineWelke documenten hebben OCR of conversiewaarschuwingen?SRC-010, SRC-019SRC-010, SRC-019, SRC-0020.671.0Controleert intake/conversion trace.
VAL-005known_missWelke claims horen bij zorgsectorwaarde?SRC-011SRC-006, SRC-017, SRC-0150.00.0Recall-risico bij sectorspecifieke taal.

Conversion warnings

OCR and format checks are represented in SRC-010 and SRC-019; upload rejection is modeled as a validation gate.

Run telemetry

SQLite stores prompt refs, model, retrieval mode, selected source ids, intermediate canvas blocks, latency and usage placeholders.

Regression focus

Known misses and negative cases prevent generic GPT claims from replacing source-grounded canvas evidence.