# Semantic Document Analysis A UAPF Level-4 process package for extracting VDVC-conformant semantic metadata from free-text documents. ## What this package is A **real, inspectable process** — not a single AI call in BPMN costume. The flow has six executable nodes; three of them are DMN decision tables that carry the actual algorithm, with explicit ranked rules and weights. ``` Start -> [service] Detect and redact PII ai.redact@1 -> [decision] Assess personal-data risk DMN assess-personal-data-risk -> [decision] Decide GDPR processing route DMN gdpr-processing-route -> [service] Extract semantic metadata ai.extract@1 -> [decision] Determine validation status DMN human-validation-gate -> [service] Emit completed event event.emit@1 End ``` Only **one** node performs model inference (semantic extraction). PII detection, risk classification, GDPR routing and the human-validation gate are deterministic — the host cannot make them up. ## The decision tables (dmn/) ### assess-personal-data-risk PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces HIGH; two or more PII categories, or contact data, gives MEDIUM; one category LOW; nothing NONE. Hit policy FIRST (ranked). ### gdpr-processing-route `personalDataRisk` x `allowCentralization` -> `processingRoute` (CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A sensitive document whose owner has not permitted centralisation stays LOCAL with full redaction. This is the routing rule lifted out of the host's `generate_semantic_metadata`. ### human-validation-gate `outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` -> `humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and `requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED; below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output -> APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights. ## Capabilities required of the host | Capability | Used by | Purpose | |----------------|------------------------|----------------------------------| | ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals | | ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction | | event.emit@1 | Task_EmitResult | Publish completion CloudEvent | DMN decisions need no host capability — the runtime evaluates them. ## Output contract `resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1 output. The process additionally yields the DMN-decided fields (`personalDataRisk`, `processingRoute`, `redactionLevel`, `humanValidationStatus`, `requiresHumanReview`). ## Compliance EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data minimisation. See `resources/guardrails.yaml` and `docs/`.