You've already forked dokumenta-semantiska-analize
Import UAPF package
The 1.x package was a single ai.extract call wrapped in three BPMN
service tasks. No decision logic, no dmn cornerstone, no weights — the
risk/routing/validation algorithm lived invisibly in host code. There
was nothing for a runtime to actually execute.
2.0.0 makes it a real process:
- dmn cornerstone added with three decision tables:
* assess-personal-data-risk — PII regex signals -> risk level
* gdpr-processing-route — risk x centralisation -> CENTRAL/LOCAL,
anonymisation, redaction level
* human-validation-gate — confidence thresholds + PII re-scan
-> REJECTED/PENDING_REVIEW/APPROVED_AUTO
- BPMN expanded 3 -> 6 nodes (3 serviceTask + 3 businessRuleTask),
with horizontal DI.
- Task ids, mappings, docs, manifest (dmn:true), uapf.yaml, lifecycle
and eval-set updated; added a PII-bearing fixture.
Only the semantic extraction remains a model step. Risk classification,
GDPR routing and validation gating are now explicit ranked DMN rules —
inspectable, versioned, portable. Breaking change: structure + outputs.
69 lines
2.8 KiB
Markdown
69 lines
2.8 KiB
Markdown
# Semantic Document Analysis
|
|
|
|
A UAPF Level-4 process package for extracting VDVC-conformant semantic
|
|
metadata from free-text documents.
|
|
|
|
## What this package is
|
|
|
|
A **real, inspectable process** — not a single AI call in BPMN costume.
|
|
The flow has six executable nodes; three of them are DMN decision tables
|
|
that carry the actual algorithm, with explicit ranked rules and weights.
|
|
|
|
```
|
|
Start
|
|
-> [service] Detect and redact PII ai.redact@1
|
|
-> [decision] Assess personal-data risk DMN assess-personal-data-risk
|
|
-> [decision] Decide GDPR processing route DMN gdpr-processing-route
|
|
-> [service] Extract semantic metadata ai.extract@1
|
|
-> [decision] Determine validation status DMN human-validation-gate
|
|
-> [service] Emit completed event event.emit@1
|
|
End
|
|
```
|
|
|
|
Only **one** node performs model inference (semantic extraction). PII
|
|
detection, risk classification, GDPR routing and the human-validation
|
|
gate are deterministic — the host cannot make them up.
|
|
|
|
## The decision tables (dmn/)
|
|
|
|
### assess-personal-data-risk
|
|
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces
|
|
HIGH; two or more PII categories, or contact data, gives MEDIUM; one
|
|
category LOW; nothing NONE. Hit policy FIRST (ranked).
|
|
|
|
### gdpr-processing-route
|
|
`personalDataRisk` x `allowCentralization` -> `processingRoute`
|
|
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
|
|
sensitive document whose owner has not permitted centralisation stays
|
|
LOCAL with full redaction. This is the routing rule lifted out of the
|
|
host's `generate_semantic_metadata`.
|
|
|
|
### human-validation-gate
|
|
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
|
|
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
|
|
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
|
|
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
|
|
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
|
|
|
|
## Capabilities required of the host
|
|
|
|
| Capability | Used by | Purpose |
|
|
|----------------|------------------------|----------------------------------|
|
|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
|
|
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
|
|
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
|
|
|
|
DMN decisions need no host capability — the runtime evaluates them.
|
|
|
|
## Output contract
|
|
|
|
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
|
|
output. The process additionally yields the DMN-decided fields
|
|
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
|
|
`humanValidationStatus`, `requiresHumanReview`).
|
|
|
|
## Compliance
|
|
|
|
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
|
|
minimisation. See `resources/guardrails.yaml` and `docs/`.
|