diff --git a/README.md b/README.md index 2758e5c..5d2b1ba 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,68 @@ -# dev.uapf.semantic-document-analysis +# Semantic Document Analysis -UAPF v1.1 SSOT-conformant Level 4 process package providing reusable -semantic document analysis as `Process_SemanticDocumentAnalysis`. +A UAPF Level-4 process package for extracting VDVC-conformant semantic +metadata from free-text documents. -See `docs/00-overview.md` for what it does, `docs/01-eu-ai-act.md` for -the regulatory analysis, and `docs/02-integration.md` for runtime -integration notes. +## What this package is -## Validates against +A **real, inspectable process** — not a single AI call in BPMN costume. +The flow has six executable nodes; three of them are DMN decision tables +that carry the actual algorithm, with explicit ranked rules and weights. -Run `jsonschema -i ` against -the canonical schemas in -`github.com/UAPFormat/UAPF-specification/schemas/`: +``` +Start + -> [service] Detect and redact PII ai.redact@1 + -> [decision] Assess personal-data risk DMN assess-personal-data-risk + -> [decision] Decide GDPR processing route DMN gdpr-processing-route + -> [service] Extract semantic metadata ai.extract@1 + -> [decision] Determine validation status DMN human-validation-gate + -> [service] Emit completed event event.emit@1 +End +``` -- `uapf-manifest.schema.json` (root manifest) -- `ownership.schema.json` (metadata/ownership.yaml) -- `lifecycle.schema.json` (metadata/lifecycle.yaml) -- `resource-mapping.schema.json` (resources/mappings.yaml) +Only **one** node performs model inference (semantic extraction). PII +detection, risk classification, GDPR routing and the human-validation +gate are deterministic — the host cannot make them up. + +## The decision tables (dmn/) + +### assess-personal-data-risk +PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces +HIGH; two or more PII categories, or contact data, gives MEDIUM; one +category LOW; nothing NONE. Hit policy FIRST (ranked). + +### gdpr-processing-route +`personalDataRisk` x `allowCentralization` -> `processingRoute` +(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A +sensitive document whose owner has not permitted centralisation stays +LOCAL with full redaction. This is the routing rule lifted out of the +host's `generate_semantic_metadata`. + +### human-validation-gate +`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` -> +`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and +`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED; +below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output -> +APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights. + +## Capabilities required of the host + +| Capability | Used by | Purpose | +|----------------|------------------------|----------------------------------| +| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals | +| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction | +| event.emit@1 | Task_EmitResult | Publish completion CloudEvent | + +DMN decisions need no host capability — the runtime evaluates them. + +## Output contract + +`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1 +output. The process additionally yields the DMN-decided fields +(`personalDataRisk`, `processingRoute`, `redactionLevel`, +`humanValidationStatus`, `requiresHumanReview`). + +## Compliance + +EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data +minimisation. See `resources/guardrails.yaml` and `docs/`. diff --git a/bpmn/semantic-document-analysis.bpmn b/bpmn/semantic-document-analysis.bpmn index dc83acd..09e268a 100644 --- a/bpmn/semantic-document-analysis.bpmn +++ b/bpmn/semantic-document-analysis.bpmn @@ -14,45 +14,89 @@ - - Calls ai.redact@1 to mask names, identifiers, addresses, financial - and health data before downstream extraction. Required by - resources/guardrails.yaml (GDPR Art. 5 minimisation). + Calls ai.redact@1 over the source text. Beyond masking, the host + runs the four Latvian PII regex detectors (personas kods, IBAN, + e-mail, phone) and returns the deterministic signal set the risk + decision consumes: personasKodaPresent, financialDataPresent, + contactDataPresent, piiCategoryCount, detectedEntityTypes, plus + redactedContent. No model inference — pure pattern detection. + + + DMN dmn/assess-personal-data-risk.dmn. Maps the PII signal set to + personalDataRisk (NONE | LOW | MEDIUM | HIGH) by explicit ranked + rules. Personas kods or IBAN forces HIGH; two or more categories + or contact data gives MEDIUM. Deterministic and auditable. + + + + + + DMN dmn/gdpr-processing-route.dmn. From personalDataRisk and + allowCentralization decides processingRoute (CENTRAL | LOCAL), + anonymizationRequired and redactionLevel. This is the routing + rule extracted from the host's generate_semantic_metadata: a + sensitive document where centralisation is not permitted stays + LOCAL with full redaction. + + + - Calls ai.extract@1 with the redacted text and the VDVC v1.1 output - schema (resources/schemas/vdvc-semantic-summary.schema.json). The - host's AI agent must produce output that validates against that - schema. Output records aiModelVersion + aiConfidenceScore per - EU AI Act Art. 13. + Calls ai.extract@1 on redactedContent with the VDVC v1.1 output + schema. This is the single bounded model step: it produces the + semanticSummary (topic, summary, keywords, urgency, risk) and + must validate against resources/schemas/vdvc-semantic-summary. + The host also returns flat aiConfidenceScore and the result of + the post-extraction PII re-scan as outputPiiErrorCount. - + + DMN dmn/human-validation-gate.dmn. From outputPiiErrorCount, + aiConfidenceScore and personalDataRisk decides + humanValidationStatus (REJECTED | PENDING_REVIEW | APPROVED_AUTO) + and requiresHumanReview. Any leaked PII or confidence below 0.3 + rejects; below 0.7, or HIGH risk, forces review; 0.7 and above + with clean output auto-approves. The thresholds are the weights. + + + + - Calls event.emit@1 to publish a CloudEvent containing the extracted - semantic summary. Downstream processes consume this event. + Calls event.emit@1 to publish a CloudEvent carrying the semantic + summary, the routing decision and the validation status. - - - - + + + + + + + @@ -61,33 +105,54 @@ - - + + + + + + + + - + - - + + + + + - + - - + + - + - + + + + + + + + + + + + + diff --git a/dmn/assess-personal-data-risk.dmn b/dmn/assess-personal-data-risk.dmn new file mode 100644 index 0000000..413f5f6 --- /dev/null +++ b/dmn/assess-personal-data-risk.dmn @@ -0,0 +1,71 @@ + + + + + + personasKodaPresent + + + financialDataPresent + + + contactDataPresent + + + piiCategoryCount + + + + + true + - + - + - + "HIGH" + "Personas kods (national ID) detected" + + + - + true + - + - + "HIGH" + "Financial account identifier (IBAN) detected" + + + - + - + - + [2..999] + "MEDIUM" + "Two or more distinct PII categories present" + + + - + - + true + - + "MEDIUM" + "Contact data (email/phone) detected" + + + - + - + - + 1 + "LOW" + "Single PII category present" + + + - + - + - + - + "NONE" + "No personal data detected by regex scan" + + + + diff --git a/dmn/gdpr-processing-route.dmn b/dmn/gdpr-processing-route.dmn new file mode 100644 index 0000000..37711c9 --- /dev/null +++ b/dmn/gdpr-processing-route.dmn @@ -0,0 +1,60 @@ + + + + + + personalDataRisk + + + allowCentralization + + + + + + "HIGH","MEDIUM" + false + "LOCAL" + true + "FULL" + + + - + false + "LOCAL" + false + "PARTIAL" + + + "HIGH","MEDIUM" + true + "CENTRAL" + true + "FULL" + + + "LOW" + true + "CENTRAL" + false + "PARTIAL" + + + "NONE" + true + "CENTRAL" + false + "NONE" + + + - + - + "CENTRAL" + false + "PARTIAL" + + + + diff --git a/dmn/human-validation-gate.dmn b/dmn/human-validation-gate.dmn new file mode 100644 index 0000000..9b00ca8 --- /dev/null +++ b/dmn/human-validation-gate.dmn @@ -0,0 +1,62 @@ + + + + + + outputPiiErrorCount + + + aiConfidenceScore + + + personalDataRisk + + + + + [1..9999] + - + - + "REJECTED" + true + + + - + [0..0.3) + - + "REJECTED" + true + + + - + [0.3..0.7) + - + "PENDING_REVIEW" + true + + + - + - + "HIGH" + "PENDING_REVIEW" + true + + + - + [0.7..1] + - + "APPROVED_AUTO" + false + + + - + - + - + "PENDING_REVIEW" + true + + + + diff --git a/docs/00-overview.md b/docs/00-overview.md index 421ee3a..ee92308 100644 --- a/docs/00-overview.md +++ b/docs/00-overview.md @@ -1,32 +1,46 @@ # dev.uapf.semantic-document-analysis — Overview -**UAPF v1.1 SSOT-conformant** Level 4 process package providing -reusable semantic document analysis. +**UAPF v1.1 SSOT-conformant** Level 4 process package for semantic +document analysis. ## What -A 3-step BPMN process that, given free-text document content: +A six-node BPMN process that, given free-text document content: -1. Redacts PII via `ai.redact@1` -2. Extracts VDVC v1.1 structured semantic metadata via `ai.extract@1` -3. Emits `document.semantic-analysis.completed.v1` CloudEvent via `event.emit@1` +1. **Detect and redact PII** (`ai.redact@1`) — masks PII and returns the + deterministic regex signal set (personas kods / IBAN / contact data / + category count). +2. **Assess personal-data risk** (DMN `assess-personal-data-risk`) — + ranked rules map the signal set to `personalDataRisk`. +3. **Decide GDPR processing route** (DMN `gdpr-processing-route`) — + `personalDataRisk` x `allowCentralization` -> CENTRAL/LOCAL, + anonymisation and redaction level. +4. **Extract semantic metadata** (`ai.extract@1`) — the one model step; + produces VDVC v1.1 structured metadata. +5. **Determine validation status** (DMN `human-validation-gate`) — + confidence thresholds + PII re-scan -> REJECTED / PENDING_REVIEW / + APPROVED_AUTO. +6. **Emit** `document.semantic-analysis.completed.v1` (`event.emit@1`). + +## Why this shape + +The previous 1.x package was a single `ai.extract` call wrapped in +BPMN. The decision logic — risk, routing, validation gating — lived +invisibly in host code. Version 2.0 extracts that logic into three +versioned DMN decision tables. The algorithm is now in the package: +inspectable, diff-able, portable. The host supplies inference for one +bounded step only. ## What's portable -The package ships: -- The BPMN flow (the algorithm shape) -- The VDVC output JSON Schema (the output contract) -- The resource mapping (input/output contracts, timeouts, retries) -- The guardrails policy (GDPR + EU AI Act constraints) - -The host system supplies the actual AI agent that fulfils the three -capabilities. Multiple hosts can implement the same capabilities; -multiple packages can require the same capabilities. +- The BPMN flow (the process shape) +- Three DMN decision tables (the algorithm and its weights) +- The VDVC output JSON Schema (the extraction contract) +- The resource mapping and the guardrails policy ## How to consume -Drop this `.uapf` into any UAPF-conformant runtime. The runtime -exposes `uapf.run_process` (per UAPF-specification §6.3.1) targeting -`Process_SemanticDocumentAnalysis`. The runtime resolves the resource -mapping to find a target with the three required capabilities and -invokes them in order per the BPMN flow. +Drop this `.uapf` into any UAPF-conformant runtime and run +`Process_SemanticDocumentAnalysis`. The runtime evaluates the DMN +decisions itself and resolves the resource mapping for the three +capability-backed service tasks. diff --git a/fixtures/child-rights-input.json b/fixtures/child-rights-input.json index 7ea1d47..fe6b39d 100644 --- a/fixtures/child-rights-input.json +++ b/fixtures/child-rights-input.json @@ -1,3 +1,3 @@ { - "text": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam." -} + "content": "Vēršos Tiesībsarga birojā par bāriņtiesas lēmumu attiecībā uz manu mazdēlu (vecums 7 gadi). Lēmums pieņemts steidzamā kārtībā 2026. gada martā. Lūdzu izvērtēt rīcības atbilstību Bērnu tiesību aizsardzības likumam." +} \ No newline at end of file diff --git a/fixtures/discrimination-input.json b/fixtures/discrimination-input.json index e768e8f..949b254 100644 --- a/fixtures/discrimination-input.json +++ b/fixtures/discrimination-input.json @@ -1,3 +1,3 @@ { - "text": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt." -} + "content": "Iesniedzējs ar otrās grupas invaliditāti norāda, ka valsts iestādes darba intervijā atklāti pateikts: 'nevaram pieņemt cilvēkus ar īpašām vajadzībām'. Lūdz Tiesībsargu izmeklēt." +} \ No newline at end of file diff --git a/fixtures/pii-bearing-input.json b/fixtures/pii-bearing-input.json new file mode 100644 index 0000000..277cff7 --- /dev/null +++ b/fixtures/pii-bearing-input.json @@ -0,0 +1,3 @@ +{ + "content": "Iesniedzējs (personas kods 010180-12345) lūdz pārskatīt lēmumu. Saziņai: janis.berzins@example.lv, tālrunis +371 29123456. Norēķini: LV80BANK0000435195001." +} \ No newline at end of file diff --git a/manifest.json b/manifest.json index 9733923..280e07e 100644 --- a/manifest.json +++ b/manifest.json @@ -1,15 +1,15 @@ { "kind": "uapf.package", "id": "dev.uapf.semantic-document-analysis", - "name": "Semantic Document Analysis (UAPF reference algorithm)", - "description": "Level-4 UAPF process for extracting VDVC-conformant semantic metadata\n(topic, summary, urgency, risk, sensitivity) from a free-text document.\n\nPortable across document management systems, intake portals, mailroom\nscanners, case-management platforms. Three BPMN service tasks invoke\nthe reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1.\nThe host fulfils each capability with its own AI agent; this package\nsupplies the BPMN flow, the VDVC output JSON Schema, the guardrails,\nand the resource mapping contract.\n", + "name": "Semantic Document Analysis", + "description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n", "level": 4, - "version": "1.0.0", + "version": "2.0.0", "includes": [], "dependencies": {}, "cornerstones": { "bpmn": true, - "dmn": false, + "dmn": true, "cmmn": false, "resources": true }, @@ -30,6 +30,7 @@ "exposedArtifacts": [ "manifest", "bpmn", + "dmn", "docs" ] } @@ -42,4 +43,4 @@ } ], "lifecycle": "draft" -} \ No newline at end of file +} diff --git a/metadata/lifecycle.yaml b/metadata/lifecycle.yaml index cd7ce16..ae4a81c 100644 --- a/metadata/lifecycle.yaml +++ b/metadata/lifecycle.yaml @@ -1,9 +1,13 @@ kind: uapf.metadata.lifecycle status: draft created: "2026-05-15T20:30:00Z" -lastModified: "2026-05-15T20:30:00Z" +lastModified: "2026-05-17T00:00:00Z" changeHistory: - version: "1.0.0" date: "2026-05-15" - summary: "Initial release. Reusable UAPF v1.1 Level 4 process for VDVC semantic metadata extraction. Three reserved-namespace capabilities (ai.redact@1, ai.extract@1, event.emit@1). VDVC output schema in resources/schemas/." + summary: "Initial release. Three reserved-namespace capabilities; VDVC output schema." + author: "uapf-stewards" + - version: "2.0.0" + date: "2026-05-17" + summary: "Full rewrite into a real process. Added the dmn cornerstone with three decision tables (assess-personal-data-risk, gdpr-processing-route, human-validation-gate) carrying the risk, routing and validation-gating algorithm previously hidden in host code. BPMN expanded from 3 to 6 nodes. ai.redact now returns deterministic PII regex signals; ai.extract returns flat aiConfidenceScore and outputPiiErrorCount. Breaking: package structure and outputs changed." author: "uapf-stewards" diff --git a/processgit.mcp.yaml b/processgit.mcp.yaml index 5e9bad2..d921115 100644 --- a/processgit.mcp.yaml +++ b/processgit.mcp.yaml @@ -3,16 +3,24 @@ version: 1 server: - name: "Semantic Document Analysis - UAPF reference algorithm" - description: "MCP server for the semantic document analysis process (UAPF package dev.uapf.semantic-document-analysis)." + name: "Semantic Document Analysis" + description: "MCP server for the semantic document analysis UAPF package (dev.uapf.semantic-document-analysis) - BPMN flow plus three DMN decision tables." instructions: | - This repository is a UAPF process package - the reference algorithm for - extracting VDVC-conformant semantic metadata from a document. Use - 'search' and 'get_entity' to explore the BPMN flow, and 'validate' to - check the model. The process executes via UAPF-IP; see the /uapf-ip - endpoint of this repo. + This repository is a UAPF process package for semantic document + analysis. The algorithm lives in dmn/ as three decision tables; the + BPMN in bpmn/ wires them with three capability-backed service tasks. + Use 'search' and 'get_entity' to explore, 'validate' to check models. sources: - path: "bpmn/semantic-document-analysis.bpmn" type: "xml" - description: "BPMN - redact, extract, emit flow" + description: "BPMN - the six-node process flow" + - path: "dmn/assess-personal-data-risk.dmn" + type: "xml" + description: "DMN - PII signals to personal-data risk" + - path: "dmn/gdpr-processing-route.dmn" + type: "xml" + description: "DMN - risk to CENTRAL/LOCAL processing route" + - path: "dmn/human-validation-gate.dmn" + type: "xml" + description: "DMN - confidence thresholds to validation status" diff --git a/resources/mappings.yaml b/resources/mappings.yaml index f210569..98eb7d2 100644 --- a/resources/mappings.yaml +++ b/resources/mappings.yaml @@ -1,49 +1,60 @@ kind: uapf.resources.mapping +# Host-readable contract for the capability-backed service tasks. The three +# DMN decisions (assess-personal-data-risk, gdpr-processing-route, +# human-validation-gate) are NOT listed here: they are evaluated by the +# UAPF runtime against the dmn/ cornerstone and need no host resource. + targets: - id: agent.semantic-extractor type: ai_agent name: Semantic Extraction AI Agent description: | - Host-provided AI agent that fulfils ai.redact@1, ai.extract@1, and - event.emit@1 for this process. Implementation is the host's choice - (Claude, GPT, on-prem LLM, etc.); this package supplies the BPMN - flow, the output schema, and the guardrails. + Host-provided agent fulfilling ai.redact@1, ai.extract@1 and + event.emit@1. Implementation is the host's choice; this package + supplies the BPMN flow, the DMN decision logic, the output schema + and the guardrails. capabilities: - capability.ai.redact - capability.ai.extract - capability.event.emit bindings: - - source: { type: bpmn.serviceTask, ref: Task_RedactPii } + - source: { type: bpmn.serviceTask, ref: Task_DetectRedactPii } targetId: agent.semantic-extractor mode: autonomous contract: input: - - { name: text, type: string, required: true } - - { name: categories, type: array, required: false, description: "Optional PII categories; defaults to host policy." } + - { name: content, type: string, required: true } output: - - { name: redactedText, type: string } - - { name: detections, type: array } + - { name: redactedContent, type: string, description: "Source text with PII masked." } + - { name: detectedEntityTypes, type: array, description: "PII TYPE names only, never values." } + - { name: personasKodaPresent, type: boolean, description: "Latvian national ID regex hit." } + - { name: financialDataPresent,type: boolean, description: "IBAN regex hit." } + - { name: contactDataPresent, type: boolean, description: "E-mail or phone regex hit." } + - { name: piiCategoryCount, type: number, description: "Count of distinct PII categories detected." } timeout: "10s" requiredCapabilities: [capability.ai.redact] + feeds: [assess-personal-data-risk] - source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics } targetId: agent.semantic-extractor mode: autonomous contract: input: - - { name: text, type: string, required: true, description: "Redacted text from previous task." } - - { name: schema, type: object, required: true, description: "VDVC v1.1 output schema. Reference: resources/schemas/vdvc-semantic-summary.schema.json" } + - { name: redactedContent, type: string, required: true } + - { name: schemaRef, type: string, required: true, description: "resources/schemas/vdvc-semantic-summary.schema.json" } output: - - { name: extracted, type: object, description: "Validates against resources/schemas/vdvc-semantic-summary.schema.json" } - - { name: confidence, type: number } - - { name: modelUsed, type: string } + - { name: semanticSummary, type: object, description: "Validates against the VDVC v1.1 schema." } + - { name: sensitivityControl, type: object } + - { name: aiConfidenceScore, type: number, description: "Flat 0.0-1.0; consumed by human-validation-gate." } + - { name: outputPiiErrorCount, type: number, description: "PII re-scan hits on extracted text; consumed by human-validation-gate." } timeout: "30s" retries: { maxAttempts: 2, backoffMs: 2000 } requiredCapabilities: [capability.ai.extract] + feeds: [human-validation-gate] - - source: { type: bpmn.serviceTask, ref: Task_EmitResultEvent } + - source: { type: bpmn.serviceTask, ref: Task_EmitResult } targetId: agent.semantic-extractor mode: autonomous contract: diff --git a/tests/eval-set.json b/tests/eval-set.json index 1d59843..4081456 100644 --- a/tests/eval-set.json +++ b/tests/eval-set.json @@ -1,24 +1,41 @@ { "algorithm": "Process_SemanticDocumentAnalysis", - "package_version": "1.0.0", + "package_version": "2.0.0", "cases": [ { - "id": "child-rights", - "input_fixture": "fixtures/child-rights-input.json", - "expected_facets": { - "mentions_child": true, - "vulnerable_group": true, - "humanValidationStatus": "PENDING" + "id": "pii-bearing-high-risk", + "input_fixture": "fixtures/pii-bearing-input.json", + "input_vars": { + "allowCentralization": false + }, + "expect_decisions": { + "assess-personal-data-risk": { + "personalDataRisk": "HIGH" + }, + "gdpr-processing-route": { + "processingRoute": "LOCAL", + "redactionLevel": "FULL" + } } }, { - "id": "discrimination-disability", + "id": "no-pii-discrimination", "input_fixture": "fixtures/discrimination-input.json", - "expected_facets": { - "vulnerable_group": true, - "humanValidationStatus": "PENDING" + "input_vars": { + "allowCentralization": true + }, + "expect_decisions": { + "assess-personal-data-risk": { + "personalDataRisk": "NONE" + }, + "gdpr-processing-route": { + "processingRoute": "CENTRAL" + } } } ], - "success_criteria": { "min_pass_rate": 0.95, "max_p95_latency_ms": 15000 } + "success_criteria": { + "min_pass_rate": 0.95, + "max_p95_latency_ms": 20000 + } } diff --git a/uapf.yaml b/uapf.yaml index 394696f..e1f3ec7 100644 --- a/uapf.yaml +++ b/uapf.yaml @@ -1,28 +1,38 @@ kind: uapf.package id: dev.uapf.semantic-document-analysis -name: Semantic Document Analysis (UAPF reference algorithm) +name: Semantic Document Analysis description: | - Level-4 UAPF process for extracting VDVC-conformant semantic metadata - (topic, summary, urgency, risk, sensitivity) from a free-text document. + Level-4 UAPF process for semantic analysis of free-text documents. - Portable across document management systems, intake portals, mailroom - scanners, case-management platforms. Three BPMN service tasks invoke - the reserved UAPF-IP capabilities ai.redact@1, ai.extract@1, event.emit@1. - The host fulfils each capability with its own AI agent; this package - supplies the BPMN flow, the VDVC output JSON Schema, the guardrails, - and the resource mapping contract. + Three BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1, + ai.extract@1 and event.emit@1. Three DMN decision tables encode the + deterministic algorithm the host previously hid inside application + code: assess-personal-data-risk maps PII regex signals to a risk + level; gdpr-processing-route selects CENTRAL vs LOCAL processing, + anonymisation and redaction level; human-validation-gate applies the + confidence thresholds that decide REJECTED / PENDING_REVIEW / + APPROVED_AUTO. + + Only the semantic extraction is a model step. Risk classification, + GDPR routing and the validation gate are explicit ranked rules in + versioned DMN — inspectable, auditable, portable. Extraction output + validates against the VDVC v1.1 semantic-summary JSON Schema. level: 4 -version: "1.0.0" +version: "2.0.0" # ── UAPF-IP integration (capability needs + profile + guardrails) ── -# Declared so a UAPF-IP runtime / the ProcessGit /uapf-ip endpoint can -# discover what this package requires before loading it. requires_capabilities: - ai.redact@1+ - ai.extract@1+ - event.emit@1+ +# DMN decisions are evaluated by the runtime itself — no host capability. +provides_decisions: + - assess-personal-data-risk + - gdpr-processing-route + - human-validation-gate + profiles_supported: - uapf-ip-orchestrated @@ -33,7 +43,7 @@ dependencies: {} cornerstones: bpmn: true - dmn: false + dmn: true cmmn: false resources: true @@ -53,6 +63,7 @@ exposure: exposedArtifacts: - manifest - bpmn + - dmn - docs owners: