diff --git a/README.md b/README.md index 0adfc09..c5106c0 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,21 @@ # Semantic Document Analysis UAPF Level-4 process for semantic analysis of free-text documents, -governed by **UAPF v2.3.0** (Algorithm Cards). +governed by **UAPF v2.4.0** (Algorithm Cards visible on BPMN tasks). ## What this package does -Three BPMN service tasks invoke three UAPF-IP host capabilities: +Three BPMN service tasks invoke three UAPF-IP host capabilities. Each +service task carries `uapf24:algorithmCardRef` pointing at the +Algorithm Card that governs the algorithm being invoked, and a +`` synthesised from the card's `io` block so +inputs and outputs render as visible data objects. -| Task | Capability | Algorithm Card | -|-----------------------|----------------|---------------------------------------------------------------------| -| `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) | -| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) | -| `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) | +| Task | Capability | Algorithm Card | Risk class | +|-----------------------|----------------|---------------------------------------------------------------------|------------| +| `Task_DetectRedactPii`| `ai.redact@1` | [`pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) | limited | +| `Task_ExtractSemantics`| `ai.extract@1`| [`vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) | high | +| `Task_EmitResult` | `event.emit@1` | [`completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) | minimal | Three DMN decision tables encode the deterministic policy: @@ -25,42 +29,59 @@ Only `Task_ExtractSemantics` is a model-inference step (governed by the high-risk `vdvc_semantic_extractor` Card). Everything else is deterministic. -## v3.0.0 — Algorithm Cards +## v3.1.0 — Algorithm Cards visible on BPMN -The three opaque host capabilities are now wrapped in Algorithm Cards -under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13: -intent, IO contract, ownership, validation history, risk class, audit -configuration, and (where relevant) `privacy` and `risk` extensions. +In v3.1.0, the Algorithm Card references move from `resources/mappings.yaml` +targets onto the BPMN service tasks themselves, per UAPF v2.4.0. This +matters because: + +- A reader of the BPMN diagram now sees *which algorithm* runs at each + step, by inspecting the rendered task. +- The card's IO contract is synthesised into the task's + ``, so downstream gateway conditions branching + on outputs like `ai_confidence_score` or `personas_koda_present` + are visually traceable to their source. +- A renderer that supports `uapf24:algorithmCardRef` (e.g., ProcessGit + preview, OpenDMS visualiser) draws the algorithm-card icon, name, + version, and risk-class dot directly on the task. Audit question → answer-location: | Auditor asks | Read this | |-----------------------------------------------|------------------------------------------------| -| What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io | -| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk | -| Who owns each algorithm? | each Card § owners | -| When was each algorithm last validated? | each Card § validation | -| What gets logged, with what retention? | each Card § audit | -| Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence | +| Which algorithm runs at task X? | the BPMN itself: `uapf24:algorithmCardRef` attr | +| What inputs/outputs does it have? | the BPMN task's `` block | +| What is the algorithm's risk class? | the Card's `risk.aiActRiskClass` field | +| When was the algorithm last validated? | the Card's `validation.last_validated` | +| What gets logged, with what retention? | the Card's `audit` block | +| Why is human oversight needed? | the Card's `confidence` + `risk` blocks | -### Delta from v2.0.0 +### Delta from v3.0.0 -- **+** `algorithms/` folder with three Cards (one per opaque host capability). -- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`. -- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged. -- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required. -- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone). +- **~** `bpmn/semantic-document-analysis.bpmn`: each of the 3 service tasks now carries `xmlns:uapf24="https://uapf.dev/bpmn/v2.4"` `uapf24:algorithmCardRef` attribute, plus a `` synthesised from the card's `io` block. +- **~** `resources/mappings.yaml`: `algorithm_card:` removed from each of the 3 targets. They go back to being just dispatch endpoints, per UAPF v2.4.0. +- **~** `uapf.yaml` / `manifest.json`: version `3.0.0` → `3.1.0`. +- **=** `algorithms/*.card.yaml`: unchanged. +- **=** `dmn/*.dmn`: unchanged. + +### Why the v3.0.0 → v3.1.0 churn + +v3.0.0 followed UAPF v2.3.0, which placed the algorithm card on the +resource target. That hid the algorithm from the BPMN diagram. UAPF +v2.4.0 reverses that decision and moves the reference onto the BPMN +task. v3.1.0 of this package follows the corrected spec. Algorithm +Cards themselves are unchanged across both revisions. ## Structure ``` . -├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0) -├── bpmn/ # 1 BPMN process (unchanged from v2.0.0) -├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0) -├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0) +├── uapf.yaml + manifest.json # Package manifest (UAPF v2.4.0) +├── bpmn/ # 1 BPMN process (algorithm refs + ioSpecification) +├── dmn/ # 3 DMN decision tables +├── algorithms/ # 3 Algorithm Cards (introduced in v3.0.0) ├── resources/ -│ ├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED) +│ ├── mappings.yaml # Resource targets (dispatch endpoints only) │ ├── guardrails.yaml │ └── schemas/ # Output JSON Schemas ├── metadata/ # ownership + lifecycle @@ -71,7 +92,7 @@ Audit question → answer-location: ## Validation -Validates against UAPF v2.3.0 schemas at +Validates against UAPF v2.4.0 schemas at `github.com/UAPFormat/UAPF-specification`: ```bash diff --git a/bpmn/semantic-document-analysis.bpmn b/bpmn/semantic-document-analysis.bpmn index 09e268a..df5c87e 100644 --- a/bpmn/semantic-document-analysis.bpmn +++ b/bpmn/semantic-document-analysis.bpmn @@ -2,6 +2,7 @@ + uapf:capability="ai.redact@1" + uapf24:algorithmCardRef="algo.semantic_document_analysis.pii_redactor"> - Calls ai.redact@1 over the source text. Beyond masking, the host + Calls ai.redact@1 over the source text. Governed by Algorithm + Card algo.semantic_document_analysis.pii_redactor (see + algorithms/pii_redactor.card.yaml). Beyond masking, the host runs the four Latvian PII regex detectors (personas kods, IBAN, e-mail, phone) and returns the deterministic signal set the risk - decision consumes: personasKodaPresent, financialDataPresent, - contactDataPresent, piiCategoryCount, detectedEntityTypes, plus - redactedContent. No model inference — pure pattern detection. + decision consumes. + + + + + + + + + + content + + + redacted_content + detected_entity_types + personas_koda_present + financial_data_present + contact_data_present + pii_category_count + + DMN dmn/assess-personal-data-risk.dmn. Maps the PII signal set to - personalDataRisk (NONE | LOW | MEDIUM | HIGH) by explicit ranked - rules. Personas kods or IBAN forces HIGH; two or more categories - or contact data gives MEDIUM. Deterministic and auditable. + personalDataRisk (NONE | LOW | MEDIUM | HIGH). @@ -44,48 +64,75 @@ DMN dmn/gdpr-processing-route.dmn. From personalDataRisk and allowCentralization decides processingRoute (CENTRAL | LOCAL), - anonymizationRequired and redactionLevel. This is the routing - rule extracted from the host's generate_semantic_metadata: a - sensitive document where centralisation is not permitted stays - LOCAL with full redaction. + anonymizationRequired and redactionLevel. + uapf:schemaRef="resources/schemas/vdvc-semantic-summary.schema.json" + uapf24:algorithmCardRef="algo.semantic_document_analysis.vdvc_semantic_extractor"> Calls ai.extract@1 on redactedContent with the VDVC v1.1 output - schema. This is the single bounded model step: it produces the - semanticSummary (topic, summary, keywords, urgency, risk) and - must validate against resources/schemas/vdvc-semantic-summary. - The host also returns flat aiConfidenceScore and the result of - the post-extraction PII re-scan as outputPiiErrorCount. + schema. Governed by Algorithm Card + algo.semantic_document_analysis.vdvc_semantic_extractor (see + algorithms/vdvc_semantic_extractor.card.yaml). EU AI Act + Annex III high-risk; human oversight is mandatory and is + enforced downstream by the human-validation-gate DMN. + + + + + + + + + redacted_content + schema_ref + + + semantic_summary + sensitivity_control + ai_confidence_score + output_pii_error_count + + - DMN dmn/human-validation-gate.dmn. From outputPiiErrorCount, - aiConfidenceScore and personalDataRisk decides - humanValidationStatus (REJECTED | PENDING_REVIEW | APPROVED_AUTO) - and requiresHumanReview. Any leaked PII or confidence below 0.3 - rejects; below 0.7, or HIGH risk, forces review; 0.7 and above - with clean output auto-approves. The thresholds are the weights. + DMN dmn/human-validation-gate.dmn. From output_pii_error_count, + ai_confidence_score and personalDataRisk decides + humanValidationStatus (REJECTED | PENDING_REVIEW | APPROVED_AUTO). + uapf:eventType="document.semantic-analysis.completed.v1" + uapf24:algorithmCardRef="algo.semantic_document_analysis.completion_event_emitter"> - Calls event.emit@1 to publish a CloudEvent carrying the semantic - summary, the routing decision and the validation status. + Calls event.emit@1 to publish a CloudEvent. Governed by + Algorithm Card algo.semantic_document_analysis.completion_event_emitter + (see algorithms/completion_event_emitter.card.yaml). + + + + + + event_type + payload + + + published + + diff --git a/manifest.json b/manifest.json index 9bb20b9..a300298 100644 --- a/manifest.json +++ b/manifest.json @@ -2,9 +2,9 @@ "kind": "uapf.package", "id": "dev.uapf.semantic-document-analysis", "name": "Semantic Document Analysis", - "description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n\nv3.0.0: the three opaque host capabilities (ai.redact@1,\nai.extract@1, event.emit@1) are now governed by Algorithm Cards\nin algorithms/ per UAPF v2.3.0 chapter 13. Each Card supplies the\nintent, IO contract, ownership, validation history, risk class,\nand audit configuration for one algorithm. Cards are referenced\nfrom resource targets in resources/mappings.yaml.\n", + "description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n\nv3.1.0: aligned with UAPF v2.4.0 \u2014 Algorithm Card references move\nfrom resource targets to the BPMN service tasks themselves (via\nuapf24:algorithmCardRef attribute). Each card's io block is also\ndenormalised into a on the task so inputs\nand outputs render as visible data objects on the diagram. The\ncards themselves and the DMN decisions are unchanged from v3.0.0.\n", "level": 4, - "version": "3.0.0", + "version": "3.1.0", "requires_capabilities": [ "ai.redact@1+", "ai.extract@1+", diff --git a/resources/mappings.yaml b/resources/mappings.yaml index 6fc4997..fa9b834 100644 --- a/resources/mappings.yaml +++ b/resources/mappings.yaml @@ -2,10 +2,11 @@ kind: uapf.resources.mapping # Host-readable contract for the capability-backed service tasks. # -# v3.0.0 change: the single agent.semantic-extractor target has been -# split into three algorithm-specific targets, each referencing an -# Algorithm Card under algorithms/ (UAPF v2.3.0, chapter 13). The -# binding shape is unchanged. The BPMN file is unchanged. +# v3.1.0 change: the algorithm_card reference (added in v3.0.0 on each +# target) has been removed per UAPF v2.4.0 — the Algorithm Card reference +# now lives on the BPMN serviceTask itself via the +# uapf24:algorithmCardRef attribute (see bpmn/semantic-document-analysis.bpmn). +# Targets here keep their role as dispatch endpoints only. # # The three DMN decisions (assess-personal-data-risk, # gdpr-processing-route, human-validation-gate) remain self-describing @@ -21,7 +22,6 @@ targets: pii_redactor Algorithm Card. capabilities: - capability.ai.redact - algorithm_card: algo.semantic_document_analysis.pii_redactor - id: agent.vdvc_semantic_extractor type: ai_agent @@ -33,7 +33,6 @@ targets: enforced downstream by the human-validation-gate DMN. capabilities: - capability.ai.extract - algorithm_card: algo.semantic_document_analysis.vdvc_semantic_extractor - id: agent.completion_event_emitter type: ai_agent @@ -43,7 +42,6 @@ targets: completion_event_emitter Algorithm Card. capabilities: - capability.event.emit - algorithm_card: algo.semantic_document_analysis.completion_event_emitter bindings: - source: { type: bpmn.serviceTask, ref: Task_DetectRedactPii } diff --git a/uapf.yaml b/uapf.yaml index 1dd6673..d382b6f 100644 --- a/uapf.yaml +++ b/uapf.yaml @@ -18,15 +18,15 @@ description: | versioned DMN — inspectable, auditable, portable. Extraction output validates against the VDVC v1.1 semantic-summary JSON Schema. - v3.0.0: the three opaque host capabilities (ai.redact@1, - ai.extract@1, event.emit@1) are now governed by Algorithm Cards - in algorithms/ per UAPF v2.3.0 chapter 13. Each Card supplies the - intent, IO contract, ownership, validation history, risk class, - and audit configuration for one algorithm. Cards are referenced - from resource targets in resources/mappings.yaml. + v3.1.0: aligned with UAPF v2.4.0 — Algorithm Card references move + from resource targets to the BPMN service tasks themselves (via + uapf24:algorithmCardRef attribute). Each card's io block is also + denormalised into a on the task so inputs + and outputs render as visible data objects on the diagram. The + cards themselves and the DMN decisions are unchanged from v3.0.0. level: 4 -version: "3.0.0" +version: "3.1.0" # ── UAPF-IP integration (capability needs + profile + guardrails) ── requires_capabilities: