1
0

Merge pull request 'feat(3.0.0): Algorithm Cards per UAPF v2.3.0 chapter 13' (#1) from v3.0.0-algorithm-cards into main

Reviewed-on: #1
This commit was merged in pull request #1.
This commit is contained in:
2026-05-20 13:34:48 +00:00
7 changed files with 372 additions and 83 deletions

127
README.md
View File

@@ -1,68 +1,79 @@
# Semantic Document Analysis # Semantic Document Analysis
A UAPF Level-4 process package for extracting VDVC-conformant semantic UAPF Level-4 process for semantic analysis of free-text documents,
metadata from free-text documents. governed by **UAPF v2.3.0** (Algorithm Cards).
## What this package is ## What this package does
A **real, inspectable process** — not a single AI call in BPMN costume. Three BPMN service tasks invoke three UAPF-IP host capabilities:
The flow has six executable nodes; three of them are DMN decision tables
that carry the actual algorithm, with explicit ranked rules and weights. | Task | Capability | Algorithm Card |
|-----------------------|----------------|---------------------------------------------------------------------|
| `Task_DetectRedactPii`| `ai.redact@1` | [`algorithms/pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) |
| `Task_ExtractSemantics`| `ai.extract@1`| [`algorithms/vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) |
| `Task_EmitResult` | `event.emit@1` | [`algorithms/completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) |
Three DMN decision tables encode the deterministic policy:
- `assess-personal-data-risk` — PII regex signals → risk level
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
anonymisation, redaction level
- `human-validation-gate` — confidence thresholds → REJECTED /
PENDING_REVIEW / APPROVED_AUTO
Only `Task_ExtractSemantics` is a model-inference step (governed by the
high-risk `vdvc_semantic_extractor` Card). Everything else is
deterministic.
## v3.0.0 — Algorithm Cards
The three opaque host capabilities are now wrapped in Algorithm Cards
under `algorithms/`. Each Card supplies, per UAPF v2.3.0 chapter 13:
intent, IO contract, ownership, validation history, risk class, audit
configuration, and (where relevant) `privacy` and `risk` extensions.
Audit question → answer-location:
| Auditor asks | Read this |
|-----------------------------------------------|------------------------------------------------|
| What does the redactor detect? | `algorithms/pii_redactor.card.yaml` § io |
| What's the AI Act risk class of the extractor?| `vdvc_semantic_extractor.card.yaml` § risk |
| Who owns each algorithm? | each Card § owners |
| When was each algorithm last validated? | each Card § validation |
| What gets logged, with what retention? | each Card § audit |
| Why is human oversight needed? | `vdvc_semantic_extractor.card.yaml` § confidence |
### Delta from v2.0.0
- **+** `algorithms/` folder with three Cards (one per opaque host capability).
- **+** `algorithm_cards: true` and `paths.algorithms` in `uapf.yaml` / `manifest.json`.
- **~** `resources/mappings.yaml`: single `agent.semantic-extractor` target split into three algorithm-specific targets (`agent.pii_redactor`, `agent.vdvc_semantic_extractor`, `agent.completion_event_emitter`), each carrying its `algorithm_card` reference. Binding shape unchanged.
- **~** `bpmn/semantic-document-analysis.bpmn`: **unchanged**. Algorithm Cards live on resource targets, not in the BPMN — no extension elements required.
- **−** `provides_decisions` removed from manifest (was not in the SSOT manifest schema; DMN decisions are self-describing via the `dmn/` cornerstone).
## Structure
``` ```
Start .
-> [service] Detect and redact PII ai.redact@1 ├── uapf.yaml + manifest.json # Package manifest (UAPF v2.3.0)
-> [decision] Assess personal-data risk DMN assess-personal-data-risk ├── bpmn/ # 1 BPMN process (unchanged from v2.0.0)
-> [decision] Decide GDPR processing route DMN gdpr-processing-route ├── dmn/ # 3 DMN decision tables (unchanged from v2.0.0)
-> [service] Extract semantic metadata ai.extract@1 ├── algorithms/ # 3 Algorithm Cards (NEW in v3.0.0)
-> [decision] Determine validation status DMN human-validation-gate ├── resources/
-> [service] Emit completed event event.emit@1 ├── mappings.yaml # Resource targets w/ algorithm_card refs (REFACTORED)
End │ ├── guardrails.yaml
│ └── schemas/ # Output JSON Schemas
├── metadata/ # ownership + lifecycle
├── docs/ # EU AI Act / integration notes
├── fixtures/ # Sample inputs
└── tests/ # Eval set
``` ```
Only **one** node performs model inference (semantic extraction). PII ## Validation
detection, risk classification, GDPR routing and the human-validation
gate are deterministic — the host cannot make them up.
## The decision tables (dmn/) Validates against UAPF v2.3.0 schemas at
`github.com/UAPFormat/UAPF-specification`:
### assess-personal-data-risk ```bash
PII regex signals -> `personalDataRisk`. Personas kods or IBAN forces python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
HIGH; two or more PII categories, or contact data, gives MEDIUM; one ```
category LOW; nothing NONE. Hit policy FIRST (ranked).
### gdpr-processing-route
`personalDataRisk` x `allowCentralization` -> `processingRoute`
(CENTRAL | LOCAL), `anonymizationRequired`, `redactionLevel`. A
sensitive document whose owner has not permitted centralisation stays
LOCAL with full redaction. This is the routing rule lifted out of the
host's `generate_semantic_metadata`.
### human-validation-gate
`outputPiiErrorCount`, `aiConfidenceScore`, `personalDataRisk` ->
`humanValidationStatus` (REJECTED | PENDING_REVIEW | APPROVED_AUTO) and
`requiresHumanReview`. Any leaked PII or confidence below 0.3 -> REJECTED;
below 0.7 or HIGH risk -> PENDING_REVIEW; 0.7+ with clean output ->
APPROVED_AUTO. The thresholds 0.3 / 0.7 are the weights.
## Capabilities required of the host
| Capability | Used by | Purpose |
|----------------|------------------------|----------------------------------|
| ai.redact@1 | Task_DetectRedactPii | Mask PII + return regex signals |
| ai.extract@1 | Task_ExtractSemantics | VDVC semantic extraction |
| event.emit@1 | Task_EmitResult | Publish completion CloudEvent |
DMN decisions need no host capability — the runtime evaluates them.
## Output contract
`resources/schemas/vdvc-semantic-summary.schema.json` — the ai.extract@1
output. The process additionally yields the DMN-decided fields
(`personalDataRisk`, `processingRoute`, `redactionLevel`,
`humanValidationStatus`, `requiresHumanReview`).
## Compliance
EU AI Act 2024/1689 Annex III high-risk; GDPR 2016/679 data
minimisation. See `resources/guardrails.yaml` and `docs/`.

View File

@@ -0,0 +1,64 @@
kind: uapf.algorithm.card
id: algo.semantic_document_analysis.completion_event_emitter
version: "1.0.0"
name: "Process completion event emitter"
intent: >
Publishes a CloudEvents 1.0-conformant event marking the completion
of one semantic analysis cycle, with the DMN-decided fields
(personal data risk, processing route, redaction level, human
validation status) attached. Personal data is NEVER included in
the emitted payload — only the deterministic classification fields.
algorithm_kind: emitter
io:
inputs:
- id: event_type
type: string
cardinality: single
- id: payload
type: object
cardinality: single
outputs:
- id: published
type: boolean
implementation:
type: external
medium: mcp_tool
uri: "uapf-ip://capability/event.emit@1"
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000"
runtime:
capability: "event.emit@1"
cloud_events_spec: "1.0"
determinism: deterministic
side_effects: writes_state
confidence:
type: none
complexity:
typical_latency_ms: 25
max_latency_ms: 1000
failure_mode: "throw — process must complete reliably or fail loudly."
reference:
standard: "CloudEvents 1.0"
url: "https://github.com/cloudevents/spec/blob/v1.0/spec.md"
owners:
- type: team
id: uapf-stewards
contact: stewards@uapf.dev
lifecycle:
status: draft
since: "2026-05-20"
audit:
log_inputs: full
log_outputs: full
retention: "1y"

View File

@@ -0,0 +1,87 @@
kind: uapf.algorithm.card
id: algo.semantic_document_analysis.pii_redactor
version: "1.0.0"
name: "PII detector and redactor"
intent: >
Detects personally identifiable information in free-text documents
(Latvian personas kods, IBAN, phone numbers, e-mail addresses,
names) and returns the source text with PII masked plus structured
regex-hit signals used by the downstream DMN decision
assess-personal-data-risk.
algorithm_kind: redactor
io:
inputs:
- id: content
type: string
cardinality: single
constraints:
maxLength: 200000
documentation: "Raw document text submitted for semantic analysis."
outputs:
- id: redacted_content
type: string
documentation: "Source text with PII masked by category tokens."
- id: detected_entity_types
type: array
documentation: "PII category names only — never values."
- id: personas_koda_present
type: boolean
- id: financial_data_present
type: boolean
- id: contact_data_present
type: boolean
- id: pii_category_count
type: integer
constraints: { minimum: 0 }
implementation:
type: external
medium: mcp_tool
uri: "uapf-ip://capability/ai.redact@1"
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000"
runtime:
capability: "ai.redact@1"
note: "Host-fulfilled UAPF-IP capability. Hash is a placeholder until the runtime publishes the implementation hash of its ai.redact handler."
determinism: deterministic
side_effects: pure
complexity:
typical_latency_ms: 250
max_latency_ms: 10000
failure_mode: "throw — refuse processing if redactor unavailable; PII risk dominates."
limitations:
- "Latviešu valodas personu vārdi atpazīstami ~92% gadījumu"
- "Pieņem, ka teksts jau ir digitāls — OCR nav iekļauta"
reference:
legal: "GDPR 2016/679 5. pants (datu minimizēšana); Fizisko personu datu apstrādes likums."
standard: "NIST SP 800-188 — De-Identification of Personal Information."
owners:
- type: role
id: data_protection_officer
contact: stewards@uapf.dev
lifecycle:
status: draft
since: "2026-05-20"
audit:
log_inputs: redacted
log_outputs: full
retention: "7y"
privacy:
processesPII: true
technique: pseudonymization
reidentificationRisk: low
risk:
aiActRiskClass: limited
humanOversight: advisory

View File

@@ -0,0 +1,88 @@
kind: uapf.algorithm.card
id: algo.semantic_document_analysis.vdvc_semantic_extractor
version: "1.0.0"
name: "VDVC semantic metadata extractor"
intent: >
Extracts a VDVC v1.1-conformant structured semantic summary from
the redacted document text — primary topic, keywords,
classification, summary, sensitivity signals. Output validates
against resources/schemas/vdvc-semantic-summary.schema.json. This
is the sole model-inference step in the process; everything else
in the package is deterministic.
algorithm_kind: extractor
io:
inputs:
- id: redacted_content
type: string
cardinality: single
constraints:
maxLength: 200000
documentation: "Output of the upstream PII redactor."
- id: schema_ref
type: string
documentation: "Path to the JSON Schema the output must validate against."
outputs:
- id: semantic_summary
type: object
schema: "../resources/schemas/vdvc-semantic-summary.schema.json"
- id: sensitivity_control
type: object
- id: ai_confidence_score
type: probability
- id: output_pii_error_count
type: integer
constraints: { minimum: 0 }
implementation:
type: external
medium: llm_prompt
uri: "uapf-ip://capability/ai.extract@1"
hash: "sha256:0000000000000000000000000000000000000000000000000000000000000000"
runtime:
capability: "ai.extract@1"
note: "Host-fulfilled UAPF-IP capability. Specific model identity and prompt hash are runtime concerns of the host; the Card declares the contract, not the implementation choice."
determinism: stochastic
side_effects: external_call
confidence:
type: probability
threshold: 0.70
below_threshold: "route-to:human.legal_reviewer (enforced by DMN human-validation-gate)"
complexity:
typical_latency_ms: 8000
max_latency_ms: 60000
failure_mode: "default:null + flag — DMN human-validation-gate routes low-confidence outputs to PENDING_REVIEW."
limitations:
- "Garie dokumenti (>50 000 znaki) tiek apgriezti — pirmie 50K + pēdējie 5K"
- "Nav juridisks vērtējums — tikai semantiska klasifikācija"
- "Latviešu valodas juridiskā retorika var samazināt recall"
reference:
legal: "EU AI Act 2024/1689, Pielikums III (augstā riska MI sistēmas), 13. pants (caurspīdība)."
url: "https://eur-lex.europa.eu/eli/reg/2024/1689/oj"
owners:
- type: team
id: uapf-stewards
contact: stewards@uapf.dev
lifecycle:
status: draft
since: "2026-05-20"
audit:
log_inputs: redacted
log_outputs: full
retention: "7y"
risk:
aiActRiskClass: high
humanOversight: mandatory
transparencyTier: tier-3-full

View File

@@ -2,9 +2,18 @@
"kind": "uapf.package", "kind": "uapf.package",
"id": "dev.uapf.semantic-document-analysis", "id": "dev.uapf.semantic-document-analysis",
"name": "Semantic Document Analysis", "name": "Semantic Document Analysis",
"description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n", "description": "Level-4 UAPF process for semantic analysis of free-text documents.\n\nThree BPMN service tasks invoke the UAPF-IP capabilities ai.redact@1,\nai.extract@1 and event.emit@1. Three DMN decision tables encode the\ndeterministic algorithm the host previously hid inside application\ncode: assess-personal-data-risk maps PII regex signals to a risk\nlevel; gdpr-processing-route selects CENTRAL vs LOCAL processing,\nanonymisation and redaction level; human-validation-gate applies the\nconfidence thresholds that decide REJECTED / PENDING_REVIEW /\nAPPROVED_AUTO.\n\nOnly the semantic extraction is a model step. Risk classification,\nGDPR routing and the validation gate are explicit ranked rules in\nversioned DMN \u2014 inspectable, auditable, portable. Extraction output\nvalidates against the VDVC v1.1 semantic-summary JSON Schema.\n\nv3.0.0: the three opaque host capabilities (ai.redact@1,\nai.extract@1, event.emit@1) are now governed by Algorithm Cards\nin algorithms/ per UAPF v2.3.0 chapter 13. Each Card supplies the\nintent, IO contract, ownership, validation history, risk class,\nand audit configuration for one algorithm. Cards are referenced\nfrom resource targets in resources/mappings.yaml.\n",
"level": 4, "level": 4,
"version": "2.0.0", "version": "3.0.0",
"requires_capabilities": [
"ai.redact@1+",
"ai.extract@1+",
"event.emit@1+"
],
"profiles_supported": [
"uapf-ip-orchestrated"
],
"guardrails": "resources/guardrails.yaml",
"includes": [], "includes": [],
"dependencies": {}, "dependencies": {},
"cornerstones": { "cornerstones": {
@@ -18,8 +27,10 @@
"dmn": "dmn", "dmn": "dmn",
"cmmn": "cmmn", "cmmn": "cmmn",
"resources": "resources", "resources": "resources",
"metadata": "metadata" "metadata": "metadata",
"algorithms": "algorithms"
}, },
"algorithm_cards": true,
"exposure": { "exposure": {
"mcp": { "mcp": {
"enabled": true, "enabled": true,

View File

@@ -1,27 +1,53 @@
kind: uapf.resources.mapping kind: uapf.resources.mapping
# Host-readable contract for the capability-backed service tasks. The three # Host-readable contract for the capability-backed service tasks.
# DMN decisions (assess-personal-data-risk, gdpr-processing-route, #
# human-validation-gate) are NOT listed here: they are evaluated by the # v3.0.0 change: the single agent.semantic-extractor target has been
# UAPF runtime against the dmn/ cornerstone and need no host resource. # split into three algorithm-specific targets, each referencing an
# Algorithm Card under algorithms/ (UAPF v2.3.0, chapter 13). The
# binding shape is unchanged. The BPMN file is unchanged.
#
# The three DMN decisions (assess-personal-data-risk,
# gdpr-processing-route, human-validation-gate) remain self-describing
# and continue to need no host resource — they are evaluated by the
# UAPF runtime against the dmn/ cornerstone.
targets: targets:
- id: agent.semantic-extractor - id: agent.pii_redactor
type: ai_agent type: ai_agent
name: Semantic Extraction AI Agent name: PII detection and redaction agent
description: | description: |
Host-provided agent fulfilling ai.redact@1, ai.extract@1 and Host capability ai.redact@1, governed by the
event.emit@1. Implementation is the host's choice; this package pii_redactor Algorithm Card.
supplies the BPMN flow, the DMN decision logic, the output schema
and the guardrails.
capabilities: capabilities:
- capability.ai.redact - capability.ai.redact
algorithm_card: algo.semantic_document_analysis.pii_redactor
- id: agent.vdvc_semantic_extractor
type: ai_agent
name: VDVC semantic metadata extractor agent
description: |
Host capability ai.extract@1, governed by the
vdvc_semantic_extractor Algorithm Card. EU AI Act
Annex III high-risk; human oversight is mandatory and is
enforced downstream by the human-validation-gate DMN.
capabilities:
- capability.ai.extract - capability.ai.extract
algorithm_card: algo.semantic_document_analysis.vdvc_semantic_extractor
- id: agent.completion_event_emitter
type: ai_agent
name: Completion event emitter agent
description: |
Host capability event.emit@1, governed by the
completion_event_emitter Algorithm Card.
capabilities:
- capability.event.emit - capability.event.emit
algorithm_card: algo.semantic_document_analysis.completion_event_emitter
bindings: bindings:
- source: { type: bpmn.serviceTask, ref: Task_DetectRedactPii } - source: { type: bpmn.serviceTask, ref: Task_DetectRedactPii }
targetId: agent.semantic-extractor targetId: agent.pii_redactor
mode: autonomous mode: autonomous
contract: contract:
input: input:
@@ -35,10 +61,9 @@ bindings:
- { name: piiCategoryCount, type: number, description: "Count of distinct PII categories detected." } - { name: piiCategoryCount, type: number, description: "Count of distinct PII categories detected." }
timeout: "10s" timeout: "10s"
requiredCapabilities: [capability.ai.redact] requiredCapabilities: [capability.ai.redact]
feeds: [assess-personal-data-risk]
- source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics } - source: { type: bpmn.serviceTask, ref: Task_ExtractSemantics }
targetId: agent.semantic-extractor targetId: agent.vdvc_semantic_extractor
mode: autonomous mode: autonomous
contract: contract:
input: input:
@@ -52,10 +77,9 @@ bindings:
timeout: "30s" timeout: "30s"
retries: { maxAttempts: 2, backoffMs: 2000 } retries: { maxAttempts: 2, backoffMs: 2000 }
requiredCapabilities: [capability.ai.extract] requiredCapabilities: [capability.ai.extract]
feeds: [human-validation-gate]
- source: { type: bpmn.serviceTask, ref: Task_EmitResult } - source: { type: bpmn.serviceTask, ref: Task_EmitResult }
targetId: agent.semantic-extractor targetId: agent.completion_event_emitter
mode: autonomous mode: autonomous
contract: contract:
input: input:

View File

@@ -18,8 +18,15 @@ description: |
versioned DMN — inspectable, auditable, portable. Extraction output versioned DMN — inspectable, auditable, portable. Extraction output
validates against the VDVC v1.1 semantic-summary JSON Schema. validates against the VDVC v1.1 semantic-summary JSON Schema.
v3.0.0: the three opaque host capabilities (ai.redact@1,
ai.extract@1, event.emit@1) are now governed by Algorithm Cards
in algorithms/ per UAPF v2.3.0 chapter 13. Each Card supplies the
intent, IO contract, ownership, validation history, risk class,
and audit configuration for one algorithm. Cards are referenced
from resource targets in resources/mappings.yaml.
level: 4 level: 4
version: "2.0.0" version: "3.0.0"
# ── UAPF-IP integration (capability needs + profile + guardrails) ── # ── UAPF-IP integration (capability needs + profile + guardrails) ──
requires_capabilities: requires_capabilities:
@@ -27,12 +34,6 @@ requires_capabilities:
- ai.extract@1+ - ai.extract@1+
- event.emit@1+ - event.emit@1+
# DMN decisions are evaluated by the runtime itself — no host capability.
provides_decisions:
- assess-personal-data-risk
- gdpr-processing-route
- human-validation-gate
profiles_supported: profiles_supported:
- uapf-ip-orchestrated - uapf-ip-orchestrated
@@ -53,6 +54,9 @@ paths:
cmmn: cmmn cmmn: cmmn
resources: resources resources: resources
metadata: metadata metadata: metadata
algorithms: algorithms
algorithm_cards: true
exposure: exposure:
mcp: mcp: