1
0
Files
dokumenta-semantiska-analize/README.md
Rihards 9b3790c1fa feat(3.2.0): align with UAPF v2.5.0 — embed algorithm card tests, drop sidecar
Per UAPF v2.5.0, tests move from sidecar files
(tests/algorithms/<card-id>.test.yaml — removed in v2.5.0) into a
top-level tests array on each algorithm card. Minimum two entries per
card; the Algorithm Card viewer (UAPF chapter 13.16, ProcessGit
Preview tab) consumes these as its primary interaction surface.

This package's three cards now carry embedded tests:

- algo.semantic_document_analysis.pii_redactor (deterministic redactor)
  — 3 cases: Latvian personas kods inline (positive — three entity
  types detected), plain administrative text (negative — no PII
  signals), financial figures with IBAN (mixed — financial yes,
  personas_kods no).

- algo.semantic_document_analysis.vdvc_semantic_extractor (stochastic
  LLM extractor, EU AI Act high-risk + mandatory oversight) — 2
  cases: regulatory construction-permit appeal (in-domain, expected
  topic + applicable_regulations), non-regulatory thank-you note
  (out-of-domain, low confidence). Both carry ai_confidence_score
  tolerance bands appropriate for a stochastic output.

- algo.semantic_document_analysis.completion_event_emitter
  (deterministic CloudEvents emitter) — 2 cases: successful
  completion event, failure completion event. The emitter does not
  gate on payload contents, so both succeed.

Other changes:
- uapf.yaml + manifest.json: version 3.1.0 -> 3.2.0
- README.md: v3.2.0 section added describing embedded tests and the
  removed sidecar location

BPMN file unchanged from v3.1.0 — uapf:algorithmCardRef on each
service task per UAPF v2.4.0 + ioSpecification synthesis. Mappings
unchanged. DMN tables unchanged.

uapf-cli validate against v2.5.0 schemas passes cleanly.
2026-05-21 08:02:26 +00:00

113 lines
6.0 KiB
Markdown

# Semantic Document Analysis
UAPF Level-4 process for semantic analysis of free-text documents,
governed by **UAPF v2.4.0** (Algorithm Cards visible on BPMN tasks).
## What this package does
Three BPMN service tasks invoke three UAPF-IP host capabilities. Each
service task carries `uapf24:algorithmCardRef` pointing at the
Algorithm Card that governs the algorithm being invoked, and a
`<bpmn:ioSpecification>` synthesised from the card's `io` block so
inputs and outputs render as visible data objects.
| Task | Capability | Algorithm Card | Risk class |
|-----------------------|----------------|---------------------------------------------------------------------|------------|
| `Task_DetectRedactPii`| `ai.redact@1` | [`pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) | limited |
| `Task_ExtractSemantics`| `ai.extract@1`| [`vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) | high |
| `Task_EmitResult` | `event.emit@1` | [`completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) | minimal |
Three DMN decision tables encode the deterministic policy:
- `assess-personal-data-risk` — PII regex signals → risk level
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
anonymisation, redaction level
- `human-validation-gate` — confidence thresholds → REJECTED /
PENDING_REVIEW / APPROVED_AUTO
Only `Task_ExtractSemantics` is a model-inference step (governed by the
high-risk `vdvc_semantic_extractor` Card). Everything else is
deterministic.
## v3.1.0 — Algorithm Cards visible on BPMN
In v3.1.0, the Algorithm Card references move from `resources/mappings.yaml`
targets onto the BPMN service tasks themselves, per UAPF v2.4.0. This
matters because:
- A reader of the BPMN diagram now sees *which algorithm* runs at each
step, by inspecting the rendered task.
- The card's IO contract is synthesised into the task's
`<bpmn:ioSpecification>`, so downstream gateway conditions branching
on outputs like `ai_confidence_score` or `personas_koda_present`
are visually traceable to their source.
- A renderer that supports `uapf24:algorithmCardRef` (e.g., ProcessGit
preview, OpenDMS visualiser) draws the algorithm-card icon, name,
version, and risk-class dot directly on the task.
Audit question → answer-location:
| Auditor asks | Read this |
|-----------------------------------------------|------------------------------------------------|
| Which algorithm runs at task X? | the BPMN itself: `uapf24:algorithmCardRef` attr |
| What inputs/outputs does it have? | the BPMN task's `<bpmn:ioSpecification>` block |
| What is the algorithm's risk class? | the Card's `risk.aiActRiskClass` field |
| When was the algorithm last validated? | the Card's `validation.last_validated` |
| What gets logged, with what retention? | the Card's `audit` block |
| Why is human oversight needed? | the Card's `confidence` + `risk` blocks |
### Delta from v3.0.0
- **~** `bpmn/semantic-document-analysis.bpmn`: each of the 3 service tasks now carries `xmlns:uapf24="https://uapf.dev/bpmn/v2.4"` `uapf24:algorithmCardRef` attribute, plus a `<bpmn:ioSpecification>` synthesised from the card's `io` block.
- **~** `resources/mappings.yaml`: `algorithm_card:` removed from each of the 3 targets. They go back to being just dispatch endpoints, per UAPF v2.4.0.
- **~** `uapf.yaml` / `manifest.json`: version `3.0.0``3.1.0`.
- **=** `algorithms/*.card.yaml`: unchanged.
- **=** `dmn/*.dmn`: unchanged.
### Why the v3.0.0 → v3.1.0 churn
v3.0.0 followed UAPF v2.3.0, which placed the algorithm card on the
resource target. That hid the algorithm from the BPMN diagram. UAPF
v2.4.0 reverses that decision and moves the reference onto the BPMN
task. v3.1.0 of this package follows the corrected spec. Algorithm
Cards themselves are unchanged across both revisions.
## Structure
```
.
├── uapf.yaml + manifest.json # Package manifest (UAPF v2.4.0)
├── bpmn/ # 1 BPMN process (algorithm refs + ioSpecification)
├── dmn/ # 3 DMN decision tables
├── algorithms/ # 3 Algorithm Cards (introduced in v3.0.0)
├── resources/
│ ├── mappings.yaml # Resource targets (dispatch endpoints only)
│ ├── guardrails.yaml
│ └── schemas/ # Output JSON Schemas
├── metadata/ # ownership + lifecycle
├── docs/ # EU AI Act / integration notes
├── fixtures/ # Sample inputs
└── tests/ # Eval set
```
## Validation
Validates against UAPF v2.4.0 schemas at
`github.com/UAPFormat/UAPF-specification`:
```bash
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
```
## v3.2.0 (UAPF v2.5.0 alignment)
Tests are now **embedded in each algorithm card** under a top-level `tests:` array (minimum 2 entries per card). The old sidecar location `tests/algorithms/<card-id>.test.yaml` is **removed** per UAPF v2.5.0 — that location no longer applies to algorithm cards.
Embedded tests for this package:
- `algo.semantic_document_analysis.pii_redactor` — 3 cases (Latvian personas kods inline, plain text with no PII, financial figures + IBAN)
- `algo.semantic_document_analysis.vdvc_semantic_extractor` — 2 cases (regulatory complaint, non-regulatory thank-you), both with `ai_confidence_score` tolerance bands appropriate for a stochastic LLM extractor
- `algo.semantic_document_analysis.completion_event_emitter` — 2 cases (success completion, failure completion)
The Algorithm Card viewer (UAPF v2.5.0 chapter 13.16, ProcessGit Preview tab) consumes these embedded tests as its primary interaction surface — sample browser for `external` cards, regex/FEEL/source-display for `inline` cards.