You've already forked dokumenta-semantiska-analize
Import UAPF package
Per UAPF v2.5.0, tests move from sidecar files (tests/algorithms/<card-id>.test.yaml — removed in v2.5.0) into a top-level tests array on each algorithm card. Minimum two entries per card; the Algorithm Card viewer (UAPF chapter 13.16, ProcessGit Preview tab) consumes these as its primary interaction surface. This package's three cards now carry embedded tests: - algo.semantic_document_analysis.pii_redactor (deterministic redactor) — 3 cases: Latvian personas kods inline (positive — three entity types detected), plain administrative text (negative — no PII signals), financial figures with IBAN (mixed — financial yes, personas_kods no). - algo.semantic_document_analysis.vdvc_semantic_extractor (stochastic LLM extractor, EU AI Act high-risk + mandatory oversight) — 2 cases: regulatory construction-permit appeal (in-domain, expected topic + applicable_regulations), non-regulatory thank-you note (out-of-domain, low confidence). Both carry ai_confidence_score tolerance bands appropriate for a stochastic output. - algo.semantic_document_analysis.completion_event_emitter (deterministic CloudEvents emitter) — 2 cases: successful completion event, failure completion event. The emitter does not gate on payload contents, so both succeed. Other changes: - uapf.yaml + manifest.json: version 3.1.0 -> 3.2.0 - README.md: v3.2.0 section added describing embedded tests and the removed sidecar location BPMN file unchanged from v3.1.0 — uapf:algorithmCardRef on each service task per UAPF v2.4.0 + ioSpecification synthesis. Mappings unchanged. DMN tables unchanged. uapf-cli validate against v2.5.0 schemas passes cleanly.
113 lines
6.0 KiB
Markdown
113 lines
6.0 KiB
Markdown
# Semantic Document Analysis
|
|
|
|
UAPF Level-4 process for semantic analysis of free-text documents,
|
|
governed by **UAPF v2.4.0** (Algorithm Cards visible on BPMN tasks).
|
|
|
|
## What this package does
|
|
|
|
Three BPMN service tasks invoke three UAPF-IP host capabilities. Each
|
|
service task carries `uapf24:algorithmCardRef` pointing at the
|
|
Algorithm Card that governs the algorithm being invoked, and a
|
|
`<bpmn:ioSpecification>` synthesised from the card's `io` block so
|
|
inputs and outputs render as visible data objects.
|
|
|
|
| Task | Capability | Algorithm Card | Risk class |
|
|
|-----------------------|----------------|---------------------------------------------------------------------|------------|
|
|
| `Task_DetectRedactPii`| `ai.redact@1` | [`pii_redactor.card.yaml`](algorithms/pii_redactor.card.yaml) | limited |
|
|
| `Task_ExtractSemantics`| `ai.extract@1`| [`vdvc_semantic_extractor.card.yaml`](algorithms/vdvc_semantic_extractor.card.yaml) | high |
|
|
| `Task_EmitResult` | `event.emit@1` | [`completion_event_emitter.card.yaml`](algorithms/completion_event_emitter.card.yaml) | minimal |
|
|
|
|
Three DMN decision tables encode the deterministic policy:
|
|
|
|
- `assess-personal-data-risk` — PII regex signals → risk level
|
|
- `gdpr-processing-route` — selects CENTRAL vs LOCAL processing,
|
|
anonymisation, redaction level
|
|
- `human-validation-gate` — confidence thresholds → REJECTED /
|
|
PENDING_REVIEW / APPROVED_AUTO
|
|
|
|
Only `Task_ExtractSemantics` is a model-inference step (governed by the
|
|
high-risk `vdvc_semantic_extractor` Card). Everything else is
|
|
deterministic.
|
|
|
|
## v3.1.0 — Algorithm Cards visible on BPMN
|
|
|
|
In v3.1.0, the Algorithm Card references move from `resources/mappings.yaml`
|
|
targets onto the BPMN service tasks themselves, per UAPF v2.4.0. This
|
|
matters because:
|
|
|
|
- A reader of the BPMN diagram now sees *which algorithm* runs at each
|
|
step, by inspecting the rendered task.
|
|
- The card's IO contract is synthesised into the task's
|
|
`<bpmn:ioSpecification>`, so downstream gateway conditions branching
|
|
on outputs like `ai_confidence_score` or `personas_koda_present`
|
|
are visually traceable to their source.
|
|
- A renderer that supports `uapf24:algorithmCardRef` (e.g., ProcessGit
|
|
preview, OpenDMS visualiser) draws the algorithm-card icon, name,
|
|
version, and risk-class dot directly on the task.
|
|
|
|
Audit question → answer-location:
|
|
|
|
| Auditor asks | Read this |
|
|
|-----------------------------------------------|------------------------------------------------|
|
|
| Which algorithm runs at task X? | the BPMN itself: `uapf24:algorithmCardRef` attr |
|
|
| What inputs/outputs does it have? | the BPMN task's `<bpmn:ioSpecification>` block |
|
|
| What is the algorithm's risk class? | the Card's `risk.aiActRiskClass` field |
|
|
| When was the algorithm last validated? | the Card's `validation.last_validated` |
|
|
| What gets logged, with what retention? | the Card's `audit` block |
|
|
| Why is human oversight needed? | the Card's `confidence` + `risk` blocks |
|
|
|
|
### Delta from v3.0.0
|
|
|
|
- **~** `bpmn/semantic-document-analysis.bpmn`: each of the 3 service tasks now carries `xmlns:uapf24="https://uapf.dev/bpmn/v2.4"` `uapf24:algorithmCardRef` attribute, plus a `<bpmn:ioSpecification>` synthesised from the card's `io` block.
|
|
- **~** `resources/mappings.yaml`: `algorithm_card:` removed from each of the 3 targets. They go back to being just dispatch endpoints, per UAPF v2.4.0.
|
|
- **~** `uapf.yaml` / `manifest.json`: version `3.0.0` → `3.1.0`.
|
|
- **=** `algorithms/*.card.yaml`: unchanged.
|
|
- **=** `dmn/*.dmn`: unchanged.
|
|
|
|
### Why the v3.0.0 → v3.1.0 churn
|
|
|
|
v3.0.0 followed UAPF v2.3.0, which placed the algorithm card on the
|
|
resource target. That hid the algorithm from the BPMN diagram. UAPF
|
|
v2.4.0 reverses that decision and moves the reference onto the BPMN
|
|
task. v3.1.0 of this package follows the corrected spec. Algorithm
|
|
Cards themselves are unchanged across both revisions.
|
|
|
|
## Structure
|
|
|
|
```
|
|
.
|
|
├── uapf.yaml + manifest.json # Package manifest (UAPF v2.4.0)
|
|
├── bpmn/ # 1 BPMN process (algorithm refs + ioSpecification)
|
|
├── dmn/ # 3 DMN decision tables
|
|
├── algorithms/ # 3 Algorithm Cards (introduced in v3.0.0)
|
|
├── resources/
|
|
│ ├── mappings.yaml # Resource targets (dispatch endpoints only)
|
|
│ ├── guardrails.yaml
|
|
│ └── schemas/ # Output JSON Schemas
|
|
├── metadata/ # ownership + lifecycle
|
|
├── docs/ # EU AI Act / integration notes
|
|
├── fixtures/ # Sample inputs
|
|
└── tests/ # Eval set
|
|
```
|
|
|
|
## Validation
|
|
|
|
Validates against UAPF v2.4.0 schemas at
|
|
`github.com/UAPFormat/UAPF-specification`:
|
|
|
|
```bash
|
|
python tools/uapf-cli/uapf.py validate /path/to/dokumenta-semantiska-analize
|
|
```
|
|
|
|
## v3.2.0 (UAPF v2.5.0 alignment)
|
|
|
|
Tests are now **embedded in each algorithm card** under a top-level `tests:` array (minimum 2 entries per card). The old sidecar location `tests/algorithms/<card-id>.test.yaml` is **removed** per UAPF v2.5.0 — that location no longer applies to algorithm cards.
|
|
|
|
Embedded tests for this package:
|
|
- `algo.semantic_document_analysis.pii_redactor` — 3 cases (Latvian personas kods inline, plain text with no PII, financial figures + IBAN)
|
|
- `algo.semantic_document_analysis.vdvc_semantic_extractor` — 2 cases (regulatory complaint, non-regulatory thank-you), both with `ai_confidence_score` tolerance bands appropriate for a stochastic LLM extractor
|
|
- `algo.semantic_document_analysis.completion_event_emitter` — 2 cases (success completion, failure completion)
|
|
|
|
The Algorithm Card viewer (UAPF v2.5.0 chapter 13.16, ProcessGit Preview tab) consumes these embedded tests as its primary interaction surface — sample browser for `external` cards, regex/FEEL/source-display for `inline` cards.
|
|
|