Two changes: 1. INDEX.md (new, generated). Workspace-level index that lists every package grouped by UAPF level (L0/L1/L2/L3/L4) for navigation. Per UAPF specification §01-concepts.md, levels are an aggregation and governance scope only and MUST NOT be used to imply modeling semantics; per §04-folder-structure.md the SHOULD-recommended on-disk layout is enterprise/ + domains/ + processes/ regardless of level. INDEX.md is therefore a level-grouped *view* over the spec-conformant on-disk layout, not an alternative layout. Generated by tools/build-index/build_index.py (new), which walks the workspace manifests and emits INDEX.md from current state. 2. docs/methodology.md updated to document the §07 auto-layout DI as a known deliberate non-conformance. §07-package-format.md requires authored DI (from a conforming OMG modeler) and explicitly forbids automatic layout generation as a substitute. The DI in the five .bpmn files in this workspace is auto-generated by tools/register-transcoder/bpmn_di.py and is therefore non-conformant under §07. It is kept for the POC so artefacts preview visually; the conformant path is to re-author each .bpmn in Camunda Modeler or bpmn-js Studio and recommit, which emits authored DI. New *Known non-conformances* section in methodology.md cites §07 verbatim and explains this rationale. The Pass-1 transcoder section and Final-validation-pass section are softened to call out that the DI is auto-generated. uapf-cli validate green on all 13 packages (L1 domains/gramatvediba, L2 fg1-fg6, L3 fg3-2/3/6, L4 fg3-1/4/5).
329 lines
18 KiB
Markdown
329 lines
18 KiB
Markdown
# Methodology — `vk-gramatvediba` transcription pipeline
|
|
|
|
## Context
|
|
|
|
The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu
|
|
Centrs (VPC) publishes a normative description of public-sector bookkeeping
|
|
in Latvia — *Grāmatvedības uzskaites procesu apraksts* — as a set of six
|
|
function-group spreadsheets (FG1–FG6) spanning planning, expenditure,
|
|
settlement, fixed assets, payroll and reporting. Each register row is a
|
|
process step with an explicit responsible actor (RACI), the IT system used,
|
|
an SLA, the data produced, and predecessor/successor references to other
|
|
steps in the same or adjacent registers.
|
|
|
|
This workspace, `vk-gramatvediba`, is a UAPF 2.2.0 transcription of the FG3
|
|
register (*saistību un izdevumu uzskaite* — obligations and expenditure
|
|
accounting) into executable process artefacts. It is one of the projects
|
|
accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as
|
|
institutional applicant and KISC as pilot partner.
|
|
|
|
The transcription is not a one-off translation. It is a *methodology* —
|
|
a repeatable two-pass pipeline from published normative documents to signed,
|
|
runnable process packages. This note describes the methodology, shows how
|
|
the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced
|
|
the curated FG3-4 and FG3-5 packages, and reports the final validation pass
|
|
over the workspace.
|
|
|
|
## The pipeline
|
|
|
|
Transcription has two passes with deliberately different epistemic status.
|
|
|
|
**Pass 1 — mechanical transcoding.** A deterministic tool reads the
|
|
register and emits a BPMN skeleton: one task per register step, swimlanes
|
|
from RACI, sequence flows from the register's own predecessor/successor
|
|
columns, synthesised start/end events at the fragment's real boundary. The
|
|
skeleton is faithful to the register — including its inconsistencies — and
|
|
is `isExecutable="false"`. This pass lives in `tools/register-transcoder/`.
|
|
|
|
**Pass 2 — curated refinement.** A human curator, optionally AI-assisted,
|
|
takes the skeleton and resolves it into a Level 4 executable: explicit
|
|
gateways, decision logic extracted into DMN, resource roles/agents/mappings,
|
|
metadata (ownership, lifecycle, policies), and a package manifest. The
|
|
result is a `processes/<id>/uapf.yaml` package that validates against the
|
|
UAPF 2.2.0 schemas and against `uapf-cli`, and is runnable on the
|
|
uapf-engine.
|
|
|
|
The separation is the methodology's load-bearing claim. Pass 1 is
|
|
deterministic and traceable — re-running the transcoder on the same
|
|
register produces identical output, and every node has a mechanical
|
|
provenance to a register row. Pass 2 is interpretive and signable — it
|
|
adds judgement the register does not contain (rule tables that the register
|
|
describes only in prose, branches the register implies in footnotes,
|
|
metadata the register does not carry), and the curator is identified in the
|
|
package's ownership metadata. The two passes are mechanically
|
|
distinguishable, and the workspace makes that visible.
|
|
|
|
## Workspace structure — the L0–L4 level model
|
|
|
|
The workspace's directory layout is grounded in the UAPF SSOT
|
|
specification at `UAPFormat/UAPF-specification/specification/`, in
|
|
particular `01-concepts.md` (Levels), `04-folder-structure.md`,
|
|
`05-level-composition.md` and the conformance checklist in
|
|
`10-conformance-checklist.md`.
|
|
|
|
The spec defines five levels as aggregation and governance scope only —
|
|
not as modeling semantics:
|
|
|
|
- **L0 — Enterprise process collection index.** Workspace-level. MUST NOT
|
|
contain executable logic. Here: `enterprise/enterprise.yaml`.
|
|
- **L1 — Domain process collection.** Composes L2/L3/L4 packages within a
|
|
domain. Here: `domains/gramatvediba/`.
|
|
- **L2 — End-to-end business process.** Composes L3/L4 packages.
|
|
Here: `processes/fg1`, `fg2`, `fg3`, `fg4`, `fg5`, `fg6` — one per
|
|
Valsts Kase function group.
|
|
- **L3 — Composed subprocess / variant.** A composition placeholder that
|
|
references one or more L4 packages.
|
|
Here: `processes/fg3-2`, `fg3-3`, `fg3-6` — the FG3 sub-processes that
|
|
were in scope for the POC but not built out to atomic executables.
|
|
- **L4 — Atomic executable process.** MUST include at least one BPMN file
|
|
and MUST include resource mappings. Cornerstones (BPMN, optional DMN,
|
|
optional CMMN, resources) live here and only here.
|
|
Here: `processes/fg3-1`, `fg3-4`, `fg3-5`.
|
|
|
|
The spec enforces strict containment of executable artefacts at L4: L1–L3
|
|
packages MUST reference lower-level packages via `includes` and MUST NOT
|
|
duplicate BPMN/DMN/CMMN files. The validator in `uapf-cli` and the
|
|
conformance rules in `05-level-composition.md` reject workspaces that
|
|
mix the layers.
|
|
|
|
## Pass 1 in detail — the transcoder
|
|
|
|
The transcoder, `tools/register-transcoder/transcode.py`, is a small Python
|
|
tool with one external dependency (`openpyxl`) plus a co-installed layout
|
|
helper `bpmn_di.py` (also runnable standalone). It locates the worksheet
|
|
and header row by content rather than by position, so it tolerates the
|
|
leading title rows the registers carry and applies unchanged to any of the
|
|
FG1–FG6 registers. It expects the standard register columns: the
|
|
predecessor block (FG-group and step-number in adjacent cells), the step's
|
|
*Nr.p.k.*, *Process, apakšprocess*, the RACI block split across the three
|
|
actor sub-columns (Nodarbinātais / Iestāde / VPC), *Darbību apraksts*,
|
|
*Izmantotā IS*, *Izpildes termiņš*, *Sagatavotie dati*, and the successor
|
|
block *Uz procesa darbības soli*.
|
|
|
|
Rows that carry a number and a name but no description and no RACI entry
|
|
are treated as sub-process headers; rows with description or any RACI entry
|
|
are treated as steps and assigned to the most recently encountered header.
|
|
For a requested sub-process the transcoder emits a single BPMN process
|
|
containing one `bpmn:userTask` per step, with the step's description,
|
|
system, SLA, RACI cells and any cross-sub-process or cross-FG references
|
|
preserved in `bpmn:documentation`; swimlanes for each actor that has steps
|
|
in the sub-process, with each step placed in the lane of its Responsible
|
|
actor; sequence flows reconstructed from the union of predecessor and
|
|
successor references whose endpoints are both inside the sub-process; and
|
|
one `bpmn:startEvent` per *entry step* (no in-group predecessor) and one
|
|
`bpmn:endEvent` per *exit step* (no in-group successor), so the fragment's
|
|
real boundary is visible rather than hidden behind synthesised gateways.
|
|
The output then has BPMN Diagram Interchange (`bpmndi:BPMNDiagram` with
|
|
`BPMNShape` and `BPMNEdge` elements) appended by `bpmn_di.py` via a
|
|
swim-lane left-to-right auto-layout, so the resulting file previews in
|
|
bpmn.io, Camunda Modeler and the ProcessGit web view. **This DI is
|
|
auto-generated, not authored** — §07 of the UAPF specification requires
|
|
authored DI produced by a conforming OMG modeler and explicitly forbids
|
|
automatic layout generation as a substitute. The auto-layout output is
|
|
kept as a known deliberate non-conformance pending re-authoring of each
|
|
`.bpmn` in a modeler. See *Known non-conformances* below.
|
|
|
|
The output is `isExecutable="false"` and deliberately unembellished: no
|
|
inferred gateways, no synthesised decision logic, no compensation for
|
|
register-side inconsistencies. Reproducing register defects is a feature —
|
|
it makes the refinement step's contribution explicit.
|
|
|
|
## Pass 2 in detail — the refinement
|
|
|
|
Pass 2 takes a skeleton and authors the material the register implies but
|
|
does not encode. The operations, in roughly the order they apply:
|
|
|
|
- *Link reconciliation* — the skeleton may show reciprocal edges, short
|
|
cycles, or disjoint fragments where the register's predecessor and
|
|
successor columns disagree. The curator decides the intended topology
|
|
and rewrites flows accordingly.
|
|
- *Gateway promotion* — a task whose semantics implies a decision is split
|
|
into the evaluating step plus an explicit gateway (typically
|
|
`exclusiveGateway`) with named branches; multiple register steps that
|
|
represent alternative outcomes collapse into branches.
|
|
- *DMN extraction* — where the register's prose implies a rule table, the
|
|
decision is lifted into a separate `.dmn` file with FIRST or UNIQUE hit
|
|
policy and named inputs/outputs. The BPMN gets a `businessRuleTask` whose
|
|
decision reference points to the DMN decision.
|
|
- *Resource authoring* — `resources/roles.yaml`, `agents.yaml` and
|
|
`mappings.yaml` enumerate the responsible parties, bind roles to the
|
|
systems named in *Izmantotā IS* (HoP, RVS Horizon, ePNS, etc.), and link
|
|
BPMN tasks to roles and agents.
|
|
- *Metadata authoring* — `metadata/ownership.yaml`, `lifecycle.yaml` and
|
|
`policies.yaml` record who curated the package, its UAPF lifecycle stage,
|
|
and policy constraints (retention, signing, jurisdictional applicability).
|
|
- *Manifest assembly* — the package's `uapf.yaml` lists kind, id, name,
|
|
level (4 for atomic executables), cornerstones referencing the BPMN, DMN,
|
|
resources and metadata, exposed inputs/outputs/artifacts, and any MCP
|
|
exposure.
|
|
|
|
The refined package validates against the UAPF 2.2.0 schemas and against
|
|
`uapf-cli validate`. Once validated, the package is signable.
|
|
|
|
## Concrete comparison — 3.5.2 / FG3-4
|
|
|
|
Sub-process 3.5.2 (*Saimnieciskie norēķini un to kustība*) has three
|
|
register steps. The transcoder's `sample-output/3.5.2.skeleton.bpmn`
|
|
contains three `userTask`s in two lanes (Nodarbinātais for 3.5.2.1 and
|
|
3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one
|
|
end event. It makes two register-side artefacts visible. Step 3.5.2.1
|
|
(the advance request) has only external successor references — it routes
|
|
out to *FG2/2.3.2* (budget commitment) and back, and is not directly
|
|
linked to 3.5.2.2 in the register's columns, so the skeleton shows it as
|
|
a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have
|
|
reciprocal predecessor/successor entries in the register — each lists the
|
|
other in both directions — so the skeleton renders a two-task cycle.
|
|
|
|
The curated `processes/fg3-4` package resolves both. Its BPMN
|
|
`Process_SaimnieciskaNorekina` has 14 nodes and 14 flows across all three
|
|
lanes, a single clean entry and exit, and a `businessRuleTask` linked to a
|
|
separate DMN. The DMN `Decision_AvansaNorekins` is a FIRST-hit decision
|
|
with five rules; its inputs are `avansaSituacija` and `avansaVeids`, and
|
|
its output `norekinResultats` takes one of four values — `slegts`,
|
|
`atmaksa`, `papildu-izmaksa`, `parnesums` — the four outcomes the register
|
|
describes in prose but does not encode in a table. The gateway downstream
|
|
of the DMN routes to the finalisation task for each outcome (close,
|
|
reclaim from employee, additional disbursement, carry forward).
|
|
|
|
Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable.
|
|
None of what the executable adds is in the register's columns. It is in
|
|
the register's prose, in the SLA cells, and in the normative documents the
|
|
register cites — *Grāmatvedības likums*, MK noteikumi Nr. 749, MK
|
|
noteikumi Nr. 877. Pass 2 is where that material enters.
|
|
|
|
## Concrete comparison — 3.5.3 / FG3-5
|
|
|
|
Sub-process 3.5.3 (*Komandējuma (darba brauciena) dokumenti un to
|
|
kustība*) has four register steps. The transcoder's
|
|
`sample-output/3.5.3.skeleton.bpmn` contains eight nodes and six flows.
|
|
|
|
The curated `processes/fg3-5` package likewise has 14 nodes and 15 flows
|
|
across three lanes. It introduces an explicit *cancellation branch* — a
|
|
trip annulled before settlement vs a trip proceeded with — and the DMN
|
|
`Decision_KomandejumaNorekins` whose *parnesums* rule reconciles an
|
|
advance surplus against the next approved business trip from the same
|
|
funding line. Neither the cancellation branch nor the carry-forward rule
|
|
is in the register's predecessor/successor columns; both are in the
|
|
register's prose and in the cited *Komandējuma izdevumu noteikumi*.
|
|
|
|
## Final validation pass
|
|
|
|
The workspace contains three Level 4 executable packages, three Level 3
|
|
composition stubs, six Level 2 function-group manifests, the Level 1
|
|
domain manifest, the Level 0 enterprise index, the transcoder tool, and
|
|
this methodology note. The validation pass run after the level-marker
|
|
correction (next section):
|
|
|
|
- `uapf-cli validate processes/fg3-1` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-4` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-5` → `OK: package valid`.
|
|
- All `.bpmn` and `.dmn` files in the workspace are XML well-formed.
|
|
- BPMN graph integrity: every `sequenceFlow` references existing
|
|
`sourceRef`/`targetRef` nodes; every `flowNodeRef` resolves to a defined
|
|
node; every `incoming`/`outgoing` reference is consistent with the
|
|
corresponding flow's source/target.
|
|
- All `.bpmn` files carry BPMN Diagram Interchange (auto-generated; see
|
|
*Known non-conformances* below) — sufficient for preview in bpmn.io,
|
|
Camunda Modeler and ProcessGit's web view.
|
|
- The transcoder is byte-deterministic: re-running it on the FG3 register
|
|
for 3.5.2 and 3.5.3 reproduces the committed `sample-output/` files
|
|
exactly.
|
|
|
|
## Conformance correction — Step-4 level-labelling
|
|
|
|
An initial pass of this workspace shipped with the FG3 sub-process stubs
|
|
(`fg3-2`, `fg3-3`, `fg3-6`) marked `level: 4` by template inheritance from
|
|
`fg3-1`, with no BPMN and no resources. That fails the spec's L4
|
|
requirement — *§01-concepts: "A Level-4 package MUST include BPMN and MUST
|
|
include resources and mappings, even if minimal."*
|
|
|
|
The cause was a Step-4 design error: the FG3 sub-process packages were
|
|
created in a single sweep with the same level marker as the FG3-1
|
|
template, without checking whether each one would actually carry
|
|
executable artefacts. Three of them never would in this POC's scope; they
|
|
are composition placeholders, which the spec models as **L3** (composed
|
|
subprocess / variant — `05-level-composition.md`).
|
|
|
|
The correction is a level-marker change: `fg3-2`, `fg3-3`, `fg3-6` are
|
|
now `level: 3` with `cornerstones.bpmn: false`. Their lack of BPMN is now
|
|
spec-conformant (L1–L3 MUST NOT duplicate L4 content). The three real
|
|
executables (`fg3-1`, `fg3-4`, `fg3-5`) remain L4. The mermaid in
|
|
`05-level-composition.md` shows L2 → L3 → L4 as a typical chain, but the
|
|
spec text is explicit that the diagram is informative and that L2
|
|
packages may reference L4 directly when no intermediate composition is
|
|
needed (`fg3` `includes` references the three L4s and the three L3 stubs
|
|
in parallel, which is conformant).
|
|
|
|
## Known non-conformances
|
|
|
|
This POC ships with one documented non-conformance against the UAPF
|
|
specification, retained deliberately for the POC and tracked here.
|
|
|
|
**Authored Diagram Interchange (§07-package-format.md).** §07 makes DI a
|
|
MUST for cornerstone BPMN/DMN/CMMN files and explicitly forbids automatic
|
|
layout generation as a substitute:
|
|
|
|
> "An implementation MUST NOT rely on automatic layout generation as a
|
|
> substitute for authored DI: a generated layout is not the authored
|
|
> layout and is not deterministic across tools."
|
|
|
|
The DI in the five `.bpmn` files (`fg3-1`, `fg3-4`, `fg3-5`, and the two
|
|
transcoder samples under `tools/register-transcoder/sample-output/`) is
|
|
produced by `tools/register-transcoder/bpmn_di.py` using a swim-lane
|
|
left-to-right auto-layout. It renders the diagrams in bpmn.io, Camunda
|
|
Modeler and ProcessGit's web view, but it is not authored DI and
|
|
therefore non-conformant under §07. The auto-layout is kept so the POC
|
|
artefacts are reviewable visually; the conformant path is to open each
|
|
`.bpmn` in a conforming OMG modeler (Camunda Modeler, bpmn-js Studio) and
|
|
re-save, which emits authored DI. The DMN files are unaffected — §07
|
|
exempts DMNs whose only logic is decision tables, and all three workspace
|
|
DMNs are exactly that.
|
|
|
|
## Implications for the AI regulatory sandbox
|
|
|
|
The pipeline has four properties that bear on the sandbox's evaluation.
|
|
*Provenance*: every executable step has a mechanical trace back to a
|
|
register row, and the refinement layer is recorded in ownership metadata.
|
|
A reviewer can audit either pass independently. *Versionability*: the
|
|
register is itself versioned (publication dates, changelog); the workspace
|
|
is git-versioned on processgit; the packages carry lifecycle metadata. A
|
|
register change triggers a re-transcode and a diffable refinement.
|
|
*Separability of judgement*: what the register says and what the
|
|
refinement adds are mechanically distinguishable — the skeleton is
|
|
reproducible from the register alone, and the curator's contribution is
|
|
exactly the diff. *Coverage*: the same tool applies unchanged to FG1, FG2,
|
|
FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's
|
|
nine sub-processes have curated executables in this POC (FG3-1, FG3-4,
|
|
FG3-5); the remaining six FG3 sub-processes are composition stubs; the
|
|
other five function groups are untouched.
|
|
|
|
The sandbox question is whether AI-assisted transcription of regulatory
|
|
processes into executable, signable artefacts is feasible at production
|
|
scale. This workspace is one positive existence proof — small,
|
|
end-to-end, with both passes shipped and reproducible.
|
|
|
|
## Limitations and next steps
|
|
|
|
The POC has known gaps. *Semantic validation* is structural only: the
|
|
packages validate against UAPF schemas and graph-integrity rules, not yet
|
|
against accounting-law semantics. There is no automated check that, for
|
|
instance, the *parnesums* rule complies with the relevant MK noteikumi or
|
|
that the deadlines align with *Grāmatvedības likums* 28.p(5). A second
|
|
validator layer — rule coverage against the cited statutes — is the
|
|
obvious next step. *Engine execution* is in scope for the uapf-engine in
|
|
isolation but not yet integrated with a Sledger host that would run the
|
|
packages against real accounting state. *Coverage*: the POC takes three
|
|
sub-processes to L4; production-grade coverage of FG3 alone needs six
|
|
more, and FG1, FG2, FG4, FG5, FG6 are still untouched. *Refinement
|
|
automation* is currently human-driven; quantifying the AI-assisted portion
|
|
of Pass 2 — applying an LLM to the skeleton and measuring the curator's
|
|
remaining diff — is the natural sandbox experiment that follows.
|
|
|
|
---
|
|
|
|
References inside the workspace: `tools/register-transcoder/README.md` for
|
|
the transcoder's CLI and register-format assumptions;
|
|
`processes/fg3-1`, `processes/fg3-4`, `processes/fg3-5` for the curated
|
|
executables; `tools/register-transcoder/sample-output/` for the skeletons
|
|
the comparisons in this note refer to.
|