Adds docs/methodology.md — the prose deliverable that ties the vk-gramatvediba workspace together for the AI regulatory sandbox. Describes the two-pass transcription pipeline (mechanical transcoder -> curated refinement), the concrete delta between the transcoder skeletons and the curated FG3-4/FG3-5 executables (gateways, DMN extraction, link reconciliation, resource/metadata authoring), the final validation pass (all .bpmn/.dmn well-formed, all 3 L4 packages pass uapf-cli, transcoder samples byte-deterministic), and the implications and limitations for the sandbox. Workspace now closes the 8-step build plan: skeleton, six FG L2 stubs, FG3 L2 + six FG3-x sub-process stubs, FG3-1/FG3-4/FG3-5 executables, register-to-BPMN transcoder, methodology note.
236 lines
13 KiB
Markdown
236 lines
13 KiB
Markdown
# Methodology — `vk-gramatvediba` transcription pipeline
|
|
|
|
## Context
|
|
|
|
The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu
|
|
Centrs (VPC) publishes a normative description of public-sector bookkeeping
|
|
in Latvia — *Grāmatvedības uzskaites procesu apraksts* — as a set of six
|
|
function-group spreadsheets (FG1–FG6) spanning planning, expenditure,
|
|
settlement, fixed assets, payroll and reporting. Each register row is a
|
|
process step with an explicit responsible actor (RACI), the IT system used,
|
|
an SLA, the data produced, and predecessor/successor references to other
|
|
steps in the same or adjacent registers.
|
|
|
|
This workspace, `vk-gramatvediba`, is a UAPF 2.2.0 transcription of the FG3
|
|
register (*saistību un izdevumu uzskaite* — obligations and expenditure
|
|
accounting) into executable process artefacts. It is one of the projects
|
|
accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as
|
|
institutional applicant and KISC as pilot partner.
|
|
|
|
The transcription is not a one-off translation. It is a *methodology* —
|
|
a repeatable two-pass pipeline from published normative documents to signed,
|
|
runnable process packages. This note describes the methodology, shows how
|
|
the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced
|
|
the curated FG3-4 and FG3-5 packages, and reports the final validation pass
|
|
over the workspace.
|
|
|
|
## The pipeline
|
|
|
|
Transcription has two passes with deliberately different epistemic status.
|
|
|
|
**Pass 1 — mechanical transcoding.** A deterministic tool reads the
|
|
register and emits a BPMN skeleton: one task per register step, swimlanes
|
|
from RACI, sequence flows from the register's own predecessor/successor
|
|
columns, synthesised start/end events at the fragment's real boundary. The
|
|
skeleton is faithful to the register — including its inconsistencies — and
|
|
is `isExecutable="false"`. This pass lives in `tools/register-transcoder/`.
|
|
|
|
**Pass 2 — curated refinement.** A human curator, optionally AI-assisted,
|
|
takes the skeleton and resolves it into a Level 4 executable: explicit
|
|
gateways, decision logic extracted into DMN, resource roles/agents/mappings,
|
|
metadata (ownership, lifecycle, policies), and a package manifest. The
|
|
result is a `processes/<id>/uapf.yaml` package that validates against the
|
|
UAPF 2.2.0 schemas and against `uapf-cli`, and is runnable on the
|
|
uapf-engine.
|
|
|
|
The separation is the methodology's load-bearing claim. Pass 1 is
|
|
deterministic and traceable — re-running the transcoder on the same
|
|
register produces identical output, and every node has a mechanical
|
|
provenance to a register row. Pass 2 is interpretive and signable — it
|
|
adds judgement the register does not contain (rule tables that the register
|
|
describes only in prose, branches the register implies in footnotes,
|
|
metadata the register does not carry), and the curator is identified in the
|
|
package's ownership metadata. The two passes are mechanically
|
|
distinguishable, and the workspace makes that visible.
|
|
|
|
## Pass 1 in detail — the transcoder
|
|
|
|
The transcoder, `tools/register-transcoder/transcode.py`, is a single-file
|
|
Python tool with one external dependency (`openpyxl`). It locates the
|
|
worksheet and header row by content rather than by position, so it tolerates
|
|
the leading title rows the registers carry and applies unchanged to any of
|
|
the FG1–FG6 registers. It expects the standard register columns: the
|
|
predecessor block (FG-group and step-number in adjacent cells), the step's
|
|
*Nr.p.k.*, *Process, apakšprocess*, the RACI block split across the three
|
|
actor sub-columns (Nodarbinātais / Iestāde / VPC), *Darbību apraksts*,
|
|
*Izmantotā IS*, *Izpildes termiņš*, *Sagatavotie dati*, and the successor
|
|
block *Uz procesa darbības soli*.
|
|
|
|
Rows that carry a number and a name but no description and no RACI entry
|
|
are treated as sub-process headers; rows with description or any RACI entry
|
|
are treated as steps and assigned to the most recently encountered header.
|
|
For a requested sub-process the transcoder emits a single BPMN process
|
|
containing one `bpmn:userTask` per step, with the step's description,
|
|
system, SLA, RACI cells and any cross-sub-process or cross-FG references
|
|
preserved in `bpmn:documentation`; swimlanes for each actor that has steps
|
|
in the sub-process, with each step placed in the lane of its Responsible
|
|
actor; sequence flows reconstructed from the union of predecessor and
|
|
successor references whose endpoints are both inside the sub-process; and
|
|
one `bpmn:startEvent` per *entry step* (no in-group predecessor) and one
|
|
`bpmn:endEvent` per *exit step* (no in-group successor), so the fragment's
|
|
real boundary is visible rather than hidden behind synthesised gateways.
|
|
|
|
The output is `isExecutable="false"` and deliberately unembellished: no
|
|
inferred gateways, no synthesised decision logic, no compensation for
|
|
register-side inconsistencies. Reproducing register defects is a feature —
|
|
it makes the refinement step's contribution explicit.
|
|
|
|
## Pass 2 in detail — the refinement
|
|
|
|
Pass 2 takes a skeleton and authors the material the register implies but
|
|
does not encode. The operations, in roughly the order they apply:
|
|
|
|
- *Link reconciliation* — the skeleton may show reciprocal edges, short
|
|
cycles, or disjoint fragments where the register's predecessor and
|
|
successor columns disagree. The curator decides the intended topology
|
|
and rewrites flows accordingly.
|
|
- *Gateway promotion* — a task whose semantics implies a decision is split
|
|
into the evaluating step plus an explicit gateway (typically
|
|
`exclusiveGateway`) with named branches; multiple register steps that
|
|
represent alternative outcomes collapse into branches.
|
|
- *DMN extraction* — where the register's prose implies a rule table, the
|
|
decision is lifted into a separate `.dmn` file with FIRST or UNIQUE hit
|
|
policy and named inputs/outputs. The BPMN gets a `businessRuleTask` whose
|
|
decision reference points to the DMN decision.
|
|
- *Resource authoring* — `resources/roles.yaml`, `agents.yaml` and
|
|
`mappings.yaml` enumerate the responsible parties, bind roles to the
|
|
systems named in *Izmantotā IS* (HoP, RVS Horizon, ePNS, etc.), and link
|
|
BPMN tasks to roles and agents.
|
|
- *Metadata authoring* — `metadata/ownership.yaml`, `lifecycle.yaml` and
|
|
`policies.yaml` record who curated the package, its UAPF lifecycle stage,
|
|
and policy constraints (retention, signing, jurisdictional applicability).
|
|
- *Manifest assembly* — the package's `uapf.yaml` lists kind, id, name,
|
|
level (4 for atomic executables), cornerstones referencing the BPMN, DMN,
|
|
resources and metadata, exposed inputs/outputs/artifacts, and any MCP
|
|
exposure.
|
|
|
|
The refined package validates against the UAPF 2.2.0 schemas and against
|
|
`uapf-cli validate`. Once validated, the package is signable.
|
|
|
|
## Concrete comparison — 3.5.2 / FG3-4
|
|
|
|
Sub-process 3.5.2 (*Saimnieciskie norēķini un to kustība*) has three
|
|
register steps. The transcoder's `sample-output/3.5.2.skeleton.bpmn`
|
|
contains three `userTask`s in two lanes (Nodarbinātais for 3.5.2.1 and
|
|
3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one
|
|
end event. It makes two register-side artefacts visible. Step 3.5.2.1
|
|
(the advance request) has only external successor references — it routes
|
|
out to *FG2/2.3.2* (budget commitment) and back, and is not directly
|
|
linked to 3.5.2.2 in the register's columns, so the skeleton shows it as
|
|
a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have
|
|
reciprocal predecessor/successor entries in the register — each lists the
|
|
other in both directions — so the skeleton renders a two-task cycle.
|
|
|
|
The curated `processes/fg3-4` package resolves both. Its BPMN
|
|
`Process_SaimnieciskaNorekina` has 14 nodes and 14 flows across all three
|
|
lanes, a single clean entry and exit, and a `businessRuleTask` linked to a
|
|
separate DMN. The DMN `Decision_AvansaNorekins` is a FIRST-hit decision
|
|
with five rules; its inputs are `avansaSituacija` and `avansaVeids`, and
|
|
its output `norekinResultats` takes one of four values — `slegts`,
|
|
`atmaksa`, `papildu-izmaksa`, `parnesums` — the four outcomes the register
|
|
describes in prose but does not encode in a table. The gateway downstream
|
|
of the DMN routes to the finalisation task for each outcome (close,
|
|
reclaim from employee, additional disbursement, carry forward).
|
|
|
|
Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable.
|
|
None of what the executable adds is in the register's columns. It is in
|
|
the register's prose, in the SLA cells, and in the normative documents the
|
|
register cites — *Grāmatvedības likums*, MK noteikumi Nr. 749, MK
|
|
noteikumi Nr. 877. Pass 2 is where that material enters.
|
|
|
|
## Concrete comparison — 3.5.3 / FG3-5
|
|
|
|
Sub-process 3.5.3 (*Komandējuma (darba brauciena) dokumenti un to
|
|
kustība*) has four register steps. The transcoder's
|
|
`sample-output/3.5.3.skeleton.bpmn` contains eight nodes and six flows.
|
|
|
|
The curated `processes/fg3-5` package likewise has 14 nodes and 15 flows
|
|
across three lanes. It introduces an explicit *cancellation branch* — a
|
|
trip annulled before settlement vs a trip proceeded with — and the DMN
|
|
`Decision_KomandejumaNorekins` whose *parnesums* rule reconciles an
|
|
advance surplus against the next approved business trip from the same
|
|
funding line. Neither the cancellation branch nor the carry-forward rule
|
|
is in the register's predecessor/successor columns; both are in the
|
|
register's prose and in the cited *Komandējuma izdevumu noteikumi*.
|
|
|
|
## Final validation pass
|
|
|
|
The workspace at HEAD `a608de4` contains three Level 4 executable packages,
|
|
six Level 2 composition stubs, the function-group L2 manifests, the
|
|
transcoder tool, and this methodology note. The validation pass run for
|
|
this step:
|
|
|
|
- `uapf-cli validate processes/fg3-4` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-5` → `OK: package valid`.
|
|
- `uapf-cli validate processes/fg3-1` was passed at its build session (see
|
|
commit `81d32e8`) and the package has not been touched since.
|
|
- All `.bpmn` and `.dmn` files in the workspace are XML-well-formed
|
|
(`xmllint --noout`).
|
|
- All schema-validated UAPF files (`uapf.yaml`, `resources/*.yaml`,
|
|
`metadata/policies.yaml`) pass the UAPF 2.2.0 JSON schemas.
|
|
- BPMN graph integrity: every `sequenceFlow` references existing
|
|
`sourceRef`/`targetRef` nodes; every `flowNodeRef` resolves to a defined
|
|
node; every `incoming`/`outgoing` reference is consistent with the
|
|
corresponding flow's source/target.
|
|
- The transcoder is byte-deterministic: re-running it on the FG3 register
|
|
for 3.5.2 and 3.5.3 reproduces the committed `sample-output/` files
|
|
exactly.
|
|
|
|
## Implications for the AI regulatory sandbox
|
|
|
|
The pipeline has four properties that bear on the sandbox's evaluation.
|
|
*Provenance*: every executable step has a mechanical trace back to a
|
|
register row, and the refinement layer is recorded in ownership metadata.
|
|
A reviewer can audit either pass independently. *Versionability*: the
|
|
register is itself versioned (publication dates, changelog); the workspace
|
|
is git-versioned on processgit; the packages carry lifecycle metadata. A
|
|
register change triggers a re-transcode and a diffable refinement.
|
|
*Separability of judgement*: what the register says and what the
|
|
refinement adds are mechanically distinguishable — the skeleton is
|
|
reproducible from the register alone, and the curator's contribution is
|
|
exactly the diff. *Coverage*: the same tool applies unchanged to FG1, FG2,
|
|
FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's
|
|
nine sub-processes have curated executables in this POC (FG3-1, FG3-4,
|
|
FG3-5); the remaining six FG3 sub-processes are composition stubs; the
|
|
other five function groups are untouched.
|
|
|
|
The sandbox question is whether AI-assisted transcription of regulatory
|
|
processes into executable, signable artefacts is feasible at production
|
|
scale. This workspace is one positive existence proof — small,
|
|
end-to-end, with both passes shipped and reproducible.
|
|
|
|
## Limitations and next steps
|
|
|
|
The POC has known gaps. *Semantic validation* is structural only: the
|
|
packages validate against UAPF schemas and graph-integrity rules, not yet
|
|
against accounting-law semantics. There is no automated check that, for
|
|
instance, the *parnesums* rule complies with the relevant MK noteikumi or
|
|
that the deadlines align with *Grāmatvedības likums* 28.p(5). A second
|
|
validator layer — rule coverage against the cited statutes — is the
|
|
obvious next step. *Engine execution* is in scope for the uapf-engine in
|
|
isolation but not yet integrated with a Sledger host that would run the
|
|
packages against real accounting state. *Coverage*: the POC takes three
|
|
sub-processes to L4; production-grade coverage of FG3 alone needs six
|
|
more, and FG1, FG2, FG4, FG5, FG6 are still untouched. *Refinement
|
|
automation* is currently human-driven; quantifying the AI-assisted portion
|
|
of Pass 2 — applying an LLM to the skeleton and measuring the curator's
|
|
remaining diff — is the natural sandbox experiment that follows.
|
|
|
|
---
|
|
|
|
References inside the workspace: `tools/register-transcoder/README.md` for
|
|
the transcoder's CLI and register-format assumptions;
|
|
`processes/fg3-1`, `processes/fg3-4`, `processes/fg3-5` for the curated
|
|
executables; `tools/register-transcoder/sample-output/` for the skeletons
|
|
the comparisons in this note refer to.
|