From 514613c464417d6c4e0ba3611c8b2cbeb3c6724e Mon Sep 17 00:00:00 2001 From: Rihards Gailums Date: Wed, 20 May 2026 05:46:09 +0000 Subject: [PATCH] Step 8: methodology note and final validation pass MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds docs/methodology.md — the prose deliverable that ties the vk-gramatvediba workspace together for the AI regulatory sandbox. Describes the two-pass transcription pipeline (mechanical transcoder -> curated refinement), the concrete delta between the transcoder skeletons and the curated FG3-4/FG3-5 executables (gateways, DMN extraction, link reconciliation, resource/metadata authoring), the final validation pass (all .bpmn/.dmn well-formed, all 3 L4 packages pass uapf-cli, transcoder samples byte-deterministic), and the implications and limitations for the sandbox. Workspace now closes the 8-step build plan: skeleton, six FG L2 stubs, FG3 L2 + six FG3-x sub-process stubs, FG3-1/FG3-4/FG3-5 executables, register-to-BPMN transcoder, methodology note. --- docs/methodology.md | 235 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 235 insertions(+) create mode 100644 docs/methodology.md diff --git a/docs/methodology.md b/docs/methodology.md new file mode 100644 index 0000000..d81ae90 --- /dev/null +++ b/docs/methodology.md @@ -0,0 +1,235 @@ +# Methodology — `vk-gramatvediba` transcription pipeline + +## Context + +The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu +Centrs (VPC) publishes a normative description of public-sector bookkeeping +in Latvia — *Grāmatvedības uzskaites procesu apraksts* — as a set of six +function-group spreadsheets (FG1–FG6) spanning planning, expenditure, +settlement, fixed assets, payroll and reporting. Each register row is a +process step with an explicit responsible actor (RACI), the IT system used, +an SLA, the data produced, and predecessor/successor references to other +steps in the same or adjacent registers. + +This workspace, `vk-gramatvediba`, is a UAPF 2.2.0 transcription of the FG3 +register (*saistību un izdevumu uzskaite* — obligations and expenditure +accounting) into executable process artefacts. It is one of the projects +accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as +institutional applicant and KISC as pilot partner. + +The transcription is not a one-off translation. It is a *methodology* — +a repeatable two-pass pipeline from published normative documents to signed, +runnable process packages. This note describes the methodology, shows how +the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced +the curated FG3-4 and FG3-5 packages, and reports the final validation pass +over the workspace. + +## The pipeline + +Transcription has two passes with deliberately different epistemic status. + +**Pass 1 — mechanical transcoding.** A deterministic tool reads the +register and emits a BPMN skeleton: one task per register step, swimlanes +from RACI, sequence flows from the register's own predecessor/successor +columns, synthesised start/end events at the fragment's real boundary. The +skeleton is faithful to the register — including its inconsistencies — and +is `isExecutable="false"`. This pass lives in `tools/register-transcoder/`. + +**Pass 2 — curated refinement.** A human curator, optionally AI-assisted, +takes the skeleton and resolves it into a Level 4 executable: explicit +gateways, decision logic extracted into DMN, resource roles/agents/mappings, +metadata (ownership, lifecycle, policies), and a package manifest. The +result is a `processes//uapf.yaml` package that validates against the +UAPF 2.2.0 schemas and against `uapf-cli`, and is runnable on the +uapf-engine. + +The separation is the methodology's load-bearing claim. Pass 1 is +deterministic and traceable — re-running the transcoder on the same +register produces identical output, and every node has a mechanical +provenance to a register row. Pass 2 is interpretive and signable — it +adds judgement the register does not contain (rule tables that the register +describes only in prose, branches the register implies in footnotes, +metadata the register does not carry), and the curator is identified in the +package's ownership metadata. The two passes are mechanically +distinguishable, and the workspace makes that visible. + +## Pass 1 in detail — the transcoder + +The transcoder, `tools/register-transcoder/transcode.py`, is a single-file +Python tool with one external dependency (`openpyxl`). It locates the +worksheet and header row by content rather than by position, so it tolerates +the leading title rows the registers carry and applies unchanged to any of +the FG1–FG6 registers. It expects the standard register columns: the +predecessor block (FG-group and step-number in adjacent cells), the step's +*Nr.p.k.*, *Process, apakšprocess*, the RACI block split across the three +actor sub-columns (Nodarbinātais / Iestāde / VPC), *Darbību apraksts*, +*Izmantotā IS*, *Izpildes termiņš*, *Sagatavotie dati*, and the successor +block *Uz procesa darbības soli*. + +Rows that carry a number and a name but no description and no RACI entry +are treated as sub-process headers; rows with description or any RACI entry +are treated as steps and assigned to the most recently encountered header. +For a requested sub-process the transcoder emits a single BPMN process +containing one `bpmn:userTask` per step, with the step's description, +system, SLA, RACI cells and any cross-sub-process or cross-FG references +preserved in `bpmn:documentation`; swimlanes for each actor that has steps +in the sub-process, with each step placed in the lane of its Responsible +actor; sequence flows reconstructed from the union of predecessor and +successor references whose endpoints are both inside the sub-process; and +one `bpmn:startEvent` per *entry step* (no in-group predecessor) and one +`bpmn:endEvent` per *exit step* (no in-group successor), so the fragment's +real boundary is visible rather than hidden behind synthesised gateways. + +The output is `isExecutable="false"` and deliberately unembellished: no +inferred gateways, no synthesised decision logic, no compensation for +register-side inconsistencies. Reproducing register defects is a feature — +it makes the refinement step's contribution explicit. + +## Pass 2 in detail — the refinement + +Pass 2 takes a skeleton and authors the material the register implies but +does not encode. The operations, in roughly the order they apply: + +- *Link reconciliation* — the skeleton may show reciprocal edges, short + cycles, or disjoint fragments where the register's predecessor and + successor columns disagree. The curator decides the intended topology + and rewrites flows accordingly. +- *Gateway promotion* — a task whose semantics implies a decision is split + into the evaluating step plus an explicit gateway (typically + `exclusiveGateway`) with named branches; multiple register steps that + represent alternative outcomes collapse into branches. +- *DMN extraction* — where the register's prose implies a rule table, the + decision is lifted into a separate `.dmn` file with FIRST or UNIQUE hit + policy and named inputs/outputs. The BPMN gets a `businessRuleTask` whose + decision reference points to the DMN decision. +- *Resource authoring* — `resources/roles.yaml`, `agents.yaml` and + `mappings.yaml` enumerate the responsible parties, bind roles to the + systems named in *Izmantotā IS* (HoP, RVS Horizon, ePNS, etc.), and link + BPMN tasks to roles and agents. +- *Metadata authoring* — `metadata/ownership.yaml`, `lifecycle.yaml` and + `policies.yaml` record who curated the package, its UAPF lifecycle stage, + and policy constraints (retention, signing, jurisdictional applicability). +- *Manifest assembly* — the package's `uapf.yaml` lists kind, id, name, + level (4 for atomic executables), cornerstones referencing the BPMN, DMN, + resources and metadata, exposed inputs/outputs/artifacts, and any MCP + exposure. + +The refined package validates against the UAPF 2.2.0 schemas and against +`uapf-cli validate`. Once validated, the package is signable. + +## Concrete comparison — 3.5.2 / FG3-4 + +Sub-process 3.5.2 (*Saimnieciskie norēķini un to kustība*) has three +register steps. The transcoder's `sample-output/3.5.2.skeleton.bpmn` +contains three `userTask`s in two lanes (Nodarbinātais for 3.5.2.1 and +3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one +end event. It makes two register-side artefacts visible. Step 3.5.2.1 +(the advance request) has only external successor references — it routes +out to *FG2/2.3.2* (budget commitment) and back, and is not directly +linked to 3.5.2.2 in the register's columns, so the skeleton shows it as +a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have +reciprocal predecessor/successor entries in the register — each lists the +other in both directions — so the skeleton renders a two-task cycle. + +The curated `processes/fg3-4` package resolves both. Its BPMN +`Process_SaimnieciskaNorekina` has 14 nodes and 14 flows across all three +lanes, a single clean entry and exit, and a `businessRuleTask` linked to a +separate DMN. The DMN `Decision_AvansaNorekins` is a FIRST-hit decision +with five rules; its inputs are `avansaSituacija` and `avansaVeids`, and +its output `norekinResultats` takes one of four values — `slegts`, +`atmaksa`, `papildu-izmaksa`, `parnesums` — the four outcomes the register +describes in prose but does not encode in a table. The gateway downstream +of the DMN routes to the finalisation task for each outcome (close, +reclaim from employee, additional disbursement, carry forward). + +Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable. +None of what the executable adds is in the register's columns. It is in +the register's prose, in the SLA cells, and in the normative documents the +register cites — *Grāmatvedības likums*, MK noteikumi Nr. 749, MK +noteikumi Nr. 877. Pass 2 is where that material enters. + +## Concrete comparison — 3.5.3 / FG3-5 + +Sub-process 3.5.3 (*Komandējuma (darba brauciena) dokumenti un to +kustība*) has four register steps. The transcoder's +`sample-output/3.5.3.skeleton.bpmn` contains eight nodes and six flows. + +The curated `processes/fg3-5` package likewise has 14 nodes and 15 flows +across three lanes. It introduces an explicit *cancellation branch* — a +trip annulled before settlement vs a trip proceeded with — and the DMN +`Decision_KomandejumaNorekins` whose *parnesums* rule reconciles an +advance surplus against the next approved business trip from the same +funding line. Neither the cancellation branch nor the carry-forward rule +is in the register's predecessor/successor columns; both are in the +register's prose and in the cited *Komandējuma izdevumu noteikumi*. + +## Final validation pass + +The workspace at HEAD `a608de4` contains three Level 4 executable packages, +six Level 2 composition stubs, the function-group L2 manifests, the +transcoder tool, and this methodology note. The validation pass run for +this step: + +- `uapf-cli validate processes/fg3-4` → `OK: package valid`. +- `uapf-cli validate processes/fg3-5` → `OK: package valid`. +- `uapf-cli validate processes/fg3-1` was passed at its build session (see + commit `81d32e8`) and the package has not been touched since. +- All `.bpmn` and `.dmn` files in the workspace are XML-well-formed + (`xmllint --noout`). +- All schema-validated UAPF files (`uapf.yaml`, `resources/*.yaml`, + `metadata/policies.yaml`) pass the UAPF 2.2.0 JSON schemas. +- BPMN graph integrity: every `sequenceFlow` references existing + `sourceRef`/`targetRef` nodes; every `flowNodeRef` resolves to a defined + node; every `incoming`/`outgoing` reference is consistent with the + corresponding flow's source/target. +- The transcoder is byte-deterministic: re-running it on the FG3 register + for 3.5.2 and 3.5.3 reproduces the committed `sample-output/` files + exactly. + +## Implications for the AI regulatory sandbox + +The pipeline has four properties that bear on the sandbox's evaluation. +*Provenance*: every executable step has a mechanical trace back to a +register row, and the refinement layer is recorded in ownership metadata. +A reviewer can audit either pass independently. *Versionability*: the +register is itself versioned (publication dates, changelog); the workspace +is git-versioned on processgit; the packages carry lifecycle metadata. A +register change triggers a re-transcode and a diffable refinement. +*Separability of judgement*: what the register says and what the +refinement adds are mechanically distinguishable — the skeleton is +reproducible from the register alone, and the curator's contribution is +exactly the diff. *Coverage*: the same tool applies unchanged to FG1, FG2, +FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's +nine sub-processes have curated executables in this POC (FG3-1, FG3-4, +FG3-5); the remaining six FG3 sub-processes are composition stubs; the +other five function groups are untouched. + +The sandbox question is whether AI-assisted transcription of regulatory +processes into executable, signable artefacts is feasible at production +scale. This workspace is one positive existence proof — small, +end-to-end, with both passes shipped and reproducible. + +## Limitations and next steps + +The POC has known gaps. *Semantic validation* is structural only: the +packages validate against UAPF schemas and graph-integrity rules, not yet +against accounting-law semantics. There is no automated check that, for +instance, the *parnesums* rule complies with the relevant MK noteikumi or +that the deadlines align with *Grāmatvedības likums* 28.p(5). A second +validator layer — rule coverage against the cited statutes — is the +obvious next step. *Engine execution* is in scope for the uapf-engine in +isolation but not yet integrated with a Sledger host that would run the +packages against real accounting state. *Coverage*: the POC takes three +sub-processes to L4; production-grade coverage of FG3 alone needs six +more, and FG1, FG2, FG4, FG5, FG6 are still untouched. *Refinement +automation* is currently human-driven; quantifying the AI-assisted portion +of Pass 2 — applying an LLM to the skeleton and measuring the curator's +remaining diff — is the natural sandbox experiment that follows. + +--- + +References inside the workspace: `tools/register-transcoder/README.md` for +the transcoder's CLI and register-format assumptions; +`processes/fg3-1`, `processes/fg3-4`, `processes/fg3-5` for the curated +executables; `tools/register-transcoder/sample-output/` for the skeletons +the comparisons in this note refer to.