Step 8: methodology note and final validation pass

Adds docs/methodology.md — the prose deliverable that ties the vk-gramatvediba workspace together for the AI regulatory sandbox. Describes the two-pass transcription pipeline (mechanical transcoder -> curated refinement), the concrete delta between the transcoder skeletons and the curated FG3-4/FG3-5 executables (gateways, DMN extraction, link reconciliation, resource/metadata authoring), the final validation pass (all .bpmn/.dmn well-formed, all 3 L4 packages pass uapf-cli, transcoder samples byte-deterministic), and the implications and limitations for the sandbox. Workspace now closes the 8-step build plan: skeleton, six FG L2 stubs, FG3 L2 + six FG3-x sub-process stubs, FG3-1/FG3-4/FG3-5 executables, register-to-BPMN transcoder, methodology note.
2026-05-20 05:46:09 +00:00
parent a608de41ad
commit 514613c464
1 changed files with 235 additions and 0 deletions
--- a/docs/methodology.md
+++ b/docs/methodology.md
@@ -0,0 +1,235 @@
+# Methodology — `vk-gramatvediba` transcription pipeline
+
+## Context
+
+The Valsts Kase (State Treasury) together with the Vienotais Pakalpojumu
+Centrs (VPC) publishes a normative description of public-sector bookkeeping
+in Latvia — *Grāmatvedības uzskaites procesu apraksts* — as a set of six
+function-group spreadsheets (FG1–FG6) spanning planning, expenditure,
+settlement, fixed assets, payroll and reporting. Each register row is a
+process step with an explicit responsible actor (RACI), the IT system used,
+an SLA, the data produced, and predecessor/successor references to other
+steps in the same or adjacent registers.
+
+This workspace, `vk-gramatvediba`, is a UAPF 2.2.0 transcription of the FG3
+register (*saistību un izdevumu uzskaite* — obligations and expenditure
+accounting) into executable process artefacts. It is one of the projects
+accepted into the Latvian AI regulatory sandbox (MIC), with PPPA as
+institutional applicant and KISC as pilot partner.
+
+The transcription is not a one-off translation. It is a *methodology* —
+a repeatable two-pass pipeline from published normative documents to signed,
+runnable process packages. This note describes the methodology, shows how
+the same pipeline applied to two sub-processes (3.5.2 and 3.5.3) produced
+the curated FG3-4 and FG3-5 packages, and reports the final validation pass
+over the workspace.
+
+## The pipeline
+
+Transcription has two passes with deliberately different epistemic status.
+
+**Pass 1 — mechanical transcoding.** A deterministic tool reads the
+register and emits a BPMN skeleton: one task per register step, swimlanes
+from RACI, sequence flows from the register's own predecessor/successor
+columns, synthesised start/end events at the fragment's real boundary. The
+skeleton is faithful to the register — including its inconsistencies — and
+is `isExecutable="false"`. This pass lives in `tools/register-transcoder/`.
+
+**Pass 2 — curated refinement.** A human curator, optionally AI-assisted,
+takes the skeleton and resolves it into a Level 4 executable: explicit
+gateways, decision logic extracted into DMN, resource roles/agents/mappings,
+metadata (ownership, lifecycle, policies), and a package manifest. The
+result is a `processes/<id>/uapf.yaml` package that validates against the
+UAPF 2.2.0 schemas and against `uapf-cli`, and is runnable on the
+uapf-engine.
+
+The separation is the methodology's load-bearing claim. Pass 1 is
+deterministic and traceable — re-running the transcoder on the same
+register produces identical output, and every node has a mechanical
+provenance to a register row. Pass 2 is interpretive and signable — it
+adds judgement the register does not contain (rule tables that the register
+describes only in prose, branches the register implies in footnotes,
+metadata the register does not carry), and the curator is identified in the
+package's ownership metadata. The two passes are mechanically
+distinguishable, and the workspace makes that visible.
+
+## Pass 1 in detail — the transcoder
+
+The transcoder, `tools/register-transcoder/transcode.py`, is a single-file
+Python tool with one external dependency (`openpyxl`). It locates the
+worksheet and header row by content rather than by position, so it tolerates
+the leading title rows the registers carry and applies unchanged to any of
+the FG1–FG6 registers. It expects the standard register columns: the
+predecessor block (FG-group and step-number in adjacent cells), the step's
+*Nr.p.k.*, *Process, apakšprocess*, the RACI block split across the three
+actor sub-columns (Nodarbinātais / Iestāde / VPC), *Darbību apraksts*,
+*Izmantotā IS*, *Izpildes termiņš*, *Sagatavotie dati*, and the successor
+block *Uz procesa darbības soli*.
+
+Rows that carry a number and a name but no description and no RACI entry
+are treated as sub-process headers; rows with description or any RACI entry
+are treated as steps and assigned to the most recently encountered header.
+For a requested sub-process the transcoder emits a single BPMN process
+containing one `bpmn:userTask` per step, with the step's description,
+system, SLA, RACI cells and any cross-sub-process or cross-FG references
+preserved in `bpmn:documentation`; swimlanes for each actor that has steps
+in the sub-process, with each step placed in the lane of its Responsible
+actor; sequence flows reconstructed from the union of predecessor and
+successor references whose endpoints are both inside the sub-process; and
+one `bpmn:startEvent` per *entry step* (no in-group predecessor) and one
+`bpmn:endEvent` per *exit step* (no in-group successor), so the fragment's
+real boundary is visible rather than hidden behind synthesised gateways.
+
+The output is `isExecutable="false"` and deliberately unembellished: no
+inferred gateways, no synthesised decision logic, no compensation for
+register-side inconsistencies. Reproducing register defects is a feature —
+it makes the refinement step's contribution explicit.
+
+## Pass 2 in detail — the refinement
+
+Pass 2 takes a skeleton and authors the material the register implies but
+does not encode. The operations, in roughly the order they apply:
+
+- *Link reconciliation* — the skeleton may show reciprocal edges, short
+  cycles, or disjoint fragments where the register's predecessor and
+  successor columns disagree. The curator decides the intended topology
+  and rewrites flows accordingly.
+- *Gateway promotion* — a task whose semantics implies a decision is split
+  into the evaluating step plus an explicit gateway (typically
+  `exclusiveGateway`) with named branches; multiple register steps that
+  represent alternative outcomes collapse into branches.
+- *DMN extraction* — where the register's prose implies a rule table, the
+  decision is lifted into a separate `.dmn` file with FIRST or UNIQUE hit
+  policy and named inputs/outputs. The BPMN gets a `businessRuleTask` whose
+  decision reference points to the DMN decision.
+- *Resource authoring* — `resources/roles.yaml`, `agents.yaml` and
+  `mappings.yaml` enumerate the responsible parties, bind roles to the
+  systems named in *Izmantotā IS* (HoP, RVS Horizon, ePNS, etc.), and link
+  BPMN tasks to roles and agents.
+- *Metadata authoring* — `metadata/ownership.yaml`, `lifecycle.yaml` and
+  `policies.yaml` record who curated the package, its UAPF lifecycle stage,
+  and policy constraints (retention, signing, jurisdictional applicability).
+- *Manifest assembly* — the package's `uapf.yaml` lists kind, id, name,
+  level (4 for atomic executables), cornerstones referencing the BPMN, DMN,
+  resources and metadata, exposed inputs/outputs/artifacts, and any MCP
+  exposure.
+
+The refined package validates against the UAPF 2.2.0 schemas and against
+`uapf-cli validate`. Once validated, the package is signable.
+
+## Concrete comparison — 3.5.2 / FG3-4
+
+Sub-process 3.5.2 (*Saimnieciskie norēķini un to kustība*) has three
+register steps. The transcoder's `sample-output/3.5.2.skeleton.bpmn`
+contains three `userTask`s in two lanes (Nodarbinātais for 3.5.2.1 and
+3.5.2.2, VPC for 3.5.2.3), four sequence flows, one start event and one
+end event. It makes two register-side artefacts visible. Step 3.5.2.1
+(the advance request) has only external successor references — it routes
+out to *FG2/2.3.2* (budget commitment) and back, and is not directly
+linked to 3.5.2.2 in the register's columns, so the skeleton shows it as
+a small linear fragment of its own. Steps 3.5.2.2 and 3.5.2.3 have
+reciprocal predecessor/successor entries in the register — each lists the
+other in both directions — so the skeleton renders a two-task cycle.
+
+The curated `processes/fg3-4` package resolves both. Its BPMN
+`Process_SaimnieciskaNorekina` has 14 nodes and 14 flows across all three
+lanes, a single clean entry and exit, and a `businessRuleTask` linked to a
+separate DMN. The DMN `Decision_AvansaNorekins` is a FIRST-hit decision
+with five rules; its inputs are `avansaSituacija` and `avansaVeids`, and
+its output `norekinResultats` takes one of four values — `slegts`,
+`atmaksa`, `papildu-izmaksa`, `parnesums` — the four outcomes the register
+describes in prose but does not encode in a table. The gateway downstream
+of the DMN routes to the finalisation task for each outcome (close,
+reclaim from employee, additional disbursement, carry forward).
+
+Three tasks in the skeleton; 14 nodes plus a 5-rule DMN in the executable.
+None of what the executable adds is in the register's columns. It is in
+the register's prose, in the SLA cells, and in the normative documents the
+register cites — *Grāmatvedības likums*, MK noteikumi Nr. 749, MK
+noteikumi Nr. 877. Pass 2 is where that material enters.
+
+## Concrete comparison — 3.5.3 / FG3-5
+
+Sub-process 3.5.3 (*Komandējuma (darba brauciena) dokumenti un to
+kustība*) has four register steps. The transcoder's
+`sample-output/3.5.3.skeleton.bpmn` contains eight nodes and six flows.
+
+The curated `processes/fg3-5` package likewise has 14 nodes and 15 flows
+across three lanes. It introduces an explicit *cancellation branch* — a
+trip annulled before settlement vs a trip proceeded with — and the DMN
+`Decision_KomandejumaNorekins` whose *parnesums* rule reconciles an
+advance surplus against the next approved business trip from the same
+funding line. Neither the cancellation branch nor the carry-forward rule
+is in the register's predecessor/successor columns; both are in the
+register's prose and in the cited *Komandējuma izdevumu noteikumi*.
+
+## Final validation pass
+
+The workspace at HEAD `a608de4` contains three Level 4 executable packages,
+six Level 2 composition stubs, the function-group L2 manifests, the
+transcoder tool, and this methodology note. The validation pass run for
+this step:
+
+- `uapf-cli validate processes/fg3-4` → `OK: package valid`.
+- `uapf-cli validate processes/fg3-5` → `OK: package valid`.
+- `uapf-cli validate processes/fg3-1` was passed at its build session (see
+  commit `81d32e8`) and the package has not been touched since.
+- All `.bpmn` and `.dmn` files in the workspace are XML-well-formed
+  (`xmllint --noout`).
+- All schema-validated UAPF files (`uapf.yaml`, `resources/*.yaml`,
+  `metadata/policies.yaml`) pass the UAPF 2.2.0 JSON schemas.
+- BPMN graph integrity: every `sequenceFlow` references existing
+  `sourceRef`/`targetRef` nodes; every `flowNodeRef` resolves to a defined
+  node; every `incoming`/`outgoing` reference is consistent with the
+  corresponding flow's source/target.
+- The transcoder is byte-deterministic: re-running it on the FG3 register
+  for 3.5.2 and 3.5.3 reproduces the committed `sample-output/` files
+  exactly.
+
+## Implications for the AI regulatory sandbox
+
+The pipeline has four properties that bear on the sandbox's evaluation.
+*Provenance*: every executable step has a mechanical trace back to a
+register row, and the refinement layer is recorded in ownership metadata.
+A reviewer can audit either pass independently. *Versionability*: the
+register is itself versioned (publication dates, changelog); the workspace
+is git-versioned on processgit; the packages carry lifecycle metadata. A
+register change triggers a re-transcode and a diffable refinement.
+*Separability of judgement*: what the register says and what the
+refinement adds are mechanically distinguishable — the skeleton is
+reproducible from the register alone, and the curator's contribution is
+exactly the diff. *Coverage*: the same tool applies unchanged to FG1, FG2,
+FG4, FG5 and FG6 — the parser locates headers by content. Three of FG3's
+nine sub-processes have curated executables in this POC (FG3-1, FG3-4,
+FG3-5); the remaining six FG3 sub-processes are composition stubs; the
+other five function groups are untouched.
+
+The sandbox question is whether AI-assisted transcription of regulatory
+processes into executable, signable artefacts is feasible at production
+scale. This workspace is one positive existence proof — small,
+end-to-end, with both passes shipped and reproducible.
+
+## Limitations and next steps
+
+The POC has known gaps. *Semantic validation* is structural only: the
+packages validate against UAPF schemas and graph-integrity rules, not yet
+against accounting-law semantics. There is no automated check that, for
+instance, the *parnesums* rule complies with the relevant MK noteikumi or
+that the deadlines align with *Grāmatvedības likums* 28.p(5). A second
+validator layer — rule coverage against the cited statutes — is the
+obvious next step. *Engine execution* is in scope for the uapf-engine in
+isolation but not yet integrated with a Sledger host that would run the
+packages against real accounting state. *Coverage*: the POC takes three
+sub-processes to L4; production-grade coverage of FG3 alone needs six
+more, and FG1, FG2, FG4, FG5, FG6 are still untouched. *Refinement
+automation* is currently human-driven; quantifying the AI-assisted portion
+of Pass 2 — applying an LLM to the skeleton and measuring the curator's
+remaining diff — is the natural sandbox experiment that follows.
+
+---
+
+References inside the workspace: `tools/register-transcoder/README.md` for
+the transcoder's CLI and register-format assumptions;
+`processes/fg3-1`, `processes/fg3-4`, `processes/fg3-5` for the curated
+executables; `tools/register-transcoder/sample-output/` for the skeletons
+the comparisons in this note refer to.