Managing Trust in Machine-Generated Analysis
Kenneth Tombs with a truly novel approach that aims to create trust in evidence created using AI and a call for help to help him test the idea.
| The chief compliance officer of a global financial organisation when testing AI internally, shared with the author: “I tested it [an AI] myself using a regulation I am very familiar with. With a basic question it gave me an equally basic and unhelpful answer. However, when I programmed [primed] it to think and act like a compliance officer and gave it ‘personality’ and way of thinking, then it gave more meaningful responses.” |
The genie is out of the bottle and now reads the evidence bundle! Artificial intelligence has arrived in professional practice. Not as a distant prospect, but as a present reality that courts, regulators, and practitioners are beginning to navigate whether they like it or not. The author when revisiting a most complex 2006 case, assessed that case reading-in by chambers that took 15 plus weeks of resource then, could now be completed in minutes, and immediately available for initial ‘what if’s and timelines’. Yet like all genies, AI brings both extraordinary power and genuine danger in equal measure. The question facing the legal profession and judiciary is not whether to put it back in the bottle as that is no longer possible. Instead, it is whether we learn quickly enough to ask it the right questions and extract the value possible from it.
Where the Existing Evidential Approach Falls Short
Courts and professionals are no longer asked whether computers can produce evidence. That question was settled three decades ago. What courts now face is a more subtle and less focussed problem: how can AI-assisted analysis and conclusion be tested, challenged, and weighed when no expert human can be cross-examined as to their veracity?
This is not an abstract question. AI systems are being tested in professional legal workflows, assembling case chronologies, extracting patterns from large document collections, and supporting analytical judgements, tasks that would otherwise require teams of paralegals and professionals’ months of work. Their presence in legal and regulatory environments is now unavoidable, and the pace of adoption is accelerating faster than the profession’s ability to evaluate it critically.
The difficulty is not one of capability. Contemporary AI systems can process vast bodies of information and assist in reasoning tasks at speeds no human practitioner can match. The difficulty is one of trust: whether the conclusions these systems produce can be understood, tested, and challenged within a robust and disciplined framework that evidential work demands.
The Gap in Existing Evidential Doctrine
Existing evidential doctrine secures provenance and authenticity but does not secure the structure of the reasoning that produced a conclusion. This reasoning would be examined via witnesses. Courts already possess established tools for assessing computer-generated material. They examine where information originated, how it was handled, what analytical methods were applied, and whether conclusions can be explained and challenged. Digital forensic standards, most notably ISO/IEC 27037 (identification and collection of digital evidence) and ISO/IEC 27042 (analysis and interpretation), provide a recognised framework for preserving evidential integrity across these stages.
These doctrines were developed for a world in which computer systems recorded facts. An AI system does something fundamentally different: it selects data, transforms it, applies comparative judgement, and may adapt its behaviour towards it over time. An AI output therefore resembles expert reasoning without an expert witness.
Existing doctrine does not fully address this distinction. Chain-of-custody procedures, forensic protocols, and expert testimony were designed to interrogate human analytical processes, supported by computers that recorded and stored information. When the analytical process itself occurs within a computational environment, operating at speeds and levels of complexity that are difficult for humans to reconstruct after the fact, those tools require extension rather than replacement.
The question is not whether AI can produce useful analysis. The question is whether that analysis can be made visible, replicated, and open to the same scrutiny that applies to any other form of evidence. It is potentially transformational but how, precisely, might that transformation take place?
Why Existing Governance Approaches Have Delivered Limited Results
Attempts to govern AI reasoning in professional environments have to-date largely applied established management doctrines, most notably the Plan-Do-Check-Act (PDCA) cycle, which has for more than 50 years underpinned compliance and management frameworks. ISO/IEC 42001, the emerging international standard for Artificial Intelligence Management Systems, explicitly uses PDCA as its operational backbone.
Practical simulation tests by the author and published separately, established that PDCA applied to AI reasoning carries significant shortcomings. The core difficulty is structural. PDCA assumes that the agent executing a task arrives with professional judgement, contextual understanding, and the capacity to interpret ambiguous situations. Humans arrive pre-loaded to do this. AI systems do not. Forcing AI to reason strictly within human procedural terms constrains its strengths and introduces subtle but significant weaknesses, particularly at the boundary between defined intent and real-world complexity.
AI risk-based balancing approaches face a related problem. The very flexibility we seek to employ can misrepresent the material we work with, thus an entirely risk-based weighting of evidence becomes unworkable.
What is needed is a two-way model. One in which professionals define intent, boundaries, and standards of acceptability, while AI interprets and executes tasks according to its available capabilities at a given point in time. For the best outcome, both must understand each others’ meanings. The architecture described in this article is in pursuit of a model in a form relevant to evidential work. This challenge is not merely practical but structural.
Longer-term academic work has surfaced a foundational challenge for any framework seeking to govern AI-assisted analysis, articulated clearly by Virginia and Frank Dignum in their November 2025 paper Agentifying Agentic AI. Their central argument is that current agentic AI systems, including those built on large language models, display a form of autonomy that is primarily behavioural rather than reasoned. Intentionality is not explicitly defined, but instead inferred from outputs, leaving motivations structurally opaque. Consequently, these systems lack reliable links between intent and action, making them difficult to align, predict, or verify. Dignum and Dignum argue that genuine agency requires explicit models of reasoning and governance most notably through Belief-Desire-Intention (BDI) architectures, where beliefs, goals, and intentions are separately represented and open to logical scrutiny.
Out of all this the concept of what we are calling a Dossier Space has emerged. Evidentially a dossier in conventional legal and investigative practice, is a bounded collection of documents; a container for evidence assembled for a defined purpose. The word ‘Space’ signals something different: an active, three-dimensional environment in which evidence, analysis, and reasoning coexist and remain mutually visible throughout the analytical process, rather than being assembled, analysed, and reported in separate sequential stages. Where a traditional dossier is a record of what has been gathered, a Dossier Space is a governed environment in which the act of gathering, the act of reasoning, and the act of explaining are structurally inseparable. Dossier Space should therefore be understood not as an additional governance layer around existing analytical systems, but as an attempt to impose these conditions within the analytical process itself. It operationalises this requirement by structuring AI reasoning through a primer architecture that makes intent, boundaries, and reasoning pathways explicit before any analysis begins. In doing so, it converts implicit, post hoc inference into explicit, preconditioned reasoning so directly addressing the accountability gap that Dignum and Dignum identify as a critical weakness in contemporary agentic systems deployed in consequential, high-value settings.
What a Disciplined Evidential Environment Requires – and How a Dossier Space Might Provide It
Dossier Space is a structured evidential environment in which analytical reasoning is preconditioned, not merely recorded.
The Three Requirements
The framework explored in this article treats AI-assisted evidence not as isolated outputs but as part of a structured evidential environment, referred to here as a Dossier Space. The added ‘space’ intended to represent a geometric volume within which only validated evidence and analysis can exist. Within such an environment, three elements or dimensions must remain continuously accessible to those relying on the evidence. Therefore, any framework capable of governing AI-assisted analysis must satisfy three conditions:
First, the provenance of the material must be clear. Professionals must be able to establish where information originated, how it was obtained, and whether it has been preserved without alteration. This reflects long-standing digital forensic practice and underpins confidence in the authenticity of electronic evidence.
Second, the reasoning pathway used to interpret the evidence must be visible. When analytical tools, including AI systems, are used to examine data, the observations, methods, and conclusions involved in that analysis should be recorded in a way allowing other professionals to review and challenge the process.
Third, the ability to interrogate the results must remain intact, inviolate. Courts and professionals must be able to question how conclusions were reached, test the assumptions behind them, and determine whether the analysis has been applied appropriately in the context of the case.
These three requirements correspond directly to questions courts and professionals already ask when evaluating computer-generated material: where the information came from, how it was handled, how conclusions were reached, and how those conclusions can be examined. Dossier Space does not intend to introduce new evidential doctrine. It has the goal of exploring practical ways of organising AI-assisted analysis so that established principles can continue to operate effectively with the potential to add much value.
How the Architecture Emerged
The architecture described in this article did not originate as a theoretical exercise. It emerged from practical work over two years by the author examining how AI systems interact with structured management information, particularly within Governance, Risk and Compliance (GRC) frameworks and management systems. This combined with experience ‘hands on’ as an expert witness, and a wider interest in legal admissibility, results in an AI pilot outlined in this article.
A recurring pattern emerged from those tests: mainstream AI systems were highly capable at analysing information but struggled when context, boundaries, and intent were not clearly defined. Worse, a misleading false truth could emerge as such systems tend to answer to reflect the users’ subconscious biases. This led to the development of structured prompts – referred to here as primers – designed to define purpose, scope, sources, and standards before any analytical process begins. Unlike prompt engineering or workflow controls, the primer defines the conditions under which reasoning is allowed to occur.
At the same time, a second problem became apparent. When AI generated conclusions, professionals needed to shape or reconstruct how those conclusions had been reached. This led to the evolution of PDCA as Context Intent Plan Deliver Assure (CIPDA) with its Reasoning, Observations, Methods, Evidence, and Results (ROMER) reasoning structure – as a systematic record of the analytical pathway.
When these elements were combined within a disciplined data/evidential environment, a broader insight emerged: professionals do not simply need answers from AI, but a structured workspace in which evidence, reasoning, and explanation remain visible and open to challenge.
The Structure of Dossier Space
Dossier Space as an evolution of established dossier practices, is organised around three structural axes, each corresponding to a distinct layer of evidential governance. AI can function in many multiples of dimensions concurrently as easily as a human uses stereoscopic binoculars to improve vision. Each axis represents an independently testable dimension of evidential reliability; defensibility arises only where all three are satisfied simultaneously.
The Procedural Primer (X-axis) functions as the integrity and compliance layer, ensuring that AI analytical activity remains within boundaries appropriate to the evidential material. It reflects the requirements of recognised standards including ISO/IEC 27037, ISO/IEC 27042, and ISO/IEC 42001.
The ROMER Primer (Y-axis) functions as the reasoning map, recording the logical pathway through which conclusions have been derived from evidence. Currently the UK Ministry of Justice’s 2025 call for evidence is reviewing the presumption of computer reliability, a review whose outcome will impact how AI-generated material is treated in criminal proceedings. Similarly in the US, it must meet the standard for admissible expert analysis under requirements including the proposed Rule 707 paired with the amended US Federal Rule of Evidence 702.
The Agentic Primer (Z-axis) provides the interrogation environment, creating an audit trail that tethers AI output to its procedural and logical foundations. It enables professionals to approach the dossier from multiple perspectives – investigative, analytical, or adversarial – and to examine the AI’s analytical choices rather than simply accepting its conclusions.
These three axes intersect at a fixed point of origin – the Defensibility Scorecard (0,0,0) – which provides the baseline against which all derivative judgements are measured. A governing Meta-Primer provides the master framework within which these elements operate. Thus, each axis represents an independently testable dimension of evidential reliability, and defensibility depends on their intersection.
The value of this example using geometry lies in what it prevents: within a properly constructed Dossier Space, tampering is structurally constrained, opaque reasoning should not persist, and the conditions under which AI hallucination could mislead professional judgement are substantially reduced.
What This Architecture Does Not Do
To avoid misinterpretation, it is equally important to define what a Dossier Space does not attempt to do. It is important to be clear on the limits of this approach.
The framework does not replace legal judgement. Questions of fact, interpretation, and legal consequence remain matters for professionals, courts, and tribunals. The architecture assists only in the organisation, analysis, examination, and explanation of evidential material.
It does not allow AI systems to determine outcomes. AI operating within a Dossier Space functions as an analytical tool under professional supervision: its role closer to that of a forensic support function than one of a decision-maker.
It does not assume AI outputs are inherently reliable. The entire structure is designed around the opposite assumption: that AI reasoning must be visible, replicable, and open to challenge if it is to be used responsibly in evidential contexts – zero-trust.
It does not replace established evidential principles. Courts already possess long-standing doctrines for assessing documentary and computer-generated evidence. This framework organises AI-assisted processes in a way that allows those principles to continue to operate effectively.
It does not eliminate bias in AI-assisted analysis, either human or AI. Rather, it makes the presence, source, and influence of bias more visible and therefore open to scrutiny. By requiring that intent, constraints and reasoning pathways are explicitly defined in advance, it allows bias to be identified, challenged and, where necessary, mitigated as part of the evidential process.
Finally, it does not require professionals to accept new technologies uncritically. One objective of the pilot described below is precisely to examine the limitations, weaknesses, and practical constraints of AI systems when placed within disciplined evidential environments.
A Practical Stress Test for AI-Supported Evidence
Professionals do not need advanced technical knowledge to form an initial view about the reliability of AI-supported evidence. A simple three-question stress test maps directly onto the Dossier Space framework.
Provenance: “Show me it is real.” Can the origin and integrity of the material be demonstrated? Where evidence is ingested into a Dossier Space, a unique cryptographic hash is generated at the point of entry (per ISO/IEC 27037). AI systems interact only with a copy or vector representation, preserving the original artefact intact. Under ISO/IEC 42001, the AI’s reasoning for including any artefact in an evidential bundle must be exportable in machine-readable formats (JSON/XML) so that an auditor can verify the link between the original material and any legal claim derived from it.
Analytical Pathway: “Can the reasoning be reconstructed?” Can another competent professional understand and replicate the steps through which conclusions were reached? The ROMER Primer records observations, methods, and results as part of the evidential record rather than in a retrospective report separated from the evidence itself. If the reasoning process cannot be reconstructed, the evidential value of the conclusion is significantly weakened.
Interrogation: “Explain the connection.” Can the conclusions be explained in clear terms and challenged through questioning and independent review? Courts and practitioners must never be placed in a position where the only explanation available is that “the system produced the result.” The Agentic Primer ensures that the basis for any AI-derived conclusion remains accessible and open to professional scrutiny.
Where evidence demonstrates clear provenance, a documented analytical pathway, and conclusions that can be explained and challenged, it is far more likely to withstand professional and judicial scrutiny. Where any of these elements is missing, caution is required.
Alignment with Current Standards and Regulatory Developments
The framework described in this article aligns with standards and regulatory requirements that are already in effect or in advanced development.
ISO/IEC 42001 is increasingly recognised in emerging regulation. Under the Colorado AI Act (effective 2026), alignment with recognised frameworks such as ISO/IEC 42001 or the NIST AI Risk Management Framework may support a rebuttable presumption of reasonable care, providing organisations with a structured basis for demonstrating compliance.
The direction of travel in legal technology reinforces this framing. Leading platforms have moved beyond document search toward agentic workflows that autonomously assemble case chronologies and order-of-proof dossiers. AI will be increasingly used to build entity-level dossiers tracking relationships between parties across millions of pages, creating outputs that must be relied upon, challenged, and defended. Research at the University of Gothenburg examines how AI constructs evidence from diverse digital traces – drones, body cameras, social media – shifting the role of the lawyer from searcher to evaluator of algorithmically generated patterns.
These developments make the governance question urgent rather than speculative. As AI builds dossiers, it must be subject to the kind of structured interrogation the Dossier Space framework is intended to support.
How Would a Pilot Test Dossier Space and Results Be Reported?
Purpose and Scope
The pilot outlined in this article is a practical exploration of how contemporary AI platforms behave within a structured evidential environment of the kind described above. Its purpose is not to promote any particular technology, nor to develop commercial software. It seeks to evaluate whether existing AI tools can be used safely, transparently, and usefully within professional legal workflows.
The exercise will examine current AI platforms against two principal objectives:
- Security – whether platforms can operate within boundaries appropriate for handling professionally sensitive material.
- Workability – whether systems can assist with the practical tasks involved in examining complex evidence.
Overall pilot success will ‘look-like’ structured preconditioning of reasoning which ‘actually’ improves transparency, reproducibility, and challenge-ability of AI-assisted analysis.
Design of the Exercise
At the centre of the exploration will be a simulated case dossier containing a curated, or anonymised, set of materials reflecting the types of evidence typically encountered in litigation or investigation: documentary records, structured data, correspondence, technical reports, timelines, and other evidential artefacts. The dossier will be treated as if it were a live evidential environment.
The pilot’s value will stem from participating professionals interacting with the materials using a still to be selected AI platform operating under defined constraints; specifically, the AI’s primer and ROMER structures described above. These mechanisms will guide how AI systems interpret the dossier and how their analytical responses are recorded.
During the exercise the pilot will question:
- Whether AI systems can reliably assist in navigating large evidence collections.
- How clearly they can explain their reasoning processes.
- Whether they respect evidential boundaries and constraints.
- How easily their outputs can be challenged or interrogated by professionals.
- Whether AI-supported tools can ‘truly’ manage large volumes of evidence, identify relevant connections, and explore dynamic ‘what if’ scenarios.
Participants may approach the dossier from different professional perspectives – investigative, analytical, or adversarial – reflecting the reality that evidence must be tested from multiple viewpoints within the justice system.
What the Pilot Will Deliver?
Three forms of learning are expected from the exercise.
First, practical observations about current AI platforms: how they perform when applied to evidential materials; where their reasoning is transparent and where it is not; whether their behaviour within structured constraints differs meaningfully from their behaviour in open-ended environments.
Second, insight into how AI-assisted evidence can be managed responsibly: how professionals might organise, analyse, and interrogate large bodies of material while maintaining clear provenance, reasoning pathways, and professional oversight.
Third, a set of professional lessons for future practice. Results will be reported through a series of articles documenting the progress of the experiment and the lessons emerging from it, examining both the strengths and limitations observed and recommendations for further work.
The pilot will run to a timescale set by participants, concluding with a published summary of observations and lessons learned, accompanied by a formally reviewed academic paper. The objective is not necessarily to produce definitive answers, more to develop practical insight into how AI might responsibly support professional evidential work in the years ahead.
A Call to Participate
Artificial intelligence will inevitably become part of the professional landscape in which law, regulation, and investigation operate. The question facing the legal community is not whether the technology will be employed, but whether it will be understood, governed, and applied with the level of care that the administration of justice requires.
Members of the Society, and others with relevant experience or interest, are invited to engage with the exercise, whether by participating directly, reviewing materials, contributing professional perspectives, or following the progress of the work. Constructive scrutiny is particularly welcome. If you would like to be part of the pilot, please email me at Kenneth.tombs@BUSINESS-COMPASS.COM.
The aim is not to advocate for a specific technology, nor to resist it reflexively, but to examine it with the seriousness that the legal profession owes to the systems of justice it serves. Only through that kind of engagement can the profession determine whether AI ultimately strengthens professional practice — or whether its use must remain carefully limited.
Either outcome will be valuable. What matters most is that the profession reaches its conclusions through knowledge rather than presumption or marketing speak.

Ken Tombs began his career in electronics engineering before moving into early office technologies, working with organisations such as NEXOS and Honeywell. He went on to advise the HM Government, including the Cabinet Office and HM Treasury, on emerging technologies and digital record preservation. He lead the Legal Images Initiative on computer generated evidence supported by the SCL. Now retired, he continues private research into the application of AI to governance, risk and compliance systems.
Use of Artificial Intelligence in this Article
Artificial intelligence tools were used during the preparation of this article to assist with drafting, editing, simulating and the organisation of ideas. Their use formed part of the broader exploratory process described in the paper, examining how human expertise and AI systems can interact within structured reasoning frameworks. The concepts, analysis, and conclusions presented in the article are those of the author, who remains solely responsible for the content.
© 2026 Kenneth Tombs. All rights reserved.