The accuracy and thoroughness of documentation are crucial for system certification. While the use of AI can accelerate the preparation of such documentation, it can also impact its quality. How can such documentation be effectively reviewed when it is extensive? It is important that information is presented in a way that makes it easy for humans to review. Structured arguments used in assurance cases facilitate this.
How serious is the problem, and are such reviews needed? In April, the FDA issued a Warning Letter rejecting the drug registration application, citing serious procedural errors including the inappropriate use of AI (see the section ‘Inappropriate Use of Artificial Intelligence‘). The FDA’s approach is very restrictive. The consequences of providing false information in submitted documentation can be severe. What are the causes of these problems? Do documents created by AI merely cover up human errors? Unfortunately, AI not only hallucinates, but also generates errors. A recent Microsoft Research publication, ‘LLMs Corrupt Your Documents When You Delegate‘, indicates that AI can corrupt documents when working with it. 20 iterations of the AI work on documents can damage up to 25% of the documentation. While AI can accelerate work, it also creates new risks.
Structured arguments were created to organise information for verification and to make it easier to review the documentation of complex systems. Information that is related and should be verified together because it leads to a specific conclusion is presented side by side in the argument. The verifier doesn’t have to search for the information. An assurance case makes life easier for verifiers, not for system developers. If the AI creates some system documentation, we can also delegate to the AI the task of organising information from that documentation into an argument. Typically, no modifications to the system documentation itself will be necessary, as the argument is an additional information structure that simply references individual documents and their fragments. The created argument provides an insight into the documentation from an auditor’s perspective. Creating arguments can help the AI better understand its goals and improve the documentation. Let the AI create such arguments and let humans verify them. This achieves maximum effect with minimum effort.
How do arguments facilitate verification?
- Arguments clearly show how system goal, such as safety or security, can be demonstrated based on the evidence documentation. These arguments are broken down into small inference steps with clearly defined acceptance criteria. Rather than evaluating large blocks of information, we focus on small, specific steps of the argument. If any step is incomplete — for example, if a testing task is incomplete — this is easily identified.
- The reasoning steps are interconnected and form a coherent whole. When contextual inconsistencies are identified, specific areas for improvement can be pinpointed. For instance, a component’s operating environment requirements may be incompatible with the target system environment.
- Argumentation is a permanent record of information. If we identify incorrect argumentation steps, this is recorded and we can track it until it’s resolved. A complete review and argument change history is maintained. We monitor the process of progress to the point when the argument is full approved.
- Arguments maintain links to the documentation, from system-level goals such as safety, security or compliance, through the design and testing evidence, to the operational documentation. If we identify a gap or error in an argument, we can easily link it to the system’s defined requirements and goals.
- With this argument in place, we can produce a review report. This report provides evidence that the AI tasks are overseen by humans. This evidence is important for system certification.
In this way assurance cases can be used to verify the work products generated by AI. Although we cannot avoid human analysis of the documentation, this approach is easier for humans.
Are you planning to use assurance case arguments to verify AI-generated results? Or perhaps you are already doing so? Share your comments.
Andrzej Wardziński
Share your comments
