Summary

The adaptation of the global economy to the capabilities offered by digital tools in general and artificial intelligence (AI) systems in particular is transforming our societies in a profound but also insidious way by raising questions that bring into play the very foundations of our values system. Indeed, the continuous increase in the networks size, the computing power of computers, the arrival of new technologies such as machine learning, the availability of large quantities of data accentuated by the Internet of Things, are accelerating the penetration of AI in sectors as diverse as health, defence, transport, education, energy, justice, etc.

Nevertheless, the reliability of the algorithms that underlie the decisions made by these AI-based systems is often highly questionable (fragmentation, bias, discrimination, exclusion, etc.). Moreover, for “algorithms derived from the artificial neural network paradigm, in particular, from deep learning (i.e. comprising several linked layers of nodes that process different information and then aggregate them)” it is impossible to explain the result produced since, “unlike the logic-deductive reasoning of AI systems, it is not possible to extract a clear and coherent decision tree”. [1]

Therefore, how could a decision be accepted if it is not explicable, measurable and provable?

“The logic of the latter (the algorithms) tends to shift the decision making towards the technical stages of system design” [1]. This runs counter to the imperative of making algorithms more intelligible and transparent. This leads us to imagine, following the example of “privacy by design” that has been provided for in terms of personal data protection (PDPR) that the governance of algorithms could be part of an “ethics by design” approach. It is therefore a management system that will have to be analysed in terms of ethics, and not a final product. Normative ethics, and in particular consequentialism, which is distinct from law and conformity, can sometimes fulfil a function aimed at assessing the correctness of the rules it lays down. Moreover, it can prefigure it, particularly in the area of risk assessment.

Concerning this assessment, there are already well-proven methods and tools available for the analysis of operational and risk control securities related to complex systems, of which IA is only a new construction model; with the reservation, however, that these methods and tools apply to “products” and not to “management systems” as it is the case here.

Thus, we consider that the acceptability of AI-based systems, which will only take place under conditions that will ensure trust (I), presupposes, on the one hand, their rigorous evaluation, in particular by drawing up reference frameworks, charters, labels and certifications (II), and, on the other hand, the implementation of mechanisms likely to establish a real gradation of risks in order, if necessary, to provide irrefutable proof of the chain of responsibility (III). We have developed more in depth these aspects, taking as an example the Arborus Charter on Artificial Intelligence.

I – Building trust on AI systems and uses

On mechanisms for trust building

The trust placed in a system by its users depends essentially on their perception of the designer’s control of the risks inherent in the system.
In order to increase this perception, the system provider can implement various commercial recognition processes:

  • Adhering spontaneously to a charter, which will most often incorporate the general principles of respect for fundamental rights, non-discrimination, quality and security, transparency and neutrality, user control and human control, respect for privacy, etc.
  • Obtain a label based on a reference system close to a charter, under the control of a body awarding this label according to criteria that are sometimes more or less transparent.
  • Obtain a certification, which will generally be based on a reference system of the same type as the previous ones, but this time stamped by an official standardisation body such as ISO; this reference system, thus endowed with the status of a “voluntary standard”, then complements the legal standards that are laws and regulations; the “certifying” body, in order to be able to issue its certificate, will itself have to undergo an accreditation process, complying with another voluntary standard.

These standards and certifications obviously generate very lucrative activities for these training and certification bodies. The issue of confidence in the system is then replaced by that of confidence in the accreditation procedures of this system.

What level of confidence should be given to these charter, label and certification mechanisms? These mechanisms already offer only a very relative guarantee in technical matters. For example, the Boeing 737 Max was certified to fly. In terms of cybersecurity, ISO 27001-certified companies are not immune to data leaks and attacks by hackers… As regards ethical commitments, examples are not rare of non-compliance, or non-conformity by products that are nevertheless distributed by labelled or certified companies.

When users, be they citizens or organisations, see their trust based on these processes betrayed, and this will also necessarily be the case for AI-based systems, they still can recourse to law and justice. Thus called in case of breach of trust, the legal and judicial organization will then itself have to demonstrate that it is trustworthy: it will have to guarantee to the litigants that, in case of failure of the commercial mechanisms of trust, the justice system will be able to determine responsibilities.

On the criteria applicable to AI-based systems

Trust is always a prediction, a bet on the future with a real proportion of uncertainty.

A “trustworthy” AI should be based on CoE values and rules.

Some European legislation on data protection, privacy or non-discrimination already applies to AI-based systems. Given their specificities, should complementary legislation be added, and which one?

Ethical charters, such as those of the CEPEJ [3], or of the Arborus [4] collective, provide recommendations to AI designers, developers and users.

While the commitment to comply with a charter is an element of trust, it does not constitute absolute certainty as to the absence of risk. The effective application of a charter is not always subject to control. The signing of the charter bears witness to an intentional approach, which very often only covers a marketing and image improvement operation.

In the same way, product and service labelling and certification processes, by providing “reasonable” assurance of compliance with a predefined set of requirements, will provide an element of confidence with regard to AI, as do certification systems in terms of quality, safety or environment.

The objective of these processes is to inspire confidence by having a third-party organization verify compliance with a set of requirements. Confidence in these recognition processes is thus inferred from the choice of the reference system of requirements and the credit given to the certifying and accrediting bodies.
Beyond the certification frame of reference, the attribution of the certificate requires the existence of an evaluation model which would define the following rules:

  • who is in charge of assessing conformity to the reference frame?
  • according to which process?
  • what evidence can be required and verified?

All questions need to be answered before considering the certification of AI-based systems.

II – Reducing and controlling risks

It is now usual to consider trust as a ” risk-reduction mechanism “.

In order to reduce risks, they have to first be assessed. This assessment is contextual, not absolute, and will evolve over the lifetime of the AI-based system. The level of risk of an AI depends on multiple factors: the technologies implemented, the field of application, the users targeted and the company implementing the system, among other factors.

Several potential risks of AI have been already identified: lack of transparency of decision-making, discrimination on the basis of sex or other reasons, intrusion into privacy, criminal use.

A distinction is proposed between ordinary and high-risk IT systems, which require special protection measures. While this distinction is necessary, establish it is not always easy.

AI-based systems are very complex, as they are often systems that themselves include AI-based systems and the risks analysis of such systems is an ongoing task. Therefore, we believe that the risk level of an AI-based system should be assessed before its design (by design) and reassessed at regular intervals. Ethical labels and certifications of AI-based systems should necessarily be based on an identification of risks, their regular re-evaluation and a control of the effective implementation of risk reduction actions.

It seems to us that the risk analysis and management methodologies widely used in industry, such as FMECA, can be usefully applied to AI-based systems by adapting them to the context and transposing the terminology to the specific failure modes of Artificial Intelligence. Lastly, labels and certifications are more precise than simple charters and they can help define common rules to promote the emergence of fair competition between AI-based system designers for the benefit of their users.

III – Providing proof

However, these labelling and certification systems must be based on a sufficiently solid and trustworthy system of proof.

In order to address the problem of proof, we started as an example with the Arborus Charter on Artificial Intelligence [4]. Designed to create a framework of trust with regard to artificial intelligence, this charter essentially seeks to prevent potential biases with regard to gender equality.

“It is a reference document both for Tech companies and for all those implementing AI to respect diversity by ensuring that the entire data value chain is responsible and that discriminatory biases are identified and controlled. » [4]

Our objective here is to highlight, in a concrete way, the means required to search for evidence in relation to allegations, and to underline the difficulties of a rigorous evaluation of the respect of these allegations. We are in no way seeking to criticise the Arborus initiative, which can only have positive effects in the pursuit of its objective.

Even if this charter does not cover all the areas addressed by the other current initiatives for the labeling or certification of an ethical and trustworthy AI, it appears to be very representative of the problems that certifiers, as well as experts having to seek liability after a disaster, may encounter.

Let us examine one after the other the 7 commitments made by the signatories of this charter and ask ourselves how to demonstrate to an independent third party that each commitment has been fulfilled. This exercise is not exhaustive, it is just illustrative:

1. Promote diversity and diversity in the teams working on AI-based solutions.

As evidence of this first commitment, one can imagine :

  • Look at the result: count the teams working on AI and compare the ratio of women / men to the same ratio for the rest of the company?
  • Looking at the “promotion” work that will have been done, then it is a question of evaluating the means deployed?
  • Collect these figures over time to evaluate the effectiveness of the means deployed?

We have only looked at the ratio of women to men, the objective here being diversity, we should probably go further in the analysis? We could also imagine the collection of more micro indicators, make analyses by site instead of taking into account the whole company, or use strict or broad definitions of the notion of “teams working on AI” …

2. Organize to assess and respond to all forms of discrimination that could result from biased or stereotyped data.

With regard to this second commitment, one may wonder about the definition of the occurrence of discrimination: is it an isolated particular case? or is there a sufficient number of cases to declare that discrimination has occurred?
Once this has been defined, one could then look for evidence of the existence of a procedure for assessing and reacting in the event of discrimination. This would be indirect evidence. To go further, one would have to ask the question about the materiality of this procedure: Is it documented? Is it known? By how many people in the teams? Is it applied? Is its application traced? Has it already been used? What are the results of its use both in terms of evaluation and response? Has it been modified over time to adapt it to the realities encountered? What lessons have been learned?

3. Ensure the quality of the data used to guarantee the most equitable systems possible: unified, consistent, verified, traceable and usable data.

We assume that this is the data used to develop the algorithms. It is therefore learning data (in the case of machine learning) and test data. Qualifiers for these data are required in large numbers here.

The first two “unified, coherent” need to be explained in order to know how to check these points. The next one “verified”, against which reference frame?

We understand traceability as a conclusive archiving of learning and test data. To verify this point, it would be necessary to make sure that these data have been backed up and that the backup that is presented is indeed that of the data that were used to develop the algorithm. As there are usually several successive versions,

one should be able to trace these data for each version. And in the event of a post-evidence search for evidence, it should be ensured that the operational version at the time of the incident is the version that is presented to an external “auditor”.
The last qualifier, “exploitable”, also requires an explanation of how to verify its application.

4. Training to raise awareness and empower designers, developers and all actors involved in the AI industry, to stereotypes, biases that can generate discrimination.

This is a training objective.

Evidence could therefore be sought in terms of the implementation of means; for example, the number of hours of training on this theme per person concerned. This indicator can be refined by distinguishing different categories of the population concerned.

One could also look for evidence of results: how did the training affect the people concerned? Were there student evaluations at the end of each training session? What are the results: randomly interviewing people who have taken the training to check their level of knowledge?

5. Raise awareness among those who prescribe AI-based solutions (HR, finance, customer relations, marketing) of the risks of bias and stereotypes that can generate discrimination and include checkpoints and iterative evaluation points in the specifications.

Here again, we are in a logic of training/awareness, the same remarks as for the previous paragraph apply with both logics: what are the means implemented, and then we should ask the question, are they sufficient? adapted? and a logic of results, which could be measured in different ways, on the level of awareness of the prescribers and on the contents of the specifications. With regard to the latter, we could, here again, measure mainly intentions.
If one wanted to go further, one would have to test the checkpoints being developed for the system and the content of the evaluations. If one wanted to be complete, one would also have to follow the evolution of the specifications being developed. If we are in an agile logic, where documentation on developments is weak, it would be necessary to track all iterations.

Being exhaustive may require considerable effort, even with the active contribution of the audited organization.

6. Ensure that suppliers are carefully selected and evaluated iteratively to ensure that the entire AI value chain is non-discriminatory.

Here, we would like to point out that most systems currently under development and many systems in use are in fact already combinations of AI-based systems. That is, AI-based system providers are actually assembling, integrating.

Some systems with a broad purpose are standardised and off-the-shelf (e.g. voice recognition), others need to be developed more specifically. Some operate in the cloud, others at the user’s premises or those of the provider offering the service to the public.

Faced with such diversity and variety of situations, how will purchasing managers be able to evaluate their suppliers, spread across different parts of the world and whose products are constantly evolving? How will they look for evidence after the fact? And how will the certifying body be able to attest to this? Above all, problems can arise from the simple integration of these systems. System A works well, system B works well, connecting system A to B causes problems. This kind of situation is classic in computer science, but the impact with neuron-based systems can be very important.

7. Monitor AI-based solutions and continuously adapt processes.

Test data is there to control and validate AI-based solutions. This is true for any software that is released on the market.

Is there, however, only one software on the market that is functionally flawless and secure? Unfortunately, the answer is no.

It is usually the users who report functional flaws and hackers, ethical or otherwise, who reveal security holes. AI-based systems cannot be otherwise.

It would certainly be easier to evaluate the reactions to fault escalations: Check if there is a process? if it is documented? known to the people who have to implement it? Whether the reactions have been successful in closing the gaps? Count the frequencies of occurrence of faults? The proportionality and adaptation of the reactions to the problems encountered?

Conclusion

The main purpose of this contribution was to show that, in our view, obtaining evidence to certify that a final product conforms to a standard is a well-known exercise. But this has very little to do with obtaining evidence that an AI-based system is working well, will work well in the future and has worked well in the past, if liability is sought.

Information systems are not foolproof either functionally or in terms of security. AI-based systems will be no exception. Therefore, in absolute terms, one cannot consider a risk classification or scale of risks as an achievable goal. Nevertheless, such a classification is necessary and would allow players in the field to share common reading and assessment benchmarks.

Making the provider accountable for the implementation of an ethical and responsible approach, under the supervision of an IA supervisory authority, as was planned for the RGPD, then seems to us to be a path that needs to be explored.

1. Jocelyne Maclure and Marie-Noelle Saint Pierre, ” Le nouvel âge de l’intelligence artificielle : une synthèse des enjeux éthiques “, Les cahiers de la Propriété intellectuelle vol 30, p 758, le 19/09/19, available on :
http://www.ethique.gouv.qc.ca/fr/assets/documents/CPI_Maclure_Saint-Pierre.pdf
2. CNIL, “Comment permettre à l’homme de garder la main? Les enjeux éthiques des algorithmes et de l’intelligence artificielle” December 2017, available on :
https://www.cnil.fr/sites/default/files/atoms/files/cnil_rapport_garder_la_main_web.pdf#page=60
3. Council of Europe, CEPEJ, European Commission for the Efficiency of Justice (CEPEJ)
European Ethical Charter for the Use of Artificial Intelligence in Judicial Systems, Sept. 2019, available at: https://rm.coe.int/charte-ethique-fr-pour-publication4-decembre-2018/16808f699b
4. ARBORUS Association, a collective of European companies under the patronage of the European Economic and Social Council, directed towards the promotion of equality between women and men, first international charter for an inclusive AI, April 2020, available on: https://charteia.arborus.org/