ISO 24028

Artificial Intelligence - Trustworthiness

Technology & Innovation Published: 2020

Overview

Framework defining trustworthiness characteristics for AI systems including accountability, transparency, reliability, fairness, privacy protection, and robustness

ISO/IEC TR 24028:2020 "Information technology — Artificial intelligence — Overview of trustworthiness in artificial intelligence" is a foundational technical report published in May 2020 that provides a comprehensive framework for understanding, assessing, and implementing trustworthy artificial intelligence systems. As AI technologies rapidly proliferate across critical applications in healthcare, finance, criminal justice, autonomous systems, and countless other domains, ensuring that AI systems are trustworthy becomes paramount. This standard addresses the growing recognition that technical performance alone is insufficient—AI systems must also be transparent, fair, accountable, safe, secure, and respectful of privacy and human rights to earn and maintain stakeholder trust. ISO/IEC TR 24028 establishes a common language and conceptual foundation enabling organizations to systematically address trustworthiness throughout the AI lifecycle, from design and development through deployment, operation, and decommissioning.

Defining Trustworthiness in AI: A Multidimensional Framework

ISO/IEC TR 24028 defines trustworthiness as "the ability to meet stakeholders' expectations in a verifiable way," recognizing that trust in AI systems is not a single attribute but rather emerges from multiple interdependent characteristics that must be balanced based on application context and stakeholder needs. The standard identifies that trustworthiness encompasses reliability (consistent, predictable performance under specified conditions), availability (accessibility and usability when needed by authorized entities), resilience (ability to maintain acceptable service levels under stress, attacks, or failures), security (protection against unauthorized access, manipulation, or exploitation), privacy (safeguarding personal information throughout the AI lifecycle), safety (freedom from unacceptable risk of harm to people, property, or environment), accountability (ability to determine who or what is responsible for AI system behavior and outcomes), transparency (ability to understand how AI systems work and make decisions), integrity (accuracy, completeness, and consistency of data and processing), and authenticity (assurance that entities, data, and communications are genuine). These characteristics are not independent but interact in complex ways, with improvements in one potentially affecting others—for example, increased transparency might expose vulnerabilities reducing security, requiring careful balancing of competing objectives.

The standard emphasizes that trustworthiness is context-dependent and must be tailored to specific use cases, stakeholder expectations, regulatory requirements, and risk profiles. High-risk AI applications such as medical diagnosis, autonomous vehicles, credit scoring, criminal risk assessment, or critical infrastructure control demand stronger trustworthiness measures across all dimensions compared to low-risk applications like product recommendations, entertainment content curation, or gaming. Organizations must conduct context-specific trustworthiness assessments considering the application domain (healthcare, finance, public sector, consumer services), the nature and severity of potential harms from AI failures or misbehavior, the vulnerability of affected populations (children, elderly, disadvantaged groups), the degree of human oversight versus autonomous decision-making, regulatory and legal requirements applicable to the sector and jurisdiction, stakeholder expectations including users, affected individuals, regulators, and civil society, and the maturity and proven track record of the AI techniques being employed. This risk-based, context-aware approach enables proportionate trustworthiness measures avoiding both under-protection in high-stakes applications and excessive burden in lower-risk scenarios.

Transparency, Explainability, and Interpretability

Transparency represents a cornerstone of trustworthy AI, addressing the widespread concern that modern AI systems, particularly those employing deep learning and ensemble methods, operate as "black boxes" where decision-making processes are opaque and inscrutable. ISO/IEC TR 24028 distinguishes between transparency (openness about AI system capabilities, limitations, and decision-making processes), explainability (ability to explain in human-understandable terms how specific decisions or predictions were reached), and interpretability (degree to which humans can understand the cause of decisions made by the AI system). These related but distinct concepts support different stakeholder needs: transparency enables oversight and accountability by disclosing information about the AI system; explainability supports trust and acceptance by helping users understand why particular outcomes occurred; and interpretability facilitates debugging, improvement, and validation by enabling developers and domain experts to comprehend system behavior.

The standard recognizes that different levels and types of transparency, explainability, and interpretability are appropriate for different stakeholders and contexts. End users affected by AI decisions may require explanations focused on the factors influencing decisions relevant to them (why a loan was denied, why a medical diagnosis was made, why a job application was rejected), presented in accessible language without technical jargon. Operators and human-in-the-loop decision-makers need explanations supporting effective oversight, including confidence levels, alternative possibilities considered, and factors that could change outcomes. Developers and data scientists require detailed interpretability supporting debugging, validation, bias detection, and improvement, including feature importance, decision pathways, activation patterns, and model behavior under various conditions. Auditors and regulators need comprehensive transparency supporting compliance verification, fairness assessment, and accountability determination, including documentation of training data, algorithms, validation methods, and performance across demographic groups. Domain experts (physicians, loan officers, judges) require explanations aligning with their professional knowledge and decision-making frameworks, enabling them to assess AI recommendations critically and override when appropriate.

ISO/IEC TR 24028 surveys approaches to achieving transparency, explainability, and interpretability across the AI lifecycle. Model selection and design choices significantly impact interpretability, with inherently interpretable models (linear regression, decision trees, rule-based systems) offering transparency advantages over complex neural networks, though potentially sacrificing performance for some tasks. Post-hoc explanation methods apply to already-trained models, including local explanation techniques like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) that explain individual predictions, and global explanation methods that characterize overall model behavior through feature importance, partial dependence plots, or surrogate models. Attention mechanisms and saliency maps visualize which input features or regions the model focused on when making decisions, particularly valuable for image, text, and time-series applications. Example-based explanations show similar cases from training data or identify influential training examples that most affected the prediction, leveraging human pattern recognition and analogical reasoning. Counterfactual explanations describe minimal changes to inputs that would alter the prediction, helping users understand decision boundaries and what factors matter most.

However, the standard acknowledges significant challenges and limitations in AI explainability. The fidelity-interpretability trade-off means that simpler, more interpretable models may provide less accurate explanations of complex model behavior than the complex models themselves provide. Explanation stability concerns arise when small input perturbations produce dramatically different explanations even when predictions remain similar, undermining trust in the explanations. Adversarial manipulation of explanations is possible, where explanations can be crafted to appear reasonable while masking problematic behavior. Human cognitive limitations mean that explanations exceeding certain complexity thresholds become incomprehensible regardless of technique, and humans may over-trust plausible-sounding but misleading explanations. Cultural and contextual factors influence which explanations are considered satisfactory, with preferences varying across cultures, domains, and individual backgrounds. The standard emphasizes that explainability is a means to trustworthiness, not an end in itself—explanations must be accurate, relevant, timely, and actionable to genuinely support trust, and should be validated through user studies and domain expert review rather than assumed to be effective.

Fairness, Bias, and Non-Discrimination

Fairness in AI systems represents a critical trustworthiness dimension addressing concerns that AI can perpetuate, amplify, or introduce discriminatory outcomes affecting individuals and groups based on protected characteristics such as race, gender, age, disability, religion, sexual orientation, or other sensitive attributes. ISO/IEC TR 24028 recognizes that fairness is both technically challenging and normatively complex, with multiple mathematical definitions of fairness that can be mutually incompatible and no single universally applicable fairness criterion. The standard surveys sources of bias and unfairness in AI systems, approaches to detect and mitigate bias, and the fundamental tensions requiring careful contextual judgment in fairness determinations.

Bias can enter AI systems through multiple pathways requiring different mitigation strategies. Historical bias exists when training data reflects past discriminatory practices, social inequalities, or systemic disadvantages that the AI system learns and perpetuates—for example, hiring algorithms trained on historical hiring decisions that favored certain demographics, or criminal risk assessment tools trained on arrest data reflecting biased policing patterns. Representation bias occurs when training data inadequately represents certain groups, leading to poor model performance for underrepresented populations—for example, facial recognition systems with lower accuracy for darker-skinned individuals when trained predominantly on lighter-skinned faces. Measurement bias arises when the features, labels, or outcomes measured imperfectly proxy the actual construct of interest, with measurement quality potentially varying across groups—for example, using arrest records as a proxy for criminal behavior when arrest rates reflect enforcement patterns not just underlying crime rates. Aggregation bias emerges when a single model is used across diverse groups whose relationships between features and outcomes differ, producing poor fit for some groups. Evaluation bias occurs when benchmark datasets or evaluation metrics fail to adequately assess performance across all relevant groups or use cases.

Algorithmic bias can be introduced through design choices independent of data bias. Objective function specification that optimizes overall accuracy may produce disparate performance across groups if some groups are smaller or harder to predict. Feature engineering choices that include proxy variables correlated with protected characteristics can enable indirect discrimination even when protected attributes are excluded from the model. Learning algorithms may exhibit different generalization properties across groups, particularly when groups have different sample sizes or noise characteristics. Threshold setting and decision rules applied to model outputs can produce disparate impacts if not calibrated appropriately for different groups. Feedback loops occur when AI system decisions influence future data collection in ways that reinforce existing biases—for example, if a risk assessment tool assigns higher risk scores to certain demographics, leading to increased scrutiny, more detected violations, and reinforcement of the pattern in future training data.

ISO/IEC TR 24028 surveys multiple mathematical fairness definitions reflecting different normative principles about what constitutes fair treatment. Individual fairness requires that similar individuals receive similar outcomes, but requires defining meaningful similarity metrics that themselves avoid bias. Group fairness metrics ensure equal treatment or equal outcomes across demographic groups, including demographic parity (equal selection rates across groups), equalized odds (equal true positive and false positive rates across groups), equal opportunity (equal true positive rates across groups), and calibration (equal positive predictive value across groups for each risk score). These group fairness criteria can be mutually incompatible—it is mathematically impossible to simultaneously satisfy demographic parity, equalized odds, and calibration except in trivial cases, requiring organizations to choose which fairness criterion aligns with their context and values. Counterfactual fairness examines whether an individual would have received the same outcome had they belonged to a different demographic group, holding all other causally-determined factors equal. Procedural fairness focuses on the decision-making process rather than outcomes, ensuring transparent, accountable, contestable decisions with opportunities for explanation and recourse.

The standard describes technical approaches to bias detection and mitigation across the AI lifecycle. Pre-processing methods address data bias before training through techniques like resampling to balance representation across groups, reweighting instances to equalize influence, or generating synthetic data to augment underrepresented groups. In-processing approaches modify the learning algorithm to incorporate fairness constraints during training, optimizing for both accuracy and fairness metrics simultaneously through constrained optimization, adversarial debiasing, or fairness-aware regularization. Post-processing methods adjust trained model outputs to satisfy fairness criteria through threshold optimization, calibration adjustments, or output transformations. Fairness auditing and testing systematically evaluate AI systems for disparate impacts through disaggregated performance analysis, bias stress tests, and fairness metric computation across demographic groups. However, the standard emphasizes that technical mitigation alone is insufficient—organizational processes including diverse development teams, stakeholder engagement with affected communities, domain expert involvement, ethical review boards, and ongoing monitoring for emergent fairness issues are essential to achieving genuinely fair AI systems.

Privacy Protection and Data Governance

Privacy represents a fundamental trustworthiness characteristic addressing concerns that AI systems' data-intensive nature creates unprecedented privacy risks through massive data collection, powerful inference capabilities, potential re-identification of supposedly anonymized data, and surveillance applications. ISO/IEC TR 24028 emphasizes that privacy protection must be embedded throughout the AI lifecycle, from data collection and processing through model training, deployment, and ultimate data deletion, following privacy-by-design and privacy-by-default principles. The standard aligns with privacy frameworks including GDPR, CCPA, ISO/IEC 27701, and the OECD Privacy Principles, while addressing AI-specific privacy challenges requiring specialized technical and organizational measures.

AI systems create distinctive privacy risks beyond traditional data processing. Model inversion attacks can reconstruct training data from trained models, potentially exposing sensitive information about individuals whose data was used in training. Membership inference attacks determine whether particular individuals' data was included in the training set, revealing potentially sensitive information about participation in studies or datasets associated with medical conditions, financial situations, or other private matters. Attribute inference enables AI models to infer sensitive attributes not explicitly provided, such as predicting sexual orientation from social media activity, health conditions from purchasing patterns, or political beliefs from seemingly unrelated data. De-anonymization and re-identification leverage AI's pattern recognition capabilities to link supposedly anonymous records to identified individuals by correlating multiple datasets or exploiting unique combinations of quasi-identifiers. Model extraction attacks steal proprietary AI models through carefully crafted queries, representing both intellectual property and privacy concerns if the models encode training data characteristics. Differential privacy attacks demonstrate that even aggregate statistics can leak information about individuals through careful analysis, requiring formal privacy guarantees beyond simple aggregation.

ISO/IEC TR 24028 surveys technical approaches to privacy protection in AI systems. Differential privacy provides mathematical guarantees that outputs from analyses of databases change minimally whether any individual's data is included or excluded, preventing inference about individuals while enabling useful aggregate analysis. Implementation requires adding carefully calibrated random noise to query results, gradients during model training, or model parameters, with privacy budgets tracking cumulative privacy loss across multiple queries or analyses. Federated learning enables model training across distributed datasets without centralizing data, sending model updates rather than raw data to a central server, preserving data locality, and supporting privacy protection—though additional protections like secure aggregation and differential privacy may be needed to prevent inference from model updates. Homomorphic encryption allows computations on encrypted data without decryption, enabling privacy-preserving inference or training, though currently with significant computational overhead limiting practical applications. Secure multi-party computation protocols enable multiple parties to jointly compute functions over their combined data without revealing individual inputs, supporting collaborative AI development while protecting proprietary or sensitive data.

Data minimization and purpose limitation represent foundational privacy principles requiring AI systems to collect only data necessary for specified purposes and retain it no longer than needed. This requires defining clear purposes before data collection, limiting data collection to what is strictly necessary for those purposes, avoiding function creep where data collected for one purpose is repurposed for others without consent, implementing retention policies with automatic deletion after specified periods, and providing granular consent mechanisms enabling individuals to authorize specific uses. Anonymization and pseudonymization techniques remove or replace identifying information to reduce privacy risks, though the standard acknowledges that true anonymization is increasingly difficult given AI's ability to re-identify individuals from supposedly anonymous data through linkage attacks. Transparency and individual rights support privacy by enabling individuals to know what data is collected, how it's used, who has access, and what automated decisions affect them, with rights to access, rectify, erase, and object to processing recognized in privacy regulations worldwide.

The standard emphasizes that privacy protection requires organizational governance beyond technical measures. Privacy impact assessments (PIAs) systematically identify and mitigate privacy risks before deploying AI systems, particularly important for high-risk processing involving sensitive data or vulnerable populations. Data governance frameworks establish accountability for data quality, security, and privacy throughout the lifecycle through defined roles, policies, access controls, audit trails, and incident response procedures. Training and awareness programs ensure that everyone involved in AI development and deployment understands privacy principles, requirements, and their responsibilities. Privacy by design embeds privacy considerations from initial system design rather than adding them retrospectively, considering privacy implications of architectural choices, data flows, retention policies, and access patterns. Third-party management ensures that vendors, partners, and service providers handling data on the organization's behalf maintain equivalent privacy protections through contractual obligations, audits, and oversight.

Robustness, Security, and Resilience

Robustness refers to an AI system's ability to maintain acceptable performance under adverse conditions including noisy or corrupted inputs, distributional shifts where operational data differs from training data, adversarial attacks deliberately crafted to fool the system, and unexpected edge cases not anticipated during development. ISO/IEC TR 24028 emphasizes that AI systems face distinctive robustness and security challenges compared to traditional software because learned behaviors can be manipulated through data poisoning, adversarial examples, model extraction, and other AI-specific attack vectors requiring specialized defenses beyond conventional cybersecurity measures.

Adversarial examples represent a fundamental vulnerability of many AI systems where small, often imperceptible perturbations to inputs cause dramatic mispredictions. In image recognition, adding carefully crafted noise patterns invisible or barely visible to humans can cause classifiers to misidentify objects with high confidence—for example, causing a stop sign to be classified as a speed limit sign, or a benign skin lesion to be classified as malignant. Adversarial text examples can fool natural language processing systems through synonym substitutions, character manipulations, or grammatical changes that preserve meaning for humans but alter model predictions. Audio adversarial examples can cause speech recognition systems to transcribe attacker-chosen phrases from audio that sounds innocuous to humans. These vulnerabilities arise from AI models' reliance on statistical patterns that may differ from human perception, creating exploitable decision boundaries in high-dimensional input spaces.

Data poisoning attacks manipulate training data to compromise model behavior during the training phase. Targeted poisoning aims to cause misclassification of specific inputs chosen by the attacker—for example, causing a malware detector to miss particular malware families by including carefully crafted malicious samples in training data labeled as benign. Backdoor attacks embed hidden triggers causing the model to produce attacker-chosen outputs when the trigger is present while maintaining normal performance otherwise—for example, training a face recognition system to grant access to the attacker when a particular pattern appears in the background. Clean-label poisoning achieves similar results without requiring label manipulation, making detection more difficult. Availability attacks degrade overall model performance through low-quality, corrupted, or adversarially perturbed training examples, potentially requiring expensive retraining to recover performance.

Model extraction and model inversion attacks steal information from deployed AI systems through carefully crafted queries. Model extraction clones proprietary models by querying them extensively and training surrogate models on the query-response pairs, enabling intellectual property theft and enabling attackers to find vulnerabilities more easily on the extracted model for later attacks on the original. Model inversion reconstructs training data from models or their predictions, potentially exposing sensitive information about individuals whose data was used in training. Membership inference determines whether particular data points were in the training set, revealing potentially sensitive information about participation in certain datasets. These attacks exploit the information leakage inherent when models are exposed through prediction APIs or when model parameters are published.

ISO/IEC TR 24028 surveys defenses against these attacks across the AI lifecycle. Adversarial training augments training data with adversarial examples to improve robustness, though this increases computational costs and may not generalize to all attack types. Input validation and sanitization detect and reject out-of-distribution or anomalous inputs, filtering potential adversarial examples before they reach the model. Certified defenses provide provable robustness guarantees within specified perturbation bounds, though currently with significant computational overhead and accuracy trade-offs. Ensemble methods combine multiple models with different architectures or training procedures, making adversarial attacks harder to craft as they must fool all ensemble members. Randomization and obfuscation introduce stochasticity into model predictions or obscure model details, making attacks requiring precise knowledge of model behavior more difficult. Monitoring and anomaly detection identify unusual prediction patterns, high-confidence errors, or adversarial attack attempts, enabling response before significant damage occurs.

Distribution shift—where operational data differs from training data due to changing environments, evolving user behavior, adversary adaptation, or concept drift—represents a pervasive robustness challenge for deployed AI systems. The standard emphasizes the need for continual monitoring to detect performance degradation, regular retraining on updated data reflecting current distributions, uncertainty quantification to identify when models encounter unfamiliar inputs where predictions may be unreliable, and fallback mechanisms enabling graceful degradation or human oversight when distribution shift is detected. Safety-critical applications require particularly robust handling of distribution shift through conservative design, extensive out-of-distribution testing, and fail-safe mechanisms preventing dangerous behavior when encountering unfamiliar situations.

Safety Assurance and Risk Management

Safety represents paramount importance for AI systems controlling physical processes or making decisions affecting human welfare, health, or life. ISO/IEC TR 24028 addresses AI safety challenges recognizing that AI's statistical, learned nature fundamentally differs from traditional software where behavior is explicitly programmed, making exhaustive testing and formal verification substantially more difficult. The standard complements domain-specific safety standards (automotive functional safety ISO 26262, machinery safety ISO 13849, medical device safety IEC 62304) by addressing AI-specific safety considerations requiring specialized approaches to hazard analysis, safety architecture, validation, and assurance.

AI introduces distinctive safety challenges requiring adapted safety engineering approaches. Emergent behavior arises from learned patterns rather than explicit programming, making it difficult to predict all possible system responses, particularly for unusual inputs or rare combinations of conditions not well-represented in training data. Opacity and lack of interpretability complicate hazard analysis and safety argumentation when engineers cannot fully understand why the AI system behaves as it does or confidently bound its behavior across all possible inputs. Performance uncertainty and probabilistic behavior mean that AI systems lack the determinism traditionally assumed in safety-critical systems, with predictions subject to errors whose frequency and severity depend on input characteristics and training data coverage. Distribution shift degrades performance when operational conditions diverge from training conditions, potentially creating safety hazards if degradation goes undetected. Continuous learning and adaptation, while enabling improvement, can introduce new behaviors that were not validated during initial safety assessment, requiring ongoing safety assurance throughout the operational lifecycle.

ISO/IEC TR 24028 describes AI-specific safety engineering practices across the development lifecycle. Hazard analysis and risk assessment identify potential harms from AI system failures, errors, or unexpected behaviors, considering both random failures (incorrect predictions, software bugs, hardware faults) and systematic failures (design limitations, training data inadequacies, environmental conditions outside operational design domain). Techniques include AI-specific HAZOP (Hazard and Operability study) examining how AI failures could contribute to hazards, FMEA (Failure Modes and Effects Analysis) analyzing consequences of different AI failure modes, and scenario-based analysis considering safety-critical situations the AI may encounter. Risk assessment determines severity and likelihood of identified hazards, with particular attention to high-severity harms even if low probability, and establishes risk reduction requirements driving safety architecture and validation activities.

Safety architecture establishes system-level safeguards ensuring acceptable safety even when AI components fail or behave unexpectedly. Redundancy and diversity employ multiple independent AI models or combine AI with conventional algorithms, reducing the probability that all components fail simultaneously in the same way. Monitoring and supervision continuously assess AI component outputs for anomalies, out-of-distribution inputs, or safety-critical errors, enabling intervention before hazards materialize. Fallback and degradation mechanisms provide safe alternatives when AI performance is insufficient, such as transferring control to human operators in autonomous vehicles when uncertainty exceeds thresholds, or reverting to conservative default behaviors in medical decision support systems. Physical barriers and constraints limit the consequences of AI failures in robotic systems through mechanical safeguards, force limiting, and workspace restrictions preventing contact with humans. Uncertainty quantification enables AI systems to recognize when predictions are unreliable and should not be acted upon, supporting decisions about when to defer to human judgment or safer default behaviors.

Validation and verification of AI safety requires adapted approaches acknowledging the impossibility of exhaustive testing for systems with vast input spaces learned from data. The standard describes techniques including extensive testing on diverse datasets covering nominal conditions, edge cases, rare events, and failure modes; stress testing and adversarial evaluation deliberately seeking failures through out-of-distribution inputs, adversarial examples, and worst-case scenarios; formal methods and verification applied to AI systems through neural network verification, property checking, or verified training, though currently limited in scalability; simulation and synthetic testing generating challenging scenarios difficult or dangerous to collect in reality, particularly valuable for safety-critical applications like autonomous driving or medical treatment; and field testing and piloting in controlled operational environments with extensive monitoring, data logging, and progressive expansion of operational design domain as confidence increases. Validation must address not only nominal performance metrics like accuracy but safety-specific requirements like false negative rates for critical failures, robustness to perturbations and distribution shift, and behavior in failure modes.

Ongoing safety assurance during operation addresses the reality that AI safety cannot be fully validated before deployment but must be continuously monitored and maintained. Operational monitoring tracks AI system performance, environmental conditions, and safety-relevant events in deployment, detecting performance degradation, distribution shift, or emerging failure patterns. Incident reporting and investigation analyze safety-relevant events including near-misses to identify contributing factors and improvement opportunities. Software and model updates require safety revalidation ensuring that changes improving performance in some aspects do not compromise safety in others. Change management processes ensure that modifications to AI systems, data, or operational contexts undergo appropriate safety review and approval before implementation. Safety culture and organizational learning embed safety consciousness throughout the organization, learning from incidents and near-misses to continuously improve AI safety.

Accountability, Governance, and Ethics

Accountability represents a foundational trustworthiness characteristic addressing the critical question: who is responsible when AI systems cause harm, make errors, or exhibit undesirable behaviors? ISO/IEC TR 24028 emphasizes that technical capabilities alone do not ensure trustworthy AI—organizational governance, ethical principles, clear accountability structures, and mechanisms for recourse and remedy are equally essential. The standard addresses accountability challenges arising from AI's distributed development and deployment involving multiple actors (data providers, algorithm developers, system integrators, deployers, operators), the opacity of decision-making in complex AI systems, potential automation bias where humans over-rely on AI recommendations without appropriate critical evaluation, and the novel legal and ethical questions about responsibility for autonomous system behaviors.

Establishing accountability requires clear definition of roles and responsibilities throughout the AI lifecycle. Organizations developing AI systems bear responsibility for design choices, algorithm selection, validation adequacy, documentation quality, and honest communication about capabilities and limitations. Organizations deploying AI systems are accountable for appropriate use case selection, adequate validation in operational context, competent operation and monitoring, and ensuring human oversight is effective rather than perfunctory. Operators using AI systems in decision-making retain responsibility for their decisions even when informed by AI recommendations, requiring sufficient understanding to critically evaluate AI outputs. Data providers share accountability for data quality, appropriate collection and use, and potential biases in data that could lead to discriminatory outcomes. Regulators and standard-setters establish accountability frameworks, enforcement mechanisms, and minimum requirements promoting responsible AI development and use.

ISO/IEC TR 24028 describes governance structures supporting accountability. AI ethics boards or committees provide oversight of AI development and deployment decisions, reviewing high-risk applications, assessing alignment with organizational values and ethical principles, and advising on resolution of ethical dilemmas. Clear policies establish organizational positions on AI use cases, ethical principles, risk tolerances, and prohibited applications, guiding development teams and deployment decisions. Impact assessments identify potential negative consequences of AI systems including fairness impacts, privacy risks, safety hazards, environmental effects, and broader societal implications, informing go/no-go decisions and risk mitigation strategies. Stakeholder engagement involves affected communities, domain experts, ethicists, and diverse perspectives in AI development and governance, avoiding the myopia that can result from homogeneous development teams. Documentation and audit trails record decisions, rationales, data provenance, model versions, validation results, and incident responses, enabling after-the-fact accountability investigation when problems arise.

Transparency toward users and affected individuals supports accountability by enabling informed consent, appropriate reliance, and meaningful recourse when AI decisions adversely affect them. Notification when AI is involved in decisions affecting individuals enables appropriate calibration of trust and critical evaluation rather than assuming human decision-making. Explanation of how decisions were made, what factors were influential, and how outcomes could potentially be changed supports understanding and appropriate response to adverse decisions. Access to data about individuals held by AI systems and ability to correct inaccuracies support individual autonomy and control. Right to human review of automated decisions provides recourse when individuals disagree with AI outputs, particularly important for high-stakes decisions affecting employment, credit, healthcare, or legal status. Contestability mechanisms enable individuals to challenge AI decisions, present additional information, or request reconsideration through processes that are accessible, timely, and effective.

Ethical principles provide normative guidance for trustworthy AI extending beyond compliance with legal requirements to address broader societal values and human rights. Respect for human autonomy means AI should support and enhance human decision-making rather than replacing human judgment in contexts where human autonomy is valued, providing recommendations that humans can accept or reject rather than fully autonomous decisions, and ensuring humans remain ultimately in control of AI systems particularly in high-stakes domains. Prevention of harm requires proactive identification and mitigation of potential harms from AI use, application of precautionary principles when risks are significant but uncertain, and prioritization of safety over performance when trade-offs arise. Fairness and non-discrimination demand active efforts to identify and mitigate biases, ensure equitable treatment across different groups, and avoid perpetuating or amplifying societal inequalities. Privacy and data governance require respect for individuals' privacy rights, responsible data stewardship, and transparency about data use. Societal and environmental wellbeing consider broader impacts including effects on employment, social cohesion, democratic processes, and environmental sustainability, avoiding applications that benefit narrow interests at broader societal expense. These principles often require difficult trade-offs and contextual judgment, with ethics boards, stakeholder engagement, and transparent deliberation supporting principled decision-making.

Integration with AI Standards Ecosystem and Regulatory Frameworks

ISO/IEC TR 24028:2020 functions as a foundational standard within a growing ecosystem of AI-related standards, regulatory frameworks, and governance initiatives addressing different aspects of AI trustworthiness and providing complementary guidance. Understanding how ISO/IEC TR 24028 relates to and supports these other frameworks enables organizations to build comprehensive AI governance programs leveraging synergies across standards and satisfying multiple requirements through integrated approaches rather than parallel compliance efforts.

Within the ISO/IEC AI standards family, ISO/IEC TR 24028 provides foundational concepts and definitions for AI trustworthiness that inform and support other AI standards. ISO/IEC 42001:2023 "Artificial Intelligence Management System" establishes requirements for organizations to implement systematic management of AI systems addressing risks and opportunities, optimizing AI benefits, and demonstrating responsible AI use. ISO/IEC TR 24028 supports ISO/IEC 42001 by providing detailed guidance on trustworthiness dimensions that AI management systems must address. ISO/IEC 23894 "Information technology — Artificial intelligence — Guidance on risk management" provides specific AI risk management frameworks and techniques, complementing ISO/IEC TR 24028's trustworthiness overview with detailed risk assessment and treatment methodologies for AI-specific risks including bias, explainability, robustness, and safety. ISO/IEC 22989 "Information technology — Artificial intelligence — Concepts and terminology" establishes common AI vocabulary enabling consistent communication, providing terminological foundation referenced by ISO/IEC TR 24028 and other AI standards. ISO/IEC 38507 "Information technology — Governance of IT — Governance implications of the use of artificial intelligence by organizations" addresses governance of AI at the organizational level, complementing ISO/IEC TR 24028's technical trustworthiness focus with governance structures and processes.

Technical standards for specific AI trustworthiness dimensions provide detailed requirements and test methods. ISO/IEC 23053 "Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)" addresses AI system frameworks and architectures. ISO/IEC TR 24027 "Information technology — Artificial intelligence — Bias in AI systems and AI aided decision making" provides specific guidance on identifying, measuring, and mitigating bias. ISO/IEC TS 4213 "Information technology — Artificial intelligence — Assessment of machine learning classification performance" establishes methods for evaluating AI classification performance. These technical standards operationalize the trustworthiness principles established in ISO/IEC TR 24028 through specific requirements, metrics, and validation approaches for particular trustworthiness dimensions or AI application areas.

ISO/IEC TR 24028 also complements established information security and privacy standards that remain highly relevant for AI systems. ISO/IEC 27001 "Information security management systems" provides comprehensive security management frameworks applicable to AI systems' confidentiality, integrity, and availability, with AI-specific security considerations augmenting general information security requirements. ISO/IEC 27701 "Privacy information management" extends ISO/IEC 27001 to privacy management, addressing personal data protection essential for trustworthy AI given AI systems' intensive data use. ISO/IEC 29100 "Privacy framework" establishes privacy principles and terminology supporting privacy-preserving AI development and deployment. Organizations can integrate AI trustworthiness requirements from ISO/IEC TR 24028 into existing ISMS or privacy management systems rather than maintaining separate governance structures.

Regulatory frameworks worldwide increasingly reference ISO/IEC TR 24028 and related AI standards as technical elaboration of legal requirements. The EU Artificial Intelligence Act establishes risk-based regulatory framework classifying AI systems by risk level (unacceptable, high, limited, minimal risk) with requirements proportionate to risk. High-risk AI systems must comply with requirements including data governance, technical documentation, transparency, human oversight, accuracy, robustness, and cybersecurity that align closely with trustworthiness characteristics in ISO/IEC TR 24028. The EU Commission has requested development of harmonized standards supporting AI Act compliance, with ISO/IEC standards including ISO/IEC TR 24028, ISO/IEC 42001, and ISO/IEC 23894 positioned as foundation for these harmonized standards. Organizations implementing these ISO/IEC standards can demonstrate AI Act compliance more effectively than developing proprietary approaches, with conformity to harmonized standards providing presumption of conformity to corresponding AI Act requirements.

Other regulatory frameworks similarly align with ISO/IEC TR 24028 trustworthiness principles. The U.S. NIST AI Risk Management Framework provides voluntary guidance for organizations developing and deploying AI systems, establishing four core functions (Govern, Map, Measure, Manage) addressing AI risks throughout the lifecycle. NIST AI RMF trustworthiness characteristics closely parallel ISO/IEC TR 24028 dimensions, with NIST providing implementation guidance complementing ISO/IEC TR 24028's conceptual framework. Sector-specific regulations increasingly incorporate AI requirements aligned with trustworthiness principles, including FDA guidance for AI/ML-based medical devices emphasizing safety, effectiveness, and transparency; financial services regulations addressing AI model risk management, fairness, and explainability in credit decisions; and automotive functional safety standards adapted for AI/ML components in autonomous driving systems. ISO/IEC TR 24028 provides a common technical foundation supporting compliance across these varied regulatory frameworks through shared trustworthiness principles operationalized in sector-specific ways.

Industry Applications and Sector-Specific Trustworthiness

ISO/IEC TR 24028 provides general guidance applicable across AI applications, but trustworthiness requirements and priorities vary significantly across sectors based on potential harms, regulatory requirements, stakeholder expectations, and operational contexts. Understanding sector-specific trustworthiness considerations enables organizations to prioritize and tailor trustworthiness measures appropriately for their AI use cases.

Healthcare AI applications including medical diagnosis, treatment recommendation, patient risk stratification, and drug discovery demand the highest levels of trustworthiness given direct impacts on patient safety and health outcomes. Safety and accuracy are paramount, requiring extensive clinical validation demonstrating that AI systems meet or exceed human expert performance while avoiding dangerous errors. Explainability supports clinical decision-making by enabling physicians to understand AI recommendations, assess whether they align with clinical knowledge and patient-specific factors, and exercise appropriate clinical judgment in accepting or overriding AI advice. Fairness is critical to avoid healthcare disparities, requiring validation that diagnostic accuracy and treatment recommendations are equitable across demographic groups, socioeconomic statuses, and geographic regions. Privacy protection must satisfy HIPAA, GDPR, and other health data regulations while enabling legitimate secondary uses for research and public health. Regulatory approval through FDA, EMA, or equivalent bodies requires demonstrating safety, effectiveness, and compliance with medical device regulations, with ISO/IEC TR 24028 trustworthiness principles informing regulatory submissions and post-market surveillance.

Financial services AI for credit scoring, fraud detection, algorithmic trading, insurance underwriting, and robo-advisory face stringent fairness and explainability requirements given impacts on individuals' economic opportunities and regulatory prohibitions on discriminatory lending. Fair lending laws mandate equitable access to credit regardless of protected characteristics, requiring rigorous bias testing and mitigation in credit scoring models. Explainability supports regulatory compliance with laws requiring adverse action notices explaining why credit was denied, enabling consumers to understand and potentially remediate factors affecting creditworthiness. Model risk management regulations require financial institutions to validate models, monitor performance, and maintain model governance throughout the lifecycle. Robustness and security protect against market manipulation, fraud, and adversarial attacks that could cause financial losses or systemic risks. Accountability mechanisms ensure clear responsibility for algorithmic trading decisions, consumer harm from automated systems, and compliance with financial regulations.

Autonomous vehicles represent quintessential safety-critical AI applications where robustness, safety assurance, and testing rigor determine life-or-death outcomes. Functional safety requirements adapted from ISO 26262 establish safety goals, hazard analysis, safety architecture, and validation requirements for AI/ML components in ADAS and autonomous driving systems. Validation through billions of miles of testing data, simulation, and scenario-based evaluation aims to demonstrate safety superior to human drivers before deployment. Resilience to adversarial attacks, sensor failures, and edge cases ensures safe behavior even under challenging conditions. Ethical decision-making algorithms addressing unavoidable crash scenarios ("trolley problem" scenarios) require societal dialogue and transparent principles about how life-or-death trade-offs should be resolved. Liability and accountability frameworks determine responsibility when autonomous vehicles cause crashes, currently unresolved questions balancing manufacturer, software developer, operator, and fleet owner responsibilities.

Criminal justice AI applications for risk assessment, predictive policing, evidence analysis, and sentencing recommendations face intense scrutiny regarding fairness, accuracy, transparency, and impacts on fundamental rights. Fairness is paramount given racial disparities in criminal justice and risks that AI could perpetuate or amplify discriminatory outcomes, requiring rigorous bias testing across racial, ethnic, gender, and socioeconomic groups. Accuracy directly affects liberty interests, with false positives potentially leading to unjust incarceration and false negatives allowing recidivism. Transparency and explainability support due process rights enabling defendants to understand and challenge evidence or risk assessments used against them. Judicial oversight ensures human decision-makers retain ultimate authority with AI providing decision support rather than automated determination. Validation and accountability determine admissibility in adversarial legal proceedings, with Daubert/Frye standards requiring demonstrable scientific validity, known error rates, peer review, and general acceptance.

Human resources AI for resume screening, candidate ranking, employee monitoring, and performance evaluation must prioritize fairness and transparency to avoid employment discrimination and support employee rights. Anti-discrimination laws prohibit adverse employment actions based on protected characteristics, requiring bias testing ensuring hiring algorithms do not have disparate impacts on protected groups. Transparency about use of AI in employment decisions supports informed consent and employee understanding of how they are evaluated. Privacy protection addresses employee monitoring and surveillance, balancing employer interests in productivity and security against employee privacy expectations and rights. Right to explanation enables job applicants to understand why they were rejected and potentially address deficiencies, supporting fair employment processes.

Implementation Pathways and Organizational Adoption

Implementing ISO/IEC TR 24028 guidance requires systematic organizational approaches spanning strategy, governance, technical practices, and culture. While ISO/IEC TR 24028 is a technical report providing guidance rather than certifiable requirements, organizations can use it as foundation for trustworthy AI programs tailored to their specific contexts, risk profiles, and stakeholder expectations.

Organizational readiness assessment establishes baseline understanding of current AI governance maturity, trustworthiness gaps, and improvement priorities. This includes inventorying existing AI systems and use cases, assessing current practices against ISO/IEC TR 24028 trustworthiness dimensions, identifying high-risk applications requiring priority attention, evaluating existing governance structures and technical capabilities, and engaging stakeholders to understand trustworthiness expectations and concerns. Readiness assessment informs roadmap development prioritizing improvements based on risk, feasibility, and resource availability.

Governance structures embed trustworthiness throughout the AI lifecycle. AI ethics boards provide oversight, reviewing high-risk applications, establishing principles, and resolving ethical dilemmas. Policies and standards define organizational positions on AI use, prohibited applications, ethical principles, risk tolerances, and required practices. Roles and responsibilities clarify accountability for AI trustworthiness across development, deployment, operation, and governance functions. Risk management processes systematically identify, assess, treat, and monitor AI-specific risks informed by ISO/IEC 23894 guidance. Impact assessments evaluate potential negative consequences before deploying AI systems, considering fairness, privacy, safety, and societal impacts. Stakeholder engagement involves affected communities, domain experts, and diverse perspectives in AI development and governance decisions.

Technical practices operationalize trustworthiness throughout the AI development lifecycle. Responsible data practices ensure training data quality, representativeness, and appropriate collection and use, with data governance, bias testing, and privacy protection. Model development incorporates fairness constraints, robustness techniques, uncertainty quantification, and explainability methods aligned with use case requirements. Validation and testing assess performance across trustworthiness dimensions using diverse datasets, bias metrics, robustness stress tests, and safety scenario analysis. Documentation maintains records of data provenance, model decisions, validation results, and incident responses supporting transparency and accountability. Deployment includes monitoring, fallback mechanisms, human oversight, and continuous validation ensuring sustained trustworthiness in operation. Incident response procedures address AI failures, bias discoveries, security breaches, or safety events through rapid investigation, remediation, and learning.

Organizational culture and capability building foster trustworthy AI as organizational priority and shared responsibility. Training and education ensure AI developers, deployers, operators, and leadership understand trustworthiness principles, requirements, and techniques. Diverse and inclusive teams bring varied perspectives helping identify blind spots and biases that homogeneous teams might miss. Ethics integration makes ethical considerations routine parts of development processes rather than afterthoughts, with ethics checklists, review gates, and decision frameworks. Continuous learning captures lessons from incidents, near-misses, research advances, and evolving best practices, feeding improvements to practices and governance. Leadership commitment and culture emphasize that trustworthiness is not optional or negotiable but fundamental to organizational values and long-term success.

Future Directions and Evolving Trustworthiness Landscape

The AI trustworthiness landscape continues to evolve rapidly driven by technological advances, regulatory developments, societal expectations, and lessons learned from AI deployments. ISO/IEC TR 24028 will likely see revisions reflecting these developments while maintaining stable foundational principles.

Emerging technologies create new trustworthiness challenges and opportunities. Foundation models and large language models raise novel questions about trustworthiness at scale given their broad capabilities, opaque training data, unexpected emergent abilities, and potential for misuse. Generative AI including deepfakes and synthetic content challenges authenticity and trustworthiness of media, requiring detection methods, watermarking, provenance tracking, and media literacy. Autonomous systems with increasing autonomy in physical world require enhanced safety assurance, human-machine teaming approaches, and accountability frameworks for autonomous actions. Federated and decentralized AI enables privacy-preserving collaborative learning but creates challenges for governance, quality assurance, and accountability across organizational boundaries. Neuromorphic and quantum computing may enable new AI architectures requiring adapted trustworthiness approaches as classical methods may not transfer directly.

Regulatory landscape evolution worldwide establishes enforceable requirements for AI trustworthiness beyond voluntary standards. Organizations must track regulatory developments, anticipate future requirements, and implement trustworthiness measures satisfying emerging compliance obligations. International harmonization efforts seek alignment across jurisdictions, avoiding fragmentation that would burden global AI development and deployment. Sector-specific regulations increasingly incorporate AI requirements tailored to domain-specific risks and existing regulatory frameworks.

Societal expectations for AI trustworthiness will likely increase as understanding of AI capabilities and risks grows, high-profile AI incidents raise awareness, and stakeholder demands for transparency, fairness, and accountability intensify. Organizations treating trustworthiness as competitive advantage rather than compliance burden will be better positioned for long-term success in an environment of rising trustworthiness expectations. Trustworthy AI leadership differentiates organizations, builds customer loyalty, attracts talent, and creates resilience against reputational and regulatory risks.

Purpose

To provide a comprehensive framework for understanding and implementing trustworthy AI systems by defining key trustworthiness characteristics and providing guidance for addressing transparency, fairness, accountability, privacy, robustness, and safety concerns throughout the AI lifecycle

Key Benefits

  • Comprehensive framework for AI trustworthiness addressing all key dimensions
  • Enhanced stakeholder confidence in AI systems and outputs
  • Support for responsible AI development and deployment practices
  • Risk-based approach tailoring trustworthiness to application context
  • Improved AI transparency and explainability
  • Mitigation of AI bias and promotion of fairness
  • Enhanced privacy protection in AI processing
  • Better accountability and governance for AI decisions
  • Improved AI robustness and reliability
  • Regulatory compliance support (EU AI Act, sector regulations)
  • Competitive advantage through trustworthy AI leadership
  • Integration with AI management systems (ISO 42001) and risk management (ISO 23894)

Key Requirements

  • Understanding context-specific trustworthiness requirements for AI applications
  • Implementing accountability mechanisms including roles, responsibilities, and audit trails
  • Ensuring transparency through documentation, explainability, and interpretability
  • Establishing reliability through testing, validation, and performance monitoring
  • Implementing safety measures including risk assessment and hazard mitigation
  • Addressing fairness through bias detection, mitigation, and equitable outcomes
  • Protecting privacy through data minimization, anonymization, and security
  • Building robustness against adversarial attacks and distribution shift
  • Risk-based assessment determining appropriate trustworthiness levels
  • Stakeholder engagement including affected individuals and communities
  • Continuous monitoring and improvement of trustworthiness characteristics
  • Documentation and communication of trustworthiness measures and limitations
  • Integration of trustworthiness throughout AI lifecycle (design, development, deployment, operation)
  • Governance frameworks supporting trustworthy AI decision-making

Who Needs This Standard?

Organizations developing or deploying AI systems, AI engineers and data scientists, AI governance and ethics teams, risk managers and compliance officers addressing AI risks, product managers for AI-powered products, regulators and policymakers, auditors and certification bodies, and any organization seeking to build trustworthy and responsible AI systems.

Related Standards