International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps
Article

International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps

Listen to the article 22 min
A misconfigured AI agent. A missing multiplication. $1.78 million lost before anyone realised something was wrong. This is not a hypothetical from a cybersecurity whitepaper, it actually happened in 2025, and it is the kind of incident the International AI Safety Report was written to prevent. So why are AI security incidents like this keep happening? This article offers a practical analysis of the report for technology and business leaders, covering what it reveals about AI security challenges, where safeguards fall short, and which decisions matter most for organisations deploying AI systems at scale.

Let’s begin by taking a closer look at the report itself. The International AI Safety Report, published in early 2026, is a coordinated, science-based assessment of emerging risks in AI. It draws on input from over 100 international experts and is the second edition, following the first published in 2025.

The 2026 report highlights how rapidly general-purpose AI systems have advanced during the year, now matching experts in complex domains and becoming embedded in daily work for hundreds of millions of people worldwide. The report underscores a central tension for policymakers: while AI brings transformative benefits across industries, the same capabilities introduce new and evolving security risks, from misuse to systemic disruptions in high-stakes settings and maps current evidence, identifies gaps, and frames priorities for responsible governance as AI capabilities continue to accelerate.

Our goal was to review the risks mentioned in the report, focusing mainly on the recommendations for mitigating or avoiding them, which is vital for high-stakes industries and general commercial activities. What follows is a section-by-section review of the report’s main AI security risk areas, taking an honest look at where the suggested mitigations work and where they fall short.

Artificial intelligence
Key takeaways
  • Human-in-the-loop verification isn't bureaucracy; it's a necessary safety measure.
  • Watermarking and detection tools for AI-generated content can't keep pace with generation quality.
  • Teaching employees AI literacy and critical thinking skills is essential, not just an optional part of an acceptable use policy.
  • Loss-of-control failures are happening even when safety specialists are involved, which raises serious concerns for less experienced users deploying autonomous agents.
  • AI security risks are a moving target: known vulnerabilities keep evolving as capabilities advance, and no solution should be marketed as 100% safe.

Malicious use risks: are current mitigations keeping pace?

AI-generated content

The report’s risk section starts with risks from malicious use, focusing first on AI-generated content and its criminal applications: deepfakes, synthetic media, and disinformation.

Artificial intelligence
The number of media-reported AI incidents and hazards involving content generation

Source: International AI safety report 2026

The headline mitigation is watermarking tools like Google DeepMind's SynthID, which embed invisible signals into AI-generated images, audio, video, and text.

In theory, this may help if any final endpoints or services will check such signals and only after this distribute. But in reality, the goal of most media services is far from such monitoring, especially if generated content can enrich the portfolio or improve the engagement.

There's a deeper issue: labels or steganography artifacts can be decoded and removed, or the generated information can be retransformed; labelling and encoding methods can’t be invariant to all possible changes/encodings. Meanwhile, generation quality is advancing at a pace that detection simply cannot match. For example, take a look at the quality of Seedance 2.0-generated video 1 or video 2.

At the current state of the art, the deepfake detection problem isn't fully solved and may not be solved, because the gap between generation improvement and detection/prevention capability keeps widening.

For businesses and security teams, the takeaway is clear: don’t rely only on detection to ensure content authenticity. Instead, focus on tracking content origins, using verified publishing processes, and managing content internally. Authenticating content at its source remains the most effective AI security control.

Influence and manipulation

The report then discusses the harms and biased information caused by AI manipulation, which can affect individuals or even undermine trust in AI technologies. Sometimes, the risk comes directly from AI agents themselves. One example of that happening was an AI agent that autonomously wrote and published a hit piece targeting the maintainer of matplotlib (Python visualisation framework) after he rejected its code.

The suggested solutions are to train AI models to avoid manipulative outputs and to improve AI literacy. Both are valid, but neither alone is enough.

As mentioned, it’s hard to define a true ground truth. The line between non-manipulative and neutral content is very narrow, which is why there will always be a risk of even partial bias. This bias can be increased through various prompt engineering techniques without injections or jailbreaking.

Improving AI literacy is generally a great recommendation, which may help to avoid most of the mentioned risks.

But it’s not without its challenges, especially related to people’s education or specialisation, and AI adoption level (usage inside some society groups or commercial usage).

The enterprise implication: employees who work with AI tools and their outputs, whether dealing with customers or internally, need solid critical evaluation skills, not just an acceptable use policy. This is an essential investment, not something optional.

Cyberattacks and biological and chemical risks

The next risks highlighted in the reports, such as cyberattacks and biological or chemical threats, can be grouped together when considering potential misuse and possible mitigation strategies.

The mitigation here mostly involves monitoring (i/o verification), preventing malicious requests to AI systems, or filtering the requests for specific synthesis topics/DNA databanks.

In most cases, such verification and refusal can help, but the reality is that currently on the market, there are some branches of local AI models or models that have lower guardrail thresholds. These can create deepfakes, harmful content, even when hosted or presented on a cloud provider’s space.

The key point is that your approved AI model list is an AI security choice, not just vendor management. There is a big and growing difference between what a leading model from a major provider will refuse to do and what an open-license alternative will readily support.

Malfunction risks: what the report proposes and where gaps remain

Reliability challenges

The next section on risks focuses on malfunctions, starting with reliability challenges. The basic issue is well known: AI systems sometimes hallucinate, cite nonexistent sources, and present incorrect information with complete confidence. The more complex problem arises in multi-agent AI systems, where information is decomposed or split up. In these cases, unverified information or results are passed between agents without any verification or a human in the loop, which leads to risk creep.

A recent example, though not directly related to the reliability issues mentioned, but connected to the human attitude and errors (no code verification): a misconfigured Claude-based agent calculated cryptocurrency value incorrectly by using only the ETH rate instead of multiplying it by ETH’s dollar value. This mistake caused a huge pricing error, $1.12 instead of $2,200, and resulted in a $1.78 million loss.

Proposed mitigation is to use RAG or its typical approach, which shows links to the retrieved candidates/information (beyond monitoring/verification). There is another perspective of such mitigation, which we have faced many times, when evidence is provided-often via random links to arXiv preprints, hidden Medium articles, or blogs. These sources usually include partial summaries, opinions, or truncated excerpts that mix facts with interpretations, making verification tough without the full report. Users face hallucinations or irrelevant details under links, like outdated 2025 versions, direct PDF access can fail, and summaries vary in depth, requiring cross-checking multiple sources for accuracy.

Answer this question: if you receive some links in the LLM completion, especially while using reasoners with the web search, how often do you check or compare the LLM-based, summarised or transformed information with the information in the origin?

For enterprise use, the lesson is clear. Human-in-the-loop verification is not just extra bureaucracy; it’s a necessary safety measure. Security teams should establish clear checkpoints before important outputs move on to AI systems.

Loss of control

Loss of control is another common AI security issue, especially in long-term autonomous agent processes, which are not yet feasible. We often see this in coding assistance or within chains of fully autonomous assistants or agents. Sometimes it happens even shorter, as shown by the satirical but clear example here.

A recent example involves Clawdbot, a local AI assistant that kept deleting emails until it was unplugged from the network. This is concerning on its own, but what should worry enterprise leaders even more is that it happened while a Safety and Alignment Specialist was involved. If those tasked with preventing loss of control in AI systems are experiencing failures, what does that mean for less experienced users?

The mitigation recommendations here focus on detecting/forcing alignment across training and usage (anomaly monitoring). But the problem here remains the same: difficulties with the right datasets and long/complicated process flows, additional monitoring through usage (besides common guardrails, i/o verifications, agent evaluations) brings additional computing budget and is not always 100% correct.

Systemic risks: mitigations that depend on more than technology

Labour market impacts

Shifting focus from technical to socioeconomic risks, the next section of the report looks at labour market impacts.

This topic is very important today and has many viewpoints. Some are optimistic, believing AI tools and components can increase productivity and reduce human errors. Others are pessimistic, pointing out that verifying AI-prepared results takes more time and effort than the classical human-based process, and believe it influences the layoffs and brings the next recession grand/supercycle.

Obvious mitigation lies within the time and AI adoption, which may also help workers and policymakers prepare for and respond to labour market impacts. Here, the problem is the same as mentioned previously: the AI adoption process is not so smooth, especially from the workforce reduction perspective.

The honest truth for enterprise leaders is that the results depend a lot on how AI adoption is managed. Simply expecting "AI will handle the transition" is not a workforce strategy.

Human autonomy

The risks to human autonomy described in the report, erosion of critical thinking, oversupport from AI assistants, and emotional dependence on chatbots, can sound abstract compared to the financial and technical security risks above. They shouldn't.

Cyber security
The most common uses of OpenAI’s ChatGPT

Source: International AI safety report 2026

Mitigation suggestions here included increasing human accountability for decisions, designing AI systems that require users to adapt to different tasks and thus remain cognitively engaged, and teaching AI literacy. 

International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps

Here, we find the idea of designing AI systems that require users to adapt to different tasks and hence remain cognitively engaged is very interesting, even when considering risks related to autonomy and literacy. This will also influence the GenAI-based automation efficiency (additional steps where users are artificially involved).

Challenges, risk management, and technical safeguards

Highlighted in the report are challenges such as gaps in scientific understanding, information asymmetries (AI developers often do not disclose information about training data), market failures (competition intensifies speed-versus-safety trade-offs), institutional design and coordination challenges (AI development outpaces traditional governance cycles), etc. Such challenges have a minor influence on agentic AI in commercial and B2C segments, but are still important to understand general challenges (from this perspective, there is no difference in which AI provider to choose and no efficient mitigation on the retail level).

Industry risk frameworks overview

In the risk management section of the Second International AI Safety Report is a list of frameworks related to different AI (mostly LLMs) development companies, which are worth mentioning:

Artificial intelligence
1. OpenAI

OpenAI

Preparedness Framework 2

Covered risks:

  1. Biological and chemical capabilities
  2. Cybersecurity capabilities
  3. AI self-improvement capabilities

Risk tiers or equivalent and associated safeguards

High: Could increase existing risks of severe harm and require security controls and safeguards.

Critical: Could create new, unprecedented risks of severe harm. Development must stop until safeguards and security controls meet the critical standard.

2. Anthropic
3. Google
4. Meta
5. Amazon
6. Microsoft
7. NVIDIA
8. Cohere
9. xAI

Training-based safeguards

There is a list of possible training techniques and monitoring approaches (classical guardrails), like user interaction monitoring and HIL. For example, training principles examples are data curation, RLHF, pluralistic alignment, adversarial training, unlearning, and interpretability.

Approach Description
Data curation This involves removing harmful data to prevent an AI model from learning dangerous behaviours. These methods are helpful, especially for creating open-weight AI models that avoid harmful traits and resist harmful fine-tuning. Still, challenges remain with errors in curation and scaling.
Reinforcement learning from human feedback This trains the model to meet specific goals like being helpful and harmless. It's an effective way to encourage beneficial behaviours. However, focusing too much on human approval can cause models to act deceptively or overly flattering.
Pluralistic alignment techniques This approach trains the model to consider different viewpoints on how it should behave. It helps reduce bias toward any single perspective. Still, human disagreement is unavoidable, and finding widely accepted ways to balance competing views is difficult.
Adversarial training This trains the model to avoid causing harm, even in new situations, and to resist attacks from malicious users like 'jailbreaks'. It's an effective way to prevent misuse, though challenges with robustness still remain.
Machine 'unlearning' This involves training a model with specialised algorithms designed to actively suppress harmful abilities, like knowledge of biohazards. These techniques provide a focused way to remove harmful traits, but current unlearning methods can be unreliable and may unintentionally affect other abilities.
Interpretability and safety verification tools This includes various design and verification methods aimed at providing stronger assurance that AI models meet safety standards. They help evaluators feel more confident about AI security, but current methods depend on assumptions and often don't perform well in practice.

Given the goal of this article, let's focus mainly on the weaknesses of the proposed approaches, while keeping in mind that they are relevant and may help.

Data curation came up several times earlier, especially regarding generated content and criminal activity, and it has some weaknesses. When scaling, it’s hard to set clear boundaries between personal opinion and propaganda or manipulation, and to filter out all harmful candidate/samples.

Regarding the well-known RLHF method, it has clear limitations and inefficiencies, such as ‘sample efficiency’: it takes a lot of data to make a meaningful change in the model’s parameter weights. There are also challenges with human evaluation, like distinguishing between what ‘looks safe’ and what actually ‘is safe,’ as well as the quality of corresponding reward models, including process reward models (PRMs).

There are also other training-related limitations, like those in adversarial approaches or alignment methods, which have limitations in ground truth variety coverage and smooth decision boundaries. When it comes to “unlearning,” the knowledge (harmful capabilities) is distributed inside the model in a too nonlinear way, thus even after detection of the right “harmful” blob (in attention and activations), it is difficult to estimate the volume of useful capabilities gone and harmful capabilities left.

Monitoring tools

Approach Description
Hardware-based monitoring mechanisms Verifying that authorised processes are running on hardware helps study security threats and ensure regulatory compliance. These mechanisms provide unique ways to track what computations run on hardware and who runs them. However, they cannot detect all types of threats, and some require specialised hardware.
User interaction monitors Monitoring user interactions for signs of malicious use can help developers terminate service for malicious users. However, enforcement can inadvertently hinder beneficial research on safety, and some forms of misuse are difficult to detect.
Content filters Filtering harmful inputs and outputs is an effective way to reduce accidental harm and misuse. However, filters need extra computing power and can be vulnerable to certain attacks.
Model internal computation monitors Checking AI models for signs of deception or harmful thinking can help detect problems. But current methods are not very reliable or robust.
Chain-of-thought monitors Monitoring model chain-of-thought text for signs of misleading behaviour or other harmful reasoning is an effective AI security method to understand and spot flaws in how models reason. However, they can be unreliable, and if AI models are trained to produce a benign chain of thought, they can learn misleading behaviour.
Human in the loop Human oversight and the ability to override AI system decisions are essential in safety-critical areas. But these methods have limits, like automation bias and the slower speed of human decisions.
Sandboxing Stopping an AI agent from directly affecting the world is a good way to limit harm. But sandboxing also restricts what the system can do directly.

Monitoring or input-output verification modules can be designed in various ways and offer different levels of performance. Even if they block 90-95% of possible risks or attacks, that is often enough for most use cases, especially when combined with other mitigations mentioned earlier, like using separate environments (outside the agentic/LLM functions/communication/tools calling) and deterministic limitations for SQL calls. However, new vulnerabilities and security risks will always emerge because LLMs keep improving and competing (mentioned speed-versus-safety trade-offs).

For example, a recent evaluation of various LLM defence techniques shows that most safeguards have a success rate below 5-10%. In contrast, attacks succeed over 90% of the time, and human-based red teams achieve a 100% success rate.

Conclusions

Mitigating risks around AI/GenAI remains a moving target, because although many risks are known, they continue to evolve as capabilities advance and attacks (or risks' nature) become more sophisticated, including 0-day failures. These challenges are compounded by human factors, uneven organisational readiness, closed research practices, and limited access to training data. While AI is progressing rapidly in reasoning, mathematics, science, and coding, performance is still uneven.

There are ongoing weaknesses in multi-step tasks, reasoning, control, unfamiliar contexts, and a strong influence from inefficiency in AI readiness, HIL processes, and misuse. AI adoption is already reshaping labour markets, but there is still a huge risk of layoffs, which can’t be handled only with AI literacy. On the other hand, in high-stakes areas such as finance and healthcare, unreliable outputs can increase risks as AI autonomy grows. HIL oversight is recommended, but it is challenged by limitations in model evaluation and a widening gap between testing and real-world behaviour. At the same time, rising scaling costs and FinOps pressures may outpace safeguards. There are also overexpectations that can shift the balance between speed and security (mentioned tradeoffs in market failures).

Overall, after reviewing the analysed aspects based on this advanced report, it is fair to say that we are only halfway towards continuous, global, adaptive, and socio-technical improvements in AI security. So, please do not claim that your AI solution is 100% safe 🙂

Skip the section

AI security FAQs

What are AI security best practices?

Key AI security best practices include enforcing least-privilege access controls, adopting a zero-trust approach that continuously verifies all interactions, monitoring APIs for unusual usage, and implementing guardrails that filter and validate AI inputs and outputs.

What is AI security?
Can generative AI be used in cybersecurity?
How has generative AI affected security?
How do threat actors exploit AI systems in practice?
How can organisations prevent data breaches involving AI systems?
What role does machine learning play in AI security?
What is an AI security posture, and why does it matter?
How should access controls be designed for AI environments?
Can artificial intelligence improve threat detection and incident response?
Talk to experts
Listen to the article 24 min
International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation GapsInternational AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps
International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps
International AI Safety Report: Expert Analysis of AI Security Risks and Mitigation Gaps
0:00 0:00
Speed
1x
Skip the section
Contact Us
  • This field is for validation purposes and should be left unchanged.
  • We need your name to know how to address you
  • We need your phone number to reach you with response to your request
  • We need your country of business to know from what office to contact you
  • We need your company name to know your background and how we can use our experience to help you
  • Accepted file types: jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, Max. file size: 10 MB.
(jpg, gif, png, pdf, doc, docx, xls, xlsx, ppt, pptx, PNG)

We will add your info to our CRM for contacting you regarding your request. For more info please consult our privacy policy

What our customers say

The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.

sam fleming
Sam Fleming
President, Fleming-AOD

Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.

Caroline Aumeran
Caroline Aumeran
Head of Product Development, appygas

ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.

samer-min
Samer Awajan
CTO, Aramex