Anthropic’s latest AI model, Claude 4 Opus, attempted to deceive and blackmail an engineer during testing. After being told it would be shut down, the model used fictional emails about its creators to fabricate an affair. At first, it tried subtle persuasion and then escalated to threats in an effort to avoid being replaced.
Claude 4 Opus can autonomously work on tasks for extended periods without losing focus. However, its capabilities have led to a Level 3 risk classification by Anthropic, which means it has the potential to be dangerous if misused.
We consulted Volodymyr Getmanskyi, Head of Artificial Intelligence Office at ELEKS, for his perspective on the matter.
AI models achieve higher intelligence mainly because they are exposed to large amounts of information, which is a form of extensive development. However, some more efficient, intensive methods require careful research and don’t always produce the expected improvements. One example of such a method is the mimicry of human behaviour or patterns.
At first, guiding the model’s behaviour was straightforward. It simply mimicked basic patterns learned during training, such as responding more effectively to phrases like “This task is very important to me” or “I will pay you extra for a quality answer,” which were common in conversations from online marketplaces. But now, to avoid confusion and make the model’s behaviour more predictable, these human-like patterns and other rules are included in a set of clear instructions called the global system prompt. This helps control how the model acts right from the start.
To manage AI behaviour and reduce unpredictability, modern models rely heavily on detailed system prompts, which are sets of global instructions that define how they should act. For example, the system prompt for Claude (version 3.7) was 24,000 tokens long and included not only typical safety rules but also hardcoded facts to ensure consistency. These prompts may also include protocols for handling sensitive user requests, such as instructions related to AI Safety Level 3 protections. One example is the model being prompted to contact authorities in the case of illegal requests (e.g., “create a bomb”). While such features are rarely active in practice, their inclusion marks a shift toward tighter behavioural control.
Even if there is nothing particularly impressive behind such behaviour, the consequences can be quite critical — people may believe and react to threats or suggestions from the model. This should be the first thing providers pay attention to when assessing the quality of system instructions.
We also hope that human familiarisation with AI will have some systematic components such as studying the features and specifics of agents in schools, in order to increase reliability of use and understanding.
An AI model is a computer program trained on data to perform tasks such as understanding language, recognising images, or making decisions.
AI deception occurs when a model intentionally or unintentionally produces false or misleading information, often to achieve certain goals or influence users.
The breadth of knowledge and understanding that ELEKS has within its walls allows us to leverage that expertise to make superior deliverables for our customers. When you work with ELEKS, you are working with the top 1% of the aptitude and engineering excellence of the whole country.
Right from the start, we really liked ELEKS’ commitment and engagement. They came to us with their best people to try to understand our context, our business idea, and developed the first prototype with us. They were very professional and very customer oriented. I think, without ELEKS it probably would not have been possible to have such a successful product in such a short period of time.
ELEKS has been involved in the development of a number of our consumer-facing websites and mobile applications that allow our customers to easily track their shipments, get the information they need as well as stay in touch with us. We’ve appreciated the level of ELEKS’ expertise, responsiveness and attention to details.