The 'truth serum' for AI: OpenAI’s new method for training models to confess their mistakes

OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, "confessions," addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive …
Continue reading The 'truth serum' for AI: OpenAI’s new method for training models to confess their mistakes

HTB AI Range offers experiments in cyber-resilience training

The cybersecurity training provider Hack The Box (HTB) has launched the HTB AI Range, designed to let organisations test autonomous AI security agents under realistic conditions, albeit with oversight from human cybersecurity professionals. Its goal is to help users assess how well AI, and mixed human–AI teams might defend infrastructure. Vulnerabilities in AI models add …
Continue reading HTB AI Range offers experiments in cyber-resilience training