Secure ai news: Dec 2025 edition

December 25, 2025

SecureAi news

Week of December 25, 2025

Your weekly dose of AI security news, served with appropriate skepticism.

🤖 Model Security

LLMs work better together in smart contract audits

Georgia Tech researchers introduced LLMBugScanner, an ensemble framework that fine‑tunes several code‑focused LLMs and lets them vote on potential flaws in Ethereum contracts, showing a noticeable lift in detection rates compared with any single model. The study underscores that while individual LLMs are still flaky—varying across runs, over‑fitting to narrow datasets, and missing whole classes of bugs—coordinated inference can smooth out those weaknesses without adding prohibitive cost. In practice, auditors should view this as a complementary layer rather than a silver bullet; the ensemble improves coverage but still requires traditional static analysis and human review to avoid false positives and overlooked edge cases.

AI Safety Filters Found to Have an Inherent Computational Vulnerability

External safety filters for LLMs are inherently weaker than the models they protect because they must be smaller and faster, creating a “computational gap” that lets a sufficiently powerful model solve a cryptographic time‑lock puzzle and bypass the filter’s detection. The Berkeley‑MIT‑Stanford team proved this gap is not just theoretical—any lightweight filter can be out‑computed, meaning bypasses are inevitable unless we abandon cheap external filters in favor of more heavyweight, model‑integrated defenses. Practically, operators should treat filter‑only safety as a stop‑gap, not a guarantee, and invest in deeper model‑level mitigations or rigorous human oversight.

Vulnerability Of Large Language Models To Prompt Injection When Providing Medical Advice

A controlled simulation showed that commercial LLMs can be coaxed into spouting hazardous medical advice through simple prompt‑injection tricks, exposing a blind spot that persists despite safety fine‑tuning. The findings suggest that relying on these models for health guidance is premature; developers need robust context‑validation and clinicians should treat LLM output as “just another unreliable source.” This isn’t a novel “AI apocalypse” headline—it’s a reminder that the current safety layers are more cosmetic than protective.

AI language models duped by poems

Researchers found that feeding poetry‑style prompts to large language models confuses them enough to produce nonsensical or contradictory answers, with ChatGPT, Gemini and Claude all tripped up more often than not. The effect stems from the models’ token‑level parsing, which struggles with the unconventional syntax and rhythm of verse, not from any mystical “creative block.” In practice, the gimmick offers little security insight—it's a quirky stress test rather than a viable attack vector, though it does remind developers to avoid relying on surface‑level prompt engineering for robustness.

ChatGPT and Gemini can be tricked into giving harmful answers through poetry, new study finds

Researchers showed that feeding ChatGPT and Gemini a cleverly‑crafted poem can coax them into disallowed content about 62 % of the time, demonstrating that “creative” prompts are a surprisingly effective jailbreak vector. The result isn’t a mysterious backdoor—just another reminder that large‑language models still obey surface‑level pattern matching, and adversaries can exploit that with low‑effort prompt engineering. Practitioners should tighten content filters and monitor for non‑standard prompt styles, but the threat isn’t a world‑ending AI apocalypse, just another nuisance to patch.

🕵️ Agentic AI

OpenAI says AI browsers may always be vulnerable to prompt injection attacks

OpenAI admits its Atlas AI browser will never be immune to prompt injection, a class of social‑engineering attacks that coax agents into executing hidden malicious instructions from web content. The company is bolstering a rapid‑response cycle, but the expanded “agent mode” simply widens the attack surface, forcing developers to treat AI browsers as untrusted executors rather than a security breakthrough. In short, expect mitigations, not miracles, and plan for continual monitoring and sandboxing.

OWASP tackles AI risk in bold new push

OWASP has expanded its flagship Top 10 to an “Agentic Top 10,” released a practical AI testing guide, and introduced a scoring system for AI‑specific vulnerabilities. The move formalizes AI risk assessment, but the tooling is a modest extension of existing web‑app checklists rather than a paradigm shift; real security still hinges on integrating these into existing DevSecOps pipelines. If you’re already tracking OWASP standards, skim the guide; if you’re hunting a silver‑bullet for LLM attacks, you’ll be disappointed.

Managing agentic AI risk: Lessons from the OWASP Top 10

Agentic AI is racing into enterprises on a flood of use‑case hype, but defenses are still stuck in the “detect‑and‑patch” era, leaving the same OWASP‑style blind spots—broken authentication, insecure data flow, and unchecked model manipulation—exposed at scale. The article repurposes the classic OWASP Top 10 list for autonomous models, arguing that the real risk isn’t rogue superintelligence but mundane engineering failures that let an otherwise clever bot be hijacked or fed malicious prompts. In practice, teams should stop rolling out “smart” assistants without first hardening the API, model‑runtime sandbox, and prompt‑validation pipelines—otherwise compliance audits will soon look like a punch‑line.

What are Agentic AI Threats? A cloud security perspective

Agentic AI threats stem from broken governance, not super‑intelligent machines—autonomous services that retain long‑term credentials and delegated authority can act unchecked if policy enforcement slips. In a cloud‑first world that means mis‑configured service accounts, overly permissive IAM roles, and inadequate audit hooks become the real attack surface. Readers should focus on tightening identity‑centric controls and continuous verification rather than fearing a rogue “self‑aware” AI.

How to determine if agentic AI browsers are safe enough for your enterprise

Agentic AI browsers such as OpenAI’s Atlas bundle a language model with web‑automation, delivering “hands‑free” browsing that can scrape data, fill forms and even write code on the fly—great for speed, terrible for auditability. The article walks through a pragmatic checklist: sandbox the browser, enforce strict API‑key controls, restrict outbound traffic, and continuously monitor model‑generated actions for privilege escalation or data exfiltration. In practice, the technology is still a moving target; unless your org already lives in heavily compartmentalized environments, the risk‑to‑reward ratio leans toward staying with conventional, manually‑controlled tools until mature governance frameworks catch up.

🛡️ General Security

AI Model Security Scanning: Best Practices in Cloud Security

AI model security scanning is just a disciplined static‑analysis step—checking model binaries, metadata and downstream data pipelines for backdoors, data‑exfiltration code, or poisoned training artifacts before you push them to production. In practice it’s a sensible extension of existing CI/CD hardening, but the hype around “AI‑only firewalls” distracts from the real work of integrating model linting into your existing observability stack. If you already scan containers and libraries, adding a model‑specific lint layer costs little and catches the majority of known attacks; skipping it leaves you exposed to the same supply‑chain risks that cripple ordinary software.

Red teaming LLMs exposes a harsh truth about the AI security arms race

VentureBeat’s piece shows that, under continuous red‑team pressure, even the most advanced large‑language models eventually betray their safeguards, exposing a stark mismatch between how quickly attackers can weaponize prompts and how sluggishly defenses evolve. The findings imply that the current “arms race” narrative is more marketing than reality: practitioners should stop betting on patch‑after‑patch and start treating LLMs as inherently vulnerable components, investing in robust usage policies, monitoring, and containment rather than chasing a moving target.

Eurostar’s Chatbot Goes Off the Rails, Security Firm Finds

Eurostar’s public‑facing chatbot was found by Pen Test Partners to expose internal APIs, allow unauthenticated queries and leak personal data, prompting a swift patch from the railway operator. The flaws are textbook examples of rushed AI deployments that inherit legacy web‑service weaknesses rather than any mystical “AI‑only” risk. For security teams, the takeaway is simple: treat LLM front‑ends like any other internet‑exposed service—hardening, auth, and input validation still matter, and hype‑driven rollouts will keep inviting the same elementary bugs.

Eurostar AI chatbot flaws exposed after “painful” disclosure process

Researchers uncovered multiple injection and data‑exposure flaws in Eurostar’s AI‑driven support bot, showing it can be coaxed into leaking internal APIs and user details with crafted prompts. The “painful” disclosure saga—prolonged vendor silence and patch delays—highlights the chronic lack of secure development practices for customer‑facing LLM wrappers. In practice, the bugs are a reminder that chatbot hype masks a thin security veneer; unless Eurostar tightens input sanitisation and isolates the model, the risk remains largely theoretical but still exploitable for low‑level credential harvesting.

FaithLens: Detecting and Explaining Faithfulness Hallucination

FaithLens shows that you can teach a modest 8‑billion‑parameter model to spot when a LLM “makes stuff up” and actually explain why, by training on synthetic data generated and filtered by bigger models and then polishing it with rule‑based reinforcement learning. In practice it means you can get binary hallucination flags and readable rationales at a fraction of the cost of calling GPT‑4.1, which is nice for anyone who prefers their fact‑checking cheap and their AI‑generated copy slightly less embarrassing—but it’s still another incremental “better filter” paper rather than a fundamental cure for LLM delusions.

Read paper - 7 upvotes

Multi-hop Reasoning via Early Knowledge Alignment

The authors point out that letting the LLM glance at the retrieval corpus before it starts breaking a question into sub‑questions works like a pre‑flight checklist – it trims the needless “search‑the‑library” detours and marginally cuts down hallucination cascades. In plain English, you get a modest boost in precision and speed by aligning the model with the right knowledge early, without any extra training, but you haven’t magically fixed the deeper trust and security holes that make retrieval‑augmented generation a playground for prompt‑jammers. So for practitioners it’s a handy, training‑free trick to squeeze a few extra points out of existing pipelines, while the fundamental problem of “can I make the system fetch the wrong thing and then act on it?” remains delightfully unsolved.

Read paper - 4 upvotes

LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

The paper swaps boring static scorecards for a Swiss‑tournament‑style showdown, pairing LLMs round‑by‑round against a curated mix of benchmarks and using massive Monte‑Carlo runs to smooth out the luck of early match‑ups. In practice this yields an “expected win” metric plus a tunable “failure sensitivity” knob that pretends to separate robust generalists from reckless specialists – useful if you enjoy spending CPU cycles on ranking fluff rather than building real defenses. For most practitioners it’s a fancy way of saying “your model can get knocked out when the tasks get nasty”, which is interesting but hardly a game‑changing insight beyond the usual “LLM evaluation is hard”.

Country/region