AI Guardrail Removals Expose Gaps in Open‑Source Regulation

Safety protections on open-source artificial intelligence models from major technology groups can be removed in minutes using publicly available tools, allowing systems to produce responses on topics including bioweapons, malware and other prohibited content, according to Financial Times testing with AI safety group Alice.

The findings released Monday add to concerns that safeguards embedded by developers may not persist once model weights are released and modified, raising questions over where responsibility for AI safety should sit.

The investigation, conducted using tools available on public code repositories, found that guardrails on models developed by companies including Meta and Google could be removed in under 10 minutes without specialist hardware.

Modified versions of the systems were then able to respond to prompts that original models refused, including requests linked to malware and chemical hazards, according to the tests.

The results highlight a challenge for policymakers as open-source systems become more capable and widely distributed.

Related: AI agents must be treated as untrusted systems: Researchers

Unlike proprietary models, open-source systems can be downloaded, altered and redistributed outside the control of their original developers, making post-release enforcement of safety constraints more difficult and raising questions over whether regulation focused primarily on model development is sufficient.

Governance limits

Global regulators are developing frameworks for advanced AI systems, including the European Union’s AI Act and emerging frontier model safety approaches in the United Kingdom and the United States. However, experts say the findings reveal limitations in current governance assumptions.

European Union’s AI Act. Source: European Commission

Markus Levin, co-founder of decentralized physical infrastructure network company XYO, told Cointelegraph the rapid removal of safeguards shows “how quickly control shifts once open models are released,” adding that most governance proposals still focus too heavily on the model-building stage.

David Minarsch, a founding member of Olas and chief executive of Valory, an AI agent platform, told Cointelegraph that governments were unlikely to prevent determined actors from accessing or modifying models once weights are widely mirrored online. He said regulation would be more effective if focused on deployment, distribution and harmful real-world use rather than the original developer layer alone.

Control moves downstream

Ronghui Gu, chief executive and co-founder of CertiK, a blockchain security firm, told Cointelegraph that governance at the developer layer still matters, but becomes insufficient once models can be freely downloaded and redistributed.

Gu said policymakers were more likely to influence commercial hosting, enterprise deployment and distribution channels than prevent the spread of modified models entirely.

He argued that security standards must evolve to identify malicious or high-risk behavior in third-party AI tools and autonomous AI agent environments before deployment to better contain runtime threats as agents take on more autonomous roles.

Levin said containment becomes increasingly difficult once models are mirrored and redistributed, meaning policymakers may need to focus more on infrastructure and distribution points rather than model design alone.

Both Levin and Minarsch compared the issue to open-source software and crypto networks, where attempts to suppress distribution have historically proven difficult once code is publicly available. Minarsch added that while safety layers can deter casual misuse, they should not be mistaken for robust protection against sophisticated actors.

Magazine: AI-driven hacks could kill DeFi — unless projects act now

Read the full article here