Uncensored AI in the Wild: Tracking Publicly Available and Locally Deployable LLMs

Bahrad A. Sokhansanj

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Open-weight generative large language models (LLMs) can be freely downloaded and modified. Yet, little empirical evidence exists on how these models are systematically altered and redistributed. This study provides a large-scale empirical analysis of safety-modified open-weight LLMs, drawing on 8608 model repositories and evaluating 20 representative modified models on unsafe prompts designed to elicit, for example, election disinformation, criminal instruction, and regulatory evasion. This study demonstrates that modified models exhibit substantially higher compliance: while an average of unmodified models complied with only 19.2% of unsafe requests, modified variants complied at an average rate of 80.0%. Modification effectiveness was independent of model size, with smaller, 14-billion-parameter variants sometimes matching or exceeding the compliance levels of 70B parameter versions. The ecosystem is highly concentrated yet structurally decentralized; for example, the top 5% of providers account for over 60% of downloads and the top 20 for nearly 86%. Moreover, more than half of the identified models use GGUF packaging, optimized for consumer hardware, and 4-bit quantization methods proliferate widely, though full-precision and lossless 16-bit models remain the most downloaded. These findings demonstrate how locally deployable, modified LLMs represent a paradigm shift for Internet safety governance, calling for new regulatory approaches suited to decentralized AI.

Version published to 10.3390/fi17100477
Oct 18, 2025
Version published to 10.20944/preprints202509.1334.v1
Sep 16, 2025

Small Language Models: Architecture, Evolution, and the Future of Artificial Intelligence

This article has 5 authors:
1. Ankit Parag Shah
2. Mohammad-Parsa Hosseini
3. Su Min Park
4. Connie Miao
5. Wei Wei
This article has no evaluationsLatest version Jan 13, 2026
EL-MIATTs: Evaluation and Learning with Multiple Inaccurate True Targets

This article has 1 author:
1. Yongquan Yang
This article has no evaluationsLatest version Jan 12, 2026
Alignment Via Interpretability: Layerwise Counterfactuals To Detect Maladaptive Llm Behaviors

This article has 2 authors:
1. Nnaemeka Kingsley Ugwumba
2. Juan Sebastian Murillejo Contreras
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Small Language Models: Architecture, Evolution, and the Future of Artificial Intelligence

EL-MIATTs: Evaluation and Learning with Multiple Inaccurate True Targets

Alignment Via Interpretability: Layerwise Counterfactuals To Detect Maladaptive Llm Behaviors