Concept Siever: Controllable Erasure of Concepts from Diffusion Models without Side-effects

A framework to surgically remove unwanted concepts (like nudity or copyrighted styles) from generative Text to Image Diffusion models without causing side effects to related concepts.

Aakash Kumar Singh Priyam Dey Sribhav Srivatsa R. Venkatesh Babu

Vision and AI Lab (VAL), IISc Bangalore

Concept Siever Main Figure
Figure 1: Concept Siever removes concepts like "Van Gogh's Style" (top) and "Snoopy" (bottom) while preserving related concepts like "Van Gogh's Identity" and "Charlie Brown". It also offers inference-time control (right).
📄 Paper (TMLR) 💾 Code (GitHub) 🐦 Twitter Thread

Why is Forgetting in AI So Hard?

The "Preservation Set" Nightmare

Current methods force you to create a 'preservation set'—a massive, manually-curated list of concepts to not forget. This is slow, biased, and will never be exhaustive.

Meme about preservation sets

Catastrophic Side-Effects

Even if you try, previous methods cause 'concept leakage.' Trying to forget one thing (like 'Snoopy') breaks related concepts (like 'Charlie Brown').

Concept leakage comparison

Concepts are Subjective

Finally, what does 'forgetting' even mean? For a concept like 'nudity,' the level of desired erasure is deeply subjective and cultural. Current methods offer a single, 'one-size-fits-all' solution with no user control.

Subjectivity of forgetting example

Our Solution: A Surgical Sieve, Not a Hammer.

Concept Siever Framework Diagram
The Concept Siever Framework: (I) Automated paired dataset curation, (II) Training the Concept Sieve, (III) Controllable forgetting.

Part 1: Automated Paired Dataset

We ditch the manual "preservation lists." Our method automatically creates its own paired dataset. By intelligently perturbing concept tokens in the text embedding, we generate "concept" and "concept-negated" images. These pairs are nearly identical in structure and background, differing only in the target concept.

Part 2: Sparse, Targeted Steering Vector

This paired dataset lets us train a "Concept Sieve" τ—a precise steering vector found by taking the difference between the two model states3. This isolates the concept's signal. We further sparsify this vector by identifying the exact model layers responsible for the concept. This targeted, surgical edit dramatically reduces side effects, like forgetting Van Gogh's identity when erasing his style.

Part 3: Inference Time Control

This "Concept Sieve" vector τ isn't just an on/off switch. We provide two mechanisms for fine-grained control at inference time, requiring no retraining. Users can scale the vector's magnitude λ for overall strength or use "Column Masking" to control the scope of the edit. This directly solves the subjectivity problem by letting users choose their own level of forgetting

Meme showing automated dataset is better
Meme showing automated dataset is better
Meme showing automated dataset is better

It Works. And You're in Control.

We don't just solve the "preservation" and "side effects" problems—we also introduce fine-grained user control, a first for this task.

Part 1: The Proof (Surgical Precision)

Result 1: State-of-the-Art on NSFW Removal

Concept Siever sets a new state-of-the-art on the I2P benchmark, reducing inappropriate images by over 33% compared to the previous best domain-agnostic methods.

Result 2: Superior Preservation (The "Aha!" Moment)

We solve the 'concept leakage' problem. We can forget Van Gogh's style without forgetting his identity—something other methods fail to do. Our 'Structure LPIPS' metric confirms we preserve the underlying image content far better than the baselines.

Van Gogh style vs identity comparison
Other methods forget both style and identity. Ours forgets only the style.

You Are in Control: Inference-Time Forgetting

We solve the subjectivity problem. Forgetting isn't 'one-size-fits-all,' so we're the first to provide fine-grained control over the strength of forgetting at inference time. Just scale the steering vector λ and decide for yourself—no retraining needed.

Read the Paper, Run the Code.

Concept Siever makes generative AI safer, more controllable, and more reliable. Dive deeper into our work and use it in your own projects.

Quick Summary (Twitter Thread)

For a quick, visual summary of the entire project, check out our main Twitter thread.

Citation

Citation (BibTeX)

@article{
singh2025concept,
title={Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models without Side-effect},
author={Aakash Kumar Singh and Priyam Dey and Sribhav Srivatsa and Venkatesh Babu Radhakrishnan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2025},
url={https://openreview.net/forum?id=O7zTvlSBZ9},
note={}
}