AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

hedge@beehaw.org · 6 months ago

Ilandar@aussie.zone · 6 months ago

This sounds promising but I do wonder how undermined any progress they make will be by:

the speed of advancements in AI
the fact that this research doesn’t necessarily apply to other LLMs
the fact that LLMs are being released/leaked to the public, so anyone who has access to them has the potential to jailbreak the AI and circumvent any safety precautions researchers implement as a result of this work