Your website can now opt out of training Google's Bard and future AIs

Einar@lemm.ee · 1 year ago

Your website can now opt out of training Google's Bard and future AIs

underisk@lemmy.ml · 1 year ago

It’s just a robots.txt flag that explicitly mentions a google user agent string. This is about as effective at stopping AI from training on your data as a “no trespassing” sign hidden behind the hedges of your unfenced lawn is at stopping trespassers.

Otter@lemmy.ca · edit-2 1 year ago

We could put it on the various Lemmy sites, but it’s even more ineffective because of federation.

Not sure what the analogy would be in that case, and infinite number of people get to decide if that sign exists on your lawn?

edit: An infinite number of people have a copy of your lawn, and they need to put a sign on it

orca@orcas.enjoying.yachts · 1 year ago

I’ve always been told the Robots file is pretty much just a suggestion.

👁️👄👁️@lemm.ee · 1 year ago

Just like Google honors “do not track”, right?

Mongostein@lemmy.ca · edit-2 1 year ago

So since what’s available now isn’t actually AI, what do we call it when we do get real AI? Will it be like what happened with HD? With True AI™ followed by Ultra AI™, AI4K™, and so on until we just call them master?

pjhenry1216@kbin.social · 1 year ago

I’ve seen AGI thrown around. Artificial General Intelligence.

chameleon@kbin.social · 1 year ago

AGI (artificial general intelligence) is the current term for “The Concept Formerly Known As AI”. Not really a new term, but it’s only recently that companies decided that any algorithm can qualify as regular “AI” if they consider it good enough.

lloram239@feddit.de · edit-2 1 year ago

Artificial Intelligence never meant AGI. AI was simply the attempt to build software that can solve problems that computers traditionally can’t do, but humans can. That includes stuff like chess, image recognition, language, etc. Exactly the kind of things AI has been getting really good at over the last decade.

AGI on the other side is an autonomous AI system that can solve all the problems a human can solve, not just some. That’s frankly quite a bit more of a blurry concept, since what human can do is largely just an artifact of evolution and our environment, and also quite different between humans. AGI is not some magical point that has any significance in the underlying science and it’s unlikely we ever land on exactly that point, as capabilities are just so different between AI and humans and often far superior once we figured out the basics (e.g. StableDiffusion can paint images 1000x faster than a human, ChatGPT has more knowledge than any human ever had).

Also AGI still doesn’t contain sentience, since that’s not really needed to replace a human.

what do we call it when we do get real AI?

The “real AI” is in the underlying algorithms, e.g. backpropagation. Those are the foundation of modern AI systems and those algorithms are what allows you to find pattern in the data. And they are the reason why the field is making seemingly such rapid progress, we can throw data at them and get good results. Our actual understanding why that all works is still somewhat limited, since those algorithms are what extracts the pattern from the data, not us.

ChatGPT on the other side is just an application build with those algorithms and a lot of data. It’s interesting because we can play with it today, but AI models get thrown away and new ones get trained from scratch all the time.

dangblingus@lemmy.world · 1 year ago

AI puts on ski mask Alright, I promise I’m not an AI scraping your website.

TwoGems@lemmy.world · 1 year ago

Is it technically possible to prevent AI scraping on your website?

lloram239@feddit.de · 1 year ago

Yes, pull the plug that connects the machine to the Internet.

TwoGems@lemmy.world · 1 year ago

dangblingus@lemmy.world · 1 year ago

No. There is nothing a website admin can do to prevent it. Every single tool to flag an AI would be circumvented by the AI learning what tools are being used.

Psythik@lemm.ee · 1 year ago

How do I opt in?

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

Large language models are trained on all kinds of data, most of which it seems was collected without anyone’s knowledge or consent.

Now you have a choice whether to allow your web content to be used by Google as material to feed its Bard AI and any future models it decides to make.

It’s as simple as disallowing “User-Agent: Google-Extended” in your site’s robots.txt, the document that tells automated web crawlers what content they’re able to access.

“We’ve also heard from web publishers that they want greater choice and control over how their content is used for emerging generative AI use cases,” the company’s VP of Trust, Danielle Romain, writes in a blog post, as if this came as a surprise.

On one hand that is perhaps the best way to present this question, since consent is an important part of this equation and a positive choice to contribute is exactly what Google should be asking for.

On the other, the fact that Bard and its other models have already been trained on truly enormous amounts of data culled from users without their consent robs this framing of any authenticity.

The original article contains 381 words, the summary contains 190 words. Saved 50%. I’m a bot and I’m open source!

Your website can now opt out of training Google's Bard and future AIs

Your website can now opt out of training Google's Bard and future AIs

Your website can now opt out of training Google's Bard and future AIs | TechCrunch