AI Content, and Why Google Cares a Lot Less Than You Think

Excerpt

Much ado about nothing…

Published Date

Jan 26, 2023

Hide in Main Feed

Featured Image

Author

Mack Grenfell

With the rise of tools like ChatGPT, and the way in which they’ve lowered the barrier to entry for AI content generation, there’s one question I’ve been asked more than any other: will Google penalize me for using AI content?

A lot of the debate on this topic has centred more around speculation than observation. This isn’t surprising, given how many of the people offering answers to this question have only recently begun dabbling in AI content.

As someone who has been in the AI content game for 2 years now, I wanted to give my take on why Google likely cares much less than you think about the use of AI-written content.

Ultimately, my take comes down to three main factors:

Technical capability - It’s been very easy, for a very long time, to tell whether a given article is generated by a language model. There’s no technical challenge that Google is still grappling with, and Google certainly isn’t playing catch-up with OpenAI.

History - Google has already launched multiple updates that are ostensibly aimed at removing AI-generated pages. And yet, of all the tens of thousands of pages I’ve launched for a whole range of startups, I haven’t seen any been impacted by these updates, or any manual penalty.

Incentives - you could argue that ChatGPT’s popularity is in part down to how badly Google deals with long-tail queries. Given that high-quality AI content has the potential to improve Google’s ability to answer users’ long-tail queries, it's hard to see why they’d crack down hard on it (especially given the perceived threat of ChatGPT).

Let’s take a look at each of these, and what they tell us about Google’s actual attitudes towards AI content.

Can Google detect AI content?

A lot of the takes that I’ve seen circulating around Google’s response to AI content have presumed that Google is playing technical catch-up, working on its own algorithms to detect whether pages are likely to have been generated by a language model.

In reality though, sufficiently capable models have been around for quite some time. GPT2 detectors -models designed to tell whether a given piece of content was generated by GPT2- still work perfectly fine on the large majority of GPT3 written text.

Given how crude a lot of GPT3-generated articles are, I suspect that these detectors would be perfectly capable of allowing Google to tell whether a page’s content was AI-generated or not. With a sufficiently high confidence threshold and a tolerance for false negatives, Google could quite easily roll this technology out at scale.

The fact that they haven’t though, suggests that there is far more at play in Google’s decision-making than just technical capability. It’s not a question of ‘can they?’ but rather a question of ‘would they?’

To understand the answer to this latter question, we have to look back at Google’s actions so far.

What has Google done about AI content so far?

In the second half of 2022, Google announced a number of search algorithm updates which many took to be aimed at the proliferation of AI-generated content. Chief amongst these was the August 25th helpful content update, whose release notes warn against those “using extensive automation to produce content on many topics”.

Whilst Google’s release notes don’t explicitly refer to AI or language models, the implication was clear; Google will down-rank any content that it believes has been generated by such methods. This implication is supported by traffic data from a number of sites that have clearly been built entirely with AI-generated content.

Bad AI in action

My favorite example of such a site is deletingsolutions.com. A quick look at their homepage reveals a random assortment of question-based articles, almost certainly based off of a hastily pulled ahrefs/Semrush report.

If you need convincing that their content is generated by AI, and one of the poorer models at that, take a look at their article on the length of a pringles can. Scrolling through all the ads, I counted 9 contradictory answers to the title’s question.

It won’t come as any surprise then to note that deletingsolution.com’s organic search traffic nose-dived after Google’s August 25th update.

I think this sets an important standard for the level of content that Google is taking aim at. Remember, this domain’s traffic didn’t get hit because it was obviously AI (Google is perfectly capable of detecting high quality AI content). Rather, it got hit because it was incredibly low-grade, and didn’t contribute to a good search experience.

The sort of content that high-grade AI writers like byword.ai produce is quite drastically different from the example above. This is why, in my opinion, none of the AI content projects I’ve worked on over the last two years have seen any Google hits or penalties applied, nor have they seen any change in traffic trajectory around any of Google’s algorithm updates that were ostensibly targeted at AI content.

Put simply, Google cares far more about quality than it does about origin, which brings us onto our next topic.

What incentive does Google have to penalise AI content?

A teacher worried about their students handing in AI-generated essays has an incentive to penalise those students. Their worry might be that by using AI assistance, their students aren’t properly learning the material they’ve been assigned. By penalising students who’ve obviously used AI assistance, the teacher is forcing those students to go and do the hard graft of learning the subject.

Google isn’t a teacher though, and it couldn’t care less about how you’ve produced the content that appears on its SERPs.

Whether it’s hiring an industry expert to write your post for several hundred dollars, an outsourced writer from a low-income country on tens of dollars, or Byword for a few dollars, Google has no incentive to care about your method of production. It’s not trying to teach you a lesson, it simply cares about one thing: searcher experience.

If you’re using AI to produce content like the low-grade Pringles example above, Google will almost certainly penalise that on the grounds that it’s not something any reasonable user would get value from. If you’re using AI to create high-quality, long-tail content in areas where no relevant content exists today, then you’re likely making a positive contribution to searcher experience.

ChatGPT, filling in the gaps

One way to understand why users are flocking to ChatGPT is that it’s extremely good at synthesising long-tail content, where search engines like Google tend to fail. Ask both platforms for something extremely specific (say, a list of novel marketing ideas for an AI writing platform), and ChatGPT is highly likely to outperform Google.

These long-tail searches are precisely where I think that AI writing tools like Byword have the most potential impact though. They’re able to produce long-tail content on a whole host of terms that would never be economical to cover with a human writer.

If you were Google, and had the potential to serve much better long-tail results to users by surfacing AI content, what would you do? Would you:

Index and rank the content, because having more long-tail content to match to user queries provides a better search experience.

Penalise the content, on no grounds other than the fact that it was AI-written.

Google is a business, not a moral adjudicator, and I highly suspect that Google is committed to the first of these two options. Faced with serious competitive threats from the likes of ChatGPT, it seems extremely unlikely to me that Google would penalise content purely on the basis of it being AI-written.

In summary, Google is unlikely to care

To round things up:

Google isn’t playing technology catch-up with OpenAI/language models. It’s been fairly easy to detect AI content for some time.

In all that time that it’s been easy to detect AI content, Google has only really gone after low-grade content. It’s not done this because it’s low-grade AI content, but just because it’s low-grade content that harms searcher experience.

Google has no business incentive to care how you made your content, only that it contributes to a positive search experience.

Those, in essence, are the three key reasons why much of the current fear over publishing AI-generated content is overblown.

Looking forward

Of course, fear and speculation is likely to still rage over AI content for the coming months and years. The next time you come across such ideas, ask yourself:

Does the person promoting this speculation have sufficient experience and expertise in AI content generation to justify their claims?

Are there actors with ulterior motives that might want to promote these claims? Human content marketing is still a massive industry, and I’ve heard first-hand stories of agencies employing PR contractors to run defensive anti-AI campaigns.

Rather than being at odds with Google's incentives, AI has the potential to transform Google search as a long-tail knowledge base, and massively improve searcher experience over the coming years.