November 25, 2019
by Brooks Manley / November 25, 2019
We use search engines all the time, but we aren’t always the best at asking questions.
Take this half-thought out and poorly written search query as an example:
Thankfully, Google still serves us relevant results. How? Natural language processing (NLP).
NLP is a form of artificial intelligence technology that aims to make sense of human language. NLP helps computers to understand, interpret, and replicate human language characteristics.
In an age ruled by technology, it’s important for computers to be able to understand us. NLP is an attempt to take our shoddy language inputs and turn them into something useful computers can comprehend.
This task wouldn’t be difficult if we all spoke like robots, but we don’t. We use slang, have different dialects, misuse grammar, and leave out punctuation. From a computer’s standpoint, it’s like being a beginner in a new language and venturing into the city center only to encounter phrases you’d never heard before. You wouldn’t know how to respond appropriately.
Since machine learning algorithms have been introduced, computers can now process huge amounts of information in order to identify patterns and better understand human languages. This has numerous use cases for tech, many of which you see and use on a daily basis:
Fifteen percent of Google queries are brand new. In other words, the search engine has never before seen the combination of words that make up 15% of all searches.
This isn’t because we are searching for topics Google has never seen before. It’s due to the varying ways we combine words and ask questions, both in writing and voice search. Google and other search engines are able to answer the questions we aren’t able to form correctly.
“It’s our job to figure out what you’re searching for and surface helpful information from the web, no matter how you spell or combine the words in your query.”
Google is constantly trying to get better at understanding queries in order to serve relevant search results, and natural language processing plays a big role in that effort.
Unless you live on a secluded island, you probably heard the noise about BERT in late October. I’d be remiss not to touch on it. BERT (Bidirectional Encoder Representations from Transformers) is an NLP model that Google introduced in 2018 and began rolling out in October 2019. BERT has the ability to consider the full context of a word based on the words that come before or after.
Put simply, BERT helps search engines better understand intent, most notably for longer searches that contain multiple prepositions.
For example, Google called out the query “2019 brazil traveler to usa need a visa” in a recent post. The word “to” is crucial here. Before BERT, Google would have returned results about US citizens traveling to Brazil. Post-BERT, Google can recognize that nuance and return a more relevant and helpful result.
To non-developers and those unfamiliar with AI, NLP can seem like a foreign and distant concept. Lucky for us, Google gives us some really helpful insight through a Cloud API product that offers a set of advanced NLP models. This API is available through Google Cloud and is used by a number of companies for a variety of applications.
Google offers an abbreviated free demo of its API.
Here’s how to check it out:
|
What does all of this have to do with SEO? Let’s start by looking at the four key aspects of Google’s Natural Language API: entity, sentiment, and syntax analysis, and categories.
Put really simply, an entity is a thing. An entity can be a place, person, organization, idea, or concept. Entities address the relationships between things and help search engines like Google understand their relatedness.
For example, the following are entities:
When you search for “prime minister of canada,” you get the following result:
Search engines look for co-occurrence in order to establish the relationships between entities. The three entities listed above exist together frequently enough on the web that search engines are able to confidently serve you a single result to the query.
In addition to co-occurrence, Google looks at other factors like the notability and authority of the contributor to gauge the importance of relationships between entities.
Of these four NLP models, entities have the largest implications on SEO. Google’s Natural Language API measures the salience of entities found in content. Salience is a score of how important the entity is in the context of the whole text. The higher the score, the more salient the entity is.
Unfortunately, there’s no sure-fire way to optimize content for entity salience, nor are we certain entity salience means higher rankings. However, entity salience can be a guide for content optimization, particularly around satisfying user intent.
Pick the topic you’d like to rank for, copy and paste the top five competitor results into the Natural Language API, and check out their entity salience scores.
This will give you a clear picture of what entities Google deems important in regard to the topic. Taking a deeper dive into the Wikipedia articles (when linked), will help you understand what topics and attributes should be covered in your piece.
Note that content isn’t the only implication for entities in search. Entities also mean a great deal for links, as they help search engines identify relevant and irrelevant connections. For example, a company looking to rank for SEO in Atlanta needs quality links from pages about SEO and pages about Atlanta in order to establish relatedness and topical authority. Because of entities, links from pages about puppies and New York will likely carry less weight for that brand.
Sentiment Analysis is another sub-field of NLP that attempts to identify opinions and emotions about an entity within a text.
Sentiment is scored from -1 (very negative feelings) to 1 (very positive feelings). Magnitude is a positive number that measures the overall feeling of a text, whether positive or negative.
Based on the demo Google supplies, it’s no surprise the only text that stood out is in reference to the Android phones, which Sundar Pichai noted users loved. This statement’s magnitude was 1 and score was 0.5.
In January 2018, in regard to featured snippets, Danny Sullivan stated:
“For instance, people who search for “are reptiles good pets” should get the same featured snippet as “are reptiles bad pets” since they are seeking the same information: how do reptiles rate as pets? However, the featured snippets we serve contradict each other.”
Though Google is still experimenting with what they should show for this query, it tells us that sentiment plays a big role in ranking. If you want to rank for a positive query (like “are reptiles good pets”), you should seek to write a post with a high sentiment score.
Additionally, it means the sentiment found around your backlinks and brand mentions are likely sending Google signals as well. Though there’s not much you can do to optimize for this, it makes the case for ensuring your company is providing top-notch customer service and putting effort and authenticity into its brand.
Syntax analysis breaks down all the words of a text and returns its type, mood, voice, and more.
Syntax gives us insight into how Google looks at our sentences and categorizes words for understanding.
Syntax analysis provides far fewer implications for content and SEO as our other three models. Good grammar and clear sentence structure is key to satisfying readers, but it’s unlikely you would glean much more than that from syntax analysis.
In fact, if you use Grammarly or another AI writing assistant tool, it’s likely you’re already using syntax analysis to optimize your content.
Categories reveal how Google classifies text. Within their API demo, you will see the category along with a confidence score.
Our example is short, so we’re only seeing one category with a rather low confidence score. If you were to add a few paragraphs specifically about the phone, you may see the confidence score for Mobile & Wireless rise closer to 1.
It goes without saying that you would want your content to produce a high confidence score for the category and subcategories you wish to rank in.
As with entities, it’s unlikely there’s a tried and true process to optimizing for classification. However, categories can also serve as a great guide. I’d recommend analyzing entity analysis of top-ranking pages prior to crafting content and using category classification as more of a post-writing check.
For instance, if you wrote an article on puppies that scored a 0.99 confidence score for /Autos & Vehicles/Motor Vehicles (By Type)/Motorcycles, it would be worth revisiting the topics and sub-topics you covered (or maybe just starting all the way over).
Though NLP doesn’t offer specific proven formulas for better ranking, it is likely the future of SEO. It’s important that SEOs have a baseline understanding of NLP models and how Google processes language. Keep your eyes on future models Google adds to their algorithm, and don’t be afraid to tiptoe into the waters of machine learning and AI. It’s where our industry is headed.
Brooks is a Digital Marketing Specialist and SEO Lead at Engenius, a marketing agency in Greenville, SC. When he’s not panicking about ranking drops and algorithm updates, you can find him watching NBA games, eating tacos, or blogging at Creative Primer.
Natural language processing (NLP) and large language models (LLM) have become indispensable...
Large language models (LLMs) understand and generate human-like text. They learn from vast...
Online reviews have skyrocketed in importance in recent years.
Natural language processing (NLP) and large language models (LLM) have become indispensable...
Large language models (LLMs) understand and generate human-like text. They learn from vast...