Explore our Search Engine Optimization (SEO) Toolkit

What is TF*IDF and how can it be used as a new approach for content optimisation?

Author's avatar By Expert commentator 10 Jun, 2016
Essential Essential topic

Using TF*IDF for keyword research, to assess your Term Weight and improve your SEO

Keyword Density is a term frowned upon, and right it should be! It was used in such a way that the content wouldn’t look natural, as you keep reading the same word over and over again. At the time however, the Google algorithms weren’t as complex as they are today, so you could get away with it and not be penalised as badly. Meaning it once was the most main ranking factor for websites. However, these times have changed considerably as now KD is recognised as a minor factor in the rankings because the overuse of a keyword is better known as keyword stuffing which would you will be penalised for!

OnPage.org has invented a new tool for keyword research and to help towards a new method for relevant content creation based around TF*IDF. If you don't know TF*IDF, Google has been using TF*IDF for a long time as the foundation for the ranking factor of your website and Cyrus Shephard of Moz rates it as one of 7 Concepts of Advanced On-Page SEO.

The Mathematical Calculation behind TF*IDF

The idea behind term frequency has been used in the vector space model since the 1960s. Google also seem to focus on term frequency rather than on counting keywords. It’s a series of more complicated algorithms and complex calculations, but the result is a more effective solution to justify as a ranking factor.

“The field of information retrieval has come a long way in the last forty years, and has enabled easier and faster information discovery. In the early years there were many doubts raised regarding the simple statistical techniques used in the field. However, for the task of finding information, these statistical techniques have indeed proven to be the most effective ones so far.” ( Amit Singhal, Google. )

But Term Frequency is easy to understand. First let’s explain the the meaning behind Term Frequency (TF) and Inverse Document Frequency (IDF) as this will clear a lot of things up later when you read the case study. The formula itself is used to create a metric score or otherwise known as “Term Weight” which is used by information retrieval systems (IR was formed in the 1950, but over the years this has advanced massively increasing the capacity of information that can be stored). This term weight is the calculation of the the most important terms in the document, meaning there is no need to worry about stop-words anymore such as: “the”, “is”, “of”, “at” etc. Knowing that this term weight is a mathematical calculation it can be used in any language / country.

Lets take it bit deeper…

Term Frequency pretty much explains itself. However, this is how frequent a term is used in a SINGLE document. Therefore, the TF is calculated by the document length, but also by how many times the term is used in the document. Knowing this, you’re able to determine if you’re using the term too much or too little. Refer to the calculation below for the TF formula:

TF= (No. of times the keyword appears in the document / Total No. of words in the document)

This is where the calculation gets a bit more complicated. The Inverse Document Frequency is the measurement of the important terms in the document within the CORPUS (a collection of written texts for example the entire works of a particular author). This time the calculation needs to scale down the least important terms for example stop-words and scale up the terms which you are trying to rank for.

IDF= (Total No. of documents in corpus / Total No. of terms in the documents)

However, finding the correct way to execute this method of keyword frequency is only possible through a tool as we’re not all mathematicians or have the time to calculate the formula manually.

TF*IDF is one of the newest and most impressive features from OnPage.org providing the support needed when writing unique, relevant content for the readers. We all know the importance of this far to well now! But, are you taking unique content as seriously as you should be and as a result ranking for the most relevant and important terms? When it comes down to finding the right keywords to use, the phrase “keyword inspiration” sums it up perfectly, meaning, giving you the courtesy to get inspired by the keywords shown in the results.

As previously mentioned OnPage.org invented a tool to calculate these formulas and techniques to provide, accurate, real time relevant terms to use when doing content marketing. The case study below describes the steps needed to use this tool effectively when carrying out keyword research.

Case Study:

Okay, so hypothetically speaking I am the SEO specialist for a rental car company and my day to day business consists of writing content about the makes and models of cars. I want to rank for terms like rent a car, rental car or car rental so when people search a specific make / model and rental I will appear in the top 10 on the serps. So I start the search with “rental car” as two word combinations (single terms are possible too).

TF*IDF

 

So from the screenshot above first enter the keyword you want to rank for, followed by the language and country you are targeting. To the right you’re able to quickly load previous reports so you can check back when ever needed without having to wait.

TF*IDF analysis

 

Here comes the orange.. and also the document corpus of information set of the top 15 pages on Google serps for the search term “rental car”. As a result, these are most weighted keywords from the top 15 pages.

TF*IDF

The is useful, but not ideal just yet, as it would be more relevant to know the terms of particular importance. Use the Proof keyword filter to narrow down the results to end up with the more significant keywords and eliminate stop-words, brand-names etc which are not important for the query.

Now it is time to compare the term frequency of your document or a competitor's document with the top 15 from above.

TF*IDF

When comparing your own URL or a competitor's you’re able to notice which keywords are weighted better in your document. Starting from the bottom of the bar graph, the dark blue area symbolises how common the term is in the analysed set. Whereas, the light blue section illustrates the average TF*IDF score of the complete set, meaning the taller this bar is the more heavier this term is used with other pages that use this term. So the goal here, is to try and reach the green bar so the keywords are weighted correctly, but not appearing to spammy.

Whilst TF*IDF will give you great keyword inspiration, it’s always useful to see how your competitors are doing it and if they are using the relevant terms more frequently or not.

Again using the proof keyword filter to acquire more relevant keywords:

TF*IDF

To finish off my keyword research and knowing that I'm using the correct terms to help the overall performance of my page I will be using the Text Assistant. So the content has already been written, but I just want to check if the content is relevant for the search term and how I would improve it if not. Below I simply “CMD+A” the entire page and pasted it into the box. The text helper will then show if there any terms that I should add, Terms that I could use more often or Terms that I use to much (which may appear spammy).

TF*IDF

If the text wasn’t already written, you can simply type in the box and the tool with analyse as you go. The main advantage: is that once you can create text and then check it through TF*IDF to see the possibilities to rank better.

 In a Nutshell

This feature by OnPage.org is our unique Keyword Inspiration tool invented and built to save you from making mistakes such as keyword stuffing or creating un-relevant content. Take the time to do proper keyword research as it will save you time and money especially when you start ranking for the most relevant search terms. Search engines are using the TF*IDF formula as a foundation for the ranking factors of your webpages and ultimately your domain, so work smart.

Author's avatar

By Expert commentator

This is a post we've invited from a digital marketing specialist who has agreed to share their expertise, opinions and case studies. Their details are given at the end of the article.

This blog post has been tagged with:

Google Algorithm ranking factors

Recommended Blog Posts