Google and Canonical URL

In this post I will change the canonical URL in the <head>, just to see how Google and Googlebot behave between discovery of the URL discovered in the tag.

The canonical tag looks like this:

<link rel="canonical" href="https://www.url-to-crawl.com" /> 

I’ve changed the URL in this example on purpose, in order to make sure that Googlebot discovers the URL only through the rel canonical tag.

I’ll keep you posted with the results !

Edit 15 days after: Googlebot didn’t crawl the URL set in the canonical !

A slow page, just to check the impact on log

It took a certain to load this page, isn’t it ?

It took probably a bit more than 10 seconds.

It’s normal, the idea is to check how Google SERPs behave with this slow page.

I’ve added a PHP code snippet to make this page to take at least 10 seconds to load using “Insert PHP Code Snippet” WP plugin.

FYI, it is how it looks in the UI of WP:

Well, I had to add content to make this test to work, so here it is !

To find this page during the test, he magic word is: drumblebassseo .

I’ve copied and past few sentences below, found randomly on the web, just to give to credibility to this page:

The classic editor does not solve this issue nor does the tinymce help. I’m embedding HTML from Amazon. Amazon gives HTML for products for their affiliates.

And also this one:

A few people were expressing their frustration with inserting Amazon compliant images in WordPress.  The problem seems simple enough.  Take an image from Amazon and put it on your website.

edit: I’ve removed the tag to slow down the website. I’ll check the results now !

Impact of search of images on log for SEO

I recently stated a huge difference between the traffic monitored in the Search Console and what I could find in the log files using Kelogs log analyzer. My first hypothesis is that Google Image could be at the origin of this difference, with a preload in background. Second hypothesis is that the first results in Google Web is preloaded if the page is usually too slow.

Protocol to check the first hypothesis:

I’ll try to get the image below indexed for this website, despite it is hosted on amazon, and I will live check the impact on log files when I search for it in Google.

Let’s come back in few days when Googlebot will have indexed this page and display the image in Google Image (I’ll use “site:quentinadt.com” to check it).

So here is the picture of someone swimming:

A women crawling

To check the second hypothesis, Ill make a page which is very slow but ranks on a specific unique keyword.

Test of Indexation of images in background by Google

In this post, there is a background image. It is the picture of my 3 years old Macbook Pro 2016 which is already dying… The screen is sometimes not usable, and most of the time there is just one line, vertical.

The idea is to check if either of not Google will index an image which is accessible only from a background CSS call.

John Muller, from Google, in 2018:

And as far as I know we don’t use CSS images at all for image search.

Source: https://www.seroundtable.com/google-image-search-css-25068.html

Let’s see what if it is still true in 2020 with the mobile first passage, chrome evergreen, the interpretation of the JS, etc.

Results of this SEO test in a few days!

Edit: One month after…

The image is still not indexed.
Current conclusion is that background images aren’t indexable by default.

Who knows this rare fish ?

If you’ve read my last blog post, you will see that it didn’t help to use the title of an image to rank in the SERPs.

So I decided to push the test a little bit further, until I manage to get the image indexed.

So here is the picture of a very rare – endemic – fish that you can found only in Croatian rivers. I’m note passionate about fish, but I thought it’s an ok picture for testing the ranking capacities of the title element.

So again, I’ve added a title, but this time, it contains what most probably is the name of this endemic fish.

Does the Title of picture help to rank in Google in 2020 ?

I recently read that Google uses the title of the images to rank them in Google Images.

It would be kind of a little revolution, since it has never been considered as useful.

I decided to make my own test, just to make sure it works also for me.

So here is the picture, it’s a screenshot of a presentation shared on Linkedin. There are 34 keywords in the title. You can only find the keywords in the title.

This page will be updated with the results !

Edit 1: 3 days later, the page is indexed, but the keywords in the title aren’t findable in Google. So currently, the title has zero impact.

Edit 2: 2 weeks after, still no result when I search a keyword found in the title.

Comment augmenter la précision des rapports “Vitesse du site” dans Google Analytics

Il suffit d’une ligne pour augmenter la précision des rapports Google Analytics de web performance (timing utilisateur):

ga(‘create’, ‘UA-XXXX-Y’, {‘siteSpeedSampleRate’: 100}); // Théoriquement 100% des pages vues

Vous n’obtiendrez pas 100%, notamment car certains navigateurs ne sont pas compatibles avec la mesure du timing utilisateur. Néanmoins, vous devriez facilement dépasser les 25%.

Source: https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#siteSpeedSampleRate

What are the best SEO Crawler for Huge Website ?

You want to crawl millions of URL and make a super SEO Audit ?

Here are the best crawler for SEO on a large to very large website:

  • Desktop SEO Crawler (starts at more or less 200$ / year / unlimited crawl)
    • Screaming Frog (Linux / Windows / Mac OS) -> The big plus: native function to cross data from G.A., Search Console, MajesticSEO, Ahrefs… + You can cross with your log files ! +/- 60 GB of disk space for 1 million URL crawled. Read this post to setup Screaming Frog on Remote Desktop Ubuntu Cloud instance.
    • Sitebulb (Windows / Mac OS) -> pretty rich ! Interesting visualization of the internal links structure.
    • Hextrakt (Windows) -> URL Segmentation is a real + when it comes to analyze Big Websites. Hextrakt does the job !
    • Xenu (Windows) -> only for very basic checkup, like 404.
  • SaaS SEO Crawler (starts at +159$ / month / for 2 millions URLs crawled per month)
  • Open Source SEO Crawler (Python / Java etc. )
    • Scrapy
    • Crowl (An Open Source crawler based on Scrapy)
    • Nutch
    • => Those solutions aren’t profitable in most cases, since it requires a lot of development and maintenance compared to a SaaS solution for instance.
    • => Nevertheless, if you want to discover how a search engine works, you will learn a lot ! 🙂

Screaming Frog v9.0 / Docker / Debian 8 / Ubuntu Remote Desktop

You would like to setup a remote desktop with Screaming Frog to crawl huge websites (+1 million pages) ?

You only need your crawler to run once a month ? (few hours or days)

You have some basic technical skills and want to discover the power of the Cloud ?

An Ubuntu Remote Desktop in a cloud instance in OVH offers a good  price to performance solution. (1,5 € per day / 2$ or half this price if you take a monthly subscription )

What we will do to get an army of crawler:

  1. Open an OVH account or login
  2. Create a new Cloud Project
  3. Create a Server (an instance)
  4. Specify in advance that we want Docker to be installed (it will make everything super simple to setup)
  5. Install a Docker container containing Ubuntu + A remote Desktop based on NoVNC
  6. Connect to Ubuntu with Chrome or any browser in one click 🙂
  7. Install Screaming Frog with 2 commands
  8. Create a Snapshot in Openstack (= OVH)
  9. Create as many server containing Screaming Frog in just ONE clic

Setup a new Cloud Instance

  • Go here and create an account / login: https://www.ovh.ie/public-cloud/instances/
  • Then here: https://www.ovh.com/manager/cloud/index.html
  • Order > Cloud Project > Fill the forms
  • In your project > Infrastructure > You can add a new server in “Actions” > Add Server
  • Take the 7GB RAM & 100GB SSD for this test.
  • You will need 60 GB of disk for 1 Million URL Crawled
  • Setup your ssh key (Google is your best friend to get help, it’s OS specific)

Setup the server

  • Connect to the server with a terminal
    • user@IP of your server
  • Then copy and paste each line one by one:
    • apt-get update
    • apt-get upgrade
    • sudo docker run -it –rm -p 6080:80 -p 5900:5900 -e VNC_PASSWORD=MyPassWordToReplaceByWhatYouWant dorowu/ubuntu-desktop-lxde-vnc

Setup Screaming Frog

  • Connect to https://IP-OF-YOUR-SERVER:6080/ with the password you used for “VNC_PASSWORD=”
  • Open a terminal in Ubuntu (in the NoVNC session – icon in the bottom left of the Ubuntu desktop)
  • Then copy and paste each line one by one:
    • sudo apt-get install screen wget
    • wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_9.0_all.deb
    • dpkg -i screamingfrogseospider_9.0_all.deb
    • sudo apt-get -f install
  • Screaming Frog is now installed 🙂
  • You can try it here:
    • Bottom left icon > Internet > Screaming Frog SEO Spider
  • You will have to setup the Storage to disk to crawl huge websites
    • Screaming Frog > Configuration > System > Storage

Next step: create a snapshot of this in OVH (to be continued if you liked this article !)