Screaming Frog v9.0 / Docker / Debian 8 / Ubuntu Remote Desktop

You would like to setup a remote desktop with Screaming Frog to crawl huge websites (+1 million pages) ?

You only need your crawler to run once a month ? (few hours or days)

You have some basic technical skills and want to discover the power of the Cloud ?

An Ubuntu Remote Desktop in a cloud instance in OVH offers a good  price to performance solution. (1,5 € per day / 2$ or half this price if you take a monthly subscription )

What we will do to get an army of crawler:

  1. Open an OVH account or login
  2. Create a new Cloud Project
  3. Create a Server (an instance)
  4. Specify in advance that we want Docker to be installed (it will make everything super simple to setup)
  5. Install a Docker container containing Ubuntu + A remote Desktop based on NoVNC
  6. Connect to Ubuntu with Chrome or any browser in one click 🙂
  7. Install Screaming Frog with 2 commands
  8. Create a Snapshot in Openstack (= OVH)
  9. Create as many server containing Screaming Frog in just ONE clic

Setup a new Cloud Instance

  • Go here and create an account / login: https://www.ovh.ie/public-cloud/instances/
  • Then here: https://www.ovh.com/manager/cloud/index.html
  • Order > Cloud Project > Fill the forms
  • In your project > Infrastructure > You can add a new server in “Actions” > Add Server
  • Take the 7GB RAM & 100GB SSD for this test.
  • You will need 60 GB of disk for 1 Million URL Crawled
  • Setup your ssh key (Google is your best friend to get help, it’s OS specific)

Setup the server

  • Connect to the server with a terminal
    • user@IP of your server
  • Then copy and paste each line one by one:
    • apt-get update
    • apt-get upgrade
    • sudo docker run -it –rm -p 6080:80 -p 5900:5900 -e VNC_PASSWORD=MyPassWordToReplaceByWhatYouWant dorowu/ubuntu-desktop-lxde-vnc

Setup Screaming Frog

  • Connect to https://IP-OF-YOUR-SERVER:6080/ with the password you used for “VNC_PASSWORD=”
  • Open a terminal in Ubuntu (in the NoVNC session – icon in the bottom left of the Ubuntu desktop)
  • Then copy and paste each line one by one:
    • sudo apt-get install screen wget
    • wget https://download.screamingfrog.co.uk/products/seo-spider/screamingfrogseospider_9.0_all.deb
    • dpkg -i screamingfrogseospider_9.0_all.deb
    • sudo apt-get -f install
  • Screaming Frog is now installed 🙂
  • You can try it here:
    • Bottom left icon > Internet > Screaming Frog SEO Spider
  • You will have to setup the Storage to disk to crawl huge websites
    • Screaming Frog > Configuration > System > Storage

Next step: create a snapshot of this in OVH (to be continued if you liked this article !)

Here is what ChatGPT can tell you about Screaming Frog and Docker:

Screaming Frog Docker is a containerized version of the popular web crawling tool Screaming Frog SEO Spider. It is designed to run on Docker, a platform that allows for easier and more efficient deployment of applications in a virtualized environment. The containerization of the tool provides several benefits, such as easier portability, faster deployment, and simplified configuration management.

Using Screaming Frog Docker can help SEO professionals and website owners to quickly and easily crawl websites, identify technical issues, and improve their website’s search engine optimization (SEO) performance. With the power of Docker, it is possible to run Screaming Frog on any platform that supports Docker, such as Linux, Windows, or macOS.

Overall, Screaming Frog Docker is a convenient and powerful tool that can simplify the web crawling process and help improve website performance.

chevron_left
chevron_right

Join the conversation

comment 1 comment
  • Top 5 Crawling SEO Solutions for Huge Website – Quentin Adt

    […] Screaming Frog V9 (Linux / Windows / Mac OS) -> The big plus: native function to cross data from G.A., Search Console, MajesticSEO, Ahrefs… + You can cross with your log files ! +/- 60 GB of disk space for 1 million URL crawled. Read this post to setup Screaming Frog on Remote Desktop Ubuntu Cloud instance […]

Leave a Reply to Top 5 Crawling SEO Solutions for Huge Website – Quentin Adt Cancel reply

Your email address will not be published. Required fields are marked *

Comment
Name
Email
Website