- Screaming Frog SEO Spider home
- Operating Systems
Screaming Frog SEO Spider is a tool that help us to analyze some of the most important SEO factors of websites.
The creation of a web site is often a task that involves the work of several people with different tasks. Even with the maximum attention to all details, it is possible that there is some kind of bug in the design or structure of the web. To discover these failures, it is desirable to have a tool that monitor all of the web pages of our site and search for failures or problems.
Screaming Frog SEO Spider is an ideal tool to analyze and report website's problems. The program is easy to use, we only need to set some basic settings and input site's URL that we want to analyze. After a minute or hours, depending on the size and depth of the site, we will get a report with useful information that we will be able to filter and rearrange, so we can look at possible failures or errors in the website.
The first thing we must do is to set the “spider” (this is how the name for the program that will visit and collect information from the web pages that make up the site). Some of the options that we can set are the following:
- Check External Links: if our site has links to other sites, this option will check that these links are not broken.
- Follow “nofollow” internal or external links: with this option the spider will follow “nofollow” links or ignore them. This option is very useful if we want to know the total number of “dofollow” pages that our site contains.
- Crawl subdomains: if our site has multiple subdomains, and we want to “spider” it, we need to check this option.
- Ignore robots.txt file: if we have blocked certain areas of our site by using the file robots.txt. If we want our spider ignore that file and inspect all areas of the website we need to check this option.
- Limit total number of pages to crawl: with this option we will limit the number of pages that will be crawled by the “spider”.
- Limit depth search: with this option we can set the spider to only crawl few clicks away from home page.
- Request authentication: if any of the web pages that are part of the site is password-protected, by checking this option, the program will ask us to enter username and password to access and analyze protected page.
- Respect noindex: if we mark this option, the spider will not crawl those pages with the “noindex” meta tag. It's advisable to use this option if we want to know the total number of pages indexed in search engines like Google.
- Respect Canonical: if we check this option, the spider will not crawl those pages containing rel=”canonical” meta tag.
Activating the option of Respect noindex and Respect Canonical, we will have a full report of pages that will be indexed by search engines, since our spider will behave according to the rules that follow the majority of Internet search engines. Perhaps our website generate tens of thousands of pages, but if we set up correctly noindex and rel="canonical" metatags, the number of indexed pages will be reduced considerably, thus contributing to increase internal PR of important pages.
Once the spider has finished to crawl all webpages, the software generates a list of each of the pages inspected. This list includes the following information:
- URL: address of the web page analyzed
- Status code: reports HTTP status code. Thanks to that, we can see what page are not found (404), or which have server error (500).
- Title and length: displays URL's title and the number of characters it contains.
- Meta Description and length: shows the meta-description content and its lenght.
- H1, H2, and length: Displays the text of both labels (in case they exist) and its length.
- Size: the size, in bytes of the page. It is important to know web pages size, since a size too big will cause download speed to be slow, and download speed is a factor that is importatn to search engines rankings.
- Number of words: this field represents the number of words that exist on the analyzed page. This factor is also important, because the pages that contain only a few words are considered by search engines as pages of poor quality.
- Level: this refers to the depth level of page analyzed. For example, if a page has a 1-level, that points that this page is just a click away from the index of the site. This option is used to check if the pages that we believe are important to our site have a low level, because if they have a high level, the search engines will interpret that these pages are not as important as the lower levels.
- Inbound links: this field reflects the number of internal pages that link to analyzed page. This number is important, because we can know which pages are the most linked and check if those are the most important pages of our site. If not, we will be able to change the internal structure of our web site to link those pages that we consider most important.
- Outbound links: this is the number of links to other internal pages.
The program consists of different tabs that contains specific reports. For example, in the “Response Code” tab we will be able to see HTTP status codes (200, 404, etc.) and filter and organize by response code. Each tab includes additional information that is not displayed in the main listing. For example, in the “Response Code” tab we can see response time of crawled URLs.
Screaming Frog SEO Spider have the possibility to upload a sitemap file, so we can check if included URLs in this file are functioning properly. This option is very convenient, as the search engines are guided by the urls contained in the sitemap to inspect our site, so it is important that sitemap does not contain errors, and the URLs are valid.
We can export generated data to .csv files, and import them into excel or any other spreadsheet to manipulate the information in these files.
This software is developed in Java and is available to all major desktop operating systems: Windows, Mac, and Linux.
This software is paid, although it can be used without limit of time with certain restrictions in functionalities, such as: limitation of 500 URLs by site and inability to configure the spider to our taste.