In our previous blog post Automate SEO factors testing using Behat - Part 1, we covered SEO factors - meta tags, keyword and image optimization. Here we are going to cover the remaining SEO factors.
Redirects are the technique to forward users and search engines from an old URL to the correct URL.
Types of Redirection:
- 301 Moved Permanently is a permanent redirect which is best to be implemented for SEO ranking.
- 302 Found/Moved Temporarily is a temporary redirect used indicates that requested resource has been temporarily moved to new URL
- 307 Moved Temporarily is also a temporary redirect similar to 302, the only difference is that the HTTP method remains the same in the request.
Given I am on “/redirect.php”
Then the response status code should be 301
And I should be redirected to “/redirect/redirect.php”
The robots.txt is a text file that instructs the search engines crawlers which page of the site is accessible. In a robots.txt file, we can specify allow or disallow rules for all user-agents or specific user-agent(s). When the file contains a rule that applies to only one user-agent, a robot will follow the URLs/sitemap specified for it. To ensure robots.txt file is found, always include it in the root domain. In case of robots.txt file is added in the subdirectory, it would not be discovered by robots and the complete page would be crawled.
User-agent: [user-agent name]
Disallow: [URL string not to be crawled]
Allow: [URL string to be crawled]
(Note: Only applicable for Googlebot)
Crawl-delay: [Time in seconds for the crawler to pause crawling the page]
Sitemap: [XML Sitemaps associated with the URL]
Given I am a "Googlebot" crawler
Then I should be able to crawl "/crawl-allowed"
The sitemap is an XML file that provides search engines with the list of URLs, available for crawling on a particular website. A sitemap is mostly useful when a website is new with few links or content is large or it has rich media content. In such scenarios, a sitemap provides Google with pages that are more valuable and informative on the website.
For large websites, we may end with many sitemaps. Here, we can split large sitemaps using the sitemap index. Following is the XML tags for sitemap index file :
- sitemapindex - The parent tag of the file.
- sitemap - The parent tag for each sitemap listed in the file
- loc - The location of each sitemap
Given the sitemap "/sitemap/valid-sitemap.xml"
Then the sitemap should be valid
Extension module “marcortola/behat-seo-contexts” also provides the following validations:
- Then the sitemap URLs should be alive
- Then the multilanguage sitemap should pass Google validation
- Then /^the sitemap should have ([0-9]+) children$/
Schema.org is a collaborative effort between Google, Bing, Yandex, and Yahoo to create structured data markup. This will provide information to search engines about the page and enhance rich results experience.
Given I am on homepage
Then the page HTML markup should be valid
HTTP Status code
HTTP status code is a three-digit response sent by a server for a browser's request.
Common status code classes:
- 1xxs – Informational responses
- 2xxs – Success!
- 3xxs –Redirection
- 4xxs – Client errors. It is a good practice to return a 404 error page when the correct URL is not found.
- 5xxs – Server errors
We can use the existing step - “Then the response status code should be 301”, to validate HTTP response code.
Hreflang tag is an attribute which helps search engines to show the correct version of the page based on a user's location and language preferences
Format: <link rel="alternate" href="http://example.com" hreflang="en-us" />
- rel = “alternate” - Indicates that content exists in alternate language(s)
- href - Specifies the URL of the content
- hreflang=“x” or “x-default” - Hreflang shows the relationship between web pages in the alternate languages.
- Format of “x” is language code or language - country code. It is used when a page exists in a particular language. For ex. hreflang = “es” or hreflang = “es-mx”. (Note: Language code is always before country code)
- hreflang="x-default" is used when there is no language/region match for a page.
One of the rules is that hreflang tags are bidirectional/reciprocal. Bidirectional means - When an English page is linked to Spanish page, then Spanish page must link back to the English page.
<link rel="alternate" href="https://www.qed42.com" hreflang="en" />
Given I am on “/valid-hreflang.html”
Then the page hreflang markup should be valid
Page Speed (page load time) is the measure of time taken to fully load content on a page.
Some of the ways to increase page speed :
- Enable Leverage browser caching for images, CSS and JS
- Minimize Redirects
- Optimize images
Extension module provides performance context that covers - Testing HTML minification, Testing CSS minification, Testing JS minification, Testing browser cache and Testing JS loading async or defer. Below is the snippet for CSS/JS minification.
Given I am on "/performance/html/minified.html"
Then HTML code should be minified
A URL (Uniform Resource Locator) is a human-readable text that specifies the location of the webpage on the internet. A URL has the following basic format: protocol://domain-name.top-level-domain/path
- Protocol - The protocol determines how to communicate data between the server and a web browser when sending/retrieving resources. HTTP and HTTPS (secure) are two of the most common protocols.
- Domain-name - It is a unique identifier or name of the website.
- Top-Level Domain (TLD) - It is an extension to the domain name. For example, .com, .net, .edu, .org, etc.
- Path - Path is the exact location of the page/file on the website. The path includes specific folders and/or subfolders where the resource is located.
SEO best practices for URL optimization:
- Make URL readable to human and search engines
- Match URL with the page title and heading
- Use relevant page keywords in the URL
- Remove dynamic parameters from the URL
Good URL - https://www.example.com/seo/meta-tags
Bad URL - https://www.example.com/seo?=id=54321
We have covered the major SEO factors that affect the ranking of the site in the SERP and how to automate them using Behat. Hope this blog was informational!
If you'd like to automate the SEO of your site, reach out to firstname.lastname@example.org.