12 MINS
   //   Sep 22, 2020

Automate SEO factors testing using Behat - Part 1

Sonam Chaturvedi

SEO services are usually expensive and hard to find since they require a considerable time and technical expertise. A handful of SEO factors can be automated with the right tools.  In this blog, we will explore how to automate SEO factors using Behat.

Let’s understand what Search Engine Optimization is

Search Engine Optimization is the process of implementing a series of techniques on a website to improve your site’s visibility for relevant keywords on search engines.

How to automate SEO factors using Behat?

In order to automate various SEO factors using Behat, we will be using the extension module “marcortola/behat-seo-contexts” with our additional custom contexts. 

Let's understand the major SEO factors and how these can be automated.

| Meta Tags

Meta tags are the HTML tags that provide important information about the web page to the search engines. This metadata appears only in the page’s source code.

  1. Title tags - A title tag is an HTML element that specifies the title of a web page. The title tag is important for SEO because it appears as a clickable link in the search engine results page (SERP) and in browser tabs.

HTML Markup - <title>Drupal Development and Design </title>

Sample Scenario -

Given I am on homepage

Then the page title should be "Drupal Development and Design"

MetaContext method -

public function thePageTitleShouldBe(string $expectedTitle): void 
     {
$this->assertTitleElementExists();
$titleElement = $this->getTitleElement();
 Assert::notNull($titleElement);
 Assert::eq(
            $expectedTitle,
            $titleElement->getText(),
            sprintf(
                'Title tag is not "%s"',
                $expectedTitle
       )
 );
      }
private function getTitleElement(): ?NodeElement
    {
        return $this->getSession()->getPage()->find('css', 'title');
    }

 

  1. Meta Description - The meta description is an HTML attribute that provides a brief summary of a web page. Meta tag does not influence ranking, however, has an impact on the page's click-through rate (CTR).

HTML Markup - <meta name="description" content="A service open source company."/>

Sample Scenario - 

Given I am on homepage

Then the page meta description should be “A service open source company.”

MetaContext method - 

public function thePageMetaDescriptionShouldBe(string $expectedMetaDescription): void
 {
      $this->assertPageMetaDescriptionElementExists();
      $metaDescription = $this->getMetaDescriptionElement();
      Assert::notNull($metaDescription);
      Assert::eq(
$expectedMetaDescription,     $metaDescription->getAttribute('content'),
 sprintf( 'Meta description is not "%s"', $expectedMetaDescription)
        );
}
private function getMetaDescriptionElement(): ?NodeElement
    {
        return $this->getSession()->getPage()->find(
   'xpath',
            '//head/meta[@name="description"]'
        );
    }
 
  1. Canonical Tags - A canonical tag is an HTML link tag which tells the search engine that a specific URL is the original copy of the page. Canonicalization is important because it controls your duplicate content.

 HTML Markup - <link rel="canonical" href="http://example.com/" />

Sample Scenario -

Given I am on homepage

Then the page canonical should be “http://example.com/”

MetaContext method -

public function thePageCanonicalShouldBe(string $expectedCanonicalUrl): void
 {
          $this->assertCanonicalElementExists();
          $canonicalElement = $this->getCanonicalElement();
          Assert::notNull($canonicalElement);
          Assert::eq(
    $this->toAbsoluteUrl($expectedCanonicalUrl),
    $canonicalElement->getAttribute('href'),
    sprintf('Canonical url should be "%s"',    $this->toAbsoluteUrl($expectedCanonicalUrl))
            )
}
private function getCanonicalElement(): ?NodeElement
    {
        return $this->getSession()->getPage()->find(
            'xpath',
            '//head/link[@rel="canonical"]'
        );
    }
 
  1. Alternative Text Tag - Alt text is an HTML attribute that describes the image on a web page. Alt text renders on the page when the respective image fails to render. This text helps the search engines to better interpret the image and rank the website.

We have added below custom “AccessibilityContext” context to validate the alt text attribute, as this is not supported by the extended module.

HTML Markup - <img src="/themes/qed42/logo.svg" alt="Home" />

Sample Scenario -

Given I am on homepage

Then the images should have alt text

Accessibility Context method -

public function theImagesShouldHaveAltText(): void
  {
        $imageElements = $this->getImageElement();
        foreach($imageElements as $imageElement)
        {
          Assert::notNull($imageElement);
          $imageAlt = $imageElement->getAttribute('alt');
          Assert::notEmpty($imageAlt,'Alt Text is empty for image: ' + $imageElement);
     }
   }

private function getImageElement(): ?NodeElement
    {
        return $this->getSession()->getPage()->find('css', 'img');
    }
 
  1. Robot Meta Tag - Robot meta tag provides instructions to search engine bots on how to crawl and index content of the web page. The robot tag has four main values for the crawlers:
    • follow - The bot will follow all the links in that webpage
    • index - The bot will index the whole webpage
    • nofollow - The bot will NOT follow the page and any links in that webpage
    • noindex - The bot will NOT index that webpage

Note: When the robot tag is missing, crawler considers the site to be indexed and followed. 

The extended module provides support for validating only index/no index values of robots tag. Therefore, we have added below custom context to validate the nofollow/follow values.

 HTML Markup - <meta name="robots" content="INDEX, NOFOLLOW" />

 Sample Scenario -

Given I am on homepage

Then the page should be nofollow

MetaContext method -

public function thePageShouldBeNofollow(): void
    {
        $metaRobotsElement = $this->getMetaRobotsElement();
        Assert::notNull(
            $metaRobotsElement,
            'Meta robots does not exist.'
        );
        Assert::contains(
            strtolower($metaRobotsElement->getAttribute('content') ?? ''),
            'nofollow',
            sprintf(
                'Url %s is not nofollow: %s',
                $this->getCurrentUrl(),
                $metaRobotsElement->getHtml()
            )
        );
    }
private function getMetaRobotsElement(): ?NodeElement
    {
        return $this->getSession()->getPage()->find(
            'xpath',
            '//head/meta[@name="robots"]'
        );
    }
 
  1. Open Graph Meta Tags - Open Graph tags control what content / URLs are displayed when shared on Facebook. The four required properties for every page are:
    • og:title - The title of the page/content/object as it should appear on Facebook, e.g., "QED42".
    • og:type - The type of the object, e.g. blog, articles. Depending on the type you specify, other properties may also be required.
    • og:image - An image URL which represents the object. Images must be either PNG, JPEG and GIF formats and at least 50px by 50px.
    • og:url - The canonical URL of the object that will be used as its permanent ID, e.g., "https://www.qed42.comwww.qed42.com".

                 The following properties are optional for any object and are generally recommended:

  1. og:audio - A URL to an audio file to accompany this object.
  2. og:description - A one to two sentence description of the object.
  3. og:determiner - The word that appears before this object's title in a sentence. An enum of (a, an, the, "", auto). If auto is chosen, the consumer of your data should choose between "a" or "an". Default is "" (blank).
  4. og:locale - The locale these tags are marked up in. Of the format language_TERRITORY. Default is en_US.
  5. og:locale:alternate - An array of other locales this page is available in.
  6. og:site_name - The name of the site which should be displayed for the overall site. e.g., "QED42".
  7. og:video - A URL to a video file that complements this object.

 HTML Markup - <meta property="og:url" content="https://www.qed42.comwww.qed42.com" />

 Sample Scenario -

Given I am on homepage

Then the Facebook Open Graph data should satisfy full requirements

SocialContext method -

private function validateFacebookOpenGraphData(): void
{
        Assert::notEmpty(
            filter_var($this->getOGMetaContent('og:url'), FILTER_VALIDATE_URL)
        );
        Assert::eq(
            $this->getOGMetaContent('og:url'),
            $this->getCurrentUrl(),
            'OG meta og:url does not match expected url'
        );
 
        $this->getOGMetaContent('og:title');
        $this->getOGMetaContent('og:description');
        Assert::notEmpty(
            filter_var($this->getOGMetaContent('og:image'), FILTER_VALIDATE_URL)
        );
        $pathInfo = pathinfo($this->getOGMetaContent('og:image'));
Assert::keyExists($pathInfo, 'extension');
        if (isset($pathInfo['extension'])) {
            Assert::oneOf(
                $pathInfo['extension'],
                ['jpg', 'jpeg', 'png', 'gif'],
                'OG meta og:image has valid extension. Allowed are: jpg/jpeg, png, gif'
            );
        }
}
 private function getOGMetaContent(string $property): string
    {
        $ogMeta = $this->getSession()->getPage()->find(
            'xpath',
            sprintf('//head/meta[@property="%1$s" or @name="%1$s"]', $property)
        );
       Assert::notNull(
            $ogMeta,
            sprintf('Open Graph meta %s does not exist', $property)
        ); 
        Assert::notEmpty(
            $ogMeta->getAttribute('content'),
            sprintf('Open Graph meta %s should not be empty', $property)
        );
   return $ogMeta->getAttribute('content') ?? '';
    }
 
  1. Twitter Cards tags -
    • Twitter:card - It describes the type of content. The card type can have one of these values - “summary”, “summary_large_image”, “app”, or “player”.
    • Twitter:title - Title of content 
    • Twitter:description - The summary of the content.
    • Twitter:url - A canonical URL for the content
    • Twitter:image - A URL to a unique image representing the content of the page

 HTML Markup - <meta name="twitter:url" content="https://www.qed42.com" />

 Sample Scenario -

Given I am on homepage

Then the Facebook Open Graph data should satisfy full requirements

SocialContext method -

private function validateFullTwitterOpenGraphData(): void
    {
        $this->validateTwitterOpenGraphData();
        Assert::notEmpty(
            filter_var($this->getOGMetaContent('twitter:image'), FILTER_VALIDATE_URL)
 );
        $pathInfo = pathinfo($this->getOGMetaContent('twitter:image'));

        Assert::keyExists($pathInfo, 'extension');
        if (isset($pathInfo['extension'])) {
            Assert::oneOf(
                $pathInfo['extension'],
                ['jpg', 'jpeg', 'webp', 'png', 'gif'],
                'OG meta twitter:image has valid extension. Allowed are: jpg/jpeg, png, webp, gif'
            );
        }
        $this->getOGMetaContent('twitter:description');
        $this->getOGMetaContent('twitter:url');
    }

Note: Here, getOGMetaContent() method is the same as mentioned in the Open Graph Meta Tags section.

  1. Header Tags - Header tags provide hierarchy and context for a page. The heading elements go from H1 (most important) to H6 (least important). Text with an H1 tag indicates the search engine that it’s the most important text on that page. This factor does not have much impact on the ranking of the site on SERP.
  2. Responsive Design Meta Tag - Viewport meta tags are used for responsive websites. The viewport meta tag allows web designers to scale/size pages and display on any device.
  • width=device-width: Sets the width of the page to follow the screen-width of the device.
  • initial-scale=1: Sets the initial zoom level when the page is first loaded by the browser.

We have added below custom “UXContext” context to validate the viewport tag, as this is not supported by the extended module.

 HTML Markup - <meta name="viewport" content="width=device-width, initial-scale=1">

 Sample Scenario -

Given I am on "/ux/site-with-valid-viewport.html"

Then the site should be responsive

UXContext method -

public function theSiteShouldBeResponsive(): void
    {
        $viewportElement = $this->getViewportElement();
        Assert::notNull($viewportElement);
        $expectedViewportContent = "width=device-width, initial-scale=1";

        $viewportContent = $viewportElement->getAttribute('content');
        Assert::eq(
            $expectedViewportContent,
            $viewportContent,
            'Site does not support responsive design'
        );
    }

private function getViewportElement(): ?NodeElement
  {
      return $this->getSession()->getPage()->find(
          'xpath',
     '//head/meta[@name="viewport"]'
        );
  }

| Keyword Optimization

Keyword optimization is the process of identifying and selecting keywords to be incorporated into a website's content, which will drive traffic from the search engines to the website. By analyzing the words searched, we can understand what the user needs.

We can achieve keyword optimization, by using keywords in - title tag, meta description, URL, links, keywords in the image alt attribute. You can refer to the Meta Tag section on how we can optimise keywords using various meta tags.

| Image Optimization

Image optimization involves two things:

  • Reducing the size of the image without losing quality
  • Giving appropriate name and identity to the image - ALT Tag

A high number of images on a web page impacts page load time. And Google uses page load time as a factor for ranking. Therefore, we need to ensure the quantity of image, size of image and quality of the image added on a web page. 

We have covered the SEO factors - Meta tags, Keyword, content and image optimization. Stay tuned for our next blog in this series to discover more SEO factors like Automation to validate image optimization is covered under Page Speed SEO factor.

If you'd like to automate the SEO of your site, reach out to [email protected]