Keeping your HTML valid with Zend Framework, Tidy and Firebug


With Zend Framework there is an easy way to ensure that you always create valid HTML in your applications. This involves the use of a simple Front Controller Plugin, and the php Tidy component.

Valid HTML is important for a great many reasons, the most important of which is ensuring consistency across all of your visitors browsers. The first step to making sure that your site appears correctly on all the browsers is to ensure that your HTML is valid. Even if the goons at Microsoft continue to ignore the standards and do their own thing, if you at least ensure your html passes validation, then fixing things for Internet Explo(r|it)er of all its versions is a far easier task, and usually possible with a few simple extra styling rules in your CSS.

What is a front controller plugin

A front controller plugin is like an observer for the front controller. It provides several events which can be hooked at the various stages of the dispatch cycle. These events are

  • routeStartup: Called before routing takes place
  • routeShutdown: Called after routing has occured, but before the dispatcher is invoked
  • dispatchLoopStartup: Called before the dispatchloop begins, essentially the same as routeShutdown
  • preDispatch: Called before an action is dispatched (also before preDispatch of your controller, and after its init)
  • postDispatch: Called after an action is dispatched
  • dispatchLoopShutdown: Called when the dispatching is complete

A simple example of a Front Controller plugin is to automatically change the layout based upon the active module.

For this, you may do something as follows:

class Lupi_Controller_Plugin_ModuleLayout extends Zend_Controller_Plugin_Abstract
{
    public function routeShutdown(Zend_Controller_Request_Abstract $request) {
        // Changes the Layout based on the module name
        $modulename = $request->getModuleName();
        $layout = Zend_Layout::getMvcInstance();
        $layout->setLayoutPath("../application/modules/{$modulename}/layouts")
                     ->setLayout($modulename);
    }
}

What is Tidy

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.[source:http://uk.php.net/manual/en/intro.tidy.php]

The Tidy extension for PHP provides numerous functions for assisting a developer with coding performant, valid, and accessible HTML.

It does this through a variety of methods, but generally it works in 3 steps. configure, parse and repair.

class Lupi_Filter_Tidy implements Zend_Filter_Interface
{
    /**
     * @var tidy
     */
    protected $_tidy;

    /**
     * @var tidy
     */
    protected $_encoding = 'UTF-8';

    /**
     * @var array
     */
    protected $_config = array('indent' => true,
                                         'output-xhtml' => true,
                                         'wrap' => false,
                                         'show-body-only' => true);

    /**
     * Filter the content with Tidy.
     *
     * @return string
     */
    public function filter($content)
    {
        $tidy = $this->getTidy($content);
        $tidy->cleanRepair();
        return (string) $tidy;
    }

    /**
     * Gets the Tidy object
     */
    public function getTidy($string)
    {
        if (!is_string($string)) {
            throw new InvalidArguementException('Expected string, got: ' . get_type($string));
        }

        if (null === $this->_tidy) {
            $this->_tidy = new tidy();
        }

        $this->_tidy->parseString($string, $this->_config, $this->_encoding);
        return $this->_tidy;
    }
}

The above example is a simple filter, which will correct and escape a HTML fragment, suitable for use in a Zend_Form where you wish to accept user input. (NOTE: Though Tidy does a damn  good job of cleaning up input, what it does not do is guarantee that the output will be XSS safe).

HTML Tidy has a huge number of options available, one of my favourites being “bare” which removes MS-Word specific attributes and styling which will ruin your output. (No, no matter how many times you tell your clients not to paste straight from word into your editor, they will still do it!). I won’t even try to explain all the options here, so instead, find a full list of the options available for tidy on its sourceforge site.

Using Tidy with a Front Controller Plugin

Ok, So you can use tidy for filtering user input, what about using it to effectivly clean my documents and ensure my output is always valid?

In the previous example, I set the “show-body-only” option, which will force the tidy component to only output the body of the document. This is needed because tidy would have added a doctype, html, head and body tags around the user input. this does set to an automatic option, and should only return the complete document if it detected a body tag, but why risk the user sticking a body tag in there?

For the next example, we have a Front controller plugin, which allows us to filter the output html of our application, so that we know we always have valid output.

<?php

class Lupi_Controller_Plugin_TidyOutput extends Zend_Controller_Plugin_Abstract
{
    /**
     * @var tidy|null
     */
     protected $_tidy;

    /**
     * @var array
     */
    protected static $_tidyConfig = array('indent'            => true,
                                          'indent-attributes' => true,
                                          'output-xhtml'      => true,
                                          'drop-proprietary-attributes' => true,
                                          'wrap'              => 120,
                                          );
    /**
     * @var string
     */
    protected static $_tidyEncoding = 'UTF8';

    public static function setConfig(array $config)
    {
        self::$_tidyConfig = $config;
    }

    public static function setEncoding($encoding)
    {
         if (!is_string($encoding)) {
             throw new InvalidArgumentException('Encoding must be a string');
         }
         self::$_tidyEncoding = $encoding;
    }

    protected function getTidy($string = null)
    {
        if (null === $this->_tidy) {
            if (null === $string) {
                $this->_tidy = new tidy();
            } else {
                $this->_tidy = tidy_parse_string($string,
                                                 self::$_tidyConfig,
                                                 self::$_tidyEncoding);
            }
        }
        return $this->_tidy;
    }

    public function dispatchLoopShutdown()
    {
        $response = $this->getResponse();
        $tidy     = $this->getTidy($response->getBody());
        $tidy->cleanRepair();
        $response->setBody((string) $tidy);
    }
}

When you are using this, it is a good idea where possible to use full page static caching, so your not fixing the same errors over and over again!

Instant feedback with Firebug + Firephp for assisting development

Ok, so now we have valid HTML, thanks to a filter. How does this help with actual development? well in short, it doesn’t, as we have no feedback about what its actually fixed. So onto the next step, getting some nice reporting, in a real handy manner. For this, we will use FirePHP, so that all the information we need is sent to the console on every request. This information can even include automated accessibility testing (really handy for government funded work, which usually has requirements on meeting accessibility standards).

So, firstly you need to set up your FirePHP Logger, this is a simple task, simply add the following method to your applications bootstrap:

protected function _initWildFire()
{
    //Don't use in production!
    if (APPLICATION_ENV != 'development') {
       return;
    }
    $this->bootstrap('db');
    $db = Zend_Db_Table::getDefaultAdapter();
    $profiler = new Zend_Db_Profiler_Firebug('All DB Queries');
    $profiler->setEnabled(true);
    $db->setProfiler($profiler);
    $writer = new Zend_Log_Writer_Firebug();
    $logger = new Zend_Log($writer);
    Zend_Registry::set('logger', $logger);
}

This simply sets up a logger, and also does something else useful, it adds a Profiler to the default database adapter, which will log all your queries for you. A little out of the scope of this post, but useful, so I left it in there for you.

The last 3 lines are really the important bit, they set up the write and log component which we will be using to send messages to FirePHP, so we can see validation errors and warnings in our FireBug console! Setting the logger in the registry here is also handy, so we can actually get our logger from anywhere without any hassle. you may also return it from the init method, and use the invoke args to get it.

Now for the really useful bit, the plugin itself.

class Lupi_Controller_Plugin_TidyOutput extends Zend_Controller_Plugin_Abstract
{
    /**
     * @var tidy|null
     */
    protected $_tidy;

    /**
     * @var array
     */
    protected static $_tidyConfig = array('indent'            =>true,
                                          'indent-attributes' => true,
                                          'output-xhtml'      => true,
                                          'drop-proprietary-attributes' => true,
                                          'wrap'              => 120,
    );

    protected static $_diagnose = true;

    /**
     * @var string
     */
    protected static $_tidyEncoding = 'UTF8';

    /**
     * Switch diagnosing HTML mode
     */
    public static function setDiagnose($diagnose = true)
    {
        self::$_diagnose = (bool) $diagnose;
    }

    public static function setConfig(array $config)
    {
        self::$_tidyConfig = $config;
    }

    public static function setEncoding($encoding)
    {
        if (!is_string($encoding)) {
            throw new InvalidArgumentException('Encoding must be a string');
        }
        self::$_tidyEncoding = $encoding;
    }

    protected function getTidy($string = null)
    {
        if (null === $this->_tidy) {
            if (null === $string) {
                $this->_tidy = new tidy();
            } else {
                $this->_tidy = tidy_parse_string($string,
                                                 self::$_tidyConfig,
                                                 self::$_tidyEncoding);
            }
        }
        return $this->_tidy;
    }

    public function dispatchLoopShutdown()
    {
        $response = $this->getResponse();
        $tidy     = $this->getTidy($response->getBody());

        if ('development' === APPLICATION_ENV) {
            if (true === self::$_diagnose ) {
                $tidy->diagnose();
                $lines = array_reverse(explode("\n", $tidy->errorBuffer));
                array_shift($lines);
                foreach ($lines as $line) {
                    Zend_Registry::get('logger')->log($line, Zend_Log::INFO);
                }
            }
        }
        $tidy->cleanRepair();
        $response->setBody((string) $tidy);
    }
}

So, hows it work? Well its essentially the same as the previous plugin, except I have added a section to the dispatchLoopShutdown method, and added a static method to enable / disable the logging output (sometimes it can get in the way!).

The reason for splitting the diagnosis output by line and sending it over multiple log calls, is because the Firebug console will not respect the newline characters, and instead tried to display it all on one line, making it hard to read. splitting it over multiple entries makes things much tidier. reversing the array also makes things a little easier to read!

In closing, heres some examples of the output for you:

An example of the Tidy output for valid xhtml

An example of the Tidy output for valid xhtml

An example of the Tidy output with some errors in it

An example of the Tidy output with some errors in it

  1. #1 by Christoph Dorn on January 29, 2010 - 11:32 pm

    Nice writeup. Excellent use of FirePHP!

  2. #2 by PHP Gangsta on January 31, 2010 - 3:25 am

    Very nice code, I will use that in my current project. Thanks for that!

  3. #3 by ami on January 31, 2010 - 11:09 am

    Keeping your HTML valid with Zend Framework, Tidy and ZFDebug ?

  4. #4 by Lucas CORBEAUX on January 31, 2010 - 11:36 am

    Excellent code, but I think tidying html is a bit hazardous in a production environment.

    If we have a Javascript code which rely on an invalid attribute, tidying the code in development environment is really useful to fix it, but in production we simply break the Javascript feature.

  5. #5 by ryan on January 31, 2010 - 6:20 pm

    @Lucas Thanks for the positive comment!

    It is possible to set tidy to not strip proprietary attributes from your HTML, the list of settings available to tidy is really quite extensive!!

    It is worth mentioning though, that no javascript library worth its salt actually requires these invalid attributes, they are usually simply a shortcut to get things done quicker.

  6. #6 by ryan on January 31, 2010 - 6:22 pm

    @ami Sorry, but I have no plans to write this for ZFDebug, I really don’t see the point, FirePHP provides everything I need, without adding to the DOM. amongst other reasons, running tidy against the DOM with the ZFDebug code added to it could result in misleading results, and ZFDebug (last time I looked) is not really extensible at all.

  7. #7 by David Caunt on January 31, 2010 - 7:18 pm

    Nice post Ryan. Document validity is important when it comes to resolving layout bugs and this kind of automation is a nice time saver.

  8. #8 by arslan on February 1, 2010 - 11:52 am

    hi ryan

    in _initWildFire() i have figured out that we have to set dbtable default adapter again in order to make query profile working. Instead of getting db adapter like this $db = Zend_Db_Table::getDefaultAdapter(); we can get it using registry and can define _initDbTable function which will set default adapter for zend_Db_Table, we will always reflect changes in db table

  9. #9 by philipp on February 1, 2010 - 7:42 pm

    hi ryan!

    great idea and a nice article!
    In my case tidy tried to modify other output contexts (json, pdf, …) generated by zf, too. So this doesn’t work …
    Adding a small format-param check solves this problem in my case:
    if($this->getRequest()->getParam(‘format’)) return;

    … but I’m sure there is a better solution!

  10. #10 by Romeo Adrian Cioaba on February 2, 2010 - 6:53 pm

    Bittarman,

    Great post :) I can’t wait to integrate it into my code :D

    Regards,
    mimir|on

  11. #11 by Jurian Sluiman on February 14, 2010 - 1:22 pm

    Hi,
    Great post. I see tidy has no support for html5, is that correct? All my html5 headers () is recognized as XHTML 1.0 Strict :/

  12. #12 by Ryan on February 18, 2010 - 8:07 pm

    Hi philipp,
    Thanks for the comment!
    You have raised a very good point. I will attach an update soon once I have mulled over a solution to automatically shut this off with for example the context switch.

  13. #13 by admin on February 18, 2010 - 8:08 pm

    Hello Jurian,

    as far as i know, this is correct. We have not yet embraced HTML5 where I work, so I have not had much chance to meddle with it yet either :(

  14. #14 by Christian on June 24, 2010 - 6:30 pm

    Simple idea with great impact. Thanks for this really great article!

    Bye
    Christian

  15. #15 by Joe Devon on November 4, 2010 - 9:19 am

  16. #16 by andrei on January 2, 2011 - 11:10 pm

    hmm. why to use FireBug instead of remote debugging in Zend Studio/Eclipse/NetBeans/Notepad++ ?

  17. #17 by Adrian on October 15, 2011 - 8:04 am

    Only just fount this, was there any update re HTML 5 ?

Comments are closed.