Back in the good old days, when HTML standards were a moving target, it didn't matter whether you ended the <p> tag correctly or kept your formatting rules separate from your style code. Mismatched tags, missing attributes, poorly nested elements - the lack of widely adopted standards results in these and other mistakes, but because most browsers have built-in intelligence to avoid these mistakes, most developers You won't be aware of their existence at all.
Although the browser itself is trying to fix these errors, this does not mean you can ignore these problems. In order for your web pages to behave consistently in all browsers, your HTML must be completely consistent with the rules and syntax defined in the W3C standard. There are many tools to achieve this requirement, both online and offline. This article will Discuss one of them: the very cool HTML Tidy.
HTML Tidy is a free HTML checking tool. It is designed to check your HTML code and point out places where it does not fully comply with W3C published standards. It can be used to analyze an HTML file or a string containing HTML statements. , and can automatically make the necessary modifications to bring the code into compliance with the relevant standards.
Install
HTML Tidy is free and can run on Windows, Macintosh and *NIX platforms. Binary versions are available for immediate use. If you are running a *NIX platform, you may prefer to compile and install it from source code yourself. Here's what you can do: Extract the source files to your temporary folder and perform a basic compile-install process, like this:
shell> cd /tmp/tidy/build/gmake
shell> make
shell> make install
When this process is complete, you should be able to find a compiled binary version of Tidy in the /tmp/tidy/bin/tidy folder. Copy this file to your system folder /usr/local/bin / so it is easier to access. Now you are ready to use this tool.
Basic usage
Once the binary version is installed, you can immediately start using it to verify HTML code. Listing A shows a simple example:
List A:
shell> tidy -e -q index.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 2 column 1 - Warning: inserting missing 'title' element
line 4 column 1 - Warning: <body> proprietary attribute leftmargin
line 6 column 1 - Warning: <table> proprietary attribute height
line 6 column 1 - Warning: <table> lacks summary attribute
line 11 column 37 - Warning: <img> lacks alt attribute
line 15 column 1 - Warning: <table> lacks summary attribute
line 17 column 50 - Warning: <img> lacks alt attribute
In this example, Tidy found eight potential errors in the file and printed a warning for each error. Note that these errors are not serious errors, but just warning that some parts of the code are not very correct.
You can automatically correct the original file by adding the -m (modifier) option to the command line:
shell> tidy -m -q index.html
If you need to test a large website, you can use wildcards on the command line to test all files in a folder (instead of just one):
shell> tidy -m -q *.html
If you want Tidy to help write the corrected web page to a new file (rather than overwriting the original one), use the -output option with a new file name, as in the following example:
shell> tidy -output index.html.new -q index.html
You can output all errors to a separate log file for later review via the -e ("error") option:
shell> tidy -f error.log index.html
Also note that if your HTML code contains embedded PHP, ASP or JSP code, Tidy will simply ignore them and leave them in place, meaning you can even run Tidy tools on server-side scripts , to check the HTML code part, this is an example:
shell> tidy -e -q processor.php
You can also run the Tidy tool interactively, calling just the program file without appending any arguments. In this example, Tidy waits for input from the console and checks for errors. Listing B shows an example of this:
List B
shell> tidy
<html>
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
<head>
<title>This is a test
</head>
line 3 column 1 - Warning: missing </title> before </head>
<body leftmargin=0>
<p>
This is a badly terminated paragraph
</body>
</html>
line 5 column 1 - Warning: <body> proprietary attribute leftmargin
Info: Document content looks like HTML Proprietary
3 warnings, 0 errors were found!
Note that in addition to giving you real-time error warnings, Tidy can also print out the correct version of the code at the end of the input:
<html>
<head>
<meta name=generator content=
HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org>
<title>This is a test</title>
</head>
<body leftmargin=0>
<p>This is a badly terminated paragraph</p>
</body>
</html>
Advanced applications
You can also control how Tidy modifies a file. This can be achieved by passing specific parameters on the command line. For example, to have Tidy re-indent the code correctly, you can add the -i ("indent") option. :
shell> tidy -output new.html -i index.html
To replace <font> and other formatting elements related to CSS style rules, you can use the -c ("clear") option:
shell> tidy -output new.html -c index.html
By default, Tidy uses lowercase letters for all tags and attributes in HTML files. If you want to use uppercase letters, you can add the -u ("capital letters") option, as shown in the following example:
shell> tidy -output new.html -c -u index.html
To wrap text at a specific line width, you can add the -w ("line wrap") option with the specified line width, as shown in the following example:
shell> tidy -output new.html -w 40 index.html
You can convert an HTML document to a well-formed XHTML document by adding the -asxhtml option:
shell> tidy -output new.html -asxhtml index.html
The reverse operation is possible via the -ashtml option:
shell> tidy -output new.html -ashtml index.html
If you need to make extensive adjustments to Tidy's default options, it's best to put these options in a separate configuration file that you can reference every time you call the program. Listing C shows an example of a configuration file:
List C:
bare: yes # remove proprietary HTML
doctype: auto # set the doctype
drop-empty-paras: yes # automatically delete empty <p> tags
fix-backslash: yes # replace by / in URLs
literal-attributes: yes # retain whitespace in attribute values
lower-literals: yes # convert attribute values to lower case
output-xhtml: yes # produce valid XHTML output
quote-ampersand: yes # replace & with &
quote-marks: yes # replace with
repeated-attributes: keep-last # use the last of duplicated attributes
indent: yes # automatically indent code
indent-spaces: 2 # number of spaces to indent by
wrap-php: no # wrap text contained in PHP tags
char-encoding: ascii # character encoding to use
tidy-mark: no # omit Tidy meta information in corrected code
When organizing a file, you can tell Tidy to use these settings by adding the -config option to the command line:
shell> tidy -output a.html -configconfig.tidy index.html
You can get a list of configuration options with the -help-config option:
shell> tidy -help-config...quote-ampersand Boolean y/n,
yes/no, t/f, true/false, 1/0quote-marks Boolean y/n,
yes/no, t/f, true/false, 1/0quote-nbsp Boolean y/n,
yes/no, t/f, true/false, 1/0repeated-attributesenum keep-first,
keep-lastreplace-color Boolean y/n, yes/no,
t/f, true/false, 1/0show-body-only Boolean y/n,
yes/no, t/f, true/false, 1/0...
Or use the -show-config option to view a snapshot of the current configuration settings:
shell> tidy -show-config...show-body-only
Boolean noshow-errors Integer
6show-warnings Boolean yesslide-style
Stringsplit Boolean no...
Finally, you can use the -h option to get help from the command line:
shell> tidy -h
That's all for now. Hopefully you'll find Tidy an extremely valuable tool in helping your site become fully compliant with W3C publishing standards. The points in this guide will give you an idea of how to control the HTML Tidy tool. to manipulate your code and also help you use this tool more efficiently.