Explain the relationship between web standards and SEO thoroughly

Author：Eve Cole Update Time：2011-04-06 15:01:19

-
Two years ago, I started to get in touch with SEO. The craze for learning at that time made me visit various domestic forums and blogs, and used all the methods that I didn’t know whether they were useful or useless in my practice. But as time went by, Deep learning. I became suspicious of various methods circulating on the Internet. When the well-known methods of sending out links, writing soft articles, stacking keywords, etc. were exhausted, I was at the end of my rope. I still couldn't beat them in rankings and traffic. At the same time, I was exhausted. I also had to reflect on the deeper and more effective operation methods of SEO. After countless twists and turns, I returned to my old field of "programming and front-end development". It seemed that overnight, it suddenly became clear what I am doing now. Isn’t it the best SEO?

To be honest, my study is relatively closed. I have not reached the state of "the best SEO is no SEO", nor do I have a very good SEO practical experience. What I often think about is how to make my current job better. Integrated into SEO, if I were to give a definition of SEO now, it would be: network + hardware + program + site structure + web standards + content + people. Many people on the Internet are discussing the concept of "content is king", but Many other factors are ignored. If these factors are explained in detail. It is estimated that a very thick book can be published. This article just wants to share with you the impact of WEB standards on SEO.

Text begins:

To understand the relationship between web standards and SEO, you must first understand what "web standards" are. I guess you have checked a lot of explanation documents on the Internet, but you still feel a bit confused and confused. I don't want to learn from the Internet. I’ll copy a paragraph for you, but in the end I still can’t understand it. To understand web standards, you have to start by building a basic web page:

For example: If I want to write the simplest web page, I must use html tags. For example: if I want to emphasize the text, I have to use the <strong> tag. If I want to change the text color, I have to add a <font color="color" > tag, I want to start a new paragraph, so I have to use the < > tag. I can’t use the meaningless tag <jacu> to emphasize the text, because there is no such tag at all, and the browser cannot parse it, so W3C (World Wide Web) Association, an organization) stood up and said to Internet practitioners around the world: "Everyone has some opinions, let's unify these labels, which ones can be used and which ones cannot be used; and then everyone will give these labels a unified , reasonable explanation, so that everyone can understand what these labels are used for." After countless discussions. As a result, the HTML 1.0 standard was finally introduced. After subsequent modifications and updates, more web standards gradually became available, such as HTML 2.0. .html 4.01, the most commonly used xmhtml1.0/1.1 in everyone’s web pages, and the xmhtml 2.0 standard that has not yet been officially released. Standard updates are all forward compatible. When we make web pages, there is usually at the top of the web page Such a sentence:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">

This actually defines your document model, which is explained using the xhtml 1.0 standard.

But later on, web page layout became more and more complex. It was impossible to create a beautiful and beautiful page just by relying on these HTML tags. It had to be assisted by some other tools. For example, I wanted to offset a certain picture by 20px, or I wanted to space the text. 5px, it is really difficult to achieve by just relying on HTML. At this time, W3C couldn't sit still anymore, so it stood up and called: "Let's define something more to achieve this function." After countless discussions, the CSS 1.0 standard was released. Using this, you can easily achieve content offset, spacing and other effects. After development, we got to CSS 2.0 and CSS 3.0. Everyone must follow this standard when defining styles with CSS.

Later, people discovered that relying solely on html and CSS was still not perfect. It lacks human-computer interface interaction and cannot achieve dynamic effects. It would be even better if we could make things on the web page move, so w3c introduced the emascript standard, which stipulates the document object model interface. Grammar etc. For example, the commonly used javascript conforms to the emascript standard.

OK, now everything seems to be perfect. With html standards, css standards, and emascript standards, we can finally make beautiful web pages. We gather these standards together to form web standards. So what kind of web pages are in line with web Standard:

For example, a piece of html is written like this

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">
<html>
<head>
<title>demo</title>
</head>
<body>
<p><font color="#ff0000">Text content</font><p>

<img src="x.jpg" />
<dl>
<dt><h1>Title</h1></dt>
<dd>Content</dd>
<dd>Content</dd>
<dl>
<b>Content</b>
</body>

So does this code comply with web standards? Let's analyze these codes again. In the first line, you define that your document type is xhtml 1.0, which means that all your html tags must be written in compliance with this standard. In the first line of the body In a <p> tag, the font tag has been deprecated in this standard, and the color attribute has also been deprecated in this tag, so this paragraph does not comply with the web standard. Let’s look at the <img> tag again. Its The align attribute defines the alignment of the image, but the alt attribute is missing. In the xhtml 1.0 standard, img must define the alt attribute. So this code does not comply with the 1.0 standard. Look at the dl tag, dt defines the title, The <h1> tag is nested, as defined by xhtml 1.0. Nesting of <h1> tags is not allowed in <dt> tags, so it also does not meet the 1.0 standard. Look at the last <b> tag, thank God. This tag finally complies with web standards. But w3c has said it. We will retain the meaning of this label for now. However, it is still recommended that you use the <strong> tag, which is more semantic. In the new standard later, we may cancel the <b> tag as a standard tag. Regarding the constraints of the HTML standard, please check the corresponding documents.

Speaking of which. I think everyone understands. This page does not even comply with the xmhtml 1.0 standard, so it definitely does not comply with the web standard. As for whether it conforms to the web standard, it all depends on the version you defined. But this code can be parsed normally in the browser, because we have mentioned above As I said, standards are all forward compatible, but they just don't conform to the standards you define now. So how do I make this code conform to my web standards? There are only two ways. 1. Lower the standard of your document model (this may cause more trouble) 2. Re-modify your code, such as putting the color in the style attribute, img plus alt attribute. In comparison, we are more willing to choose The second type.

There is an explanation on the Internet: web standard = div + css. Table layout cannot be used. After reading the above article, it is not difficult for us to understand. This concept is purely confusing and overgeneralizing. It cannot be said that web pages with table layout do not comply with web standards. W3C has never defined that using table layout does not comply with standards. The <table> tag has always been the standard tag in all versions. Although we all use divs for layout, we need to understand: the practices recommended by others are not equal to standards.

As mentioned earlier, web standards depend on the version we define when writing html/css/js. For example, if my html uses the xhtml 1.0 standard, then my html should also comply with the xhtml 1.0 standard. But this does not seem to be the case. Almost 99.999% of the web pages on the Internet cannot pass verification. There are always errors of one kind or another. All pages on the official website of w3c: http://www.w3.org can pass verification. Yes, interested friends can test it. At this point, our article seems to have reached a dead end. Since so many web pages do not comply with web standards, they can also achieve good rankings and traffic. Then web standards What is the connection with SEO? We have to start with html structure and parsing.

Web design emphasizes the separation of structure (html) and presentation (css). We can understand their concepts in this way. The structure is a house. It is a shelf made of reinforced concrete and bricks, and the performance is the decoration and modification of the structure. It is like decoration, installing floors and plastering and painting the walls of the house. Without structure, performance has no actual performance value, which is why <font color="#ccc" size="12">text</font> or Such tags or properties, because for the structure, it is more like a performance, it should stay in the presentation layer, that is, CSS. If we apply the font tag on the xhtml 1.0 strict page, in fact it also It can be parsed correctly, because as we said in the first article, the standards are forward compatible.

Let’s understand how browsers and search engines parse our html. Why do we talk about browsers here? Because in my opinion, search engines and browsers use roughly the same method when parsing html. When crawling web pages After coming down, the HTML parsing begins, which will eventually parse the entire page into a DOM tree with strict parent-child relationship nodes. And then present it to the user, for example when I write the following code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">
<html xmlns=" http://www.w3.org/1999/xhtml ">
<head>
<title>title</title>
</head>
<body>
<div id="top">
<h1>This is the title<h1>
<img src="xx.jpg"/>
<p>This is a <strong>text</strong><p>
</div>
<div id="container">
<h2>This is another title</h2>
<p>This is another <strong>text</strong></p>
</div>
</body>
</html>

You can see that this is a piece of html under the xhtml 1.0 transition standard. There are many errors (errors include: the <h1> tag in the first div has no closing tag. img has no alt attribute. The <p> tag also has no closing tag). But if you put this code in the browser and execute it, you can see the correct effect. The <h1> tag works. The P tag also works, and the picture can be displayed. We are very surprised why this code does not even have the correct tag, but why it can be parsed correctly in the browser. If we assume that this code is not wrong, it is correct. The dom structure should be as shown below (Figure 1)

Upload and download attachment (16.49 KB) at 16:58 the day before yesterday
Why can the browser correctly parse the incorrect code? And it seems to be able to "guess" the true intention of the error code. The principle is that the browser uses dictionary analysis mode and collation mode (html tidy) when building the tag tree. Simply put, the browser will match all tags and attributes with the information in the built-in dictionary. If the match is normal, it will be parsed directly. If the match is not normal, it will be parsed directly. Just enable the finishing mode. The finishing mode will analyze your erroneous code and fix it. For example, the <h1> and <p> tags at the end of the above will be automatically changed to the closing tag, or if you write a <jiacu> text</ jiacu> tag pair. This cannot be matched at all and cannot be repaired. It will directly clear the invalid tag pair, leaving only the text inside. Of course, when browsing parses HTML into a DOM tree, it will not change your HTML source code. It is just a parsing action. Therefore, many times if we do not verify the HTML errors on our pages, we will not find these errors. Because the browser has automatically fixed it for us. Generally speaking, browsers ensure full compatibility with errors in HTML. Correct it if it can help you. If the redundant tags or attributes can be cleared, they will be cleared. If they cannot be cleared and corrected, the tags will be automatically removed for you to ensure normal display.

However, the "organizing mode" is not omnipotent. We cannot expect the browser to help us fix all errors, so many times when our pages are nested deeper and deeper, with more and more tags and more and more content, Sometimes, when the browser cannot correct the tags, the only thing it can do is to "remove all tags within an error block and keep only the content."

From a search engine's perspective, before analyzing content, its premise is the same as that of a browser, which requires building a complete DOM tree. Only when this tree is completed can the search engine determine the context relationship in the page, as well as your Which weighted (such as <strong>, <h1>) tags are used in the page, as well as their distribution positions, etc. However, search engines place more emphasis on the concept of "content block" when parsing, that is, one tag per block. Still the above html example. When the search engine was building this DOM tree, when it parsed the <h1> tag in the first div, it found that there was an error. When it parsed the P tag, it encountered another error. At this time, in order to correctly build this A DOM tree, it will enable the finishing mode, but the mode at this time may not help you fix errors, but in "blocks". Search for the superior block (node) of the error block (node) (if there is still an error at the upper level, continue to search for the upper level). If there is no error in the upper level block, then all sub-blocks and sub-blocks in this upper-level block will be searched. All erroneous tags in the sub-block are removed, that is to say, all erroneous tags within <div id="top"> are removed. The final DOM tree constructed is as shown in Figure 2 above (2011.4.5 revision: There is a small mistake in Figure 2. There is an img tag under the div tag on the left).

In this way, we see that the <h1> and <strong> tags we carefully wrote have disappeared after parsing, and the "weight" of the entire block has shifted. According to the principle of HTML parsing, we can easily draw some in conclusion:

1. When the page node level becomes more and more, we must be particularly careful about label level errors. The closer to the top node, the more careful we must be. For example, we should write less end tags. This impact may be fatal to SEO.

2. No matter what layout you use, the fewer node nesting levels the better. Firstly, it can reduce the burden on search engines when parsing nodes. Secondly, it is easier for search engines to determine the (context) relationship between nodes. Secondly, The weighting of keywords is important.

3. When the attributes of the label can be replaced by css, move them to css as much as possible.

4. Both browsers and search engines allow html errors, but standard html is obviously easier to obtain better rankings under the same external conditions.

It took me nearly four hours to write this article. Some parts are not very thorough. I will share them in the third article.

Article source: Lightyear Forum (please indicate the source link and author when reprinting)

Author of the article: newyhj