Regarding the recent revelation that Qihoo 360 comprehensive search ignored the internationally accepted Roberts protocol and crawled the content of search engines such as Baidu and Google, resulting in the leakage of intranet information that many websites did not allow search engines to crawl due to security and privacy concerns, a senior Internet observer Hong Bo pointed out that when conducting searches, one must abide by the generally recognized rules of the game in the search industry. Ignoring the rules and wantonly violating the rules is the real unfair competition. This kind of behavior cannot be stopped in time by law and government supervision, and will cause chaos in the industry.
The principle of a search engine is to automatically collect web pages on the Internet and obtain relevant information through a crawler spider program. In view of network security and privacy considerations, each website will set up its own Robots protocol to clearly indicate to search engines which content is willing and allowed to be included by the search engine, and which content is not allowed. The search engine will give its own permissions to crawl according to the Robots protocol. The Robots protocol has become an international practice that all search engines must comply with. This is just like a normal person who goes to someone else's home and needs to knock on the door first and get permission before entering the living room. Unless further permission and invitation from the owner is given, you may not enter the inner room without permission or wander around other people's homes.
Therefore, when the newly launched 360 Comprehensive Search two weeks ago ignored the Robots protocol and directly captured unauthorized information and data, its approach was generally questioned by industry insiders.
It is understood that the Baidu website’s Robots agreement does not authorize the 360 search crawler to crawl, but the 360 search ignores this setting and implements the crawling behavior without authorization. Considering that many content source websites prohibit search engines from crawling web pages, most of them involve backend databases, user privacy, passwords and other information stored on the server. This means that 360 ignores the settings in the robots.txt protocol of the content source website, which will cause private information stored on the server that should not be searched to be searched, or even displayed directly in the search results.
Regarding being accused of violating the robots agreement, Zhou Hongyi could no longer deny it because the facts were there, but he also retorted that Baidu's ban on 360 crawlers in the robots agreement was unfair competition. Hong Bo said in this regard that the robots agreement gives websites the right to ban any search crawlers, and this has nothing to do with unfair competition. 360 ignores the default rules of the industry, which is the real unfair competition.
"When doing search, you must abide by the generally recognized rules of the game in the search industry. Ignoring the rules and wantonly violating the rules is the real unfair competition." In Hong Bo's view, Baidu does not prohibit all crawlers from crawling the content of Q&A, Zhiba, and Tieba. Baidu It only bans irregular crawlers that pose potential security risks. This is a reasonable measure to protect market order and protect user privacy. He pointed out that Taobao also banned Baidu crawlers in 2008, but Baidu strictly abided by the robots agreement and stopped crawling Taobao content. It did not violate the robots agreement on the pretext of unfair competition on Taobao.
360 has always boasted that it uses an innovative way to search. Hong Bo expressed his opinion in one sentence: "How can a search engine that doesn't even follow the basic rules of the game have the nerve to label itself 'innovative'? Maybe in In Zhou Hongyi's dictionary, ignoring rules equals innovation. "Hong Bo said that if such behavior is not stopped by law and government supervision in time, then what 360 illegally grabs today is Baidu content, and tomorrow it will be able to grab a lot of private content at will. Other websites and search engines can imitate the community information of the website. Yitao, which is blocked by JD.com, can also capture the product information of competitors. By analogy, the entire Internet industry will be in chaos.