I wrote an article last week to keep your website synchronized with Sina's news data. Some netizens became interested, so I decided to share with you the pseudo-original system mentioned in it and introduce the principle of its implementation. , this system is also covered in my Sisyphus Workshop.
After all, a search engine is still a machine. By changing the title, replacing some words, shuffling some chapters, inserting some links, etc., it can achieve the purpose of pseudo-originality. There are currently similar pseudo-original tools on the Internet, but they still require manual operation. Generate, so I want to make a fully automatic, unsupervised automatic pseudo-original system. Combined with the automatic collection program, the process of collection->warehousing->pseudo-original can be realized, and the entire process can be managed by no one and has real-time sex.
Closer to home, a better way to change words without affecting the semantics of the article is to use synonyms to replace them. So I thought the first step was to build a thesaurus. After searching for such a database on the Internet to no avail, I decided to find a related website. After collecting, I found that Kingsoft PowerWord can meet my requirements very well. Through the collection, I established a vocabulary library with tens of thousands of pieces of data.
Then the key words are replaced, so how to replace them and which ones to replace? My idea is to first segment the article into several phrases, and then search for the ones with a length greater than two Chinese characters in the thesaurus. If If yes, then replace it. I use python to implement this process. In addition, in order to speed up the synonyms, you can use key-value storage. Some key codes are as follows:
def getnewword(text, list):
cxn. execute("select id from tool_words where name='%s' limit 1"%text)
result=cxn. fetchone()
if type(result) is not NoneType:
cxn. execute("select name from tool_wordslike where wid=%d order by rand() limit 1"%result[0])
result4=cxn. fetchone()
if type(result4) is not NoneType:
list[text]=result4[0]
def cuttest(text, flag):
list={}
wlist = seg. cut(text)
wlist. reverse()
result=""
for tmp in wlist:
iflen(tmp)>1:
if flag==1:
getnewword(tmp,list)
if flag==1:
result=""
for k in list. iterkeys():
result+=k+","+list[k]+";"
else:
result+=tmp+";";
return result
But after all, the pseudo-original system is also a program. It is certainly impossible to completely guarantee the inappropriateness of semantics and the smoothness of sentences. It is mainly provided to those experts who are garbage dumps. Haha, I remember that one article on my website was quite funny after conversion. http ://www.xxfsw.com/show24047.html , Russian academician Ginzburg, winner of the Nobel Prize in Physics, passed away. As a result, his death turned into death. I was speechless. Of course, in addition to the replacement of synonyms, there are also the reversal of paragraphs, insertion of links, etc. These are relatively easy to implement, so I won’t go into details. Everyone chooses according to the implementation situation. Later, I also thought of some methods to achieve the display to search engines. Using pseudo-original content will provide users with pre-pseudo-original content. This achieves the goal without affecting the user experience. However, I don’t know how dangerous this is and whether it will be manually detected by Baidu.
Ever since, after all this trouble, Baidu Spider came to your site and was shocked: Oops, I haven’t seen the content of this article before! I’ve accepted it.