In response to the opinions of the majority of users, we have improved this product and released a "Personal Edition" for free use, allowing you to better experience the product.
V1.1 has been improved to include search rankings, search codes, index directory management, web page weight settings and other functions to improve retrieval and increase search speed.
System introduction
K-PageSearch is a web search engine independently developed by Kwindsoft in 2007 and designed specifically for industry and specialized information retrieval. Main functional features: web spider, directional collection, text extraction, Chinese word segmentation, full-text index, relevance ranking, web page snapshot, related search, bidding ranking; the backend database uses Microsoft SQL Server, and the static search system design uses XML data island cache search As a result, the stability and performance of the system are improved, server resources are saved and the system burden is reduced.
web spider
The K-wind spider component includes three major functional modules: link collection, web page analysis, and invalid web page scanning;
Automatically identify web page encodings such as GB2312, BIG5, UTF-8, and Unicode;
File type verification prevents the collection of non-text type files;
K-wind spider can collect dynamic data web pages such as ASP, PHP, JSP, etc. and static web pages such as HTML, SHTML, XHTML, etc.;
Supports the resume collection function. If the collection is terminated due to system, network, etc. failures, the system will prompt you whether to "continue collection" or "end the task" when starting collection next time;
The collection task management function can set up multiple collection tasks to schedule work, and each collection task will be run in sequence;
Directional collection
Specifying the collection of specific web pages and collecting specialized information web pages is a key technology for vertical search engines to improve content quality and relevance.
Link contains keywords: keywords that must be included in the link; for example: download|mp3|soft; you can use "|" to separate multiple containing keywords;
Link excluded keywords: keywords not included in the link; for example: download|mp3|soft; you can use "|" to separate multiple excluded keywords;
Web page contains keywords: keywords that must be included in the web page; for example: K style | web page | search; you can use "|" to separate multiple containing keywords;
Web page excluded keywords: keywords not included in the web page; for example: K style | web page | search; you can use "|" to separate multiple excluded keywords;
Text extraction
The text extraction component independently developed by Kwindsoft is used to extract the central theme content of a web page and filter out information unrelated to the theme of the web page (advertising, navigation, columns and other non-web page text content information). This technology effectively ensures the quality of web page information collection, improves retrieval relevance, intelligently identifies and accurately extracts the text of web pages, and the accuracy of extracting and identifying content web pages reaches more than 80%.
Chinese word segmentation
The Chinese word segmentation component independently developed by Kwindsoft can recognize Chinese and English words. With special symbol filtering function.
Word segmentation effect demonstration original text: Kwindsoft Search World! K-PageSearch★ A web search engine designed specifically for industry and specialized information retrieval. Main functional features: web spider, directional collection, text extraction, Chinese word segmentation, full-text index, relevance ranking, web page snapshot, related search, bidding ranking; the background database uses Microsoft SQL Server, static search system design Use XML data island to cache search results to improve system stability and performance, save server resources and reduce system burden.
Word segmentation: Kwindsoft Search World KPageSearch is a web search engine designed specifically for industry-specific information retrieval. Main functions and features: Web spiders directional collection, text extraction, Chinese word segmentation, full-text indexing, relevance sorting, web snapshots, related searches, bidding rankings, background database using Microsoft SQL Server static search system Designed to use XML data islands to cache search results to improve system stability and performance, save server resources and reduce system burden
Full text index
Full-text indexing is one of the key technologies of current search engines. This system uses Microsoft SQL Server full-text engine. Full-text indexing enables powerful and fast retrieval by indexing every word in a specified database.
Relevance sorting
The system determines the ranking of results based on relevance calculations, and sorts based on keyword weight and frequency of occurrence to make search results more accurate.
Web page snapshot
What should I do if a search result doesn't open or opens slowly? "Web Snapshot" can help you solve the problem. Web page snapshots are stored on the server in text format. If the original web page has been modified, deleted, or blocked, we can also use the "web page snapshot" function to browse the content of the original web page. Web page snapshots require a large amount of storage space. You can set the web page snapshot function to be turned on or off. When turned off, the system will not save web page snapshots.
Related searches
Related searches refer to keywords that are similar and similar to the search keywords. These related search keywords are calculated based on the keyword records used by all users in the past. When the keywords searched by the user meet the conditions, the system will automatically record them and make statistics. You can click "More Related Searches" to view the search statistics of the keywords. Related searches help you find more valuable results faster.
PPC
A convenient and practical professional bidding ranking system that can bid and rank the website rankings, right-side recommendations, and E-click information submitted by members. The higher the bid, the higher the ranking. After registration, members can self-manage bidding information in one stop. Reasonable billing model, the same IP client clicks the same bidding information multiple times in the same day and is only billed once. The system can set the minimum recharge and IP click consumption amount, and members can recharge their accounts online in real time by themselves or the system administrator can do it on their behalf.
Software interface (click on the picture to see a clear larger picture):