Searching Engine from Sun Yat-sen University
Yat-Search Engine is an efficient text search engine that supports multiple file loading, precise query, fuzzy query and regular expression query. This project aims to provide a fast and accurate text search solution, with persistent storage and logging functions, and supports Chinese interface.
It should be noted that Chinese search can only be completed through the "Fuzzy Search" option, and there needs to be a "." symbol at the end.
This project is for learning and communication purposes only, please do not use it for commercial purposes. The original author is not responsible for any consequences arising from the use of this project.
The current version is still under development and some features may not be fully implemented yet. Code contributions and suggestions are welcome!
The major assignment for the "Data Structure and Algorithm" course of the School of Computer Science, Sun Yat-sen University in the fall semester of 2024.
Current version: v1.4.0
└──Yat-Search-Engine
├── CMakeLists.txt
├── LICENSE
├── dat
│ └── index.dat
├── include
│ ├── RegexSearch.h
│ └── TextSearchEngine.h
├── log
│ └── query_log.txt
├── logo_pic
│ ├── logo.txt
│ └── logo_out.txt
├── readme.md
├── source-text
│ ├── bible.txt
│ ├── text1.txt
│ ├── text2.txt
│ └── otherfile...
└── src
├── RegexSearch.cpp
├── TextSearchEngine.cpp
└── main.cpp
Hash table contains query v1.0.0
CustomHash
, and use unordered_map
in TextSearchEngine
to combine the hash function to optimize query performance.File reading v1.0.1
Multiple file support (completed) v1.1.0
main.cpp
to allow users to enter multiple file names, and the system will load these files one by one for indexing.Support regular expressions (completed) v1.1.1
RegexSearch
class, use the C++ <regex>
library to implement the query function based on regular expressions, and add corresponding options to the main menu.Precise query (completed) v1.1.2
exactQuery
method, efficiently find exact matching keywords through unordered_map
, and display the matching results.Visual search results, such as imitating the way the compiler reports errors (completed) v1.2.0
^
to mark the position of the keyword in the sentence, similar to the compiler's error indication, to enhance readability.Unlimited query, exit button (completed) v1.2.1
New hash function (completed) v1.3.1
CustomHash
function and improve the efficiency of hash conflict processing to further improve query performance.Support Chinese (completed) v1.4.0
Performance optimization and preprocessing (completed) v1.3.1
User interface (completed) v1.2.2
Persistent storage (completed) v1.3.0
Log function, record query history, etc. (Completed) v1.3.2
query_log.txt
to facilitate users to view history and debug.Support Chinese output (completed) v1.4.0
The compilation method is changed from mingW to cmake (completed) v1.4.0
Enter all to add all files (completed) v1.4.0
Please make sure to run it in a Linux environment. Chinese garbled characters may occur in a Windows environment. Enter the working directory and clone the project locally:
git clone https://github.com/ouyangyipeng/Yat-Search-Engine.git
cd Yat-Search-Engine
Make sure you have CMake and a C++17-capable compiler installed, such as g++
or clang++
. The cmakelists.txt file has been configured with compilation options. Just execute the following command to compile the project. Note that the default compiler is located in /usr/bin/g++
. If you need to use other compilers, please modify the CMakeLists.txt file.
mkdir build # 假如没有build文件夹
cd build
cmake ..
cmake --build .
./YatSearchEngine
Place the text file you want to search into the source-text folder and make sure the file is saved in .txt format.
Start the program: After running the executable file, the program will display the welcome interface and prompt to press Enter to continue.
Load file: Enter the file to be loaded according to the prompts. Enter a file name each time (it must end with .txt), and enter done to complete the file selection.
Select query type:
Enter 1 for a precise query.
Enter 2 for fuzzy query.
Enter 3 for a regular expression query.
Enter 4 to exit the program.
Enter query content: According to the selected query type, enter the corresponding keywords or regular expressions.
View the results: The query results will display matching sentences and their location markers.
Exit the program: Select the exit option, the program will save the index and record the log before exiting.
The query operation log is saved in the query_log.txt file. Users can view historical query records and operation logs through a text editor.
Code contributions and suggestions are welcome! Please submit a Pull Request or give us your feedback in Issues.
This project is licensed under the MIT license. See the LICENSE file for details.