Tip
If you have any questions about the operation and implementation of this project, or if you have better optimization suggestions for this project, you can contact me directly or leave an issue in the repository.
This project is a search engine implementation based on the Boost library, aiming to provide an efficient and accurate search system specifically for searching Boost documents. By elaborating on the search engine building process, from data preprocessing to index creation, to search query processing and result presentation, this project shows how to build a complete search engine system. The technology stack covers C++, Boost library, Html, CSS and JavaScript, realizing back-end index construction and front-end user interaction. The basic functions of the project are comprehensive, and advanced features such as word frequency statistics, dynamically updated indexes, and search result priority sorting have been added, which significantly improves search efficiency and accuracy. This makes the search engine particularly suitable for developers to quickly find the technical documents they need when using Boost libraries, greatly improving development efficiency and document accessibility.
Searching for a large number of documents and the content contained in the documents is obviously a very time-consuming and labor-intensive behavior. If you directly traverse and access them one by one, the service will basically not respond for a long time. For this reason, a faster and more convenient way is needed to plan and manage a large amount of data to achieve fast search. Building an index is the core of solving this problem.
The so-called index is to attach a label to the document and quickly search based on the label. Managing labels is much less stressful than managing documents, which is the essential reason for building an index.
Boost library is a general term for some C++ libraries that provide extensions to the C++ language standard library. It is developed and maintained by the Boost community. Boost library can work perfectly with the C++ standard library and provide extended functions for it. The boost website provides a large number of documents. Implementing a search engine can help us find the documents we need accurately and quickly in a large number of documents.
Backend: C/C+, C++11, STL, Boost, Jsoncpp, cppjieba, cpp-httplib
Frontend: html5, css, js, jQuery, Ajax
Effect:
Backend effects:
Since the project does not implement crawler services, the method of downloading data to the local computer is adopted here. The data HTML file or directory can be placed in the following directory.
Specific steps can be found at:word.md-chapter
boost-search-engine/search-engine/data/input
[!NOTE] The environment I use is:
Linux ubuntu-linux-22-04-desktop 5.15.0-113-generic #123-Ubuntu SMP Mon Jun 10 08:16:46 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Install CMake:
Install Visual Studio:
Install Boost library:
C:Librariesboost_1_75_0
cd C:Librariesboost_1_75_0
.bootstrap.bat
.b2.exe
BOOST_ROOT
to the directory where Boost is installed.Install jsoncpp:
vcpkg install jsoncpp
Configure CMake project:
BOOST_ROOT
).Install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
Install CMake and dependent libraries:
brew install cmake boost jsoncpp
Configure CMake project:
mkdir build && cd build
cmake ..
make
Linux (Ubuntu, CentOS)
sudo apt-get update
sudo apt-get install cmake g++ libboost-all-dev libjsoncpp-dev
sudo yum install cmake gcc-c++ boost-devel jsoncpp-devel
Configure CMake project:
mkdir build && cd build
cmake ..
make
You can also use makefile
to compile directly:
make
[!TIP]
- Make sure the paths are set correctly on all platforms, especially on Windows, where you may need to manually set the paths to some libraries.
- For different Linux distributions, the installation commands and available packages may be slightly different, so adjust them accordingly.
- When building on Windows with Visual Studio, make sure to select the correct architecture (x86 or x64) to match the version of the library.
Official Links:
https://github.com/yanyiwu/cppjieba
Link the cppjieba
directory into the project boost-search-engine/search-engine/include
directory.
Enter the cppjieba
directory
Link the dict
dictionary library component and the limonp
component into cppjieba
.
Parse the data.
./parser
As shown in the figure, the operation is successful. If the operation fails, you can check the error message. It may be that the path configuration is incorrect. You can configure the path yourself in the code.
Start the service:
As shown in the figure, the startup is successful.
Of course, other methods can also be used to deploy to the background service, such as:
nohup ./server > log/log.txt 2>&1 &
You can also use some other methods, such as tmux, etc.
Use a browser to access the 8081 port number of the IP. The port number is set in ./src/server.cc
.
The log part can be further improved.