| Application - Installation - Development - Documentation - Deep Learning - FAQ - TIPS - EA Half Hour - Technical Topics |
kumo search
is an end-to-end search engine framework that supports full-text retrieval, inverted index, forward index, sorting, caching, index layering, intervention system, feature collection, offline computing, storage system and other functions. kumo search
runs on the EA
(Elastic automic infrastructure architecture) platform and supports functions such as工程自动化
,服务治理
,实时数据
,服务降级与容灾
in multiple computer rooms and multiple clusters.
With the development of the Internet, searching the entire Internet is no longer the only way to obtain information. Many vertical information services, such as e-commerce, social networking, news, etc., have their own search engines. The characteristics of these search engines are: medium data volume, complex business, and high user experience requirements. The development of these search engines requires a lot of engineering and algorithm support. kumo search
aims to provide an out-of-the-box search engine framework to help users quickly build their own search engines. On this framework, users can write business logic in python
through the AOT compiler in the project. The framework will automatically generate c++
code and a binary dynamic library, which will be dynamically updated into the search engine. This enables rapid iteration of search engines.
serial number | Project name | illustrate | illustrate |
---|---|---|---|
1 | collie | Reference external header only libraries such as jason, toml, etc., for unified management | |
2 | turbo | hash, log, container class, string related operations | |
3 | melon | rpc communication | |
4 | alkaid | File system encapsulation, local files, hdfs, s3, etc. | File system unified api, zlib, lz4, zst unified api |
5 | mizar | Based on rocksdb, toplingdb storage engine core | The wisekey function is yet to be developed. For the time being, the official version of rocksdb will be used. |
6 | alioth Yuheng | table memory | Under development |
7 | megreztianquan | Data set reading and writing | hdf5 cvs bin has been completed and advanced c++api is to be encapsulated |
8 | phekda | Unified vector engine accesses api UnifiedIndex to simplify the interface | Support snapshot, filter plug-in |
9 | meraktianxuan | Comprehensive search engine core | To be developed |
10 | dubhe Tianshu | nlp kernel | To be developed |
11 | flare | GPU, CPU high-dimensional tensor calculation, etc. | |
12 | theia | Based on opengl graphics and image display, the server is not available (no display device) | |
13 | dwarf | jupyter protocol c++ kernel | |
14 | exodus | hercules and other jupyter applications | Finish |
15 | hercules | python aot compiler | |
16 | carbin | c++ package manager, cmake generator | Finish |
17 | carbin-template | cmake template library | Finish |
18 | carbin-recipes | carbin recipes depend on library custom configuration | Finish |
18 | hadar | suggest search suggestion service kernel | Nearly completed, not open source for commercial use |
19 | neptune | ea front-end service | Under development |
serial number | Project name | illustrate | schedule |
---|---|---|---|
1 | sirius | EA metadata server service discovery, global clock service, global configuration service, global id service | Finish |
2 | polaris | Vector engine stand-alone service | Finish |
3 | elnath | Comprehensive search and citation stand-alone service | Under development |
4 | vega | Vector Engine Database Cluster Edition | Complete commercial use but not open source |
5 | arcturus | Comprehensive search engine cluster version | In development, not open source for commercial use |
6 | pollux | Integrated engine business console | In development, not open source for commercial use |
7 | capella | ltr sorting service | In development, not open source for commercial use |
8 | aldebaran | suggest search suggestion service cluster | In development, not open source for commercial use |
9 | nunki | nlp service | In development, not open source for commercial use |
The half-hour series focuses on quickly building enterprise-level application services based on EA
infrastructure, focusing on practical operations, quick start, quick development, quick deployment, and quick iteration.
**This topic mainly introduces the basic knowledge of search engines, as well as the evolution, upgrade and design of search architecture with the development of search technology and search business, as well as the technical principles and implementation behind it. **
EA
is the infrastructure of server-side applications. EA
currently supports centos
and ubuntu
operating systems. The mac
system is currently under development and we will try our best to support mac
system. However, we have not tried it yet. To facilitate compilation and IDE development, some subsequent functions may be tried for compatibility. For basic environment deployment, see Installation and Use
cicd
of the EA
system is managed using the carbin tool. carbin
is a c++
package manager, cmake
generator, cicd
tool. carbin
can download third-party dependent libraries, generate a cmake
build system, and compile and deploy projects. For the use of carbin
see carbin docs
carbin | conda | cmake | CPM | conan | bazel | |
---|---|---|---|---|---|---|
Usage complexity | easy | middle | hard | middle | hard | hard |
Installation difficulty | pip easy | binary easy | NA easy | cmake | pip easy | binary hard |
dependency pattern | source/binary | binary | source | source | source/binary | source |
dependency tree | support | support | support | support | support | support |
local source code | support | NA | support | support | NA | support |
compatibility | good | middle | good | good | good | poor |
speed | good | middle | poor | poor | good | poor |
conda is a good management tool. I did not choose conda because its compilation dependencies are relatively complex and compilation options often cause problems, making it not suitable for compiling C++ projects. The management tool that comes with cmake is not suitable for the management of large projects. Each time you recompile the project, it may cause the dependent libraries to be re-downloaded, and the compilation time will be too long. CPM is a C++ package manager. Similarly, in the domestic network environment, downloading dependent libraries is slow and not suitable for the management of large projects. Conan is a C++ package manager, but the download speed of conan's dependent libraries is slow and is not suitable for the management of large projects.
At the same time, carbin is also very suitable for the management of C++ projects. Carbin can quickly generate a C++ project management cmake system, unifying the project compilation process, option configuration, and variable rules for installation and export after compilation. EA
system projects can find projects and packages through the fixed rule find_package
Project object. It is also suitable for use in any cmake
based project.
If you develop based on docker, EA
provides basic development of ea inf container:
centos7-openssl11-python-310-gcc-9.3:
lijippy/ea_inf:c7_base_v1