Descarga blast - Descarga del código fuente blast

Este proyecto ha sido asumido por Phalanx.

Este proyecto no se ha mantenido durante mucho tiempo.

Explosión

Blast es un servidor de indexación y búsqueda de texto completo escrito en Go y construido sobre Bleve.
Proporciona funciones a través de gRPC (HTTP/2 + Protocol Buffers) o API RESTful tradicional (HTTP/1.1 + JSON).
Blast implementa un algoritmo de consenso Raft de hashicorp/raft. Logra consenso entre todos los nodos, asegurando que cada cambio realizado en el sistema se realice en un quórum de nodos, o en ninguno. Blast facilita a los programadores el desarrollo de aplicaciones de búsqueda con funciones avanzadas.

Características

Búsqueda/indexación de texto completo
búsqueda facetada
Búsqueda espacial/geoespacial
Resaltado de resultados de búsqueda
Replicación de índice
Sacando a relucir el grupo
Una API HTTP fácil de usar
CLI está disponible
La imagen del contenedor Docker está disponible

Instalar dependencias de compilación

Blast requiere algunas bibliotecas C/C++ si necesita habilitar cld2, icu, libstemmer o leveldb. Las siguientes secciones son instrucciones para satisfacer dependencias en plataformas particulares.

ubuntu 18.10

$ sudo apt-get update
$ sudo apt-get install -y 
    libicu-dev 
    libstemmer-dev 
    libleveldb-dev 
    gcc-4.8 
    g++-4.8 
    build-essential

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 80
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 90
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 90

$ export GOPATH= ${HOME} /go
$ mkdir -p ${GOPATH} /src/github.com/blevesearch
$ cd ${GOPATH} /src/github.com/blevesearch
$ git clone https://github.com/blevesearch/cld2.git
$ cd ${GOPATH} /src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ ./compile_libs.sh
$ sudo cp * .so /usr/local/lib

macOS Alta Sierra Versión 10.13.6

$ brew install 
    icu4c 
    leveldb

$ export GOPATH= ${HOME} /go
$ go get -u -v github.com/blevesearch/cld2
$ cd ${GOPATH} /src/github.com/blevesearch/cld2
$ git clone https://github.com/CLD2Owners/cld2.git
$ cd cld2/internal
$ perl -p -i -e ' s/soname=/install_name,/ ' compile_libs.sh
$ ./compile_libs.sh
$ sudo cp * .so /usr/local/lib

Construir

Explosión de construcción de la siguiente manera:

$ mkdir -p ${GOPATH} /src/github.com/mosuka
$ cd ${GOPATH} /src/github.com/mosuka
$ git clone https://github.com/mosuka/blast.git
$ cd blast
$ make

Si omite GOOS o GOARCH , creará el binario de la plataforma que está utilizando.
Si desea especificar la plataforma de destino, configure las variables de entorno GOOS y GOARCH .

linux

$ make GOOS=linux build

macos

$ make GOOS=darwin build

ventanas

$ make GOOS=windows build

Construir con extensiones

Blast admite algunas extensiones Bleve (blevex). Si desea compilar con ellos, configure CGO_LDFLAGS, CGO_CFLAGS, CGO_ENABLED y BUILD_TAGS. Por ejemplo, cree LevelDB para que esté disponible para el almacenamiento de índices de la siguiente manera:

$ make GOOS=linux 
       BUILD_TAGS=icu 
       CGO_ENABLED=1 
       build

linux

$ make GOOS=linux 
       BUILD_TAGS= " kagome icu libstemmer cld2 " 
       CGO_ENABLED=1 
       build

macos

$ make GOOS=darwin 
       BUILD_TAGS= " kagome icu libstemmer cld2 " 
       CGO_ENABLED=1 
       CGO_LDFLAGS= " -L/usr/local/opt/icu4c/lib " 
       CGO_CFLAGS= " -I/usr/local/opt/icu4c/include " 
       build

construir banderas

Consulte la siguiente tabla para conocer los indicadores de compilación de las extensiones de Bleve compatibles:

BUILD_TAGS	CGO_ENABLED	Descripción
cld2	1	Habilitar detector de idioma compacto
kagome	0	Habilitar el analizador de idioma japonés
uci	1	Habilitar tokenizador de UCI y analizador de idioma tailandés
libstemmer	1	Habilitar lematización de idiomas (danés, alemán, inglés, español, finlandés, francés, húngaro, italiano, holandés, noruego, portugués, rumano, ruso, sueco, turco)

Si desea habilitar la función cuyo CGO_ENABLE es 1 , instálela consultando la sección Instalar dependencias de compilación anterior.

Binario

Puede ver el archivo binario cuando se compila correctamente de esta manera:

$ ls ./bin
blast

Prueba

Si desea probar sus cambios, ejecute un comando como el siguiente:

$ make test

Si desea especificar la plataforma de destino, configure las variables de entorno GOOS y GOARCH de la misma manera que en la compilación.

Paquete

Para crear un paquete de distribución, ejecute el siguiente comando:

$ make dist

Configurar

Blast puede cambiar sus opciones de inicio con archivos de configuración, variables de entorno y argumentos de línea de comando.
Consulte la siguiente tabla para conocer las opciones que se pueden configurar.

Bandera CLI	variable de entorno	Archivo de configuración	Descripción
--archivo de configuración	-	-	archivo de configuración. si se omite, se buscará blast.yaml en /etc y en el directorio de inicio
--identificación	BLAST_ID	identificación	ID de nodo
--dirección-balsa	BLAST_RAFT_ADDRESS	dirección_balsa	Dirección de escucha del servidor Raft
--dirección-grpc	BLAST_GRPC_ADDRESS	dirección_grpc	Dirección de escucha del servidor gRPC
--dirección http	BLAST_HTTP_ADDRESS	dirección_http	Dirección de escucha del servidor HTTP
--directorio-datos	BLAST_DATA_DIRECTORY	directorio_datos	directorio de datos que almacena el índice y los registros de Raft
--archivo de mapeo	BLAST_MAPPING_FILE	archivo_mapeo	ruta al archivo de mapeo de índice
--peer-grpc-dirección	BLAST_PEER_GRPC_ADDRESS	dirección_peer_grpc	dirección de escucha del servidor gRPC existente en el clúster que se une
--archivo-certificado	BLAST_CERTIFICATE_FILE	archivo_certificado	ruta al archivo de certificado TLS del servidor cliente
--archivo-clave	BLAST_KEY_FILE	archivo_clave	ruta al archivo de clave TLS del servidor cliente
--nombre-común	BLAST_COMMON_NAME	nombre_común	nombre común del certificado
--cors-métodos-permitidos	BLAST_CORS_ALLOWED_METHODS	cors_allowed_methods	Métodos permitidos por CORS (por ejemplo: GET,PUT,DELETE,POST)
--cors-orígenes-permitidos	BLAST_CORS_ALLOWED_ORIGINS	cors_allowed_origins	Orígenes permitidos por CORS (por ejemplo: http://localhost:8080,http://localhost:80)
--cors-encabezados-permitidos	BLAST_CORS_ALLOWED_HEADERS	cors_allowed_headers	Encabezados permitidos por CORS (por ejemplo: tipo de contenido, x-alguna-clave)
--nivel de registro	BLAST_LOG_LEVEL	nivel_registro	nivel de registro
--archivo de registro	BLAST_LOG_FILE	archivo_registro	archivo de registro
--log-max-tamaño	BLAST_LOG_MAX_SIZE	log_max_size	tamaño máximo de un archivo de registro en megabytes
--log-max-copias de seguridad	BLAST_LOG_MAX_BACKUPS	log_max_backups	recuento máximo de copias de seguridad de archivos de registro
--log-max-edad	BLAST_LOG_MAX_AGE	log_max_age	antigüedad máxima de un archivo de registro en días
--log-comprimir	BLAST_LOG_COMPRESS	log_comprimir	comprimir un archivo de registro

Comenzar

Iniciar el servidor es fácil de la siguiente manera:

$ ./bin/blast start 
              --id=node1 
              --raft-address=:7000 
              --http-address=:8000 
              --grpc-address=:9000 
              --data-directory=/tmp/blast/node1 
              --mapping-file=./examples/example_mapping.json

Puede obtener la información del nodo con el siguiente comando:

$ ./bin/blast node | jq .

o la siguiente URL:

$ curl -X GET http://localhost:8000/v1/node | jq .

El resultado del comando anterior es:

{
  "node" : {
    "raft_address" : " :7000 " ,
    "metadata" : {
      "grpc_address" : " :9000 " ,
      "http_address" : " :8000 "
    },
    "state" : " Leader "
  }
}

control de salud

Puede comprobar el estado de salud del nodo.

$ ./bin/blast healthcheck | jq .

También proporciona las siguientes API REST

La vivacidad demuestra

Este punto final siempre devuelve 200 y debe usarse para verificar el estado del servidor.

$ curl -X GET http://localhost:8000/v1/liveness_check | jq .

Sonda de preparación

Este punto final devuelve 200 cuando el servidor está listo para atender tráfico (es decir, responder a consultas).

$ curl -X GET http://localhost:8000/v1/readiness_check | jq .

poner un documento

Para poner un documento, ejecute el siguiente comando:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X PUT ' http://127.0.0.1:8000/v1/documents/1 ' --data-binary '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' | jq .

$ curl -X PUT ' http://127.0.0.1:8000/v1/documents/1 ' -H " Content-Type: application/json " --data-binary @./examples/example_doc_1.json

obtener un documento

Para obtener un documento, ejecute el siguiente comando:

$ ./bin/blast get 1 | jq .

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X GET ' http://127.0.0.1:8000/v1/documents/1 ' | jq .

Puedes ver el resultado. El resultado del comando anterior es:

{
  "fields" : {
    "_type" : " example " ,
    "text" : " A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web. " ,
    "timestamp" : " 2018-07-04T05:41:00Z " ,
    "title" : " Search engine (computing) "
  }
}

Buscar documentos

Para buscar documentos, ejecute el siguiente comando:

$ ./bin/blast search '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X POST ' http://127.0.0.1:8000/v1/search ' --data-binary '
{
  "search_request": {
    "query": {
      "query": "+_all:search"
    },
    "size": 10,
    "from": 0,
    "fields": [
      "*"
    ],
    "sort": [
      "-_score"
    ]
  }
}
' | jq .

Puedes ver el resultado. El resultado del comando anterior es:

{
  "search_result" : {
    "facets" : null ,
    "hits" : [
      {
        "fields" : {
          "_type" : " example " ,
          "text" : " A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web. " ,
          "timestamp" : " 2018-07-04T05:41:00Z " ,
          "title" : " Search engine (computing) "
        },
        "id" : " 1 " ,
        "index" : " /tmp/blast/node1/index " ,
        "score" : 0.09703538256409851 ,
        "sort" : [
          " _score "
        ]
      }
    ],
    "max_score" : 0.09703538256409851 ,
    "request" : {
      "explain" : false ,
      "facets" : null ,
      "fields" : [
        " * "
      ],
      "from" : 0 ,
      "highlight" : null ,
      "includeLocations" : false ,
      "query" : {
        "query" : " +_all:search "
      },
      "search_after" : null ,
      "search_before" : null ,
      "size" : 10 ,
      "sort" : [
        " -_score "
      ]
    },
    "status" : {
      "failed" : 0 ,
      "successful" : 1 ,
      "total" : 1
    },
    "took" : 171880 ,
    "total_hits" : 1
  }
}

Eliminar un documento

Eliminando un documento, ejecute el siguiente comando:

$ ./bin/blast delete 1

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X DELETE ' http://127.0.0.1:8000/v1/documents/1 '

Indexar documentos de forma masiva

Para indexar documentos de forma masiva, ejecute el siguiente comando:

$ ./bin/blast bulk-index --file ./examples/example_bulk_index.json

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X PUT ' http://127.0.0.1:8000/v1/documents ' -H " Content-Type: application/x-ndjson " --data-binary @./examples/example_bulk_index.json

Eliminar documentos de forma masiva

Para eliminar documentos de forma masiva, ejecute el siguiente comando:

$ ./bin/blast bulk-delete --file ./examples/example_bulk_delete.txt

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X DELETE ' http://127.0.0.1:8000/v1/documents ' -H " Content-Type: text/plain " --data-binary @./examples/example_bulk_delete.txt

Creando un grupo

Blast es fácil de abrir el grupo. el nodo ya está ejecutándose, pero no es tolerante a fallas. Si necesita aumentar la tolerancia a fallas, abra 2 nodos de datos más así:

$ ./bin/blast start 
              --id=node2 
              --raft-address=:7001 
              --http-address=:8001 
              --grpc-address=:9001 
              --peer-grpc-address=:9000 
              --data-directory=/tmp/blast/node2 
              --mapping-file=./examples/example_mapping.json

$ ./bin/blast start 
              --id=node3 
              --raft-address=:7002 
              --http-address=:8002 
              --grpc-address=:9002 
              --peer-grpc-address=:9000 
              --data-directory=/tmp/blast/node3 
              --mapping-file=./examples/example_mapping.json

El ejemplo anterior muestra cada nodo Blast ejecutándose en el mismo host, por lo que cada nodo debe escuchar en puertos diferentes. Esto no sería necesario si cada nodo se ejecutara en un host diferente.

Esto indica a cada nuevo nodo que se una a un nodo existente; cada nodo reconoce los clústeres que se unen cuando se inicia. Entonces tienes un clúster de 3 nodos. De esa manera puedes tolerar la falla de 1 nodo. Puede verificar el clúster con el siguiente comando:

$ ./bin/blast cluster | jq .

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X GET ' http://127.0.0.1:8000/v1/cluster ' | jq .

Puedes ver el resultado en formato JSON. El resultado del comando anterior es:

{
  "cluster" : {
    "nodes" : {
      "node1" : {
        "raft_address" : " :7000 " ,
        "metadata" : {
          "grpc_address" : " :9000 " ,
          "http_address" : " :8000 "
        },
        "state" : " Leader "
      },
      "node2" : {
        "raft_address" : " :7001 " ,
        "metadata" : {
          "grpc_address" : " :9001 " ,
          "http_address" : " :8001 "
        },
        "state" : " Follower "
      },
      "node3" : {
        "raft_address" : " :7002 " ,
        "metadata" : {
          "grpc_address" : " :9002 " ,
          "http_address" : " :8002 "
        },
        "state" : " Follower "
      }
    },
    "leader" : " node1 "
  }
}

Recomiende 3 o más números impares de nodos en el clúster. En escenarios de falla, la pérdida de datos es inevitable, así que evite implementar nodos únicos.

En el ejemplo anterior, el nodo se une al clúster al inicio, pero también puede unir el nodo que ya se inició en modo independiente al clúster más adelante, de la siguiente manera:

$ ./bin/blast join --grpc-address=:9000 node2 127.0.0.1:9001

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X PUT ' http://127.0.0.1:8000/v1/cluster/node2 ' --data-binary '
{
  "raft_address": ":7001",
  "metadata": {
    "grpc_address": ":9001",
    "http_address": ":8001"
  }
}
'

Para eliminar un nodo del clúster, ejecute el siguiente comando:

$ ./bin/blast leave --grpc-address=:9000 node2

o puede utilizar la API RESTful de la siguiente manera:

$ curl -X DELETE ' http://127.0.0.1:8000/v1/cluster/node2 '

El siguiente comando indexa documentos en cualquier nodo del clúster:

$ ./bin/blast set 1 '
{
  "fields": {
    "title": "Search engine (computing)",
    "text": "A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web.",
    "timestamp": "2018-07-04T05:41:00Z",
    "_type": "example"
  }
}
' --grpc-address=:9000 | jq .

Entonces, puede obtener el documento del nodo especificado por el comando anterior de la siguiente manera:

$ ./bin/blast get 1 --grpc-address=:9000 | jq .

Puedes ver el resultado. El resultado del comando anterior es:

 value1

También puede obtener el mismo documento de otros nodos del clúster de la siguiente manera:

$ ./bin/blast get 1 --grpc-address=:9001 | jq .
$ ./bin/blast get 1 --grpc-address=:9002 | jq .

Puedes ver el resultado. El resultado del comando anterior es:

{
  "fields" : {
    "_type" : " example " ,
    "text" : " A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload. The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web. " ,
    "timestamp" : " 2018-07-04T05:41:00Z " ,
    "title" : " Search engine (computing) "
  }
}

Estibador

Construir imagen de contenedor Docker

Puedes crear la imagen del contenedor Docker de esta manera:

$ make docker-build

Extraiga la imagen del contenedor Docker de docker.io

También puedes usar la imagen del contenedor Docker ya registrada en docker.io así:

$ docker pull mosuka/blast:latest

Consulte https://hub.docker.com/r/mosuka/blast/tags/

Comenzar en Docker

Ejecutando un nodo de datos Blast en Docker. Inicie el nodo Blast así:

$ docker run --rm --name blast-node1 
    -p 7000:7000 
    -p 8000:8000 
    -p 9000:9000 
    -v $( pwd ) /etc/blast_mapping.json:/etc/blast_mapping.json 
    mosuka/blast:latest start 
      --id=node1 
      --raft-address=:7000 
      --http-address=:8000 
      --grpc-address=:9000 
      --data-directory=/tmp/blast/node1 
      --mapping-file=/etc/blast_mapping.json

Puede ejecutar el comando en el contenedor Docker de la siguiente manera:

$ docker exec -it blast-node1 blast node --grpc-address=:9000

Explosión segura

Blast admite el acceso HTTPS, lo que garantiza que toda la comunicación entre los clientes y un clúster esté cifrada.

Generando un certificado y clave privada

Una forma de generar los recursos necesarios es mediante openssl. Por ejemplo:

$ openssl req -x509 -nodes -newkey rsa:4096 -keyout ./etc/blast_key.pem -out ./etc/blast_cert.pem -days 365 -subj ' /CN=localhost '
Generating a 4096 bit RSA private key
............................++
........++
writing new private key to ' key.pem '

Ejemplo de clúster seguro

Iniciar un nodo con HTTPS habilitado, cifrado de nodo a nodo y con el archivo de configuración anterior. Se supone que el certificado y la clave HTTPS X.509 se encuentran en las rutas server.crt y key.pem respectivamente.

$ ./bin/blast start 
             --id=node1 
             --raft-address=:7000 
             --http-address=:8000 
             --grpc-address=:9000 
             --peer-grpc-address=:9000 
             --data-directory=/tmp/blast/node1 
             --mapping-file=./etc/blast_mapping.json 
             --certificate-file=./etc/blast_cert.pem 
             --key-file=./etc/blast_key.pem 
             --common-name=localhost

$ ./bin/blast start 
             --id=node2 
             --raft-address=:7001 
             --http-address=:8001 
             --grpc-address=:9001 
             --peer-grpc-address=:9000 
             --data-directory=/tmp/blast/node2 
             --mapping-file=./etc/blast_mapping.json 
             --certificate-file=./etc/blast_cert.pem 
             --key-file=./etc/blast_key.pem 
             --common-name=localhost

$ ./bin/blast start 
             --id=node3 
             --raft-address=:7002 
             --http-address=:8002 
             --grpc-address=:9002 
             --peer-grpc-address=:9000 
             --data-directory=/tmp/blast/node3 
             --mapping-file=./etc/blast_mapping.json 
             --certificate-file=./etc/blast_cert.pem 
             --key-file=./etc/blast_key.pem 
             --common-name=localhost

Puede acceder al clúster agregando una bandera, como el siguiente comando:

$ ./bin/blast cluster --grpc-address=:9000 --certificate-file=./etc/blast_cert.pem --common-name=localhost | jq .

$ curl -X GET https://localhost:8000/v1/cluster --cacert ./etc/cert.pem | jq .

Expandir