Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.
Ambar defines a new way to implement full-text document search into your workflow.
docker-compose
fileTutorial: Mastering Ambar Search Queries
ambar_en
, Russian ambar_ru
, German ambar_de
, Italian ambar_it
, Polish ambar_pl
, Chinese ambar_cn
, CJK ambar_cjk
Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.
Ambar supports large files (>30MB)
Supported file types:
Notice: Ambar requires Docker to run
You can build Docker images by yourself
All the images required to run Ambar can be built locally. In general, each image can be built by navigating into the directory of the component in question, performing the compilation steps required and building the image like that:
# From project root
$ cd FrontEnd
$ docker build . -t <image_name>
The resulting image can be referred to by the name specified, and run by the containerization tooling of your choice.
In order to use a local Dockerfile with docker-compose
, simply change the image
option to build
, setting the value to the relative path of the directory containing the Dockerfile. Then run docker-compose build
to build the relevant images. For example:
# docker-compose.yml from project root, referencing local dockerfiles
pipeline0:
build: ./Pipeline/
image: chazu/ambar-pipeline
localcrawler:
image: ./LocalCrawler/
Note that some of the components require compilation or other build steps be performed on the host before the docker images can be built. For example, FrontEnd
:
# Assuming a suitable version of node.js is installed (docker uses 8.10)
$ npm install
$ npm run compile
Then follow this instructions -> https://ambar.cloud/docs/installation
Yes, it's fully open-source.
Yes, it is forever free and open-source.
Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr
query
Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld.
Yes!
Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.
It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.
Change Log
Privacy Policy
MIT License