telegram archive server Download - telegram archive server Source code download

telegram archive server

Other source code

v0.4.1 - 蔚蓝更新

Download

Telegram Archive Server

A Telegram group chat search and archiving robot suitable for CJK environment.

Feature overview

Support group member authentication, only group friends can search
Supports importing historical chat records and automatically removes duplicates
Use MeiliSearch to search Chinese, the indexing effect is good
Support image OCR and include it in search results (only new ones are supported, historical images are not yet supported)
There is a simple web interface that can display avatars
Search results can jump to open the chat interface

exhibit

Chat authentication

Click the [Search] button to automatically authenticate and open the search interface.

Search interface

Click the time link to jump to the chat interface.

deploy

Prepare

you need to:

A Bot account, obtain its token in advance
An https server accessible from the public network must have https
A super group , currently only super groups are supported
A MeiliSearch instance, with or without key configuration
A Redis instance is fine without it, but it may be restarted abnormally and messages will be lost.

Configuration

Download the .env.example file, refer to the internal comments, and configure accordingly.

You can save it as .env or configure it as an environment variable.

run

HTTPS

TAS does not provide a built-in https service. It is recommended to use Caddy or similar software to reverse proxy TAS.

With Docker

docker run -d --restart=always --env-file=.env quay.io/oott123/telegram-archive-server

Of course, you can also run it using Kubernetes or docker-compose.

Using Source Code

If you don't have Docker or don't want to use Docker, you can also compile and deploy from source code. At this point you also need:

git
node 18

git clone https://github.com/oott123/telegram-archive-server.git
cd telegram-archive-server
# git checkout vX.X.X
cp .env.example .env
vim .env
yarn
yarn build
yarn start

use

Send /search in the group. The Bot may prompt you to set the Domain, just follow the prompts.

Get user avatar

Users must meet the following criteria in order for their avatar to appear in search results:

Have interacted with the Bot (sent a message, or authorized login)
The avatar set by the user is publicly visible

Indexing rules for new records

Since MeiliSearch has poor indexing efficiency for new messages, messages will only enter the index when any of the following conditions are met:

No new messages received within 60 seconds
A total of 100 messages have been received that have not entered the index.
The main process receives the SIGINT signal

If redis is not used to persist the message queue, messages that have not entered the queue may be lost when the program is abnormal or the server is restarted.

Import old chat history

Currently only supergroup import is supported.

Click the three-dot button on the desktop client - Export chat history, wait for the export to complete, and get result.json .

implement:

curl 
  -H " Content-Type: application/json " 
  -H " Authorization: Bearer $AUTH_IMPORT_TOKEN " 
  -XPOST -T result.json 
  http://localhost:3100/api/v1/import/fromTelegramGroupExport

Records can be imported. Note that only records from a single group can be imported at a time.

OCR text recognition (TBD)

If you enable the OCR queue, Redis is required (can share an instance with the cache) and configure a third-party recognition service. The identification process is as follows:

Recognition and storage can be completed on different role instances: image downloading and text storage will be completed on the Bot instance, and the OCR instance only needs to access the OCR service.

This design allows maintainers to design offline centralized identification (for example, use a preemptible instance to run the identification service and shut it down after the queue is cleared) to reduce identification costs.

If you are using a third-party cloud service, you can directly turn off the OCR queue, or enable the Bot and OCR roles in the same instance.

identification service

Google Cloud Vision

Refer to Google Cloud Vision text recognition documentation and Google Cloud Vision billing rules. The configuration is as follows:

OCR_DRIVER=google
OCR_ENDPOINT=eu-vision.googleapis.com # 或者 us-vision.googleapis.com ，决定 Google 在何处存储处理数据
GOOGLE_APPLICATION_CREDENTIALS=/path/to/google/credentials.json # 从 GCP 后台下载的 json 鉴权文件

PaddleOCR

You need an instance of paddleocr-web. The configuration is as follows:

OCR_DRIVER=paddle-ocr-web
OCR_ENDPOINT=http://127.0.0.1:8980/api

Azure OCR

Create an Azure Vision resource and configure the resource information as follows:

OCR_DRIVER=azure
OCR_ENDPOINT=https://tas.cognitiveservices.azure.com
OCR_CREDENTIALS=000000000000000000000000000000000

Activate different roles

docker run [...] dist/main ocr,bot
# or
node dist/main ocr,bot

develop

DEBUG=app: * ,grammy * yarn start:debug

Front-end development

After the search service is authenticated, the server will jump to: $HTTP_UI_URL/index.html with the following URL parameters:

tas_server - Server base URL, in the form http://localhost:3100/api/v1
tas_indexName - group number, in the form of supergroup1234567890
tas_authKey - JWT issued by the server, which can be used as MeiliSearch's api key.