Big Whale task scheduling system v1.3

JAVA source code

1.3

Download

The Big Whale task scheduling system is a distributed computing task scheduling system developed by Meiyou Big Data. It provides DAG execution scheduling for batch processing tasks such as Spark and Flink and status monitoring and scheduling for stream processing tasks. It also has repeated application detection, Large memory application detection and other functions. The service is developed based on Spring Boot 2.0 and can be run after being packaged.

Environmental preparation

Java 1.8+

Mysql 5.1.0+

Install

1. Create database: big-whale

2. Run the database script: big-whale.sql

3. Configure the relevant database account password and SMTP information according to the Spring Boot environment.

4. Configuration: big-whale.properties

Configuration item description

ssh.user: SSH remote login user name with script execution permissions (the platform will use this user as a unified script execution user)

ssh.password: ssh remote login user password

dingding.enabled: Whether to enable DingTalk alerts

dingding.watcher-token: Dingding public group robot token

yarn.app-memory-threshold: Yarn application memory limit (unit: MB), -1 disables detection

yarn.app-white-list: Yarn application whitelist list (the memory requested by the applications in the list exceeds the upper limit, and no alarm will be issued)

5. Modify: $FLINK_HOME/bin/flink, reference: flink (because flink can only read the local jar package when submitting a task, it is necessary to download the jar package from hdfs and replace the jar package path parameter in the script when executing the submit command. )

6. Packaging: mvn clean package

start up

1. Check whether port 17070 is occupied. If so, close the occupied process or modify the project port number configuration and repackage it.

2. Copy big-whale.jar in the target directory and execute the command: java -jar big-whale.jar

Initial configuration

1. Open: http://localhost:17070

2. Enter the account admin and password admin

3. Click: Permission Management->User Management, modify the current account's email address to a legal and existing email address, otherwise the email will fail to be sent.

4. Add cluster

Cluster Management->Cluster Management->New

"yarn management address" is the WEB UI address of Yarn ResourceManager

"Package storage directory" is the storage path when the package is uploaded to the HDFS cluster, such as: /data/big-whale/storage

"Support Flink task proxy users", "Stream processing task blacklist" and "Batch processing task blacklist" are internally customized task allocation rules. Do not fill in the fields.

5. Add agent

Cluster management->Agent management->New

Multiple instances can be added (only IP addresses are supported, port numbers can be specified, and the default is 22). When executing the script, an instance will be randomly selected for execution. If the instance is unreachable, the next instance will be randomly selected. Execution fails when both are unreachable

After selecting a cluster, it will be one of the agents that submits Spark or Flink tasks under the cluster.

6. Add cluster users

Cluster Management->Cluster User->Add

The semantics of this configuration are: Yarn resource queue (--queue) and proxy user (--proxy-user) that platform users can use under the selected cluster.

7. Add calculation framework

Cluster Management->Management->Add

The submission commands of different Spark or Flink tasks in the same cluster may be different. For example, the submission command of Spark 1.6.0 is spark-submit, and the submission command of Spark 2.1.0 is spark2-submit.

Expand

Additional Information