DataBand (data help), rapid collection and cleaning, task management, real-time stream and batch data analysis, data visualization, rapid data template development, ETL tool set, data science, etc. It is a lightweight one-stop big data platform. We are committed to providing optimal solutions by providing smart applications, data analysis and consulting services.
storage
Distributed storage: HDFS, HBase
Row relational storage: MySQL, Oracle
Column storage: ClickHouse
Column family storage: HBase, Cassandra
Document library: ElasticSearch, MongoDB
calculate
Computing engine: Presto, Hive
Stream processing: Storm, Flink
integrated
Flume
Filebeat
Logstash
Front-end technology stack
Vue
Element UI
Backend technology stack
Spring Boot
Spring Cloud
MyBatis
Big data simulates data sources to generate data (data preparation engineering)
data source
databand-mock-api: interface simulation tool, simulates business system API;
databand-mock-log: Log simulation tool, manually generates a large amount of log data for debugging and testing, such as Syslog, log, CSV generation, Json, MySQL injection, RPC writing, NetCat, etc.;
databand-mock-mq: Log simulation tool, which generates a large amount of log data for debugging and testing through MQ writing, such as RaadfdsitMQ writing, Kafka writing, etc.;
databand-mock-hadoop: big data log simulation tool, related to hdfs and mapreduce;
Data collection and cleaning (collection cleaning project)
databand-etl-mysql_ods: Collect and clean mysql data such as MySQL to ods temporary intermediate library (including Redis, Kafka, etc.);
databand-etl-mysql_olap: collect and clean mysql data to the OLAP data warehouse;
databand-etl-mysql_hadoop: collect and clean mysql data to Hadoop distributed storage;
databand-etl-logfile_ods: Collect and clean semi-structured log files, such as json, xml, log, csv file data, to the ods temporary intermediate library;
databand-etl-logfile_olap: collect and clean semi-structured log file data into OLAP data warehouse;
databand-etl-logfile_hadoop: Collect and clean log file data to Hadoop distributed storage;
databand-etl-mq_ods: collect data through MQ consumption and enter it into the ods database;
databand-etl-mq_olap: collect data through MQ consumption and enter it into the OLAP library;
databand-etl-mq_hadoop: Collect data through MQ consumption and enter Hadoop;- databand-ml: Data science engineering;
Data analysis job (scheduled job scheduling project)
databand-job-springboot: scheduled task job scheduling service, supports shell, hive, python, spark-sql, java jar tasks.
databand-streamjob-springboot: streaming data job, supports kafka data consumption to clickhouse, mysql, es, etc.
Data analysis portal (back-end management and front-end display project)
databand-admin-ui: Pure front-end UI project with front-end and back-end separation, data display (currently not developed);
databand-admin-thymeleaf: back-end permissions, relationships, site configuration management (front-end and back-end are not separated, under development), based on the Ruoyi framework;
databand-admin-api: data api service;
databand-admin-tools: BI toolset;
Live streaming data
databand-rt-flinkstreaming: flink real-time data stream processing. Mainly PV and UV, involving basic usage such as window, aggregation, delay, watermark, statistics, checkpoint, etc.;
databand-rt-redis: some cache storage for real-time processing;
databand-rt-sparkstreaming: spark real-time data stream processing, similar to the function of flink, mainly structured streaming;