Awesome Distributed Systems
This repository contains list of distributed system projects with open source code in various programming languages which may be useful in order to better understand how to build distributed services.
Databases
- (Golang) Jocko - a Kafka/distributed commit log service in Go. [Serf + Raft]
- (Golang) oklog - a distributed and coordination-free log management system for big ol' clusters [Archived]
- (Golang) elasticell - a distributed HA Redis-compatible NoSQL database with strong consistency and reliability
- (Erlang) CouchDB - a highly available, partition tolerant, eventually consistent document database . Supports master-master setups with automatic conflict detection.
- (Java) Apache HBase - a Hadoop database, a distributed, scalable, big data store. Useful when random, realtime read/write access to big data needed
- (Golang) Tair - a high-performance and high-availability distributed fast-access memory (MDB)/persistent (LDB) storage service
- (Golang) immudb - an immutable database based on zero trust, Key/Value & SQL, tamperproof, data change history
- (Rust) toydb - distributed SQL database in Rust, written as a learning project
- (Rust) DB3 Network - a decentralized firebase firestore alternative
- (Python) ZODB - an ACID transactional object-oriented database
- (Golang) requiemdb - a permanent storage for OTEL data
Key-Value Databases
- (C) memcached - a high performance multithreaded event-based key/value cache store intended to be used in a distributed system
- (C) redis - an in-memory database with various value types that persists on disk
- (Rust) TiKV - a distributed transactional key-value database, originally created to complement TiDB
- (C++) leveldb - a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values
- (Golang) goleveldb - a LevelDB implemented in Golang
- (Golang) summitdb - an in-memory, NoSQL key/value database. It persists to disk, uses the Raft consensus algorithm, is ACID compliant, and built on a transactional and strongly-consistent model.It supports custom indexes, geospatial data, JSON documents, and user-defined JS scripting
- (Python) pupdb - a simple file-based key-value database
- (Python) pickledb - an open source key-value store using Python's json module
- (C++) KeyDB - a faster drop-in multithreaded alternative to Redis
- (C++) Dragonfly - an in-memory data store fully compatible with Redis and Memcache and designed using modern algorithms
- (Golang) BadgerDB - an embeddable, persistent and fast key-value (KV) database written in pure Go
- (Golang) BuntDB - a low-level, in-memory, key/value store in pure Go. It persists to disk, is ACID compliant, and uses locking for multiple readers and a single writer. It supports custom indexes and geospatial data.
- (Rust) ConstDB - a redis-like cache store that implements CRDTs and active-active replications.
- (Golang) GhostDB - a distributed, in-memory, general purpose key-value data store that delivers microsecond performance at any scale
- (Dart) Hive - a lightweight and blazing fast key-value database written in pure Dart. Inspired by Bitcask
- (Golang) rosedb - a fast, stable, and embedded NoSQL database based on bitcask, supports a variety of data structures such as string, list, hash, set, and sorted set
- (Rust) PumpkinDB - an immutable Ordered Key-Value Database Engine
- (Golang) FlashDB - a simple, in-memory, key/value store in pure Go. It persists to disk, is ACID compliant, and uses locking for multiple readers and a single writer. It supports redis like operations for data structures like SET, SORTED SET, HASH and STRING
- (PHP) Lazer - a PHP flat file database based on JSON files
- (Golang) Scribble - a tiny JSON database in Golang
- (Golang) FlyDB - a high-performance KV storage engine based on bitcask paper supports redis protocol and the corresponding data structure
- (Rust) Engula - a distributed key-value store, used as a cache, database, and storage engine
- (Golang) Dice - an extremely simple Golang-based in-memory KV store that speaks Redis dialect
Relational, SQL, NewSQL Databases
- (Golang) CockroachDB - a distributed fault-tolerant SQL database built on a transactional and strongly-consistent key-value store
- (Golang) YugabyteDB - a cloud native distributed SQL database for mission-critical applications
- (Golang) RQLite - a lightweight, distributed relational database, which uses SQLite as its storage engine
- (Golang) Kingbus - a distributed MySQL binlog store based on raft [Raft]
- (C++) YDB is an open-source Distributed SQL Database that combines high availability and scalability with strict consistency and ACID transactions
- (Golang) RadonDB - an open source, Cloud-native MySQL database for unlimited scalability and performance
NoSQL, Document Databases
- (C++) MongoDB - document database designed for ease of development and scaling
- (Golang) FerretDB - an proxy, converting the MongoDB 6.0+ wire protocol queries to SQL - using PostgreSQL as a database engine
- (C#) LiteDB - NoSQL Document Store in a single data file
- (Python) tinydb - a lightweight document oriented database written in pure Python
- (PHP) SleekDB - a simple flat file NoSQL like database implemented in PHP without any third-party dependencies that store data in plain JSON files
- (Rust) BonsaiDB - an ACID, transactional KV or document dev-friendly database with configurable delayed on-disk data storing
- (Golang) CloverDB - a lightweight document-oriented NoSQL database written in pure Golang
Graph Databases
- (Java) neo4j - Graph Database
- (Python) edgedb - a graph-relational database
- (C++) nebula - a distributed, fast open-source graph database featuring horizontal scalability and high availability
- (Golang) EliasDB - a graph-based lightweight database
Time Series
- (Golang) VictoriaMetrics - fast, cost-effective monitoring solution and time series database
- (Golang) influxdb - scalable datastore for metrics, events, and real-time analytics
- (Java) trino - fast distributed SQL query engine for big data analytics
- (Java) Apache Doris - an easy-to-use, high performance and unified analytics database
- (Scala) FiloDB - Distributed, Prometheus-compatible, real-time, in-memory, massively scalable, multi-schema time series / event / operational database
- (Rust) ceresdb - high-performance, distributed, schema-less, cloud native time-series database that can handle both time-series and analytics workloads
- (Golang) tstorage is a lightweight local on-disk storage engine for time-series data with a straightforward API
- (Rust) CnosDB is a high-performance, high-compression, and easy-to-use open-source distributed time-series database. Used in fields such as IoT, industrial internet, connected cars, and IT operations
- (Golang) LinDB - a scalable, high performance, high availability distributed time series database
- (Scala) FiloDB - a distributed, prometheus-compatible, real-time, in-memory, massively scalable, multi-schema time series / event / operational database
- (Rust) CeresDB - a high-performance, distributed, cloud native time-series database
Column Databases
- (Java) Apache Cassandra - a highly-scalable partitioned row store. Rows are organized into tables with a required primary key
- (C++) scylladb - a real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB
- (Golang) FrostDB - an embeddable wide-column columnar database written in Go
Permission Databases
- (Golang) SpiceDB - a Google Zanzibar-inspired, database system for creating and managing security-critical application permissions
- (Golang) Keto - a Google Zanzibar-inspired open source database, gRPC, REST APIs, newSQL, and an easy and granular permission language. Supports ACL, RBAC
Analytical Databases
- (C++) BaikalDB is a distributed HTAP MySQL-compatible database designed for petabytes scale
- (Golang) AresDB - a GPU-powered real-time analytics storage and query engine
Vectors
- (Rust) Qdrant - a vector similarity search engine and vector database
- (Golang) milvus - an open-source vector database built to power embedding similarity search and AI applications
- (Golang) Weaviate - an open source vector database that stores both objects and vectors
- (Golang) tobias-mayer/vector-db - a simple vector database that can be used to search for similar vectors in logarithmic time
- (Rust) DANNY - a decentralized vector database for building vector search applications
Gateways
- (Golang) Glide - an open reliable fast LLM/model gateway for rapid development of GenAI apps
- (Golang) Traefik - a cloud-native app proxy
- (Lua) Kong - a cloud-native feature-rich API gatewat
- (Golang) Skipper - an HTTP router and reverse proxy for service composition
- (Golang) janus - a lightweight API gateway and management platform
- (Golang) Lura - ultra performance API gateway with middlewares
- (Python) MLFLow Gateway - an LLM proxy
Locking
- (Golang) etcd - distributed reliable key-value store for the most critical data of a distributed system [Raft + gRPC]
- (Java) Apache Zookeeper - highly reliable distributed coordination
- (Golang) chubby - A (very simplified) implementation of Chubby, Google's distributed lock service
Streaming
- (Java) Kafka - a distributed, highly scalable, elastic, fault-tolerant, and secure event streaming platform
- (Python) faust - a distributed stream processing library that ports the ideas from Kafka Streams to Python
- (Golang) Liftbridge - a lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system
- (Rust) RisingWave - a distributed SQL database for stream processing, designed to reduce the complexity and cost of building real-time applications
Schedulers
- (Golang) dkron - a distributed, fault tolerant job scheduling system for cloud native environments
- (Python) Celery - a distributed task queue
- (Python) Apache Airflow - a platform to programmatically author, schedule, and monitor workflows
Queues
- (Golang) nsq - realtime fault tolerant distributed messaging platform designed to operate at scale, handling billions of messages per day [Raft + gRPC]
- (Golang) Sandglass - distributed, horizontally scalable, persistent, time ordered message queue
- (Golang) dnpipes - distributed version of Unix named pipes comparable to AWS SQS
- (PHP) GatewayWorker - distributed realtime messaging framework based on workerman
- (C++) ZeroMQ - abstraction of asynchronous message queues, multiple messaging patterns, message filtering (subscriptions), seamless access to multiple transport protocols and more
- (Java) Apache Pulsar - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API
- (Java) Apache ActiveMQ - high performance Apache 2.0 licensed Message Broker
Search Engines
- (Java) ElasticSearch - distributed, RESTful search and analytics engine
- (Java) Apache Lucene - a high-performance, full featured text search engine library
- (Rust) MeiliSearch - Lightning Fast, Ultra Relevant, and Typo-Tolerant Search Engine
- (JS) FlexSearch - memory-flexible full-text search library
- (Golang) RiotSearch - distributed, Simple and efficient full text search engine
- (C++) Typesense - fast, typo tolerant, fuzzy search engine
- (Rust) Sonic - fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM
File Systems
- (Golang) JuiceFS - Hadoop-compatible AWS S3-compatible high-performance POSIX file system
- (Golang) SeaweedFS - a simple Hadoop-compatible AWS S3-compatible distributed highly scalable distributed file system
- (C) GlusterFS - distributed storage that can scale to several petabytes
- (C++) GlusterFS - highly reliable, scalable and efficient distributed file system. It spreads data over a number of physical servers, making it visible to an end user as a single file system.
Service Discovery
- (Golang) sleuth - master-less peer-to-peer autodiscovery and RPC between HTTP services that reside on the same network
Data Processing
- (Scala) Apache Spark - unified analytics engine for large-scale data processing
TerminusDB
- (Prolog) terminusdb - distributed database with a collaboration model
OS
- (C) HarveyOS - distributed operating system
Frameworks
- (Golang) etcd - framework for distributed systems development. Provides the core requirements for distributed systems development including RPC and Event driven communication
- (Golang) ergo - port of Erlang/OTP approaches in Golang
- (Golang) gosiris - an actor framework for Golang
- (Python) cotyledon - a framework for defining long-running services. It provides handling of Unix signals, spawning of workers, supervision of children processes, daemon reloading, sd-notify, rate limiting for worker spawning, and more.
- (Java) atomix - fully featured framework for building fault-tolerant distributed systems [REST + Raft]
- (Kotlin) orbit - virtual actor framework for building distributed systems
- (JS) hemera - A Node.js microservices toolkit for the NATS messaging system [RPC]
- (Python) Tooz - centralizing the most common distributed primitives like group membership protocol, lock service and leader election by providing a coordination API helping developers to build distributed applications
- (C++) Nebula - powerful framework for building highly concurrent, distributed, and resilient message-driven applications
- (GoLang) Service Weaver - A framework that allows to write applications as modular binary and deploy it as a set of microservices
- (GoLang) Dapr - portable, serverless, event-driven runtime that works as a sidecar and makes it easy for developers to build resilient, stateless and stateful microservices
Components
- (Golang) Dragonboat - a high performance multi-group Raft consensus library in pure Go
- (Golang) Golimit - Uber ringpop based distributed and decentralized rate limiter
- (Python) Tenacity - general-purpose retrying library
- (Elixir) ex_hash_ring - pure Elixir consistent hash ring implementation based on the excellent C hash-ring lib
- (Elixir) raft - Raft consensus implementation
- (C++) NuRaft - Raft implementation derived from the cornerstone project
- (Python) Hyx - Lightweight fault tolerance primitives for your resilient and modern Python microservices
- (Python) Migdalor - a Kubernetes native peer discovery for Python asyncio nodes
- (Golang) skiplist - a Golang implementation of the skiplist data structure
- (Java) Waltz - a quorum-based distributed write-ahead log for replicating transactions
Other Resources
- awesome-scalability - Reading list for illustrating the patterns of scalable, reliable, and performant large-scale systems
- awesome-distributed-systems - curated list on awesome material on distributed systems
- awesome-database-learning - a list of learning materials to understand databases internals
- (C/C++)(Book) Build Your Own Redis with C/C++
- (C) (Article) Writing a sqlite clone from scratch in C
- Berkley CS186: Intro into Database Systems
- MIT 6.830: Database Systems