Alluxio (formerly Tachyon) is a virtual distributed storage system. It bridges the gap between computing frameworks and storage systems, allowing computing applications to connect to numerous storage systems through a common interface. The Alluxio project grew out of a research project at UC Berkeley's AMPLab called Tachyon, which is the data layer of the Berkeley Data Analytics Stack (BDAS).
1. Flexible file API: Alluxio's local API is similar to the java.io.File class, providing InputStream and OutputStream interfaces and efficient support for memory mapped I/O. We recommend using this API to get the full functionality and best performance of Alluxio.
2. File system interface compatible with Hadoop HDFS: Based on this set of interfaces, Hadoop MapReduce and Spark can use Alluxio instead of HDFS.
3. Pluggable underlying storage: Alluxio supports persisting memory data to the underlying storage system. Alluxio provides a common interface to simplify connection with different underlying storage systems. Currently, Alluxio supports Microsoft Azure Blob Store, Amazon S3, Google Cloud Storage, OpenStack Swift, GlusterFS, HDFS, MaprFS, Ceph, NFS, Alibaba OSS, Minio and single-node local file systems. More other storage systems will be supported in the future.
4. Alluxio hierarchical storage: Alluxio can manage memory and local storage such as SSD and HDD to accelerate data access. If more granular control is required, the tiered storage feature can be used to automatically manage data between different tiers, ensuring that hot data is on the faster storage tier. Custom policies can be easily applied to Alluxio, and the concept of pins allows users to explicitly control where data is stored.
5. Unified namespace: Alluxio can achieve efficient data management between different storage systems through the mounting function. In addition, the transparent naming mechanism can preserve the file name and directory hierarchy of the storage object when persisting the storage object to the underlying storage system.
6. Web UI: Users can browse the file system through the Web UI. In debugging mode, administrators can also view detailed information about each file, including storage location, checkpoint path, etc.
7. Command line: Users can also interact with Alluxio through ./bin/alluxio fs, for example, to copy data in and out of the file system.