About The Project
Latest updates on our blog
Example
Vision
Features
Getting Started
Usage and concepts
Roadmap
Contributing
License
While working with other Python-based tooling, frustrations arose around performance, stability, and ease of use. Thus, Swiftide was born. Swiftide's goal is to offer a fully fledged retrieval augmented generation library, that is fast, easy-to-use, reliable and easy-to-extend.
Part of the bosun.ai project. An upcoming platform for autonomous code improvement.
We <3 feedback: project ideas, suggestions, and complaints are very welcome. Feel free to open an issue or contact us on discord.
Great starting points are this readme, swiftide.rs, the examples folder, our blog at bosun.ai, and in depth tutorials at swiftide-tutorial.
Caution
Swiftide is under heavy development and can have breaking changes while we work towards 1.0. Documentation here might fall short of all features, and despite our efforts be slightly outdated. Expect bugs. We recommend to always keep an eye on our github and api documentation. If you found an issue or have any kind of feedback we'd love to hear from you in an issue.
(back to top)
Evaluate Swiftide pipelines with Ragas (2024-09-15)
Release - Swiftide 0.12 (2024-09-13)
Local code intel with Ollama, FastEmbed and OpenTelemetry (2024-09-04
Release - Swiftide 0.9 (2024-09-02)
Bring your own transformers (2024-08-13)
Release - Swiftide 0.8 (2024-08-12)
Release - Swiftide 0.7 (2024-07-28)
Building a code question answering pipeline (2024-07-13)
Release - Swiftide 0.6 (2024-07-12)
Release - Swiftide 0.5 (2024-07-1)
(back to top)
Indexing a local code project, chunking into smaller pieces, enriching the nodes with metadata, and persisting into Qdrant:
indexing::Pipeline::from_loader(FileLoader::new(".").with_extensions(&["rs"]))
.with_default_llm_client(openai_client.clone())
.filter_cached(Redis::try_from_url(
redis_url,
"swiftide-examples",
)?)
.then_chunk(ChunkCode::try_for_language_and_chunk_size(
"rust",
10..2048,
)?)
.then(MetadataQACode::default())
.then(move |node| my_own_thing(node))
.then_in_batch(Embed::new(openai_client.clone()))
.then_store_with(
Qdrant::builder()
.batch_size(50)
.vector_size(1536)
.build()?,
)
.run()
.await?;
Querying for an example on how to use the query pipeline:
query::Pipeline::default()
.then_transform_query(GenerateSubquestions::from_client(
openai_client.clone(),
))
.then_transform_query(Embed::from_client(
openai_client.clone(),
))
.then_retrieve(qdrant.clone())
.then_answer(Simple::from_client(openai_client.clone()))
.query("How can I use the query pipeline in Swiftide?")
.await?;
You can find more examples in /examples
(back to top)
Our goal is to create a fast, extendable platform for Retrieval Augmented Generation to further the development of automated AI applications, with an easy-to-use and easy-to-extend api.
(back to top)
tracing
supported for logging and tracing, see /examples and the tracing
crate for more information.Feature | Details |
---|---|
Supported Large Language Model providers | OpenAI (and Azure) - All models and embeddings AWS Bedrock - Anthropic and Titan Groq - All models Ollama - All models |
Loading data | Files Scraping Fluvio Parquet Other pipelines and streams |
Transformers and metadata generation | Generate Question and answerers for both text and code (Hyde) Summaries, titles and queries via an LLM Extract definitions and references with tree-sitter |
Splitting and chunking | Markdown Text (text_splitter) Code (with tree-sitter) |
Storage | Qdrant Redis LanceDB |
Query pipeline | Similarity and hybrid search, query and response transformations, and evaluation |
(back to top)
Make sure you have the rust toolchain installed. rustup Is the recommended approach.
To use OpenAI, an API key is required. Note that by default async_openai
uses the OPENAI_API_KEY
environment variables.
Other integrations might have their own requirements.
Set up a new Rust project
Add swiftide
cargo add swiftide
Enable the features of integrations you would like to use in your Cargo.toml
Write a pipeline (see our examples and documentation)
(back to top)
Before building your streams, you need to enable and configure any integrations required. See /examples.
We have a lot of examples, please refer to /examples and the Documentation
Note
No integrations are enabled by default as some are code heavy. We recommend you to cherry-pick the integrations you need. By convention flags have the same name as the integration they represent.
An indexing stream starts with a Loader that emits Nodes. For instance, with the Fileloader each file is a Node.
You can then slice and dice, augment, and filter nodes. Each different kind of step in the pipeline requires different traits. This enables extension.
Nodes have a path, chunk and metadata. Currently metadata is copied over when chunking and always embedded when using the OpenAIEmbed transformer.
(impl Loader)
starting point of the stream, creates and emits Nodes(impl NodeCache)
filters cached nodes(impl Transformer)
transforms the node and puts it on the stream(impl BatchTransformer)
transforms multiple nodes and puts them on the stream(impl ChunkerTransformer)
transforms a single node and emits multiple nodes(impl Storage)
stores the nodes in a storage backend, this can be chainedAdditionally, several generic transformers are implemented. They take implementers of SimplePrompt
and EmbedModel
to do their things.
Warning
Due to the performance, chunking before adding metadata gives rate limit errors on OpenAI very fast, especially with faster models like 3.5-turbo. Be aware.
A query stream starts with a search strategy. In the query pipeline a Query
goes through several stages. Transformers and retrievers work together to get the right context into a prompt, before generating an answer. Transformers and Retrievers operate on different stages of the Query via a generic statemachine. Additionally, the search strategy is generic over the pipeline and Retrievers need to implement specifically for each strategy.
That sounds like a lot but, tl&dr; the query pipeline is fully and strongly typed.
Additionally, query pipelines can also be evaluated. I.e. by Ragas.
Similar to the indexing pipeline each step is governed by simple Traits and closures implement these traits as well.
(back to top)
See the open issues for a full list of proposed features (and known issues).
(back to top)
If you want to get more involved with Swiftide, have questions or want to chat, you can find us on discord.
(back to top)
Swiftide is in a very early stage and we are aware that we lack features for the wider community. Contributions are very welcome. ?
If you have a great idea, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
If you just want to contribute (bless you!), see our issues or join us on Discord.
git checkout -b feature/AmazingFeature
)git commit -m 'feat: Add some AmazingFeature'
)git push origin feature/AmazingFeature
)See CONTRIBUTING for more
(back to top)
Distributed under the MIT License. See LICENSE
for more information.
(back to top)