automata is inspired by the theory that code is essentially a form of memory, and when furnished with the right tools, AI can evolve real-time capabilities which can potentially lead to the creation of AGI. The word automata comes from the Greek word αὐτόματος, denoting "self-acting, self-willed, self-moving,", and automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them.
More information follows below.
Follow these steps to setup the automata environment
# Clone the repository
git clone [email protected]:emrgnt-cmplxty/automata.git && cd automata/
# Initialize git submodules
git submodule update --init
# Install poetry and the project
pip3 install poetry && poetry install
# Configure the environment and setup files
poetry run automata configure
Pull the Docker image:
$ docker pull ghcr.io/emrgnt-cmplxty/automata:latest
Run the Docker image:
$ docker run --name automata_container -it --rm -e OPENAI_API_KEY=<your_openai_key> -e GITHUB_API_KEY=<your_github_key> ghcr.io/emrgnt-cmplxty/automata:latest
This will start a Docker container with automata installed and open an interactive shell for you to use.
Windows users may need to install C++ support through Visual Studio's "Desktop development with C++" for certain dependencies.
Additionally, updating to gcc-11 and g++-11 may be required. This can be done by running the following commands:
# Adds the test toolchain repository, which contains newer versions of software
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
# Updates the list of packages on your system
sudo apt update
# Installs gcc-11 and g++-11 packages
sudo apt install gcc-11 g++-11
# Sets gcc-11 and g++-11 as the default gcc and g++ versions for your system
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 60 --slave /usr/bin/g++ g++ /usr/bin/g++-11
SCIP indices are required to run the automata Search. These indices are used to create the code graph which relates symbols by dependencies across the codebase. New indices are generated and uploaded periodically for the automata codebase, but programmers must be generate them manually if necessary for their local development. If you encounter issues, we recommend referring to the instructions here.
# Install dependencies and run indexing on the local codebase
poetry run automata install-indexing
# Refresh the code embeddings (after making local changes)
poetry run automata run-code-embedding
# Refresh the documentation + embeddings
poetry run automata run-doc-embedding --embedding-level=2
The following commands illustrate how to run the system with a trivial instruction. It is recommended that your initial run is something of this sort to ensure the system is working as expected.
# Run a single agent w/ trivial instruction
poetry run automata run-agent --instructions="Return true" --model=gpt-3.5-turbo-0613
# Run a single agent w/ a non-trivial instruction
poetry run automata run-agent --instructions="Explain what automataAgent is and how it works, include an example to initialize an instance of automataAgent."
automata works by combining Large Language Models, such as GPT-4, with a vector database to form an integrated system capable of documenting, searching, and writing code. The procedure initiates with the generation of comprehensive documentation and code instances. This, coupled with search capabilities, forms the foundation for automata's self-coding potential.
automata employs downstream tooling to execute advanced coding tasks, continually building its expertise and autonomy. This self-coding approach mirrors an autonomous craftsman's work, where tools and techniques are consistently refined based on feedback and accumulated experience.
Sometimes the best way to understand a complicated system is to start by understanding a basic example. The following example illustrates how to run your own automata agent. The agent will be initialized with a trivial instruction, and will then attempt to write code to fulfill the instruction. The agent will then return the result of its attempt.
from automata.config.base import AgentConfigName, OpenAIautomataAgentConfigBuilder
from automata.agent import OpenAIautomataAgent
from automata.singletons.dependency_factory import dependency_factory
from automata.singletons.py_module_loader import py_module_loader
from automata.tools.factory import AgentToolFactory
# Initialize the module loader to the local directory
py_module_loader.initialize()
# Construct the set of all dependencies that will be used to build the tools
toolkit_list = ["context-oracle"]
tool_dependencies = dependency_factory.build_dependencies_for_tools(toolkit_list)
# Build the tools
tools = AgentToolFactory.build_tools(toolkit_list, **tool_dependencies)
# Build the agent config
agent_config = (
OpenAIautomataAgentConfigBuilder.from_name("automata-main")
.with_tools(tools)
.with_model("gpt-4")
.build()
)
# Initialize and run the agent
instructions = "Explain how embeddings are used by the codebase"
agent = OpenAIautomataAgent(instructions, config=agent_config)
result = agent.run()
Embeddings in this codebase are represented by classes such as SymbolCodeEmbedding
and SymbolDocEmbedding
. These classes store information about a symbol and its respective embeddings which are vectors representing the symbol in high-dimensional space.
Examples of these classes are:
SymbolCodeEmbedding
a class used for storing embeddings related to the code of a symbol.
SymbolDocEmbedding
a class used for storing embeddings related to the documentation of a symbol.
Code example for creating an instance of 'SymbolCodeEmbedding':
import numpy as np
from automata.symbol_embedding.base import SymbolCodeEmbedding
from automata.symbol.parser import parse_symbol
symbol_str = 'scip-python python automata 75482692a6fe30c72db516201a6f47d9fb4af065 `automata.agent.agent_enums`/ActionIndicator#'
symbol = parse_symbol(symbol_str)
source_code = 'symbol_source'
vector = np.array([1, 0, 0, 0])
embedding = SymbolCodeEmbedding(symbol=symbol, source_code=source_code, vector=vector)
Code example for creating an instance of 'SymbolDocEmbedding':
from automata.symbol_embedding.base import SymbolDocEmbedding
from automata.symbol.parser import parse_symbol
import numpy as np
symbol = parse_symbol('your_symbol_here')
document = 'A document string containing information about the symbol.'
vector = np.random.rand(10)
symbol_doc_embedding = SymbolDocEmbedding(symbol, document, vector)
If you want to contribute to automata, be sure to review the contribution guidelines. This project adheres to automata's code of conduct. By participating, you are expected to uphold this code.
We use GitHub issues for tracking requests and bugs, please see automata Discussions for general questions and discussion, and please direct specific questions.
The automata project strives to abide by generally accepted best practices in open-source software development.
The ultimate goal of the automata project is to achieve a level of proficiency where it can independently design, write, test, and refine complex software systems. This includes the ability to understand and navigate large codebases, reason about software architecture, optimize performance, and even invent new algorithms or data structures when necessary.
While the complete realization of this goal is likely to be a complex and long-term endeavor, each incremental step towards it not only has the potential to dramatically increase the productivity of human programmers, but also to shed light on fundamental questions in AI and computer science.
automata is licensed under the Apache License 2.0.
This project is an extension of an initial effort between emrgnt-cmplxty and maks-ivanov that began with this repository.