All daily activities in the human world are reproduced 1:1 in "Minecraft". A civilization composed of 100 billion AI agents actually looks like this.
The world’s first “AI intelligent civilization” has finally been revealed!
2 months ago, more than 1,000 agents collaborated in the virtual world to build their own economy, culture, religion and government...
Netizens exclaimed that it can be called a real-life version of "Western World".
Now, this civilization has evolved again. What is the world like with a civilization of 100 billion AI agents?
They are an individual and a group.
All activities in human daily life will be replicated 1:1 in "Minecraft".
RoBERT Yang, an alumnus of Peking University, shared the team’s latest research and introduced for the first time the new “cognitive architecture” behind PIANO.
PIANO (Parallel Information Aggregation Neural Coordination) is an architecture that enables AI agents to interact with multiple parties while maintaining coherence in multiple output streams.
Project address: https://github.com/altera-al/project-sid
How can an AI agent think and act simultaneously, on multiple time scales, operating in both conscious and subconscious ways?
Just like the keys of a piano, they represent different brain modules. When played together, they can produce beautiful chords. In intelligent agents, human-like qualities are also produced.
These intelligent agents will build a "civilization". Taxation, trade, government, country, religion...all the daily activities in the human world are also included in AI agents.
Moreover, AI agents can accurately infer the emotions of others, establish friendships, and even make enemies. Some introverted agents, like sociophobic humans, have fewer social connections than extroverted agents.
Some netizens said that we live in a simulation matrix and the future is now.
Next, let’s take a look at the “world” of AI agents. What is the overall picture?
Why is AI intelligent civilization needed?
In order for AI agents to coexist with humans and integrate into our society, they need to be not only autonomous but also capable of collaboration.
In recent years, advances in large language models (LLMs) for reasoning and decision-making have significantly enhanced the autonomy of agents.
However, simply having autonomy is not enough. Agents must also coexist with humans and other agents in human civilization.
As the author of the paper said:
Measuring civilizational progress by the ability of agents to coexist and advance non-human civilizations represents the ultimate benchmark for the capabilities of artificial intelligence agents.
But building an AI civilization is not easy.
First, LLM-based agents often have difficulty maintaining a sense of reality in their actions and reasoning.
Even when equipped with modules for planning and reflection, agents often fall into repetitive behavior patterns or accumulate errors through hallucinations, preventing meaningful progress.
Second, agents that miscommunicate their thoughts and intentions can mislead other agents, leading to further hallucinations and loops. This miscommunication often occurs in groups of agents, leading to dysfunctional behavior and worsening the performance of individuals in the group.
Finally, current benchmarking of agents focuses on the performance of autonomous agents in various domains, such as web search, programming, search and query, and reasoning.
So, what is the optimal solution for building an AI agent?
New PIANO architecture
In order to solve existing problems, the new PIANO architecture came into being.
The PIANO architecture is a comprehensive, highly flexible intelligent agent design framework.
Among them, P represents the perception module, I represents the intelligent core, and A is the action module. One of the most striking features of the PIANO architecture is that it allows agents to think and act simultaneously.
This feature breaks the limitations of mutual constraints between action and thinking that may exist in traditional architectures.
In complex and ever-changing environments, agents face a variety of situations, including immediate threats that require rapid response, as well as thoughtful long-term planning.
In terms of behavioral coherence, the cognitive controller (CC) module is introduced.
The cognitive controller (CC) module is like the "brain center" of the intelligent agent, responsible for making high-level decisions - by receiving and synthesizing information from each module, the cognitive controller transforms this information into a unified and coordinated decision. , and further converts it into the appropriate output in each motor module.
It ensures harmonious collaboration between various modules and avoids inconsistencies caused by different modules working independently.
Based on the above two architectural principles, the PIANO architecture system consists of 10 different modules that run concurrently. Its core modules include:
-memory:
The memory module can be called the "treasury of wisdom" of the agent. Whether it is a short daily greeting, an in-depth technical discussion, or an emotional communication, every word and every change of tone is accurately stored.
In addition, the agent can not only remember the description of each step, but also the questions asked in the conversation, the order of answers, and the key points emphasized by both parties.
-Action awareness:
It's like a comprehensive physical examination system. Through this module, the agent can accurately grasp its own energy reserve and know how long the remaining power can support operation, or whether the fuel reserve is sufficient to complete the next stage of the task.
At the same time, it can monitor various components in real time, such as detecting whether the sensor is working properly, the flexibility of the mechanical joints, the computing speed of the data processing unit, etc. No tiny abnormality can escape its "eyes".
-Target generation:
It is based on the agent's rich experience and in-depth interaction with the environment, constantly nurturing new goals and pushing the agent forward.
For example, in a multi-agent cooperation logistics scenario, the agent finds that congestion often occurs in a certain area during cargo transportation (environmental interaction), and it has previously participated in optimizing transportation routes (past experience), then the goal is generated The module may generate a new goal: to collaborate with other agents to design a new transportation route scheme that avoids congested areas.
This goal generation mechanism gives the agent the ability to actively explore and innovate, so that it not only passively performs preset tasks, but also actively expands its field of action according to the actual situation.
-Social awareness:
It opens the door for intelligent agents to understand and integrate into the group.
Simple to understand, it can respond quickly to specific action information.
For example, a simple gesture (raising an arm may mean asking for help or attracting attention), or a specific body posture (leaning forward slightly may indicate friendliness and attention), the social awareness module can accurately recognize and understand its meaning.
Of course, if it recognizes the help signal from other agents, it can decide whether to provide help based on its own capabilities and the current task situation.
-dialogue:
The dialogue module is the "language center" of the intelligent agent and the key to effective communication with the outside world.
The dialogue module has powerful syntax analysis and semantic understanding capabilities. It can accurately parse all types of language input it receives, whether it is concise and clear instructions, emotional expressions or complex and abstract conceptual descriptions.
Moreover, for vague or ambiguous language, it can also make reasonable inferences based on context and language habits.
In terms of language generation, the dialogue module can accurately express its thoughts based on the internal state and intentions of the agent.
-Skill execution:
The skill execution module is the direct executor of the interaction between the intelligent agent and the external environment. When the agent needs to perform a specific skill or action in the environment, the skill execution module coordinates the relevant parts in an orderly manner.
Single agent-multi-agent evolution
Taking "Minecraft" as an example, researchers selected 1,000 items for evaluation in an attempt to observe and measure the progress of intelligent civilization.
single agent
First, the performance of the agent is evaluated by how it acquires items in Minecraft.
The researchers set up 25 agents. Their backpacks were empty at the beginning, and the places where they were born were far away and they could not communicate with each other. These agents were all set to explore and collect items. "Explorer".
They are born in different places, like the surface, caves, forests or other different environments. Different spawn points mean that they have different resources, and the difficulty of completing the goal of collecting items is also different.
For example, an agent born on the surface where there are many resources may have a lot of basic materials such as wood and stone around, which can be easily used to make basic tools; but an agent born in a cave may have a lot of minerals, but there are also darkness, monsters, etc. Dangerous, and you have to explore outside to get more kinds of things.
Researchers found that after playing for 30 minutes with an agent with a complete PIANO architecture, an average of 17 different items could be obtained. However, their performance varies greatly, mainly due to differences in birth position.
Some agents can only get less than 5 items, while the best-performing agents can get 30-40 items, which is almost the same as human players with some experience in "Minecraft".
So, what is the upper limit of the development of a single agent?
The researchers found that under the same conditions, they increased the number of agents to 49 and let them play for four hours. After many experiments, it was found that the number of different items collected by all agents has stabilized at one-third of all items in "Minecraft" (about 320 items).
multi-agent
Multi-agent, as the name suggests, is a group of multiple agents that can communicate or compete with each other in the same environment.
Small groups:
In order for agents to cooperate and develop in a group, they must be able to understand the actions and thoughts of other agents. This ability to understand both themselves and others allows agents to adjust their behavior according to the situation in a social environment. Behavior.
For example, build trust when working with allies, and deal with competition and conflict when getting along with opponents. Researchers found through experiments that agents are not only socially capable but can form meaningful social relationships in large-scale simulations of up to 50 agents.
The researchers mainly studied the role and consciousness of the agent in the group through two sets of experiments.
-Can socially aware agents infer other people’s emotions through chatting?
In the chat experiment between three characters and the agent in "Minecraft", it can be seen that when the characters in the game express emotional changes such as love-anger-love, the agent is fully able to understand these emotional changes and make React accordingly.
-Can the agent sense emotions and act accordingly?
In another experiment, the behavior of an agent was inferred by how much the character in the game liked or disliked the same agent. The researchers found that the agent not only accurately inferred the intentions of the game character, but also used intentions to make its own decisions when making decisions. action.
society:
Subsequently, the researchers placed 50 agents in a randomly generated map of "Minecraft" and gave each agent a unique personality. They can move freely in this world and communicate with other agents at will.
In this free scenario, the researchers found that not only could the agent accurately judge the roles of other agents, but the more agents involved in the judgment and the longer they communicated, the more accurate the judgment.
In addition, in this experiment, the researchers also discovered several important phenomena:
-The importance of social modules:
If the social module is removed, the relationship between the agents will be relatively flat at this time, which shows that the social module is very important for the development of long-term relationships (whether it is a good or bad direction).
-The impact of personality on social networks:
The researchers found that some agents had different social connection patterns based on their personalities.
For example, introverted agents receive significantly fewer connections than extroverted social agents, which shows that personality can also be reflected in large and complex social networks.
And, while most of the time the emotions are mutual, it's not always that way. An agent may have a favorable opinion of another agent that ignores it, just like the situation in the real world where interpersonal relationships are complex and not always mutual.
Civilization is born
After the evolution from single agent to multi-agent, the next step is the birth of civilization.
To assess the agents' ability to advance civilization, the researchers evaluated how they behaved in several situations:
– Behavior of agents under collective rules (focusing on compliance and revision of tax laws)
– Explore cultural communication through the spontaneous generation of memes and the structured communication of single religions
Each performs his or her duties and specializes in division of labor
It is the specialized division of labor of human beings that drives the progress of civilization and promotes the advancement of agriculture, governance, culture, and technology. To replicate these emerging civilizational qualities, agents should also possess them.
To this end, the author proposes three basic standards for agent specialization:
First, have autonomy in role selection and transition. Second, their specialization should be demonstrated through interaction and experience, without clear directions and limitations. Finally, the roles they choose should be reflected in behaviors consistent with their profession.
As shown in the figure below, researchers put intelligent agents in a village, and they will develop different professions on their own, such as farmers and engineers.
Removing social awareness leads to agents choosing more homogeneous roles that do not persist over time.
The following is the simulated distribution of the behavior of 30 agents in a village.
Comply with tax laws, change laws
Can AI agents make and modify their own laws?
Next, the researchers tested the agent by implementing a tax system. It was found that they not only complied with tax laws but also democratically voted to change tax rates based on public sentiment.
Religious spread varies from town to town
Finally, can AI agents develop their own culture?
The researchers looked specifically at the organic spread of memes and tracked how agents formed a fictional religion and spread through agent associations.
What is even more interesting is that rural areas and towns present different cultural patterns.
Peking University alumni start a business to build empathetic AI
The reason why Project Sid was launched is because the Altera AI team hopes that by exploring these issues, digital humans can ultimately be seamlessly integrated into human society.
Robert Yang is the co-founder and CEO of Altera.
Previously, he received PhD degrees in computational neuroscience from New York University and Yale University, and a bachelor's degree in physics from Peking University.
He was a professor in the Department of Brain and Cognitive Sciences and the Department of Electrical Engineering and Computer Science at MIT, and the leader of the MIT MetaConscious group.
In 2023, he closed his lab and left his tenure-track position at MIT to found Altera.
Although Altera's team is small, its talent density is extremely high——
It is composed of computational neuroscientists, physics Olympians, and engineers from MIT's Department of Electrical Engineering and Computer Science, Stanford's Natural Language Processing Group, Google X, Citadel, Supercell, and more.
This company, established more than half a year ago, received US$2 million in seed financing at the beginning of the year, led by Andreessen Horowitz.
Three months later, it raised another $9 million, led by former Google CEO Eric Schmidt’s First Spark Ventures, Patron VC, angel investor Mitch Lasky and others.
In May of this year, Altera opened a branch in Menlo Park and is committed to becoming the first supplier of smart consumer products.