Research teams from the Hong Kong University of Science and Technology and the University of Science and Technology of China jointly developed the GameGen-X model, a diffusion converter model that can generate and interactively control open-world game videos. GameGen-X can not only generate game videos that include innovative characters, dynamic environments and complex actions, but also adjust game content in real time according to user's multimodal instructions (such as text and keyboard operations), allowing users to experience the fun of designing games themselves. . This research result marks a major breakthrough in AI in the field of game development and provides new possibilities for game content creation.
GameGen-X can generate open-world game videos by itself, which can simulate various game engine functions, including generating innovative characters, dynamic environments, complex actions and diverse events, and can also interact with you, allowing you to experience the pleasure of being a game planning. .
One of the highlights of GameGen-X is its controllability in interaction. It can predict and change future content based on current game clips, thereby enabling simulation of gameplay.
Users can influence the generated content through multimodal control signals, such as structured text instructions and keyboard control, thereby achieving control over character interaction and scene content.
To train GameGen-X, the researchers also constructed the first large open world game video dataset, OGameData. This dataset contains more than 1 million video clips of different game from more than 150 games, and uses GPT-4o to generate informative text descriptions for it.
The training process of GameGen-X is divided into two stages: basic model pre-training and instruction fine-tuning. In the first phase, the model is pre-trained through text-to-video generation and video continuation tasks, enabling it to generate high-quality, long-sequence open-domain game videos.
In the second phase, in order to achieve interactive controllability, the researchers designed the InstructNet module, which integrates multimodal control signal experts related to the game.
InstructNet allows models to adjust potential representations based on user input, thus unifying character interaction and scene content control in video generation for the first time. During instruction fine-tuning, only InstructNet is updated, while the pre-trained base model is frozen, allowing the model to integrate interactive controllability without losing the diversity and quality of generated video content.
Experimental results show that GameGen-X performs well in generating high-quality game content and provides excellent control over the environment and characters, superior to other open source and business models.
Of course, this AI is still in its infancy and there is still a long way to go before truly replacing game planning. But its emergence undoubtedly brings new possibilities to game development. It provides a new approach to game content design and development, demonstrating the potential of generative models as an auxiliary tool for traditional rendering technology, effectively integrating creative generation and interactive functions, bringing new things to future game development possibilities.
Project address: https://gamegen-x.github.io/
Although GameGen-X is still in its early stages of development, its outstanding performance in game video generation and interaction control indicates a broad prospect for the application of AI technology in the gaming industry. In the future, GameGen-X is expected to become a good assistant for game developers and promote the innovative development of the game industry.