1. Introduction
Imagine a world where AI agents aren’t working together to achieve a common goal. Instead, each agent is out to win the game of Rock-Paper-Scissors. The mission of each agent is straightforward: defeat the others.
Can a machine strategize in a game of pure chance? And if it can, which model will emerge victorious?
In order to answer that very question, I built a multi-agent system to host fully automated Rock-Paper-Scissors tournaments, pitting various AI models against one another to see who comes out on top. From OpenAI’s cutting-edge models to Meta’s Llama and Anthropic’s Claude, each agent brings its own “personality” and decision-making quirks to the table.
This isn’t just an experiment in gaming; it’s also a showcase of the latest capabilities in multi-agent systems. Using CrewAI and LangGraph, it is easy to create AI agents and put them into complicated flows.
In our games, we will test the following AI:
2. Architecture and Workflow
This project combines two popular frameworks: LangGraph for workflow orchestration and CrewAI for agent definitions:
- The workflow is built as a multi-agent system using LangGraph’s graph structure
- Each AI agent is defined as a Crew using CrewAI.
The graph and crew definition can be found in the src folder in the source code github repo.
Workflow:
- In each round, two player agents make their moves independently and in parallel. They have access to the history of previous rounds, allowing them to analyze patterns and decide on the best move.
- After the players make their moves, a judge agent determines the winner of the round.
- The system checks if the criteria for determining the final winner have been met (e.g., reaching the specified number of rounds, or a player winning 3 out of 5 rounds.).
- Criteria Not Met: If the criteria are not met, another round begins.
- If the criteria are met: The final winner is announced, and a post-game analysis is performed.
After running hundreds of matches, the results were nothing short of interesting – and sometimes hilarious. Let’s look at what we discovered.