Appointment Time: 10/21/2020 @ 2:00pm
The main idea behind our project is to have our Minecraft character have a buddy (Scout) that follows us around and performs helpful tasks for us, and to control Scout by using hand signals in front of a webcam. For example, a hand sign pointing forward can represent “go attack the monsters in front of me”, while two fingers up can mean “go hunt me some food”. The AI will take into account the webcam input and produce a behavior in the Minecraft game such as attacking a monster or grabbing items. This is useful for carrying out convenient, helpful tasks like having Scout sentry an area while you are farming in it, put down torches in front of you and remove them when you are not using them, and having him be able to backtrack anywhere so you never lose your house or get lost in a cave.
For the Computer Vision portion, a simple and efficient AI pipeline would be as follows: Webcam Image -> Simple Preprocessing -> Convolutional Autoencoder -> Pass latent vector to simple algo such as SVM or KNN -> retrieve max-probability choice
Since the commands sent to the Scout are represented through hand signals from the user, one useful metric for the success of our project would be the percentage of correctly executed commands from the Scout before and after training. We will start with 5 easy commands and the AI will attempt to classify it without any training. Then, we will feed in our training data and compare the AI’s predictions with those made prior. Since the AI’s actions will be based on its prediction of the command, we will focus first on creating a functional algorithm that achieves at least 90% accuracy after training. We will then work towards implementing the reinforcement learning into the AI using its actions as feedback. The training data will be generated by capturing visual inputs from our laptop cameras of the user performing the command that we want the Scout to learn. Based on that visual, the AI should be able to accurately identify and carry out the command the user wishes to execute.
The simple base cases for the commands will be simple and distinct hand gestures such as numbers and a closed fist. After ensuring that these sanity cases work effectively, we will attempt to scale the project up by implementing more complex commands. To verify the integration between the computer vision algorithms and the code utilized by Malmo, we will integrate very simple behaviors for the AI to visually show case success while also presenting debug information in logs showing events that were occurring during the test run. This will allow us to evaluate specific issues to isolate problems specific to Minecraft versus the identification capability of the algorithms. The stretch goal for the project will be the implementation of reinforcement learning of the AI in response to the user’s motions based on the user manually rewarding the AI, other additional commands will vary in complexity and introduce problems in various areas that were not covered in the original target command set (for example a cave backtracking function will require AI algorithms specific to that and efficient data storage/retrieval techniques).