For individuals who are usually not aware of Geoguessr, it’s a easy and enjoyable recreation during which you’re positioned on a random world location on Google Maps and must guess the situation throughout a time countdown, the objective is to get your guess as shut as doable to the actual location. After enjoying this recreation with some associates I began to consider how I may use the sport idea to construct one thing that enables me to follow with generative AI, that was how this challenge “GenAI GeoGuesser” was born. On my model of the sport you’ll have to guess the nation identify primarily based on hints generated by AI fashions, to assist with the understanding listed below are a couple of screenshots showcasing the sport’s workflow.
First, the person selects the specified trace modalities, you’ll be able to select any variety of choices between “Audio”, “Textual content” and ”Picture”, you additionally should choose the variety of hints that will likely be generated for every modality. For the instance above you’d get 1 trace for every one of many 3 sorts.
The textual content trace can have a textual description of the nation.
The picture trace will likely be pictures that resemble the nation.
Lastly, the audio trace needs to be an audio/sound associated to the nation (In my expertise the audio hints don’t work in addition to the opposite two).
All of the fashions used to generate the hints above have parameters to fine-tune the technology course of, you would generate longer textual content or audio hints, and even change the fashions. The repository has intuitive parameters to play with.
When you end evaluating all of the hints and are able to guess, kind the guess within the “Nation guess” subject.
If the guess is flawed you’re going to get the proper nation identify and the gap between your guess and the proper place.
If the guess is appropriate you’ll obtain a congratulations message.
Now that you’re aware of the sport’s workflow let’s perceive what is occurring underneath the hood at every step.
The sport begins with the nation choice, right here I wished to imitate the unique Geoguessr conduct the place probabilistically you’d be dropped into bigger nations (extra likelihood of being positioned there), because of this, simply randomly choosing a county wouldn’t be sufficient, small nations would have the identical likelihood of enormous ones, fortunately I discovered the countryinfo lib which supplied a listing of nations and a few metadata like nation space, beneath you’ll be able to see how the code appears like.
Choosing the nation
from countryinfo import CountryInfocountry_list = listing(CountryInfo().all().keys())
# construct a dict with nation:space pairs
country_df = {
nation: CountryInfo(nation).space() for nation in country_list
}
country_df = pd.DataFrame(country_df.objects(), columns=["country", "area"])
# choose a random nation the place the likelihood is the nation's space
nation = country_df.pattern(n=1, weights="space")["country"].iloc[0]
Textual content hints
For the textual content trace technology step, I’ve chosen a Gemma mannequin, the model with 2 billion parameters is ready to generate high-quality textual content whereas nonetheless working quick sufficient to not disrupt the person expertise, the Gemma fashions are a household of light-weight, state-of-the-art open fashions constructed from the identical analysis and expertise used to create the Gemini fashions.
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfigtokenizer = AutoTokenizer.from_pretrained("google/gemma-1.1-2b-it")
mannequin = AutoModelForCausalLM.from_pretrained("google/gemma-1.1-2b-it")
immediate = f"Describe the nation {nation} with out mentioning its identify"
input_ids = tokenizer(immediate, return_tensors="pt")
text_hint = mannequin.generate(**input_ids)
# extract the textual content from the output and clear up
text_hint = (
tokenizer.decode(text_hint, skip_special_tokens=True)
.change(immediate, "")
)
You may as well run the textual content trace technology utilizing Gemini fashions by way of Vertex to get quicker and better-quality outputs (verify the configs file).
from vertexai.generative_models import GenerativeModelmannequin = GenerativeModel("gemini-1.5-pro-preview-0409")
immediate = f"Describe the nation {nation} with out mentioning its identify"
responses = mannequin.generate_content(immediate)
# extract the textual content from the output
text_hint = responses.candidates[0].content material.components[0].textual content
Picture hints
For the picture technology half, I’ve chosen the SDXL-Turbo mannequin, that is the model of the favored Steady Diffusion mannequin that may generate high-quality pictures with as little as a single inference step.
from diffusers import AutoPipelineForText2Imagemannequin = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
immediate = f"A picture associated to the nation {nation}"
img_hints = mannequin(immediate=immediate).pictures
Audio hints
To generate the audio hints we will likely be utilizing the AudioLDM2 mannequin. From my experiments with completely different audio technology fashions, this one had a superb trade-off between the velocity and high quality of the outputs for this particular use case.
from diffusers import AudioLDM2Pipelinemannequin = AudioLDM2Pipeline.from_pretrained("cvssp/audioldm2-music")
immediate = f"A sound that resembles the nation of {nation}"
audio_hints = mannequin(immediate).audios
With this, we conclude the hint-generation course of, as you’ll be able to see the HuggingFace libraries make our work fairly simple right here, the principle complexity of this app was associated to the precise workflow of the Streamlit app, this half is a bit out of context of this text as a result of it’s extra technical and particular to that framework, however if you’re curious to grasp it you’ll be able to go to the Git repository of this project.
Continue to learn
If you wish to look into different enjoyable use instances of generative AI utilized to video games you would possibly get pleasure from studying my different challenge Gemini Hangman.
To look into one other challenge utilizing a number of modalities of generative AI, try my earlier article on producing music clips with AI.