OpenAI recently launched GPT-4o, their first multi-modal model, that features extremely efficient capabilities in textual content material, image, video, and audio analysis. This significantly broadens the needs for generative AI fashions. On this weblog, I intention to disclose how it is best to use this model by means of the API, which at current helps textual content material and film inputs with textual content material outputs. Although its full set of choices simply is not however obtainable, OpenAI will be rolling them out rapidly.
- Common textual content material know-how
- Textual content material know-how in json mode
- Image Understanding
- Function Calling
Intial Setup
Arrange, import relevent libraries and setup setting variables with OpenAI secret key.
pip arrange --upgrade openai --quiet
First step is to setup openai client. With a view to do this you simply first should create envirnonment variables with secret key. Create a .env file with openai secret key saved as OPENAI_KEY=xyz.
As quickly as accomplished, now you could entry key using dotenv.
import os
from openai import OpenAI
from dotenv import load_dotenvload_dotenv()
## Set the API key and model title
MODEL="gpt-4o"
api_key = os.getenv('OPENAI_KEY')
client = OpenAI(api_key=api_key)
Common Textual content material Expertise
completion = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant. Help me with my math homework!"}, # <-- This is the system message that provides context to the model
{"role": "user", "content": "Hello! Could you solve 2+2?"} # <-- This is the user message for which the model will generate a response
]
)print("Assistant: " + completion.selections[0].message.content material materials)
Output:
In any case! 2 + 2 = 4 For individuals who want help with the remainder, be blissful to ask!
Textual content material Expertise in Json Mode
completion = client.chat.completions.create(
model=MODEL,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": "You are a trainer who always respond in JSON"},
{"role": "user", "content": "Create a weekly workout routine for me"}
]
)json.lots of(completion.selections[0].message.content material materials)
Output:
{‘workoutRoutine’: {‘week’: 1, ‘days’: {‘Monday’: {‘muscleGroup’: ‘Chest and Triceps’, ‘exercise routines’: [{‘name’: ‘Bench Press’, ‘sets’: 4, ‘reps’: 12}, {‘name’: ‘Incline Dumbbell Press’, ‘sets’: 4, ‘reps’: 12}, {‘name’: ‘Tricep Dips’, ‘sets’: 3, ‘reps’: 15}, {‘name’: ‘Tricep Pushdown’, ‘sets’: 3, ‘reps’: 15}]}, ‘Tuesday’: {‘muscleGroup’: ‘Once more and Biceps’, ‘exercise routines’: [{‘name’: ‘Pull-Ups’, ‘sets’: 4, ‘reps’: 10}, {‘name’: ‘Deadlifts’, ‘sets’: 4, ‘reps’: 12}, {‘name’: ‘Barbell Rows’, ‘sets’: 4, ‘reps’: 12}, {‘name’: ‘Bicep Curls’, ‘sets’: 3, ‘reps’: 15}]},…..
Image Understanding
Using native image
from IPython.present import Image, present, Audio, Markdown
import base64IMAGE_PATH = "triangle.png"
# Preview image for context
present(Image(IMAGE_PATH))
# Open the image file and encode it as a base64 string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.study()).decode("utf-8")base64_image = encode_image(IMAGE_PATH)
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
{"role": "user", "content": [
{"type": "text", "text": "What's the area of the triangle?"},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
print(response.selections[0].message.content material materials)
Output:
To look out the world of the triangle, we are going to use the formulation for the world of a correct triangle: [ text{Area} = frac{1}{2} times text{base} times text{height} ] On this triangle, the underside is 20 cm and the height is 15 cm. [ text{Area} = frac{1}{2} times 20 , text{cm} times 15 , text{cm} ] [ text{Area} = frac{1}{2} times 300 , text{cm}² ] [ text{Area} = 150 , text{cm}² ] So, the world of the triangle is ( 150 , textual content material{cm}² ).
Using URL
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant that responds in Markdown"},
{"role": "user", "content": [
{"type": "text", "text": "What do you see in this image? How would you describe the emotion shown?"},
{"type": "image_url", "image_url": {
"url": "https://pbs.twimg.com/media/GNeb4-Ua8AAuaKp?format=png&name=small"}
}
]}
],
temperature=0.0,
)print(response.selections[0].message.content material materials)
Image:
Output:
The image reveals a person smiling. The emotion conveyed appears to be happiness or contentment. The smile suggests a optimistic and good mood.
Function Calling
# Mock carry out to get NBA sport scores
def get_nba_game_score(crew):
print('get_nba_game_score often called')
"""Get the current score of an NBA sport for a given crew"""
if "lakers" in crew.lower():
return json.dumps({"crew": "Lakers", "score": "102", "opponent": "Warriors", "opponent_score": "98"})
elif "bulls" in crew.lower():
return json.dumps({"crew": "Bulls", "score": "89", "opponent": "Celtics", "opponent_score": "95"})
else:
return json.dumps({"crew": crew, "score": "N/A", "opponent": "N/A", "opponent_score": "N/A"})
Identify the carry out by means of system if needed:
def function_calling():
# Step 1: Initialize dialog with the patron's message
messages = [{"role": "user", "content": "What's the score of the Lakers game?"}]
# Define obtainable devices (capabilities) the model can use
devices = [
{
"type": "function",
"function": {
"name": "get_nba_game_score",
"description": "Get the current score of an NBA game for a given team",
"parameters": {
"type": "object",
"properties": {
"team": {
"type": "string",
"description": "The name of the NBA team, e.g. Lakers, Bulls",
},
},
"required": ["team"],
},
},
}
]
# Step 2: Ship the dialog context and obtainable devices to the model
response = client.chat.completions.create(
model=MODEL,
messages=messages,
devices=devices,
tool_choice="auto", # auto is default, nevertheless we'll be specific
)
# Extract the response from the model
response_message = response.selections[0].message
tool_calls = response_message.tool_calls # Take a look at if the model wishes to call any devices
# Step 3: Take a look at if there are any system calls requested by the model
if tool_calls:
# Define obtainable capabilities
available_functions = {
"get_nba_game_score": get_nba_game_score,
} # Only one carry out on this occasion, nevertheless could be extended
# Add the model's response to the dialog historic previous
messages.append(response_message)
# Step 4: Identify the carry out requested by the model
for tool_call in tool_calls:
function_name = tool_call.carry out.title
function_to_call = available_functions[function_name]
function_args = json.lots of(tool_call.carry out.arguments)
print(f"Gadget title: {tool_call}")
# Identify the carry out with the extracted arguments
function_response = function_to_call(
crew=function_args.get("crew"),
)
# Add the carry out response to the dialog historic previous
messages.append(
{
"tool_call_id": tool_call.id,
"operate": "system",
"title": function_name,
"content material materials": function_response,
}
)
# Step 5: Proceed the dialog with the updated historic previous
second_response = client.chat.completions.create(
model=MODEL,
messages=messages,
) # Get a model new response from the model the place it is going to in all probability see the carry out response
return second_response
# Run the dialog and print the consequence
response = function_calling()
print(response.selections[0].message.content material materials)
Output:
Gadget title: ChatCompletionMessageToolCall(id=’call_k2lcfdlVAcQ8PTUL1uwu3fYz’, carry out=Function(arguments=’{“crew”:”Lakers”}’, title=’get_nba_game_score’), type=’carry out’) get_nba_game_score often called
The current score for the Lakers sport is Lakers 102, Warriors 98.
Conclusion
Now that you just simply’ve study by your complete article, you’re ready to utilize the GPT-4o model for textual content material know-how, JSON mode, image understanding, and efficiency calling by means of the OpenAI API. I plan to jot down one different weblog as quickly as audio and video help is added to the API. Until then, protect exploring and finding out!