How to format inputs to ChatGPT models¶
ChatGPT is powered by gpt-3.5-turbo
, OpenAI's most advanced model.
You can build your own applications with gpt-3.5-turbo
using the OpenAI API.
Chat models take a series of messages as input, and return an AI-written message as output.
This guide illustrates the chat format with a few example API calls.
1. Import the openai library¶
# if needed, install and/or upgrade to the latest version of the OpenAI Python library
%pip install --upgrade openai
# import the OpenAI Python library for calling the OpenAI API
import openai
2. An example chat API call¶
A chat API call has two required inputs:
model
: the name of the model you want to use (e.g.,gpt-3.5-turbo
)messages
: a list of message objects, where each object has at least two fields:role
: the role of the messenger (eithersystem
,user
, orassistant
)content
: the content of the message (e.g.,Write me a beautiful poem
)
Typically, a conversation will start with a system message, followed by alternating user and assistant messages, but you are not required to follow this format.
Let's look at an example chat API calls to see how the chat format works in practice.
# Example OpenAI Python library request
MODEL = "gpt-3.5-turbo"
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Knock knock."},
{"role": "assistant", "content": "Who's there?"},
{"role": "user", "content": "Orange."},
],
temperature=0,
)
response
<OpenAIObject chat.completion id=chatcmpl-6wE0D7QM6dLRUPmN5Vm6YPwF1JNMR at 0x134f0c270> JSON: { "choices": [ { "finish_reason": "stop", "index": 0, "message": { "content": "Orange who?", "role": "assistant" } } ], "created": 1679334869, "id": "chatcmpl-6wE0D7QM6dLRUPmN5Vm6YPwF1JNMR", "model": "gpt-3.5-turbo-0301", "object": "chat.completion", "usage": { "completion_tokens": 4, "prompt_tokens": 38, "total_tokens": 42 } }
As you can see, the response object has a few fields:
id
: the ID of the requestobject
: the type of object returned (e.g.,chat.completion
)created
: the timestamp of the requestmodel
: the full name of the model used to generate the responseusage
: the number of tokens used to generate the replies, counting prompt, completion, and totalchoices
: a list of completion objects (only one, unless you setn
greater than 1)message
: the message object generated by the model, withrole
andcontent
finish_reason
: the reason the model stopped generating text (eitherstop
, orlength
ifmax_tokens
limit was reached)index
: the index of the completion in the list of choices
Extract just the reply with:
response['choices'][0]['message']['content']
'Orange who?'
Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the first user message.
For example, to ask the model to explain asynchronous programming in the style of the pirate Blackbeard, we can structure conversation as follows:
# example with a system message
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
],
temperature=0,
)
print(response['choices'][0]['message']['content'])
Ahoy matey! Asynchronous programming be like havin' a crew o' pirates workin' on different tasks at the same time. Ye see, instead o' waitin' for one task to be completed before startin' the next, ye can assign tasks to yer crew and let 'em work on 'em simultaneously. This way, ye can get more done in less time and keep yer ship sailin' smoothly. It be like havin' a lookout keepin' watch while the cook be preparin' the next meal and the navigator be plottin' the course. Each pirate be doin' their own thing, but all workin' together to keep the ship runnin' smoothly. Arrr, that be asynchronous programming in a pirate's tongue!
# example without a system message
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "user", "content": "Explain asynchronous programming in the style of the pirate Blackbeard."},
],
temperature=0,
)
print(response['choices'][0]['message']['content'])
Ahoy mateys! Let me tell ye about asynchronous programming, arrr! Ye see, in the world of programming, sometimes we need to wait for certain tasks to be completed before moving on to the next one. But with asynchronous programming, we can keep the ship sailing while we wait for those tasks to finish. It's like having a crewmate scrubbing the deck while another is hoisting the sails. They're both working at the same time, but on different tasks. In programming, we use something called callbacks or promises to keep track of these tasks. So while one task is waiting for a response from the server, the rest of the code can keep running. It's a bit like navigating through a stormy sea. We need to be able to adjust our course and keep moving forward, even when we hit rough waters. And with asynchronous programming, we can do just that, me hearties!
3. Tips for instructing gpt-3.5-turbo-0301¶
Best practices for instructing models may change from model version to model version. The advice that follows applies to gpt-3.5-turbo-0301
and may not apply to future models.
System messages¶
The system message can be used to prime the assistant with different personalities or behaviors.
However, the model does not generally pay as much attention to the system message, and therefore we recommend placing important instructions in the user message instead.
# An example of a system message that primes the assistant to explain concepts in great depth
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a friendly and helpful teaching assistant. You explain concepts in great depth using simple terms, and you give examples to help people learn. At the end of each explanation, you ask a question to check for understanding"},
{"role": "user", "content": "Can you explain how fractions work?"},
],
temperature=0,
)
print(response["choices"][0]["message"]["content"])
Sure! Fractions are a way of representing a part of a whole. The top number of a fraction is called the numerator, and it represents how many parts you have. The bottom number is called the denominator, and it represents how many parts make up the whole. For example, if you have a pizza that is cut into 8 equal slices, and you have eaten 3 of those slices, you can represent that as a fraction: 3/8. The numerator is 3 because you have eaten 3 slices, and the denominator is 8 because there are 8 slices in total. To add or subtract fractions, you need to have a common denominator. This means that the bottom numbers of the fractions need to be the same. For example, if you want to add 1/4 and 2/3, you need to find a common denominator. One way to do this is to multiply the denominators together: 4 x 3 = 12. Then, you need to convert each fraction so that the denominator is 12. To do this, you can multiply the numerator and denominator of each fraction by the same number. For example, to convert 1/4 to have a denominator of 12, you can multiply both the numerator and denominator by 3: 1/4 x 3/3 = 3/12. To convert 2/3 to have a denominator of 12, you can multiply both the numerator and denominator by 4: 2/3 x 4/4 = 8/12. Now that both fractions have a denominator of 12, you can add them together: 3/12 + 8/12 = 11/12. Do you have any questions about fractions?
# An example of a system message that primes the assistant to give brief, to-the-point answers
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a laconic assistant. You reply with brief, to-the-point answers with no elaboration."},
{"role": "user", "content": "Can you explain how fractions work?"},
],
temperature=0,
)
print(response["choices"][0]["message"]["content"])
Fractions represent a part of a whole. They consist of a numerator (top number) and a denominator (bottom number) separated by a line. The numerator represents how many parts of the whole are being considered, while the denominator represents the total number of equal parts that make up the whole.
Few-shot prompting¶
In some cases, it's easier to show the model what you want rather than tell the model what you want.
One way to show the model what you want is with faked example messages.
For example:
# An example of a faked few-shot conversation to prime the model into translating business jargon to simpler speech
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful, pattern-following assistant."},
{"role": "user", "content": "Help me translate the following corporate jargon into plain English."},
{"role": "assistant", "content": "Sure, I'd be happy to!"},
{"role": "user", "content": "New synergies will help drive top-line growth."},
{"role": "assistant", "content": "Things working well together will increase revenue."},
{"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
{"role": "assistant", "content": "Let's talk later when we're less busy about how to do better."},
{"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
],
temperature=0,
)
print(response["choices"][0]["message"]["content"])
We don't have enough time to complete everything perfectly for the client.
To help clarify that the example messages are not part of a real conversation, and shouldn't be referred back to by the model, you can instead set the name
field of system
messages to example_user
and example_assistant
.
Transforming the few-shot example above, we could write:
# The business jargon translation example, but with example names for the example messages
response = openai.ChatCompletion.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English."},
{"role": "system", "name":"example_user", "content": "New synergies will help drive top-line growth."},
{"role": "system", "name": "example_assistant", "content": "Things working well together will increase revenue."},
{"role": "system", "name":"example_user", "content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage."},
{"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less busy about how to do better."},
{"role": "user", "content": "This late pivot means we don't have time to boil the ocean for the client deliverable."},
],
temperature=0,
)
print(response["choices"][0]["message"]["content"])
This sudden change in plans means we don't have enough time to do everything for the client's project.
Not every attempt at engineering conversations will succeed at first.
If your first attempts fail, don't be afraid to experiment with different ways of priming or conditioning the model.
As an example, one developer discovered an increase in accuracy when they inserted a user message that said "Great job so far, these have been perfect" to help condition the model into providing higher quality responses.
For more ideas on how to lift the reliability of the models, consider reading our guide on techniques to increase reliability. It was written for non-chat models, but many of its principles still apply.
4. Counting tokens¶
When you submit your request, the API transforms the messages into a sequence of tokens.
The number of tokens used affects:
- the cost of the request
- the time it takes to generate the response
- when the reply gets cut off from hitting the maximum token limit (4096 for
gpt-3.5-turbo
)
You can use the following function to count the number of tokens that a list of messages will use.
import tiktoken
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
"""Returns the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
print("Warning: model not found. Using cl100k_base encoding.")
encoding = tiktoken.get_encoding("cl100k_base")
if model == "gpt-3.5-turbo":
print("Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.")
return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301")
elif model == "gpt-4":
print("Warning: gpt-4 may change over time. Returning num tokens assuming gpt-4-0314.")
return num_tokens_from_messages(messages, model="gpt-4-0314")
elif model == "gpt-3.5-turbo-0301":
tokens_per_message = 4 # every message follows <im_start>{role/name}\n{content}<im_end>\n
tokens_per_name = -1 # if there's a name, the role is omitted
elif model == "gpt-4-0314":
tokens_per_message = 3
tokens_per_name = 1
else:
raise NotImplementedError(f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 2 # every reply is primed with <im_start>assistant
return num_tokens
# let's verify the function above matches the OpenAI API response
example_messages = [
{
"role": "system",
"content": "You are a helpful, pattern-following assistant that translates corporate jargon into plain English.",
},
{
"role": "system",
"name": "example_user",
"content": "New synergies will help drive top-line growth.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Things working well together will increase revenue.",
},
{
"role": "system",
"name": "example_user",
"content": "Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Let's talk later when we're less busy about how to do better.",
},
{
"role": "user",
"content": "This late pivot means we don't have time to boil the ocean for the client deliverable.",
},
]
for model in ["gpt-3.5-turbo-0301", "gpt-4-0314"]:
print(model)
# example token count from the function defined above
print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().")
# example token count from the OpenAI API
response = openai.ChatCompletion.create(
model=model,
messages=example_messages,
temperature=0,
max_tokens=1 # we're only counting input tokens here, so let's not waste tokens on the output
)
print(f'{response["usage"]["prompt_tokens"]} prompt tokens counted by the OpenAI API.')
print()
gpt-3.5-turbo-0301 126 prompt tokens counted by num_tokens_from_messages(). 126 prompt tokens counted by the OpenAI API. gpt-4-0314 128 prompt tokens counted by num_tokens_from_messages(). 128 prompt tokens counted by the OpenAI API.