Unlocking structured chatbot responses

27 Apr

Artificial Intelligence (or Machine Learning as some prefer) has finally made its very public entrance and now, millions of people are tinkering with the tech - the modern equivalent of taking apart the radio to see how it works. But as any tinkerer knows, dissection is only the beginning. True understanding and advancement only come from a more hands-on approach.

So let us begin with a prompt, human to human, as it were. How do we leverage this new tech in the language of our own applications? Can we channel this liquid intelligence in a structured way?

Join us on our journey to answer just that.

Groundwork

“I have made this letter longer than usual because I have not had time to make it shorter.” - Blaise Pascal

One of the most significant areas of exploration has been in the space of prompts. The importance of prompts first came into focus with image-generation models like MidJourney where it was simple to change the output image in predictable but powerful ways by adding style and resolution cues like “in the style of Salvador Dali”, or “ultra photorealistic 4K”, or “volumetric light”.

The prompt continues its starring role in the most recent pure language models like GPT3 and 4. In fact, a little digging reveals that the only difference between the raw GPT3/4 API and the consumer-facing ChatGPT app is that ChatGPT simply includes previous questions/responses when sending the current prompt to the API. It’s actually that simple.

With this in mind, let’s look again at what we have with fresh eyes. We have a tool that can seemingly “understand” what we say even when we say it in an unstructured way. But what does it mean to “understand”? That’s a tough nut to crack. Let’s fall short and assert that in practical terms “understanding” is present when a naturally expressed request to transform one thing into another is carried out satisfactorily.

A self-contained example in the context of ChatGPT can be made using the domain-specific language (DSL) Mermaid. Mermaid is a tiny programming-adjacent language that allows the author to concisely express visual diagrams in text. ChatGPT “understands” how to transform prompts into diagrams.

Prompt: Create a simple mermaid diagram showing a developer asking you to create a mermaid diagram, you responding with that mermaid diagram, the developer asking you to modify the mermaid diagram, and you doing so. Create a circular flow diagram.

ChatGPT: Here's a simple mermaid diagram showing a circular flow of a developer asking me to create a mermaid diagram, me responding with the diagram, the developer asking me to modify it, and me making the changes:

```mermaid
graph TD;
    A[Developer] --> B[ChatGPT];
    B --> C[Mermaid Diagram];
    C --> D[Developer];
    D --> E[Modification Request];
    E --> B;

In this diagram, the flow starts with the developer asking me, ChatGPT, to create a mermaid diagram (labeled "Mermaid Diagram" in the diagram). I create the diagram and present it to the developer. The developer then asks me to make a modification to the diagram (labeled "Modification Request" in the diagram). I make the changes and present the updated diagram to the developer again. The flow then loops back to the developer who can continue to request modifications until the diagram meets their needs.

We can visualize this diagram ChatGPT wrote by using a mermaid renderer, which are readily available as plugins for VS Code, Google Docs and many others, as well as being available at mermaid.live.

This is interesting. In truth, nothing different is happening here than when you ask ChatGPT to generate some python code for you. However, Mermaid is a lot simpler than a Turing Complete language like Python.

When I imagine how the underlying models were “trained to code in Python”, I imagine a training set that involves a substantial portion of openly available code repositories, stackoverflow questions and answers, and blogs. But for a little domain specific language (DSL) like Mermaid? I can easily imagine a concise expression of the syntax and available options with some extra bits that leverage all the vast training already performed generally.

Let’s combine what we’ve discussed so far:

ChatGPT is a simple layer that embeds the current prompt with previous dialog before sending to GPT.
If GPT knows a domain specific language (DSL) then you can translate unstructured text to it.

Keep this in mind as we move forward.

Embedded Prompting

Plenty of digital ink has already been spilled over how to write effective prompts and at the risk of repeating both them and myself, a prompt can make or break the response. At its most general, the prompt provides context - it works to communicate your expectations about the response. Often we think about using the prompt to condition the content of the response, but it’s just as effective in conditioning the structure in which that content is embedded.

Consider the following exchange with ChatGPT:

Prompt: "position the car three units to the left of the boat"

ChatGPT: To position the car three units to the left of the boat, you would first need to establish a reference point. Let's assume that the boat is the reference point.

Next, you would need to determine what units you are measuring in. For example, if you are measuring in meters, you would need to know how long one unit is. Let's assume that one unit is equal to one meter.

Finally, you would move the car to a point that is three units to the left of the boat. This means that you would move the car to a point that is three meters to the left of the boat, as measured from the boat's position.

This prompt is lacking in many respects but mostly in the context that would serve to make the response useful in any way. Consider instead:

Prompt: Given the following taxonomy and examples, transform the following prompt:

"position the car three units to the left of the boat"

Taxonomy:

{
  "verbs": [
    {
        "primary": "move",
        "synonyms": ["translate", "shift"]
    },
    {
        "primary": "scale",
        "synonyms": ["size up", "size down", "shrink", "blow up", "increase"]
    }
  ],
  "nouns": [
      { Name: "car object", Id: "23211221", position: [1, 0, 1] },
      { Name: "boat object", Id: "23943429", position: [-1, 0, -1] }    
    ]
}

Example Prompt1: "increase scale of the boat object by 15%"

Example Response1: { "verb": "scale", "noun": ["23943429"], "property": {"numberValue": 1.15 } }

Example Prompt2: "position the car one unit to the left of the boat"

Example Response2: { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-2, 0, -1] } }

ChatGPT: Transformed Prompt: "Move the car object three units to the left of the boat object"

Response: { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-4, 0, -1] } }

Explanation: Using the given taxonomy, we can identify the primary verb "move" and the two relevant nouns "car object" and "boat object". The prompt specifies the direction and distance, so we can use a vector value of [-3, 0, 0] to move the car object three units to the left of the boat object. However, since the initial position of the car object is [1, 0, 1] and the initial position of the boat object is [-1, 0, -1], we need to subtract 2 from the x-coordinate of the car object's position vector to get the final position vector of [-3, 0, -1]. Therefore, the response would be { "verb": "move", "noun": ["23211221"], "property": {"vectorValue": [-4, 0, -1] } }.

This is a relatively simple example, but it illustrates how we can skip the raw ML training phase by putting in a little work up front to condense our domain into a concise description. If we add to the concise description some canonical examples of the kinds of transformations on the input prompt we want performed we’ll find the responses to be significantly more accurate and applicable.

One thing to notice here is that we got ChatGPT to extract structured text in the form of a JSON object from our prompt, but it’s still embedding that JSON in the middle of free-form text. Experimenting with various prompts with the goal of getting ChatGPT to respond only with the JSON led to some interesting results. While it was possible to make GPT respond with only the JSON content, we found that the accuracy of the responses plunged. It would seem that when ChatGPT shows its work, ChatGPT produces better answers.

Maybe our math teachers were on to something.

Putting It Together

We’ve got all the basic ingredients we need to start plugging GPT into our own products. There are many ways, from the perspective of UX, that we could do so, but we’re not too concerned with that aspect of the effort here. For simplicity’s sake, let’s assume we’ve got a text box where the user can type in what they want to have happen i.e. their prompt.

What we want is for the user’s intent to be translated into the behavior of our application without further user involvement. Let’s put together everything we’ve learned so far.

First we need some basic functionality as a target for users’ intent. For demo purposes, let’s use a calculator app. It’s full-featured, so we can ‘add’, ‘subtract’, ‘multiply’, and ‘divide’. Here are our function signatures:

def add(arr):
    return sum(arr)

def subtract(arr):
    return arr[0] - sum(arr[1:])

def multiply(arr):
    return reduce(lambda x, y: x * y, arr)

def divide(arr):
    return reduce(lambda x, y: x / y, arr)

As above, we want a prompt template in which to embed the users’ prompts. There are many ways to attempt this and some evolution of the prompt is expected from trial and error. Here’s an example that works well enough for our purposes:

Given the following python API and example transformations, transform the prompt and provide only the output array of operations and a brief explanation:

“{{{USER_PROMPT}}}”

Python API:

def add(arr):

def subtract(arr):

def multiply(arr):

def divide(arr):

Example 1

Prompt: “add 2 and 2”

Output: [{ “operation”:”add”, “values”: [2, 2] }]

Example 2

Prompt: “add 3 to the product of 3 and 6”

Output: [{ “operation”:”multiply”, “values”: [3, 6], “name”: “step0” }, {“operation”:”add”, “values”:[step0, “3”]}]

An example response passing an embedded prompt of "Subtract 3 from the product of 2 and 5"

ChatGPT: Output:

[{ "operation":"multiply", "values": [2, 5], "name": "step0" }, 
{"operation":"subtract", "values":["step0", 3]}]

Explanation:

We need to perform the operation "Subtract 3 from the product of 2 and 5".

First, we need to find the product of 2 and 5, which we can do using the `multiply` function. We pass in the values 2 and 5 to the `multiply` function, and assign the result to a variable called "step0".

Next, we need to subtract 3 from the result of the previous step. We can do this using the `subtract` function. We pass in the variable "step0" and the value 3 to the `subtract` function, which gives us the final result.

Therefore, the output array of operations is:

[{ "operation":"multiply", "values": [2, 5], "name": "step0" }, 
{"operation":"subtract", "values":["step0", 3]}]

Another example response from passing an embedded prompt of “square 5”

ChatGPT: Output:

[{ "operation":"multiply", "values": [5, 5] }]

Explanation: The prompt "square 5" is asking for the square of 5, which is equivalent to multiplying 5 by itself. Therefore, the output array of operations would simply be a single operation that multiplies 5 by 5 using the 'multiply' function from the provided Python API.

Now we just need to write some code that steps over the extracted operations and supplies the values. Some care needs to be taken in this example to substitute the values at a particular depth for the associated placeholder string in subsequent operation descriptions such as step0, step1, etc.

While code like that can be fun to write, it’s typically less exciting to read and since it’s not critical to understanding my point, I will spare you the inconvenience. The essential ideas here are:

Provide an API description or DSL summary in a prompt embedding.
Provide an output transformation schema that you know how to walk.
Provide example prompt transformations into outputs.
Provide the actual prompt.
Use regex to extract the structured JSON embedded in the response.
Validate and iterate over that JSON to orchestrate and supply inputs to application code.

In actual practice, one has to refine and home in on an embedded prompt that works for their context, including identifying the best set of example transformations to include within the prompt. Practically speaking, some guidance to users to try to keep their language simple, in these early days, would go a long way, as well.

As a final piece of advice from my personal refinement journey, consider only using these types of flow when your desired behavior can be corrected/undone or has low-impact. Allowing GPT to identify operations that can’t be undone seems problemattic at this early stage in the technology. Also don’t hook it up to your missile launcher.

Next Steps

We now know how to embed a first-order prompt inside a higher-order prompt to make the response structured according to our own idiosyncratic context. We have the broadest outline of the basic flow. There are ways to improve and embellish this. Here are some examples to get you started:

You can mix in whisper or other voice to text solutions to take the keyboard out of the equation.
You can update your embedding to allow for ChatGPT to ask for clarifications, and when it does, follow different control flow paths in your own code.
You can challenge ChatGPT responses in patterned ways to elicit corrections.
You can have ChatGPT itself summarize the salient bits within a longer exchange so as to provide greater context to future prompts.
You can make multiple requests to put a request through a series of transformations or categorizations as if you’re using GPT to transition through a state-machine.
You can allow the user to see and potentially cancel a complex request flow.

In joy, fellow tinkerers!

Jerome Meyers

Lead Engineer on the Cloud Hosted Services Team at Magnopus

Unlocking structured chatbot responses

Groundwork

Embedded Prompting

Putting It Together

Next Steps

Solomon Rogers speaks to Jennifer Hicks about how Magnopus is leveraging new technology to streamline filmmaking.

Meet The Magnopians: Ceri Llewellyn