Takeaways from using Artificial Intelligence APIs
These are some insights I have taken as I have been using Gemini and GPT developer APIs.
It has been awhile since the consumer use cases of Artificial Intelligence have been in the spotlight of news, media, and popular culture. Many of AI’s distributors over the last year have ironed out and opened up their technology for external use to developers like me. For whatever reason, it was not something I immediately thought I needed to use. Despite this, the cultural interest in the technology drove my clients to encourage me to look into AI and incorporate it into the experiences I make for them. So, allow me to reveal some of the ways I use it, specifically Large Language Models, LLMs, like Google’s Gemini and OpenAI’s GPT-4o. I will also share the implications it has imposed on my own creative process.
Before AI: a numbers game
Typically in the type of design and artistic experiences I create, there are functions that require a inputs. It could be weather data, sound input from the microphone, or a dataset from a government institution. Regardless of how this input is selected, it is assigned by a number at the time of use. For instance, the location of weather data is determined by two numbers, the latitude and longitude. The amplitude of the microphone is a number that is modified to create a visual connection between a circle and sound. A dataset has many columns and rows. Numbers define which column or row to display that data. It is why I like to think the French word digital is numérique. AI changes this.
With AI and specifically LLMs, I am not bound to numbers but to text. I send text to the model and it responds to me with different text. If you use AI models in your daily life, you likely write human legible text and in turn receive human legible text. This is an effective and relatable way for humans to search. While LLMs attempt to predict your response as text, they actually do not know (or care for that matter) the language of that text. So, you can send anything that can be converted into text. For example an image can be encoded into text through a process called Base64 encoding1. If I send that to the model, it will respond back with its own image encoded as a series of text characters. So, what used to be strict definitions of numbers now is a more vague, but broader means to select and generate data through text.
How do I use AI?
Speaking strictly to the artistic process, I use AI as a replacement for something undefined or ill-defined. Here is an example. In my piece, Nostalgia For A Past Future, from 2009 the center shape is created by thousands of dots drawn to the screen every second to resemble movement. Based on the mouse’s position that can look like rivers flowing or electrons jittering. This is created via a noise function, a kind of random value. Personally, I do not know nor care what the precise angle is for each element drawn on the page in order to create movement. I care what the movement evokes. So, I have offloaded that knowledge to a noise function. Today, I could offload this ill-defined value to an AI model. I can ask the AI model, “For this point N, can you give me the angle and velocity it should move in?” If you ask ChatGPT this, it will give you a long answer. But, if you wrap your question with additional text and feed that to OpenAI’s API, you can restrict it’s output to only give you what you need. A couple of numbers to use to draw the next point in the list. Interestingly, I can further elaborate my question with style; like, “imagine this point moves like a river.” The effect is roughly the same, but the process is totally different.
I am still finding ways to apply this line of reasoning in my creative process. I will use it to fill in copy for a website I am building, or to generate an image that supports the main element I am creating. When working, my creative process gets held up in the question: how should I make this? AI has become a quick-and-dirty way to get a rough response and continue working.
The pros and cons of AI
As previously mentioned, I can use these AI APIs to continue working through the bigger picture of an idea while not getting bogged down by the “how to” of a specific detail. This can be beneficial while on a time crunch. But, it also absolves myself of thinking critically. Sometimes, the critical thinking is the key part to realizing the work. I worry that my laziness could get in the way of discovering a true gem. Then again, it can save me and has saved me a lot of time. I am conflicted.
To the credit of the LLM makers, it is extremely easy to drop in a prompt to almost every step in the creative process of building software. This allows for abundant opportunity to explore and test the efficacy of an AI in ways I could never use a noise function. The potential of this is incredibly compelling to me. However, in practice, LLMs are so complicated and layered that it is often difficult to know why or how it got to the answer it did. As someone who relies on critical thinking to make my work, this uncertainty diminishes its potential impact. Again, I am conflicted.
Lastly, I need to take some data science class to learn how to properly tune these models. I say this, because they frequently hallucinate. When working with an AI, I imagine they are an actor in a film and I am the director. I will pre-prompt them to be a certain kind of character. And they always break character.
What I can show
These findings may be interesting, but what can I show you today? Unfortunately, not a lot. Most of the work I have done is not in a state that can be shared. But, what I can say is that most of the work you will see from me in the future will have AI woven into it. It will not be the focus, nor will it be overt. But, it will be there, helping me flesh out an idea. Defining something I am trying to communicate incrementally more clear.
Until then, here is a prototype I made of an anime chatbot you can play with. It uses the same technology that powers ChatGPT and presents it in a game-like scenario. This is for a client (scrubbed of identifying and proprietary information) and weaves in some interesting animation technology from Japan with the Web Audio API to present a generic non-player character (NPC) you might find in a game, but totally unscripted.
How are you using AI? Do you use it to support the creation of art? Or is it itself the art? Curious to watch where this all goes,
—Jono
Definition of Base64. Wikipedia.
Good for you. Having a open mind so you can use all the tools avaliable makes for a better product and a less cumbersome way of getting their.