Skip to main content
Take your agents to a new level of capability by allowing them to process files. The File Upload User Input is the gateway to creating agents that can read documents, transcribe audio, analyze videos, and much more.
This intermediate-level tutorial assumes you’re already familiar with basic agent creation and will focus on the powerful combination of file inputs with Advanced Steps.

The Key Point: The Connection Between File Upload + Advanced Step

Unlike a text input, which can be used directly in the prompt (or in the step), “File Upload” needs to connect to an Advanced Step. The workflow is a logical two-step sequence:
  • The user uploads a file (through the User Input).
  • An Advanced Step (such as “Audio Transcription” or “PDF Text Extraction”) processes that file and generates a result (text, for example).
  • The result of the Advanced Step is then used by the AI in the main prompt to generate the final response.
Image
Our Example Project: The Media Translator Agent To illustrate this powerful combination, we’ll build an agent that works as a translator. It will be able to receive an audio or video file, transcribe the content, and translate it to Portuguese, or another language.

Initial Agent Setup

In AI Studio, start by creating a new Chat or Text Agent. The default “All LLM” selection is perfectly suitable for this example.
Image

Configuring the Input and the Advanced Step

This is the most important stage. We’ll configure the two parts that will work together.In “User Inputs”, add a new “File Upload” input with the variable: arquivo original. For the label, use: “Send your audio or video file”
Image
In “AI Steps”, search for the AI-Audio Transcription step, select the desired AI Model, and name the step transcribed text.In the file field, choose the arquivo-original variable to make it dynamic. With that, you created a flow where the user uploads the file and it is processed and transcribed by the step. Now it’s time to use the step result in the agent prompt!
Image

Developing the Main Prompt

Now, we’ll tell the AI what to do with the text extracted by the Advanced Step. In the prompt field, we have:
Assume the persona of a Tess AI expert in transcription and content localization. Your mission is to process the text extracted from a media file and deliver a clear, professional result in two parts. Part 1: Faithful Transcription Create a section titled ”## Original Transcription”.
In this section, present the exact text from the audio. The goal is maximum fidelity:
- Keep the original structure and punctuation.
- If a segment of the audio is unintelligible or uncertain, use the [inaudible] tag in the corresponding spot.
- Do not add, omit, or correct words.
Part 2: Natural Translation Below the transcription, create a second section titled ”## Translation to Portuguese (BR)”.
In this section, translate the text into Brazilian Portuguese. The focus here is naturalness and fluency:
- Avoid literal translations that sound robotic.
- Adapt the meaning and intent of the message to the target language, keeping the original tone (whether formal, casual, technical, etc.).
The final result must contain only these two sections, clearly separated by the titles. Do not include any introduction, commentary, or additional conclusion. This will be done based on the following content: transcribed-text
Image
IMPORTANT

Note that the prompt uses the variable that is the result of the Advanced Step, not the initial upload variable. This connection is what makes the entire flow work with files!

Saving and Testing

Click “Save” and then “Preview”. You’ll see an interface with a file upload button. Upload a short audio or video (200mb file size limit) in another language and let the agent handle the rest!
Captura De Tela 2026 02 13 Às 19 51 14
Mastering the connection between a “File Upload” input and an Advanced Step is the key to building agents that interact with the world beyond text. The translator example is just one of infinite possibilities. You can use the same principle to create agents that read PDFs, analyze reports, and much more, automating complex tasks intelligently.