AI Step | Extract Text from Docx

The Extract Text from DOCX step isolates and extracts textual content from Microsoft Word (.docx) files, delivering a clean block of text ready to be processed by AI agents. With it, complex documents become accessible data without the need for specific software or manual intervention.

What is it

This step belongs to the Document Processing group — a category dedicated to transforming file formats into content usable by AI. In practice, Extract Text from DOCX:

Reads the internal structure of the .docx file
Extracts text from paragraphs, tables, lists, headers, and footers
Discards visual elements (images, charts, formatting)
Delivers a block of plain text in the agent’s context

Where to find it

Go to AI Studio
Click on Add AI Step
In Select Step Category, choose Document Processing
Select Extract Text from DOCX

How to use?

Configuration fields:

Field	Required	Description
Step Name	Yes	Internal step name. Use only alphanumeric characters. Used to reference the result in other steps or prompts
File URL	Yes	Direct public URL of the .docx file or a user file input variable (e.g.: `{{docxfile}}`)

About the Output

The generated result is a continuous block of plain text containing all content extracted from the document.

What is extracted:

Paragraphs
List items
Table data (linearized)
Headers and footers

What is NOT extracted:

Images and photos
Charts and elements
Visual formatting (colors, bold, italics, fonts)

Important:Tables are read in a linear format, following the order of the cells. A well-structured prompt helps the agent correctly interpret tabular data extracted this way.

Deeper explanation

The step works as a document decoding layer.

Flow

.docx file (URL or variable) → Step extracts plain text↓Content enters the context → Agent uses it to analyze, summarize, or extract data

The output should be treated as raw data injected into the prompt. The quality of the analysis depends directly on:

Organization of the original document
Clarity of the prompt that uses the result

Practical examples

Analysis of contracts and commercial proposals

Prompt:
“Analyze the extracted contract. Identify risk clauses, summarize payment terms, and extract client data.”Usage:

Legal contracts or commercial proposals in .docx
Agent identifies critical points without manual reading

Automated resume screening

Prompt:
“Extract the candidate’s skills, experience, and education. Compare with the job requirements below and evaluate the fit.”Usage:

CVs submitted in .docx
Agent classifies and summarizes profiles automatically

Internal report summarization

Prompt:
“Summarize the main points of this report in up to 5 executive bullet points.”Usage:

Monthly reports, meeting notes, or management documents

Structured data extraction

Prompt:
“Extract from the document: company name, tax ID, total value, delivery deadline, and technical lead.”Usage:

Standardized documents with fixed fields
Feed CRM or spreadsheets automatically

Best practices

Prefer well-structured documents: clear headings, paragraphs, and organized tables improve extraction accuracy
Reference the step in the prompt: use the Step Name to indicate where the data comes from. Example: “Based on the data from step extracao_contrato…”
Guide the agent about tables: mention in the prompt that tables may appear linearized so the model interprets them correctly
Combine with other steps: e.g., Extract Text → analysis → Google Drive (save result)
Avoid very long documents: files with many pages may exceed the agent’s context window

Important notes

The step runs before user interaction
The file URL must be public and accessible
Visual elements are completely ignored during extraction
The output is raw text, without visual formatting

Extract Text from DOCX removes the barrier between Word documents and artificial intelligence. With it, contracts, resumes, reports, and manuals become processable data in seconds, enabling analysis, summarization, and automated extraction without any manual intervention.

Documentation Index

​What is it

​Where to find it

​How to use?

​About the Output

What is extracted:

What is NOT extracted:

​Deeper explanation

Flow

​Practical examples

​Important notes

What is it

Where to find it

How to use?

About the Output

Deeper explanation

Practical examples

Important notes