Skip to main content
The Extract Text from DOCX step isolates and extracts textual content from Microsoft Word (.docx) files, delivering a clean block of text ready to be processed by AI agents. With it, complex documents become accessible data without the need for specific software or manual intervention.

What is it

This step belongs to the Document Processing group — a category dedicated to transforming file formats into content usable by AI. In practice, Extract Text from DOCX:
  • Reads the internal structure of the .docx file
  • Extracts text from paragraphs, tables, lists, headers, and footers
  • Discards visual elements (images, charts, formatting)
  • Delivers a block of plain text in the agent’s context

Where to find it

  1. Go to AI Studio
  2. Click on Add AI Step
  3. In Select Step Category, choose Document Processing
  4. Select Extract Text from DOCX
Image

How to use?

Configuration fields:
FieldRequiredDescription
Step NameYesInternal step name. Use only alphanumeric characters. Used to reference the result in other steps or prompts
File URLYesDirect public URL of the .docx file or a user file input variable (e.g.: {{docxfile}})

About the Output

The generated result is a continuous block of plain text containing all content extracted from the document.

What is extracted:

  • Paragraphs
  • List items
  • Table data (linearized)
  • Headers and footers

What is NOT extracted:

  • Images and photos
  • Charts and elements
  • Visual formatting (colors, bold, italics, fonts)
Important:Tables are read in a linear format, following the order of the cells. A well-structured prompt helps the agent correctly interpret tabular data extracted this way.

Deeper explanation

The step works as a document decoding layer.

Flow

.docx file (URL or variable) → Step extracts plain textContent enters the context → Agent uses it to analyze, summarize, or extract data
The output should be treated as raw data injected into the prompt. The quality of the analysis depends directly on:
  • Organization of the original document
  • Clarity of the prompt that uses the result

Practical examples

Prompt:
“Analyze the extracted contract. Identify risk clauses, summarize payment terms, and extract client data.”
Usage:
  • Legal contracts or commercial proposals in .docx
  • Agent identifies critical points without manual reading
Prompt:
“Extract the candidate’s skills, experience, and education. Compare with the job requirements below and evaluate the fit.”
Usage:
  • CVs submitted in .docx
  • Agent classifies and summarizes profiles automatically
Prompt:
“Summarize the main points of this report in up to 5 executive bullet points.”
Usage:
  • Monthly reports, meeting notes, or management documents
Prompt:
“Extract from the document: company name, tax ID, total value, delivery deadline, and technical lead.”
Usage:
  • Standardized documents with fixed fields
  • Feed CRM or spreadsheets automatically
Best practices
  • Prefer well-structured documents: clear headings, paragraphs, and organized tables improve extraction accuracy
  • Reference the step in the prompt: use the Step Name to indicate where the data comes from. Example: “Based on the data from step extracao_contrato…”
  • Guide the agent about tables: mention in the prompt that tables may appear linearized so the model interprets them correctly
  • Combine with other steps: e.g., Extract Text → analysis → Google Drive (save result)
  • Avoid very long documents: files with many pages may exceed the agent’s context window

Important notes

  • The step runs before user interaction
  • The file URL must be public and accessible
  • Visual elements are completely ignored during extraction
  • The output is raw text, without visual formatting
Extract Text from DOCX removes the barrier between Word documents and artificial intelligence. With it, contracts, resumes, reports, and manuals become processable data in seconds, enabling analysis, summarization, and automated extraction without any manual intervention.