> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tess.im/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Step | Extract Text from Docx

The Extract Text from DOCX step isolates and extracts textual content from Microsoft Word (.docx) files, delivering a clean block of text ready to be processed by AI agents. With it, complex documents become accessible data without the need for specific software or manual intervention.

### What is it

This step belongs to the Document Processing group — a category dedicated to transforming file formats into content usable by AI.

In practice, Extract Text from DOCX:

* Reads the internal structure of the .docx file
* Extracts text from paragraphs, tables, lists, headers, and footers
* Discards visual elements (images, charts, formatting)
* Delivers a block of plain text in the agent’s context

### Where to find it

1. Go to AI Studio
2. Click on Add AI Step
3. In Select Step Category, choose Document Processing
4. Select Extract Text from DOCX

<Frame>
  <img src="https://mintcdn.com/tess-dfe1edf0/8b4GWdZxIeR2fmow/images/image-197.png?fit=max&auto=format&n=8b4GWdZxIeR2fmow&q=85&s=ce951ec22d0c2de8340e18859db2a144" alt="Image" width="472" height="477" data-path="images/image-197.png" />
</Frame>

### How to use?

Configuration fields:

| Field     | Required | Description                                                                                                  |
| :-------- | :------- | :----------------------------------------------------------------------------------------------------------- |
| Step Name | Yes      | Internal step name. Use only alphanumeric characters. Used to reference the result in other steps or prompts |
| File URL  | Yes      | Direct public URL of the .docx file or a user file input variable (e.g.: `{{docxfile}}`)                     |

### About the Output

The generated result is a continuous block of plain text containing all content extracted from the document.

<Columns cols={2}>
  <Column>
    <Card title="What is extracted:">
      * Paragraphs
      * List items
      * Table data (linearized)
      * Headers and footers
    </Card>
  </Column>

  <Column>
    <Card title="What is NOT extracted:">
      * Images and photos
      * Charts and elements
      * Visual formatting (colors, bold, italics, fonts)
    </Card>
  </Column>
</Columns>

<Warning>
  Important:

  Tables are read in a linear format, following the order of the cells. A well-structured prompt helps the agent correctly interpret tabular data extracted this way.
</Warning>

### Deeper explanation

The step works as a document decoding layer.

<Card title="Flow">
  .docx file (URL or variable) → Step extracts plain text

  ↓

  Content enters the context → Agent uses it to analyze, summarize, or extract data
</Card>

The output should be treated as raw data injected into the prompt. The quality of the analysis depends directly on:

* Organization of the original document
* Clarity of the prompt that uses the result

***

### Practical examples

<AccordionGroup>
  <Accordion title="Analysis of contracts and commercial proposals">
    Prompt:\
    "Analyze the extracted contract. Identify risk clauses, summarize payment terms, and extract client data."

    Usage:

    * Legal contracts or commercial proposals in .docx
    * Agent identifies critical points without manual reading
  </Accordion>

  <Accordion title="Automated resume screening">
    Prompt:\
    "Extract the candidate's skills, experience, and education. Compare with the job requirements below and evaluate the fit."

    Usage:

    * CVs submitted in .docx
    * Agent classifies and summarizes profiles automatically
  </Accordion>

  <Accordion title="Internal report summarization">
    Prompt:\
    "Summarize the main points of this report in up to 5 executive bullet points."

    Usage:

    * Monthly reports, meeting notes, or management documents
  </Accordion>

  <Accordion title="Structured data extraction">
    Prompt:\
    "Extract from the document: company name, tax ID, total value, delivery deadline, and technical lead."

    Usage:

    * Standardized documents with fixed fields
    * Feed CRM or spreadsheets automatically
  </Accordion>
</AccordionGroup>

<Tip>
  **Best practices**

  * Prefer well-structured documents: clear headings, paragraphs, and organized tables improve extraction accuracy
  * Reference the step in the prompt: use the Step Name to indicate where the data comes from. Example: *"Based on the data from step *`extracao_contrato`*..."*
  * Guide the agent about tables: mention in the prompt that tables may appear linearized so the model interprets them correctly
  * Combine with other steps: e.g., Extract Text → analysis → Google Drive (save result)
  * Avoid very long documents: files with many pages may exceed the agent’s context window
</Tip>

### Important notes

* The step runs before user interaction
* The file URL must be public and accessible
* Visual elements are completely ignored during extraction
* The output is raw text, without visual formatting

Extract Text from DOCX removes the barrier between Word documents and artificial intelligence. With it, contracts, resumes, reports, and manuals become processable data in seconds, enabling analysis, summarization, and automated extraction without any manual intervention.
