Skip to main content

Example: Document Analyzer

In this guide, we'll create a pipeline that analyzes a document and extracts the main topics.

Pipeline Configuration

First of all, we need to create a new pipeline. Since this pipeline will be used to analyze documents, we need to edit the input format to allow files upload.

To do this, click on the 'Edit Pipeline' button on the top right corner of the screen and in the 'Input Format' field, select the 'File' option and 'Save'.

Document Analyzer Step 1

Document Converting Layer

Afterward, we need to add our first layer to the pipeline. This layer will be used to convert the file to an image, so that it will be easier to elaborate for our AI assistants.

Click on the 'Add Layer' button at the upper center of the screen and give it a name of your choice (e.g. 'Document Converting Layer').

PDF Reader Actor

Then, we need to add an actor that will convert the file to an image. Click on the 'Add Actor' button at the bottom right corner of the layer and select the 'PDF Reader' actor.

In the 'General Information' tab, give it a name of your choice (e.g. 'Document Converter') and select the 'File' option in the 'Input Format' field.

Document Analyzer Step 2 Document Analyzer Step 3

In the 'Specifications' tab, we can configure the PDF Reader actor custom parameters:

  • Start Page: The number of the page to start reading from.
  • End Page: The number of the page to finish reading.
  • Max Pages: The maximum number of pages to read.
  • Output Format: The format of the output image:
    • Text: A text file with the content of the document.
    • Image: An image of the document.
    • Text and Image: A text file with the content of the document and an image of the document.
    • Screenshot per page: A screenshot of each page of the document.
Document Analyzer Step 4

For this example, we'll select the 'Screenshot per page' option.
Afterward, click on the 'Next' button to go to the 'Summary' tab and click on the 'Next' button again to save the actor.

If you have done everything correctly, you should see the following screen:

Document Analyzer Step 5

Description Creator Layer

Let's create the second layer of our pipeline. This layer will be used to generate various descriptions of the document. In this example we'll generate three descriptions using three different LLMs to get different perspectives, you can use more or less LLMs depending on your needs.

Descriptor Actors

Let's add an actor to the layer. Click on the 'Add Actor' button at the bottom right corner of the layer and select the 'LLM' actor.

In the 'General Information' tab, give it a name of your choice (e.g. "Descriptor <LLM Name>") and in 'Input Format' select the 'File' option, since the previous layer will output a file.

In the 'Specifications' tab, we can configure the LLM actor behavior. Selects Custom Prompt and let's use a simple prompt to tell the actor to describe the document:

Describe the image in the context as best as you can. The description must be exhaustive and professional. The image is a business document.
Document Analyzer Step 6 Document Analyzer Step 7

Lastly choose the LLM you want to use (e.g. 'GPT 4o Mini') and click on the 'Next' button to go to the 'Summary' tab and click on the 'Next' button again to save the actor.

Since we want multiple descriptions, we can add other actors with the same configuration but with different LLMs. To do this simply 'Duplicate' the actor how many times you want and change the LLM to the one you want to use.

To duplicate an actor, hover over the actor and click on the 'Duplicate' button on the top right corner of the actor.

Document Analyzer Step 8 Document Analyzer Step 9

If you have done everything correctly, you should see the following screen:

Document Analyzer Step 10

Description Finalization Layer

Let's add a last layer, where we'll elaborate all the descriptions into a single, more structured description.

Description Finalizer Actor

Add a new LLM actor to the layer, give it a name of your choice (e.g. 'Description Finalizer') and in the 'Specifications' tab, select the Custom Prompt option.

Prompt

In the 'Prompt' field let's give the actor directions on what data to use, let's use the following prompt:

Analyze the documents descriptions received.

<first_description> {{ACTOR-descriptor-gpt}} </first_description>
<second_description> {{ACTOR-descriptor-claude}} </second_description>
<third_description> {{ACTOR-descriptor-gemini}} </third_description>

These are multiple descriptions of the same document, analyze them and make a unique, exhaustive and professional description.
The reason there are multiple description is so that you can extract the best possible document description from multiple interpretations.

Actor Reference via ID

Make sure to change the various ids 'ACTOR-descriptor-model' to the ids of your descriptor actors.

To easily find the ids of your actors, you can click on the 'See all' text in the placeholder guide section.

Document Analyzer Step 11

Then click on the actor you want the id of. After that the id will appear in the placeholder guide, and you can press the copy button to copy it.

Document Analyzer Step 12

System Prompt

In the 'System Prompt' field we'll give the actor more information about how to structure the description it will output:

<instructions>
<identity>
You are a professional assistant that specializes in analyzing documents and in writing reports about them. Your reports are always professional, exhaustive and precise.
</identity>
<role>
You will receive multiple reviews of the same document. Your job is to analyze all the reviews and craft the best possible report about that document. The reason there are multiple reviews is so that you can create the best possible report from multiple review interpretations.
</role>
<report_structure>
The report **MUST** have the following structure:
{
"title": "Here goes the title of the document",
"topics": ["topic 1", "topic 2"],
"language": "The language of the document",
"description": "Here goes the description of the document"
}
<report_structure>
<report_output_example>
{
"title": "Graduates of X University 2025!",
"topics": ["school", "education"],
"language": "EN",
"description": "Here is the list of graduates of the year 2025 from X University..."
}
</report_output_example>
</instructions>

Afterward, select the desired LLM and click on the 'Next' button to go to the 'Summary' tab, click on the 'Next' button again to save the actor.

Document Analyzer Step 13

Testing the Pipeline

Now the only thing left to do is to test the pipeline. Click on the 'Play' button at the top right corner of the page, upload a file and click on the 'Test' button.

Document Analyzer Step 14 Document Analyzer Step 15