2️⃣Agent

Data Agents는 '읽기' 및 '쓰기' 기능 모두에서 데이터에 대한 다양한 작업을 지능적으로 수행할 수 있는 LlamaIndex의 LLM 기반 knowledge Worker. 데이터 에이전트는 다음을 수행할 수 있습니다:

비정형, 반정형, 정형 등 다양한 유형의 데이터에 대해 자동화된 검색 및 검색을 수행합니다.
구조화된 방식으로 외부 서비스 API를 호출하고 응답을 처리한 후 나중에 사용할 수 있도록 저장합니다.

그런 의미에서 에이전트는 정적 데이터 소스에서 'read'뿐만 아니라 다양한 도구에서 데이터를 동적으로 수집하고 수정할 수 있다는 점에서 쿼리 엔진을 한 단계 뛰어넘는 것입니다.

OpenAI Agent
OpenAI Assistant Agent
ReAct Agent
Function Calling Agents
Additional Agents
Custom Agents
Lower-Level Agent API

대표적인 Agents인 OpenAIAgent와 ReActAgent의 예시를 들어 보겠습니다.

OpenAIAgent

OpenAIAgen는 OpenAI(함수 호출) 에이전트입니다. OpenAI 함수 API를 사용하여 도구 사용 여부를 추론하고 사용자에게 응답을 반환합니다. 도구의 플랫 목록과 도구에 대한 검색을 모두 지원합니다.

Step 1: Install and Setup

#%pip install -q llama_index pypdf

import logging, sys, os
import nest_asyncio
from dotenv import load_dotenv  

nest_asyncio.apply()

!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

미국 정부의 2020, 2021, 2022년 financial reports 저장

!mkdir reports
!wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2020/executive-summary-2020.pdf -O ./reports/2020-executive-summary.pdf
!wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2021/executive-summary-2021.pdf -O ./reports/2021-executive-summary.pdf
!wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2022/executive-summary-2022.pdf -O ./reports/2022-executive-summary.pdf

mkdir: cannot create directory ‘reports’: File exists
--2024-04-20 00:10:23--  https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2020/executive-summary-2020.pdf
Resolving www.fiscal.treasury.gov (www.fiscal.treasury.gov)... 166.123.218.167, 2610:108:4100:100c::8:118
Connecting to www.fiscal.treasury.gov (www.fiscal.treasury.gov)|166.123.218.167|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2323072 (2.2M) [application/pdf]
Saving to: ‘./reports/2020-executive-summary.pdf’

./reports/2020-exec 100%[===================>]   2.21M  91.1KB/s    in 31s     

2024-04-20 00:10:55 (73.5 KB/s) - ‘./reports/2020-executive-summary.pdf’ saved [2323072/2323072]

--2024-04-20 00:10:56--  https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2021/executive-summary-2021.pdf
Resolving www.fiscal.treasury.gov (www.fiscal.treasury.gov)... 166.123.218.167, 2610:108:4100:100c::8:118
Connecting to www.fiscal.treasury.gov (www.fiscal.treasury.gov)|166.123.218.167|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1001902 (978K) [application/pdf]
Saving to: ‘./reports/2021-executive-summary.pdf’

./reports/2021-exec 100%[===================>] 978.42K  77.9KB/s    in 16s     

2024-04-20 00:11:12 (63.0 KB/s) - ‘./reports/2021-executive-summary.pdf’ saved [1001902/1001902]

--2024-04-20 00:11:12--  https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2022/executive-summary-2022.pdf
Resolving www.fiscal.treasury.gov (www.fiscal.treasury.gov)... 166.123.218.167, 2610:108:4100:100c::8:118
Connecting to www.fiscal.treasury.gov (www.fiscal.treasury.gov)|166.123.218.167|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1042072 (1018K) [application/pdf]
Saving to: ‘./reports/2022-executive-summary.pdf’

./reports/2022-exec 100%[===================>]   1018K  53.0KB/s    in 19s     

2024-04-20 00:11:33 (52.5 KB/s) - ‘./reports/2022-executive-summary.pdf’ saved [1042072/1042072]

Step 2: Load data & define OpenAIAgent

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.agent.openai import OpenAIAgent
import os

query_engine_tools = []

for filename in os.listdir("reports"):
    if filename.endswith(".pdf"):
        file_path = os.path.join("reports", filename)

        with open(file_path, "r") as file:
            documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
            print(f"Loaded {len(documents)} documents from {filename}")
            print(filename[:-4])

            index = VectorStoreIndex.from_documents(documents)
            query_engine = index.as_query_engine(similarity_top_k=5)
            query_engine_tool = QueryEngineTool.from_defaults(
                query_engine=query_engine,
                name=f"{filename[:-4]}",  # Construct name without extension
                description=f"미국 정부 재정 보고서에 대한 정보를 제공합니다. {filename[:-4]}",
            )
            query_engine_tools.append(query_engine_tool)

agent = OpenAIAgent.from_tools(query_engine_tools, verbose=True)

Loaded 11 documents from executive_summary_2020.pdf
executive_summary_2020
Loaded 10 documents from 2022-executive-summary.pdf
2022-executive-summary
Loaded 11 documents from 2020-executive-summary.pdf
2020-executive-summary
Loaded 11 documents from executive-summary-2021.pdf
executive-summary-2021
Loaded 11 documents from executive_summary_2021.pdf
executive_summary_2021
Loaded 10 documents from executive-summary-2022.pdf
executive-summary-2022
Loaded 11 documents from 2021-executive-summary.pdf
2021-executive-summary
Loaded 10 documents from executive_summary_2022.pdf
executive_summary_2022

Step 3: Execute Queries

from IPython.display import Markdown

response = agent.chat("3년 동안의 정부의 순 운영 비용 총액을 비교하고 대조하여 어느 해의 비용이 가장 높았는지 알려줄래?")
display(Markdown(f"<b>{response}</b>"))

Added user message to memory: 3년 동안의 정부의 순 운영 비용 총액을 비교하고 대조하여 어느 해의 비용이 가장 높았는지 알려줄래?
=== Calling Function ===
Calling function: 2020-executive-summary with args: {"input": "total_operating_costs"}
Got output: $3.8 trillion
========================

=== Calling Function ===
Calling function: 2021-executive-summary with args: {"input": "total_operating_costs"}
Got output: $7.4 trillion
========================

=== Calling Function ===
Calling function: 2022-executive-summary with args: {"input": "total_operating_costs"}
Got output: $9.1 trillion
========================

3년 동안의 정부의 순 운영 비용 총액을 비교하면 다음과 같습니다:

2020년: $3.8 trillion
2021년: $7.4 trillion
2022년: $9.1 trillion

가장 높은 운영 비용을 가진 해는 2022년입니다.

ReActAgent

ReAct는 추론과 행동의 줄임말로, ReAct 논문에서 처음 소개되었습니다: ReAct: Synergizing Reasoning and Acting in Language Models.

LlamaIndex에서 소개한 ReAct Agents는 데이터에 대한 쿼리 엔진 위에 구축된 에이전트 기반 채팅 모드입니다. ReAct Agents는 LlamaIndex의 주요 채팅 엔진 중 하나입니다. 각 채팅 상호 작용에 대해 에이전트는 추론 및 행동 루프를 시작합니다:

먼저, 쿼리 엔진 도구를 사용할지 여부와 적절한 입력을 도출하기 위해 사용할 쿼리 엔진 도구를 결정합니다.
쿼리 엔진 툴로 쿼리하고 그 출력을 관찰합니다.
출력에 따라 프로세스를 반복할지 아니면 최종 응답을 제공할지 결정합니다.

Step 1: Install and Setup

#%pip install -q llama_index pypdf

import logging, sys, os
import nest_asyncio
from dotenv import load_dotenv  

nest_asyncio.apply()

!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> .env
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

파일 이름에 대시가 아닌 밑줄이 포함된 파일로 저장해야 하며, 그렇지 않으면 ReAct 상담원 채팅 완료가 작동하지 않습니다.

Step 2: Load data & define ReActAgent

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
import os

llm = OpenAI(model="gpt-3.5-turbo-0613")

query_engine_tools = []

for filename in os.listdir("reports"):
    if filename.endswith(".pdf"):
        file_path = os.path.join("reports", filename)

        with open(file_path, "r") as file:
            documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
            print(f"Loaded {len(documents)} documents from {filename}")
            print(filename[:-4]) # print name without extension

            index = VectorStoreIndex.from_documents(documents)
            query_engine = index.as_query_engine(similarity_top_k=5)
            query_engine_tool = QueryEngineTool(
                query_engine=query_engine,
                metadata=ToolMetadata(
                    name=f"{filename[:-4]}",  # Construct name without extension
                    description=(
                        f"미국 정부 재정 보고서에 대한 정보를 제공합니다. {filename[:-4]}"
                    ),
                ),
            )
            query_engine_tools.append(query_engine_tool)

react_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

/home/kubwa/anaconda3/envs/pytorch/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm


Loaded 11 documents from executive_summary_2020.pdf
executive_summary_2020
Loaded 10 documents from 2022-executive-summary.pdf
2022-executive-summary
Loaded 11 documents from 2020-executive-summary.pdf
2020-executive-summary
Loaded 11 documents from executive-summary-2021.pdf
executive-summary-2021
Loaded 11 documents from executive_summary_2021.pdf
executive_summary_2021
Loaded 10 documents from executive-summary-2022.pdf
executive-summary-2022
Loaded 11 documents from 2021-executive-summary.pdf
2021-executive-summary
Loaded 10 documents from executive_summary_2022.pdf
executive_summary_2022

Step 3: Execute Queries

from IPython.display import Markdown

response = react_agent.chat("정부의 최종 순 운영 비용을 비교하고 대조하여 어느 연도의 비용이 가장 높았는지 알려줄래?")
display(Markdown(f"<b>{response}</b>"))

[1;3;38;5;200mThought: The user is asking for a comparison of the final net operating costs of the government and wants to know which year had the highest costs. I can use the executive summary tools to gather information on the net operating costs for different years.
Action: executive_summary_2020
Action Input: {'input': 'net operating costs'}
[0m[1;3;34mObservation: The net operating costs increased by $2.4 trillion during FY 2020 to $3.8 trillion. This increase was primarily driven by various factors such as responses to the COVID-19 pandemic, including substantial increases in costs across multiple agencies, changes in assumptions related to federal employee and veteran benefits programs, and increases in benefit expenses from social insurance programs.
[0m[1;3;38;5;200mThought: The executive summary for 2020 provides information on the net operating costs of the government. According to the summary, the net operating costs increased by $2.4 trillion during FY 2020 to $3.8 trillion. This increase was primarily driven by various factors such as responses to the COVID-19 pandemic, including substantial increases in costs across multiple agencies, changes in assumptions related to federal employee and veteran benefits programs, and increases in benefit expenses from social insurance programs.

To compare the net operating costs for other years, I can use the executive summary tools for those years as well.
Action: executive_summary_2021
Action Input: {'input': 'net operating costs'}
[0m[1;3;34mObservation: $3.1 trillion
[0m[1;3;38;5;200mThought: The executive summary for 2021 provides information on the net operating costs of the government. According to the summary, the net operating costs for that year were $3.1 trillion.

To compare the net operating costs for other years, I can use the executive summary tools for those years as well.
Action: executive_summary_2022
Action Input: {'input': 'net operating costs'}
[0m[1;3;34mObservation: The net operating costs increased by $1.1 trillion (34.8 percent) during the fiscal year to $4.2 trillion.
[0m[1;3;38;5;200mThought: The executive summary for 2022 provides information on the net operating costs of the government. According to the summary, the net operating costs increased by $1.1 trillion (34.8 percent) during the fiscal year to $4.2 trillion.

Now that I have information on the net operating costs for the years 2020, 2021, and 2022, I can compare them to determine which year had the highest costs.
Answer: Among the years 2020, 2021, and 2022, the year with the highest net operating costs was 2022, with a total of $4.2 trillion.
[0m

Among the years 2020, 2021, and 2022, the year with the highest net operating costs was 2022, with a total of $4.2 trillion.

PreviousQueryEngine NextEvaluation

Last updated 13 days ago