RAG 비즈니스 시나리오에서 데이터 로딩은 매우 중요한 측면입니다. LlamaIndex는 Data Connectors
인터페이스를 정의하고 다양한 데이터 소스 또는 데이터 형식에서 데이터 로드를 지원하는 여러 가지 구현을 제공합니다. 여기에는 다음이 포함됩니다.:
LlamaIndex 프레임워크는 내장된 Data Connector 세트를 제공합니다. 개발자는 LlamaHub에서 로드할 필요 없이 바로 사용할 수 있습니다.
다음 코드는 웹 페이지 데이터를 읽는 방법을 보여줍니다.
Copy from llama_index . core import SummaryIndex , SimpleWebPageReader
documents = SimpleWebPageReader (html_to_text = True ). load_data (
[ "http://paulgraham.com/worked.html" ]
)
Copy from pathlib import Path
from llama_index . core import download_loader
MarkdownReader = download_loader ( "MarkdownReader" )
loader = MarkdownReader ()
documents = loader . load_data (file = Path ( './README.md' ))
LlamaIndex: Data Connectors
Copy import os
from dotenv import load_dotenv
!echo "OPENAI_API_KEY=<Your OpenAI Key>" >> . env # OpenAI API Key 입
load_dotenv ()
api_key = os . getenv ( "OPENAI_API_KEY" )
LlamaIndex: Data Connectors
LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.
Copy !git clone https : // github . com / Anil - matcha / LlamaIndex - tutorials . git
Copy Cloning into 'LlamaIndex-tutorials'...
remote: Enumerating objects: 16, done.[K
remote: Counting objects: 100% (16/16), done.[K
remote: Compressing objects: 100% (15/15), done.[K
remote: Total 16 (delta 3), reused 4 (delta 1), pack-reused 0[K
Unpacking objects: 100% (16/16), 8.04 KiB | 1.15 MiB/s, done.
Copy from llama_index . core import SimpleDirectoryReader
reader = SimpleDirectoryReader (
input_dir = "./LlamaIndex-tutorials" ,
required_exts = [ ".md" ],
recursive = True
)
docs = reader . load_data ()
Copy [Document(id_='e06e478c-8581-4ff8-b3ba-59be370e8ffc', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nLlamaIndex tutorials\n\nOverview and tutorials of the LlamaIndex Library\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
Document(id_='3eb8990d-ae3e-4940-9f23-809934e30e33', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nGetting Started\n\nVideos coming soon https://www.youtube.com/@AnilChandraNaiduMatcha\n.Subscribe to the channel to get latest content\n\nFollow Anil Chandra Naidu Matcha on twitter for updates\n\nJoin our discord server for support https://discord.gg/FBpafqbbYF\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
Document(id_='195d642a-f821-4d8f-82ee-10e3059633a7', embedding=None, metadata={'file_path': '/home/kubwa/kubwai/13-LlamaIndex/LlamaIndex-Tutorials/04_Data_Connectors/LlamaIndex-tutorials/README.md', 'file_name': 'README.md', 'file_type': 'text/markdown', 'file_size': 455, 'creation_date': '2024-04-15', 'last_modified_date': '2024-04-15'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='\n\nAlso check\n\nLlamaIndex Course\n\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]
Copy from llama_index . core import VectorStoreIndex
index = VectorStoreIndex . from_documents (docs)
query_engine = index . as_query_engine ()
response = query_engine . query ( "LlamaIndex란 무엇이야?" )
print (response)
Copy LlamaIndex is a library that provides an overview and tutorials for users.
Copy response = query_engine . query ( "LlamaIndex 튜토리얼은 무엇을 제공해?" )
print (response)
Copy The LlamaIndex tutorials provide an overview and tutorials of the LlamaIndex Library.
LlamaIndex 튜토리얼(https://github.com/Anil-matcha/LlamaIndex-tutorials)을 참조하여 쿼리 엔진 사용을 위해 LlamaIndex 정의 문서에 로드합니다.
Copy from pathlib import Path
from llama_index . core import download_loader
MarkdownReader = download_loader ( "MarkdownReader" )
loader = MarkdownReader ()
documents = loader . load_data (file = Path ( './LlamaIndex-tutorials/README.md' ))
Copy index = VectorStoreIndex . from_documents (documents)
query_engine = index . as_query_engine ()
response = query_engine . query ( "LlamaIndex 시작하기 튜토리얼에서는 어떤 버전의 프레임워크를 사용하나요?" )
print (response)
Copy The LlamaIndex tutorials use a specific version of a framework.