5️⃣LangChain: Advance Techniques

Advanced Techniques

LangChain의 주요 모듈에서 데이터처리하는 Chunking, Embedding 방법은 중요합니다. 또한 Query에서 Prompt Template, Retrieval에서 Splitter는 여러 방법이 있습니다.

주로 사용하는 Advance Techniques의 사례를 코드로 구현해 보겠습니다.

import os
from dotenv import load_dotenv  

load_dotenv()
!echo "OPENAI_API_KEY=<Your_OpenAI_Key>" >> .env
api_key = os.getenv("OPENAI_API_KEY")
!mkdir data
!wget https://github.com/Coding-Crashkurse/Udemy-Advanced-LangChain/blob/main/data/food.txt -p ./data/food.txt
!wget https://github.com/Coding-Crashkurse/Udemy-Advanced-LangChain/blob/main/data/founder.txt -p ./data/founder.txt
!wget https://github.com/Coding-Crashkurse/Udemy-Advanced-LangChain/blob/main/data/restaurant.txt -p ./data/restaurant.txt

Chunking

with open("./data/restaurant.txt") as f:
    raw_data = f.read()

Standard Chunking

%pip install --upgrade --quiet langchain-text-splitters tiktoken
Note: you may need to restart the kernel to use updated packages.
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=200,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
docs = text_splitter.split_text(raw_data)
print(docs)
print(len(docs))
Created a chunk of size 325, which is longer than the specified 200
Created a chunk of size 327, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200
Created a chunk of size 374, which is longer than the specified 200
Created a chunk of size 289, which is longer than the specified 200


['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.', 'Chef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.', "One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.", 'Elena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink harmonizing with the creamy rice.', 'The final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart poured into every dish.', 'Leaving the restaurant, Elena knew her review would sing praises not just of the food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and every bite, a taste of Amico’s dream come true.']
6
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
docs = text_splitter.split_text(raw_data)
print(docs)
print(len(docs))
['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant', 'Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by', 'heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the', 'and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.', 'Chef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as', 'as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family', 'travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons', 'laughter of patrons filled the air, creating a symphony as delightful as the dishes served.', 'One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi,', "Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's", "the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy", 'with the joy of a man who loved his work.', 'Elena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a', 'was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next', 'on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love', 'of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink', 'of squid ink harmonizing with the creamy rice.', 'The final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet', 'savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico', 'that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart', 'and the heart poured into every dish.', 'Leaving the restaurant, Elena knew her review would sing praises not just of the food, but of the', 'food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and', 'through Sicily, and every bite, a taste of Amico’s dream come true.']
24

Semantic Chunking

  1. standard SemanticChunker

  2. breakpoint_threshold_type=['percentile', 'standard_deviation', 'interquartile']

#%pip install langchain_experimental
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings
text_splitter = SemanticChunker(
    OpenAIEmbeddings()
)
text_splitter_breakpoint = SemanticChunker(
    OpenAIEmbeddings(), 
    breakpoint_threshold_type="standard_deviation" # or 'interquartile'
)
docs = text_splitter.split_text(raw_data)
print(docs)
print(len(docs))
["In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy. Chef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served. One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.", 'Elena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink harmonizing with the creamy rice. The final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart poured into every dish. Leaving the restaurant, Elena knew her review would sing praises not just of the food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and every bite, a taste of Amico’s dream come true.']
2
text_splitter_breakpoint.split_text(raw_data)
print(docs)
print(len(docs))
['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant', 'Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by', 'heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the', 'and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.', 'Chef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as', 'as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family', 'travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons', 'laughter of patrons filled the air, creating a symphony as delightful as the dishes served.', 'One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi,', "Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's", "the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy", 'with the joy of a man who loved his work.', 'Elena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a', 'was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next', 'on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love', 'of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink', 'of squid ink harmonizing with the creamy rice.', 'The final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet', 'savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico', 'that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart', 'and the heart poured into every dish.', 'Leaving the restaurant, Elena knew her review would sing praises not just of the food, but of the', 'food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and', 'through Sicily, and every bite, a taste of Amico’s dream come true.']
24

Huggingface Embeddings

with open("./data/restaurant.txt") as f:
    raw_data = f.read()
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=200,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.split_text(raw_data)
texts
Created a chunk of size 325, which is longer than the specified 200
Created a chunk of size 327, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200
Created a chunk of size 374, which is longer than the specified 200
Created a chunk of size 289, which is longer than the specified 200

['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.',
 'Chef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.',
 "One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.",
 'Elena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink harmonizing with the creamy rice.',
 'The final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart poured into every dish.',
 'Leaving the restaurant, Elena knew her review would sing praises not just of the food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and every bite, a taste of Amico’s dream come true.']
%pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "paraphrase-MiniLM-L6-v2"
)

embeddings_huggingface = model.encode(texts)
/home/kubwa/anaconda3/envs/llm/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
embeddings_huggingface[0]
array([ 0.08134533,  0.427597  , -0.07879403,  0.05424361, -0.2728025 ,
       -0.26916075,  0.13213104, -0.08318138,  0.0815519 ,  0.00868148,
        0.2659209 , -0.11565454, -0.10010716,  0.20750193,  0.27180114,
       -0.30432186,  0.20329821, -0.10343803,  0.24912746,  0.10948042,
       -0.04937839, -0.09979149, -0.06548927,  0.2105276 ,  0.43719673,
       -0.19460969,  0.42836696,  0.31816912, -0.15296954, -0.10471585,
        0.24475577, -0.16168419,  0.18021908, -0.01962745, -0.19171213,
        0.28202245, -0.0416555 , -0.23106173,  0.16600938,  0.03676724,
        0.12425852,  0.16848373, -0.07740648, -0.26863122,  0.17550932,
       -0.14738299,  0.1470242 ,  0.12241286, -0.00885579, -0.16056983,
       -0.49815607,  0.0507556 ,  0.05124085, -0.06534179, -0.11084051,
       -0.01177608, -0.06595471,  0.3867669 , -0.01074637,  0.19357428,
        0.17053784, -0.47737792,  0.00470324,  0.00618902, -0.18907   ,
        0.00097326, -0.02502042, -0.3091907 , -0.6151345 , -0.12231572,
        0.10477588, -0.6650125 ,  0.0316483 ,  0.09899287,  0.1479276 ,
       -0.56450975, -0.0263247 , -0.31436023, -0.65265834,  0.22300957,
        0.19723296,  0.04794945, -0.4260626 ,  0.46166334, -0.03314395,
       -0.11607559, -0.356182  , -0.17051262,  0.2496264 , -0.18243243,
        0.2850962 , -0.106864  , -0.07401578, -0.28775948,  0.2303263 ,
       -0.1114395 , -0.43627632,  0.38926345, -0.16553438, -0.00830238,
       -0.01730877,  0.20682257,  0.11142192, -0.15938535,  0.34006354,
        0.20519672,  0.19022591,  0.12897532,  0.13632947,  0.16460864,
       -0.4090085 ,  0.19973317, -0.06375565, -0.45881853, -0.01712096,
        0.37517488,  0.04865366, -0.17216855,  0.18913826,  0.03342902,
        0.19140169, -0.02504501, -0.14154929,  0.5253965 , -0.31537458,
        0.26272824,  0.33709618, -0.01892356, -0.24211894,  0.11812719,
       -0.14618546,  0.38282657,  0.6569913 ,  0.10368019,  0.08892129,
        0.03050717, -0.03167247, -0.01797551, -0.16755357, -0.32359177,
       -0.33592445,  0.30851632, -0.19579889, -0.14112341,  0.29071486,
       -0.06961236,  0.21554048, -0.1693802 , -0.0752824 , -0.19103043,
        0.12374664,  0.17899701,  0.16886067,  0.20892084,  0.18136163,
        0.19504008, -0.123354  ,  0.15059449,  0.01775168,  0.03850172,
       -0.02808238,  0.01961013, -0.12869866,  0.34482145, -0.13494396,
       -0.17800179, -0.552189  ,  0.23084322, -0.2511283 ,  0.27158052,
       -0.23707153, -0.15630963, -0.02438921,  0.15898606, -0.48405322,
        0.29148158,  0.29493475,  0.5270614 ,  0.08418659, -0.21038903,
       -0.00191476,  0.33085576,  0.05810625,  0.08025777, -0.014285  ,
        0.23497272,  0.31899098,  0.13624567, -0.02228187,  0.40321255,
       -0.1804232 ,  0.35070148, -0.0258134 ,  0.17755735,  0.19297588,
        0.30828866,  0.3798975 ,  0.01838099,  0.1577268 , -0.07104518,
       -0.21378273, -0.0182809 , -0.17458735,  0.03942712, -0.21328509,
       -0.0315671 ,  0.37557587,  0.3688756 , -0.00300597,  0.13808648,
        0.04212682,  0.20179522, -0.0115846 ,  0.39699334,  0.5844212 ,
        0.18426289, -0.01053573,  0.22717938, -0.23566341,  0.09141976,
        0.220176  , -0.25730625, -0.08219474, -0.16517852,  0.3363021 ,
       -0.31946358, -0.1293436 , -0.15203278, -0.02271065, -0.34284008,
       -0.42678756, -0.12641244, -0.14131054,  0.02875809, -0.41570312,
       -0.00932696,  0.26857644, -0.22024237, -0.12900148,  0.4017534 ,
        0.1957088 , -0.27122462, -0.06648885,  0.32019958, -0.24556471,
        0.0210088 ,  0.06891371, -0.02413491, -0.24415857,  0.21761823,
       -0.06906353,  0.27917317, -0.22907484,  0.09660839, -0.5865401 ,
       -0.01903726,  0.11703739, -0.13185282, -0.15879811,  0.54581887,
       -0.32341397, -0.04961459, -0.2207956 , -0.03648186, -0.11043948,
       -0.17644817, -0.1990808 ,  0.38304183,  0.20172624,  0.22999325,
       -0.10156554, -0.43439794, -0.21372046, -0.37203625, -0.00974425,
        0.36596346, -0.06666319, -0.09986738, -0.02570056, -0.15350246,
       -0.39371505,  0.24003291, -0.33391774,  0.09798936,  0.02447831,
        0.03299262,  0.03249498, -0.00438718,  0.11788495, -0.10881496,
       -0.2792974 ,  0.08191813, -0.22744942,  0.41535154,  0.17875122,
        0.02301301, -0.04724361, -0.16311364, -0.43263713, -0.04223166,
       -0.15925723,  0.44740802, -0.1621015 ,  0.12942769,  0.0103466 ,
       -0.08263364, -0.19251053,  0.05942644, -0.08237252, -0.14838162,
       -0.09152927,  0.04262289,  0.01303752,  0.5252298 ,  0.17595072,
        0.297124  ,  0.16882291,  0.02264774,  0.19584496, -0.10643198,
        0.16330953, -0.14373715, -0.14615664,  0.18865511, -0.11343541,
       -0.44412965,  0.15941487, -0.11583465, -0.42000803,  0.5067833 ,
       -0.39780855,  0.52253836, -0.30004343,  0.41683283, -0.053866  ,
       -0.2161384 ,  0.25314254,  0.12023956, -0.14360043,  0.2007724 ,
        0.49751407,  0.26580822,  0.04299141, -0.5589946 , -0.10787092,
       -0.34045342, -0.35965258, -0.18885045, -0.00186023, -0.2593788 ,
       -0.24509603,  0.06214124, -0.16173457, -0.109694  , -0.18151298,
       -0.05646194, -0.35842207, -0.18353085,  0.16491622, -0.7815522 ,
       -0.23033474, -0.20105045,  0.01088465, -0.04395587,  0.06272687,
        0.42887372,  0.40137774,  0.26135635,  0.06240474,  0.27876386,
       -0.4448833 ,  0.28086808,  0.373704  , -0.13709994, -0.05792329,
       -0.26814342,  0.15053092,  0.17046294,  0.34323964, -0.04583586,
        0.14495076, -0.02778706, -0.17749774, -0.21106538], dtype=float32)
len(embeddings_huggingface[0])
384

Queries: HYDE_PROMPT

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


loader = DirectoryLoader("./data", glob="**/*.txt")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=120,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(docs)

embedding_function = OpenAIEmbeddings()
model = ChatOpenAI()

db = Chroma.from_documents(
    docs, 
    embedding_function
)
retriever = db.as_retriever()
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda
import re

query = "누가 레스토랑 주인이야?"


QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""당신은 AI 언어 모델 어시스턴트입니다. 여러분의 임무는 주어진 사용자 질문의 다섯 가지
    주어진 사용자 질문의 다섯 가지 버전을 생성하여 벡터에서 관련 문서를 검색하는 것입니다.
    데이터베이스에서 관련 문서를 검색하는 것입니다. 사용자 질문에 대한 다양한 관점을 생성함으로써 다음과 같은 목표를 달성하는 것이 목표입니다.
    사용자가 거리 기반 유사도 검색의 몇 가지 한계를 극복하도록 돕는 것입니다.
    다음과 같은 대체 질문을 제공하세요.:
    <<question1>>
    <<question2>>
    Only provide the query, no numbering.
    Original question: {question}""",
)

def split_and_clean_text(input_text):
    return [item for item in re.split(r"<<|>>", input_text) if item.strip()]
model = ChatOpenAI()
rephrase_chain = (
    QUERY_PROMPT | model | StrOutputParser() | RunnableLambda(split_and_clean_text)
)
list_of_questions = rephrase_chain.invoke(
    "누가 레스토랑 주인이야?"
)
docs = [retriever.get_relevant_documents(q) for q in list_of_questions]
Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3
def flatten_and_unique_documents(documents):
    flattened_docs = [doc for sublist in documents for doc in sublist]

    unique_docs = []
    unique_contents = set()
    for doc in flattened_docs:
        if doc.page_content not in unique_contents:
            unique_docs.append(doc)
            unique_contents.add(doc.page_content)

    return unique_docs
flatten_and_unique_documents(documents=docs)
[Document(page_content="In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.\n\nChef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.\n\nOne evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.\n\nElena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink harmonizing with the creamy rice.\n\nThe final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart poured into every dish.\n\nLeaving the restaurant, Elena knew her review would sing praises not just of the food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and every bite, a taste of Amico’s dream come true.", metadata={'source': 'data/restaurant.txt'}),
 Document(page_content='In the heart of the old quarter of Palermo, amidst the bustling market stalls and the echoes of lively street life, Amico was born into a family where food was more than sustenance—it was the language of love. Raised in the warmth of his Nonna Lucia\'s kitchen, young Amico was captivated by the symphony of flavors and aromas that danced in the air, a testament to his family’s Sicilian heritage.\n\nAmico\'s life was deeply entwined with the vibrant essence of Sicilian cuisine. In the rustic kitchen where his Nonna conjured culinary magic, Amico found his calling. These formative years, filled with the rhythmic chopping of fresh herbs and the sizzling of rich tomato sauces, laid the foundation of his passion for cooking.\n\nThe Journey to Chef Amico\n\nFrom a young age, Amico was immersed in the art of Sicilian cooking. His days were punctuated by visits to the bustling markets of Palermo, where he learned to choose the freshest fish from the Mediterranean and the ripest fruits kissed by the Sicilian sun. These experiences not only sharpened his culinary skills but also deepened his respect for the land and its bounty.\n\nAs he grew, so did his desire to explore beyond the shores of Sicily. Venturing through Italy, Amico worked alongside renowned chefs, each teaching him a new facet of Italian cuisine. From the rolling hills of Tuscany to the romantic canals of Venice, he absorbed the diverse regional flavors, techniques, and traditions that would later influence his unique culinary style.\n\nCreating Chef Amico’s Restaurant\n\nReturning to Palermo with a vision, Amico opened the doors to "Chef Amico," a restaurant that was a culmination of his travels and a tribute to his Sicilian roots. Nestled in a quaint corner of the city, the restaurant quickly gained fame for its authentic flavors and Amico’s innovative twists on traditional recipes.\n\nAt Chef Amico, every dish told a story. The menu, a tapestry of Sicilian classics and modern Italian cuisine, reflected Amico’s journey and his commitment to excellence. Patrons were not just diners; they were part of an extended family, welcomed with the same warmth and joy that Amico had experienced in his Nonna’s kitchen.\n\nPhilosophy of Hospitality\n\nFor Amico, hospitality was an art form. He believed that a meal was a celebration, a moment to pause and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini and glasses of Nero d’Avola. The atmosphere he fostered was one of comfort and camaraderie, a place where every guest left with a full stomach and a happy heart.\n\nContinuing the Legacy\n\nToday, Chef Amico stands as a landmark in Palermo, a testament to Amico’s dedication and love for his craft. His spirit of generosity and passion for food extends beyond the restaurant’s walls. He mentors young chefs, shares his knowledge at culinary workshops, and supports local farmers and producers.\n\nAmico’s legacy is not just in the dishes he creates but in the community he nurtures. His story is a tribute to the power of food to connect us, to share our stories, and to celebrate the richness of life. Chef Amico is more than a restaurant; it\'s a home, built on a lifetime of love, learning, and the flavors of Sicily.', metadata={'source': 'data/founder.txt'}),
 Document(page_content='Margherita Pizza; $12; Classic with tomato, mozzarella, and basil; Main Dish\n\nSpaghetti Carbonara; $15; Creamy pasta with pancetta and parmesan; Main Dish\n\nBruschetta; $8; Toasted bread with tomato, garlic, and olive oil; Appetizer\n\nCaprese Salad; $10; Fresh tomatoes, mozzarella, and basil; Salad\n\nLasagna; $14; Layered pasta with meat sauce and cheese; Main Dish\n\nTiramisu; $9; Coffee-flavored Italian dessert; Dessert\n\nGelato; $7; Traditional Italian ice cream; Dessert\n\nRisotto Milanese; $16; Creamy saffron-infused rice dish; Main Dish\n\nPolenta; $11; Cornmeal dish, often served as a side; Side Dish\n\nOsso Buco; $20; Braised veal shanks with vegetables and broth; Main Dish\n\nRavioli; $13; Stuffed pasta with cheese or meat filling; Main Dish\n\nMinestrone Soup; $9; Vegetable soup with pasta or rice; Soup\n\nProsecco; $8; Italian sparkling white wine; Drink\n\nChianti; $10; Dry red wine from Tuscany; Drink\n\nFocaccia; $6; Oven-baked Italian bread; Side Dish\n\nCalamari; $12; Fried squid rings with marinara sauce; Appetizer\n\nEspresso; $4; Strong Italian coffee; Drink\n\nCannoli; $8; Sicilian pastry with sweet ricotta filling; Dessert\n\nArancini; $10; Fried rice balls stuffed with cheese or meat; Appetizer\n\nPanna Cotta; $9; Creamy Italian dessert with caramel or fruit; Dessert\n\nNegroni; $12; Cocktail with gin, vermouth, and Campari; Drink\n\nAperol Spritz; $10; Aperitif cocktail with Aperol, prosecco, and soda; Drink\n\nGnocchi; $14; Potato-based pasta served with various sauces; Main Dish\n\nPanzanella; $9; Bread and tomato salad; Salad\n\nCarpaccio; $15; Thinly sliced raw beef with arugula and parmesan; Appetizer\n\nAffogato; $7; Espresso poured over gelato; Dessert\n\nBiscotti; $5; Crunchy Italian almond biscuits; Dessert\n\nVitello Tonnato; $18; Thin slices of veal with a creamy tuna sauce; Main Dish\n\nCrostini; $7; Small toasted bread with toppings; Appetizer\n\nZabaglione; $10; Light custard dessert made with egg yolks; Dessert\n\nFrittata; $12; Italian-style omelette; Main Dish\n\nSaltimbocca; $19; Veal wrapped in prosciutto and sage; Main Dish\n\nLimoncello; $8; Italian lemon liqueur; Drink\n\nGrappa; $9; Italian grape-based brandy; Drink\n\nSangiovese; $11; Medium-bodied red wine; Drink\n\nRibollita; $10; Tuscan bread and vegetable soup; Soup\n\nTortellini; $14; Ring-shaped pasta filled with meat or cheese; Main Dish\n\nPanettone; $15; Traditional Italian Christmas bread; Dessert\n\nInsalata Mista; $8; Mixed green salad with Italian dressing; Salad\n\nCacio e Pepe; $13; Pasta with cheese and pepper; Main Dish\n\nItalian Soda; $5; Carbonated water with flavored syrup; Drink\n\nAmericano; $6; Coffee with added hot water; Drink\n\nFrutti di Mare; $22; Seafood pasta with mixed shellfish; Main Dish\n\nCaponata; $9; Eggplant dish with capers, olives, and celery; Side Dish\n\nAmaretto Sour; $10; Cocktail with amaretto, lemon juice, and sugar; Drink\n\nBranzino; $21; Mediterranean sea bass, usually grilled or baked; Main Dish\n\nPorchetta; $18; Savory, fatty, and moist boneless pork roast; Main Dish\n\nMontepulciano Wine; $12; Full-bodied red wine; Drink\n\nBresaola; $14; Air-dried, salted beef served as an appetizer; Appetizer\n\nPesto Pasta; $12; Pasta with traditional basil pesto sauce; Main Dish', metadata={'source': 'data/food.txt'})]
HYDE_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""당신은 AI 언어 모델 어시스턴트입니다. 
    여러분의 임무는 사용자의 질문에 대한 5개의 가상 답변을 생성하는 것입니다. 
    이러한 답변은 다양한 관점이나 해석을 제시하여 쿼리를 포괄적으로 이해하는 데 도움이 되어야 합니다. 
    다음과 같이 가상의 답변을 제시하세요:

    Hypothetical Answer 1: <<Answer considering a specific perspective>>
    Hypothetical Answer 2: <<Answer from a different angle>>
    Hypothetical Answer 3: <<Answer exploring an alternative possibility>>
    Hypothetical Answer 4: <<Answer providing a contrasting viewpoint>>
    Hypothetical Answer 5: <<Answer that includes a unique insight>>

    Note: Present only the hypothetical answers, without numbering, to provide a range of potential interpretations or solutions related to the query.
    Original question: {question}""",
)
hyde_chain = (
    HYDE_PROMPT | model | StrOutputParser() | RunnableLambda(split_and_clean_text)
)
list_of_questions = hyde_chain.invoke(
    "누가 레스토랑 주인이야?")
list_of_questions
['- 그 레스토랑의 주인은 그것을 설립한 사람일 수도 있고, 현재 운영하고 있는 사람일 수도 있습니다.\n- 주인이라고 할 때, 실질적인 소유자를 말하는 것일 수도 있고, 사업을 주도하는 책임자를 의미할 수도 있습니다.\n- 레스토랑 주인이라는 개념은 경영, 소유, 운영 등 다양한 측면을 포함할 수 있습니다.\n- 주인이라는 개념은 상황에 따라 해석이 달라질 수 있습니다. 예를 들어, 가족 소유 레스토랑과 기업 소유 레스토랑은 다른 의미를 가질 수 있습니다.\n- 레스토랑의 주인은 종종 그 곳의 정체성과 방향성을 형성하는 주요한 인물로 볼 수 있습니다.']
docs = [retriever.get_relevant_documents(q) for q in list_of_questions]
flatten_and_unique_documents(documents=docs)
Number of requested results 4 is greater than number of elements in index 3, updating n_results = 3





[Document(page_content="In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eatery—it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his life’s journey through the flavors of Italy.\n\nChef Amico’s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amico’s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.\n\nOne evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.\n\nElena was led to a table adorned with a simple, elegant setting. The first course was Caponata, a melody of eggplant, capers, and sweet tomatoes, which danced on her palate. Next came the Risotto al Nero di Seppia, a dish that told the tale of Sicily’s love affair with the sea. Each spoonful was a revelation, the rich flavors of squid ink harmonizing with the creamy rice.\n\nThe final masterpiece was Cannoli, the crown jewel of Sicilian desserts. As Elena savored the sweet ricotta filling, encased in a perfectly crisp shell, she realized that Chef Amico wasn’t just about the food. It was about the stories, the traditions, and the heart poured into every dish.\n\nLeaving the restaurant, Elena knew her review would sing praises not just of the food, but of the soul of Chef Amico—a place where every dish was a journey through Sicily, and every bite, a taste of Amico’s dream come true.", metadata={'source': 'data/restaurant.txt'}),
 Document(page_content='In the heart of the old quarter of Palermo, amidst the bustling market stalls and the echoes of lively street life, Amico was born into a family where food was more than sustenance—it was the language of love. Raised in the warmth of his Nonna Lucia\'s kitchen, young Amico was captivated by the symphony of flavors and aromas that danced in the air, a testament to his family’s Sicilian heritage.\n\nAmico\'s life was deeply entwined with the vibrant essence of Sicilian cuisine. In the rustic kitchen where his Nonna conjured culinary magic, Amico found his calling. These formative years, filled with the rhythmic chopping of fresh herbs and the sizzling of rich tomato sauces, laid the foundation of his passion for cooking.\n\nThe Journey to Chef Amico\n\nFrom a young age, Amico was immersed in the art of Sicilian cooking. His days were punctuated by visits to the bustling markets of Palermo, where he learned to choose the freshest fish from the Mediterranean and the ripest fruits kissed by the Sicilian sun. These experiences not only sharpened his culinary skills but also deepened his respect for the land and its bounty.\n\nAs he grew, so did his desire to explore beyond the shores of Sicily. Venturing through Italy, Amico worked alongside renowned chefs, each teaching him a new facet of Italian cuisine. From the rolling hills of Tuscany to the romantic canals of Venice, he absorbed the diverse regional flavors, techniques, and traditions that would later influence his unique culinary style.\n\nCreating Chef Amico’s Restaurant\n\nReturning to Palermo with a vision, Amico opened the doors to "Chef Amico," a restaurant that was a culmination of his travels and a tribute to his Sicilian roots. Nestled in a quaint corner of the city, the restaurant quickly gained fame for its authentic flavors and Amico’s innovative twists on traditional recipes.\n\nAt Chef Amico, every dish told a story. The menu, a tapestry of Sicilian classics and modern Italian cuisine, reflected Amico’s journey and his commitment to excellence. Patrons were not just diners; they were part of an extended family, welcomed with the same warmth and joy that Amico had experienced in his Nonna’s kitchen.\n\nPhilosophy of Hospitality\n\nFor Amico, hospitality was an art form. He believed that a meal was a celebration, a moment to pause and relish life’s simple pleasures. His restaurant was a haven where strangers became friends over plates of arancini and glasses of Nero d’Avola. The atmosphere he fostered was one of comfort and camaraderie, a place where every guest left with a full stomach and a happy heart.\n\nContinuing the Legacy\n\nToday, Chef Amico stands as a landmark in Palermo, a testament to Amico’s dedication and love for his craft. His spirit of generosity and passion for food extends beyond the restaurant’s walls. He mentors young chefs, shares his knowledge at culinary workshops, and supports local farmers and producers.\n\nAmico’s legacy is not just in the dishes he creates but in the community he nurtures. His story is a tribute to the power of food to connect us, to share our stories, and to celebrate the richness of life. Chef Amico is more than a restaurant; it\'s a home, built on a lifetime of love, learning, and the flavors of Sicily.', metadata={'source': 'data/founder.txt'}),
 Document(page_content='Margherita Pizza; $12; Classic with tomato, mozzarella, and basil; Main Dish\n\nSpaghetti Carbonara; $15; Creamy pasta with pancetta and parmesan; Main Dish\n\nBruschetta; $8; Toasted bread with tomato, garlic, and olive oil; Appetizer\n\nCaprese Salad; $10; Fresh tomatoes, mozzarella, and basil; Salad\n\nLasagna; $14; Layered pasta with meat sauce and cheese; Main Dish\n\nTiramisu; $9; Coffee-flavored Italian dessert; Dessert\n\nGelato; $7; Traditional Italian ice cream; Dessert\n\nRisotto Milanese; $16; Creamy saffron-infused rice dish; Main Dish\n\nPolenta; $11; Cornmeal dish, often served as a side; Side Dish\n\nOsso Buco; $20; Braised veal shanks with vegetables and broth; Main Dish\n\nRavioli; $13; Stuffed pasta with cheese or meat filling; Main Dish\n\nMinestrone Soup; $9; Vegetable soup with pasta or rice; Soup\n\nProsecco; $8; Italian sparkling white wine; Drink\n\nChianti; $10; Dry red wine from Tuscany; Drink\n\nFocaccia; $6; Oven-baked Italian bread; Side Dish\n\nCalamari; $12; Fried squid rings with marinara sauce; Appetizer\n\nEspresso; $4; Strong Italian coffee; Drink\n\nCannoli; $8; Sicilian pastry with sweet ricotta filling; Dessert\n\nArancini; $10; Fried rice balls stuffed with cheese or meat; Appetizer\n\nPanna Cotta; $9; Creamy Italian dessert with caramel or fruit; Dessert\n\nNegroni; $12; Cocktail with gin, vermouth, and Campari; Drink\n\nAperol Spritz; $10; Aperitif cocktail with Aperol, prosecco, and soda; Drink\n\nGnocchi; $14; Potato-based pasta served with various sauces; Main Dish\n\nPanzanella; $9; Bread and tomato salad; Salad\n\nCarpaccio; $15; Thinly sliced raw beef with arugula and parmesan; Appetizer\n\nAffogato; $7; Espresso poured over gelato; Dessert\n\nBiscotti; $5; Crunchy Italian almond biscuits; Dessert\n\nVitello Tonnato; $18; Thin slices of veal with a creamy tuna sauce; Main Dish\n\nCrostini; $7; Small toasted bread with toppings; Appetizer\n\nZabaglione; $10; Light custard dessert made with egg yolks; Dessert\n\nFrittata; $12; Italian-style omelette; Main Dish\n\nSaltimbocca; $19; Veal wrapped in prosciutto and sage; Main Dish\n\nLimoncello; $8; Italian lemon liqueur; Drink\n\nGrappa; $9; Italian grape-based brandy; Drink\n\nSangiovese; $11; Medium-bodied red wine; Drink\n\nRibollita; $10; Tuscan bread and vegetable soup; Soup\n\nTortellini; $14; Ring-shaped pasta filled with meat or cheese; Main Dish\n\nPanettone; $15; Traditional Italian Christmas bread; Dessert\n\nInsalata Mista; $8; Mixed green salad with Italian dressing; Salad\n\nCacio e Pepe; $13; Pasta with cheese and pepper; Main Dish\n\nItalian Soda; $5; Carbonated water with flavored syrup; Drink\n\nAmericano; $6; Coffee with added hot water; Drink\n\nFrutti di Mare; $22; Seafood pasta with mixed shellfish; Main Dish\n\nCaponata; $9; Eggplant dish with capers, olives, and celery; Side Dish\n\nAmaretto Sour; $10; Cocktail with amaretto, lemon juice, and sugar; Drink\n\nBranzino; $21; Mediterranean sea bass, usually grilled or baked; Main Dish\n\nPorchetta; $18; Savory, fatty, and moist boneless pork roast; Main Dish\n\nMontepulciano Wine; $12; Full-bodied red wine; Drink\n\nBresaola; $14; Air-dried, salted beef served as an appetizer; Appetizer\n\nPesto Pasta; $12; Pasta with traditional basil pesto sauce; Main Dish', metadata={'source': 'data/food.txt'})]

Retriever: Parent/Child Splitter

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.chroma import Chroma
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders.directory import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
import os
loader = DirectoryLoader("./data", glob="**/*.txt")
docs = loader.load()

model = ChatOpenAI()
vectorstore = Chroma(
    collection_name="full_documents", 
    embedding_function=OpenAIEmbeddings()
)
from langchain.storage import InMemoryStore
from langchain.retrievers import ParentDocumentRetriever

docstore = InMemoryStore()
child_splitter = RecursiveCharacterTextSplitter(chunk_size=250)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=600)

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)
retriever.add_documents(docs, ids=None)
len(list(docstore.yield_keys()))
22
retriever.get_relevant_documents(
    "누가 주인이야?"
)
[Document(page_content="One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.", metadata={'source': 'data/restaurant.txt'})]

Last updated