3️⃣DSPy with LangChain

DSPy: Compiling chains from LangChain

DSPy의 가장 강력한 기능 중 하나는 옵티마이저입니다. DSPy 옵티마이저는 모든 LM 시스템에서 프롬프트(또는 LM 가중치)를 조정하여 모든 목표를 극대화할 수 있습니다.

옵티마이저는 LM 시스템의 품질을 개선하고 코드를 새로운 LM 또는 새로운 데이터에 맞게 조정할 수 있습니다. 이는 (i) 수동 프롬프트 엔지니어링, (ii) 합성 데이터 생성을 위한 복잡한 파이프라인 설계, (iii) 미세 조정을 위한 복잡한 파이프라인 설계와 같은 번거로운 작업 대신 구조와 모듈성을 도입하기 위한 것입니다.

# Install the dependencies if needed.
%pip install -U dspy-ai
%pip install -U openai jinja2
%pip install -U langchain langchain-community langchain-openai langchain-core

일반적으로 우리는 DSPy 모듈과 함께 DSPy 옵티마이저를 사용합니다. 하지만 여기서는 해리슨 체이스와 협력하여 DSPy가 LangChain 라이브러리로 구축된 체인도 최적화할 수 있도록 했습니다.

이 짧은 튜토리얼은 이 개념 증명 기능이 어떻게 작동하는지 보여줍니다. 아직은 DSPy나 LangChain의 모든 기능을 제공하지는 못하지만, 수요가 많을 경우 확장할 예정입니다.

이를 완전한 통합으로 전환하면 모든 사용자에게 혜택이 돌아갈 것입니다. LangChain 사용자는 모든 DSPy 옵티마이저로 모든 체인을 최적화할 수 있게 됩니다. DSPy 사용자는 스트리밍과 추적, 그리고 기타 풍부한 프로덕션 대상 기능을 지원하는 LCEL로 모든 DSPy 프로그램을 '내보내기' 할 수 있게 될 것입니다.

1.Setting Up

First, let's import dspy and configure the default language model and retrieval model in it.

import dspy

from dspy.evaluate.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.configure(rm=colbertv2)

Next, let's import langchain and the DSPy modules for interacting with LangChain runnables, namely, LangChainPredict and LangChainModule.

from langchain_openai import OpenAI
from langchain.globals import set_llm_cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path="cache.db"))

llm = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0)
retrieve = lambda x: dspy.Retrieve(k=5)(x["question"]).passages

2. Defining a chain as a LangChain expression

예를 들어 다음 작업을 처리해 보겠습니다.

Task: 유익한 트윗을 생성하기 위한 RAG 시스템을 구축합니다.

  • Input: 사실에 입각한 question(상당히 복잡할 수 있음).

  • Output: 검색된 정보에서 질문에 대한 정확한 답변을 제공하는 매력적인 tweet.

이를 설명하기 위해 LangChain의 표현 언어(LCEL)를 사용하겠습니다. 여기서는 어떤 프롬프트도 가능하며, 최종 프롬프트는 DSPy로 최적화할 것입니다.

이를 고려하여 핵심만 간추려 보겠습니다: **주어진 {context}에서 {question}에 대한 답을 트윗으로 작성하세요.

# From LangChain, import standard modules for prompting.
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Just a simple prompt for this task. It's fine if it's complex too.
prompt = PromptTemplate.from_template("Given {context}, answer the question `{question}` as a tweet.")

# This is how you'd normally build a chain with LCEL. This chain does retrieval then generation (RAG).
vanilla_chain = RunnablePassthrough.assign(context=retrieve) | prompt | llm | StrOutputParser()

3. Converting the chain into a DSPy module

우리의 목표는 이 프롬프트를 최적화하여 더 나은 트윗 생성기를 만드는 것입니다. DSPy 최적화 도구가 도움이 될 수 있지만 DSPy 모듈에서만 작동합니다!

이러한 이유로 DSPy에 두 개의 새로운 모듈을 만들었습니다: 'LangChainPredict'와 'LangChainModule'입니다.

# From DSPy, import the modules that know how to interact with LangChain LCEL.
from dspy.predict.langchain import LangChainPredict, LangChainModule

# This is how to wrap it so it behaves like a DSPy program.
# Just Replace every pattern like `prompt | llm` with `LangChainPredict(prompt, llm)`.
zeroshot_chain = RunnablePassthrough.assign(context=retrieve) | LangChainPredict(prompt, llm) | StrOutputParser()
zeroshot_chain = LangChainModule(zeroshot_chain)  # then wrap the chain in a DSPy module.

4. Trying the module

이 작업에서 LangChainModule은 얼마나 잘할 수 있을까요? 글쎄요, 다음 질문에 대한 트윗을 생성하도록 요청할 수 있습니다.

question = "In what region was Eddy Mazzoleni born?"

zeroshot_chain.invoke({"question": question})
' Eddy Mazzoleni, Italian professional cyclist, was born in Bergamo, Italy on July 29, 1973. #cyclist #Italy #Bergamo'

아, 그렇군요! (기술적으로 완벽하지는 않습니다. 도시가 아닌 지역을 요청했습니다. 아래에서 더 잘할 수 있습니다.)

질문과 답변을 수동으로 검사하는 것은 시스템을 파악하는 데 매우 중요합니다. 그러나 훌륭한 시스템 디자이너는 항상 자신의 작업을 반복적으로 벤치마크하여 진행 상황을 정량화합니다!

이를 위해서는 최대화하고자 하는 메트릭과 시스템에 대한 (작은) 데이터 세트라는 두 가지가 필요합니다.

좋은 트윗에 대한 사전 정의된 지표가 있나요? 100,000개의 트윗에 일일이 라벨을 붙여야 하나요? 아마 아닐 겁니다. 하지만 프로덕션에서 데이터를 얻기 전까지는 합리적인 작업을 쉽게 할 수 있습니다!

5. Evaluating the module

시작하기 위해 간단한 지표를 정의하고 QA 데이터 세트에서 여러 질문을 빌려와 여기에서 튜닝에 사용하겠습니다.

**무엇이 좋은 트윗을 만들까요? **모르겠지만 반복 개발의 정신에 따라 간단하게 시작해 보겠습니다!

좋은 트윗은 (1) 사실에 근거해야 하고, (2) 실제 출처에 기반해야 하며, (3) 사람들의 관심을 끌 수 있어야 한다는 세 가지 속성을 갖춰야 한다고 정의할 수 있습니다.

# We took the liberty to define this metric and load a few examples from a standard QA dataset.
# Let's impore them from `tweet_metric.py` in the same directory that contains this notebook.
from tweet_metric import metric, trainset, valset, devset

# We loaded 200, 50, and 150 examples for training, validation (tuning), and development (evaluation), respectively.
# You could load less (or more) and, chances are, the right DSPy optimizers will work well for many problems.
len(trainset), len(valset), len(devset)
(200, 50, 150)

이것이 올바른 지표인가요, 아니면 가장 대표적인 질문인가요? 반드시 그렇지는 않습니다. 하지만 체계적으로 반복할 수 있는 방법으로 시작할 수 있습니다!

**참고: 데이터 세트에는 실제로 트윗이 포함되어 있지 않다는 점에 유의하세요! 질문과 답변만 있습니다. 괜찮습니다. 저희 메트릭이 트윗 형식의 결과물을 평가할 수 있도록 처리할 것입니다.

이제 LangChain LCEL 객체에서 변환된 체인의 최적화되지 않은 "제로 샷" 버전을 평가해 보겠습니다.

evaluate = Evaluate(metric=metric, devset=devset, num_threads=8, display_progress=True, display_table=5)
evaluate(zeroshot_chain)
Average Metric: 63.999999999999986 / 150  (42.7%
42.67

zeroshot_chain 은 150 개 질문 중에 **43%의 정확도가 나왔다

위의 표는 몇 가지 예를 보여줍니다. 예를 들어:

  • Question: 록 밴드 주크 카튼과 서른 세컨즈 투 마스의 앨범을 프로듀싱한 프로듀서는 누구인가요?

  • Tweet: 제인스 애딕션, 벨벳 리볼버 등의 밴드와 함께 작업한 브라이언 버츄는 주크 카르텔과 서른 세컨즈 투 마스의 앨범을 제작하며... [중략]

  • Metric: 1.0 (A tweet that is correct, faithful, and engaging!*)

각주: * 적어도 저희 메트릭에 따르면, 이는 DSPy 프로그램일 뿐이므로 원한다면 그것도 최적화할 수 있습니다! 하지만 다른 노트북을 위한 주제입니다.

6. Optimizing the module

DSPy에는 많은 최적화 도구가 있지만 현재 사실상 기본값은 다음과 같습니다.BootstrapFewShotWithRandomSearch.

작동 원리가 궁금하다면: 이 옵티마이저는 trainset 질문에 대해 프로그램(이 경우 zeroshot_chain)을 실행하는 방식으로 작동합니다. 실행될 때마다 DSPy는 각 LM 호출의 입력과 출력을 기억합니다. 이를 트레이스라고 하며, 이 특정 옵티마이저는 "좋은" 트레이스(즉, 메트릭이 좋아하는 트레이스)를 계속 추적합니다. 그런 다음 이 최적화 도구는 이러한 추적을 자동화된 몇 가지 예시로 활용할 수 있는 좋은 방법을 찾으려고 노력합니다. 이 옵티마이저는 이러한 방법을 시도하여 valset의 평균 메트릭을 최대화하려고 노력할 것입니다. 예제를 자체 생성(부트스트랩)하는 방법에는 여러 가지가 있습니다. 선택을 최적화하는 방법도 여러 가지가 있습니다(여기서는 무작위 검색을 사용합니다). 그렇기 때문에 DSPy에는 여러 가지 다른 최적화 도구가 있습니다.

# Set up the optimizer. We'll use very minimal hyperparameters for this example.
# Just do random search with ~3 attempts, and in each attempt, bootstrap <= 3 traces.
optimizer = BootstrapFewShotWithRandomSearch(metric=metric, max_bootstrapped_demos=3, num_candidate_programs=3)

# Now use the optimizer to *compile* the chain. This could take 5-10 minutes, unless it's cached.
optimized_chain = optimizer.compile(zeroshot_chain, trainset=trainset, valset=valset)
Average Metric: 26.0 / 50  (52.0%)
Score: 52.0 for set: [16]
Scores so far: [44.67, 44.67, 54.0, 50.0, 51.33, 52.0]
Best score: 54.0
Average of max per entry across top 1 scores: 0.54
Average of max per entry across top 2 scores: 0.5733333333333335
Average of max per entry across top 3 scores: 0.6133333333333334
Average of max per entry across top 5 scores: 0.64
Average of max per entry across top 8 scores: 0.64
Average of max per entry across top 9999 scores: 0.64
6 candidate programs found.

7. Evaluating the optimized chain

얼마나 좋을까요? 모든 최적화 실행이 마법처럼 보이지 않는 예제를 개선하는 것은 아니니 확인해 보세요!

먼저 위에서부터 질문을 해봅시다.

question = "In what region was Eddy Mazzoleni born?"

optimized_chain.invoke({"question": question})
' Eddy Mazzoleni was born in Bergamo, a city in the Lombardy region of Italy. #EddyMazzoleni #Italy #Lombardy'

좋네요, 일화적으로 zeroshot_chain을 사용한 답변보다 조금 더 정확해 보입니다. 하지만 이제 제대로 된 평가를 해보겠습니다!

evaluate(optimized_chain)
Average Metric: 78.66666666666667 / 150  (52.4%)

52.44

zeroshot_chain은 43%에서 시작하여 현재 52%를 달성했습니다. 이는 21%의 상대적인 개선입니다.

8. Inspecting the optimized chain in action

prompt, output = dspy.settings.langchain_history[-4]

print('PROMPT:\n\n', prompt)
print('\n\nOUTPUT:\n\n', output)
PROMPT:

 Essential Instructions: Respond to the provided question based on the given context in the style of a tweet, which typically requires a concise and engaging answer within the character limit of a tweet (280 characters).

---

Follow the following format.

Context: ${context}
Question: ${question}
Tweet Response: ${tweet_response}

---

Context:
[1] «Candace Kita | Kita's first role was as a news anchor in the 1991 movie "Stealth Hunters". Kita's first recurring television role was in Fox's "Masked Rider", from 1995 to 1996. She appeared as a series regular lead in all 40 episodes. Kita also portrayed a frantic stewardess in a music video directed by Mark Pellington for the British group, Catherine Wheel, titled, "Waydown" in 1995. In 1996, Kita also appeared in the film "Barb Wire" (1996) and guest starred on "The Wayans Bros.". She also guest starred in "Miriam Teitelbaum: Homicide" with "Saturday Night Live" alumni Nora Dunn, "Wall To Wall Records" with Jordan Bridges, "Even Stevens", "Felicity" with Keri Russell, "V.I.P." with Pamela Anderson, "Girlfriends", "The Sweet Spot" with Bill Murray, and "Movies at Our House". She also had recurring roles on the FX spoof, "Son of the Beach" from 2001 to 2002, ABC-Family's "Dance Fever" and Oxygen Network's "Running with Scissors". Kita also appeared in the films "Little Heroes" (2002) and "Rennie's Landing" (2001).»
[2] «Jilly Kitzinger | Jilly Kitzinger is a fictional character in the science fiction series "Torchwood", portrayed by American actress Lauren Ambrose. The character was promoted as one of five new main characters to join "Torchwood" in its fourth series, "" (2011), as part of a new co-production between "Torchwood"' s British network, BBC One, and its American financiers on US premium television network Starz. Ambrose appears in seven of the ten episodes, and is credited as a "special guest star" throughout. Whilst reaction to the serial was mixed, Ambrose' portrayal was often singled out by critics for particular praise and in 2012 she received a Saturn Award nomination for Best Supporting Actress on Television.»
[3] «Candace Brown | Candace June Brown (born June 15, 1980) is an American actress and comedian best known for her work on shows such as "Grey's Anatomy", "Desperate Housewives", "Head Case", The "Wizards Of Waverly Place". In 2011, she joined the guest cast for "Torchwood"' s fourth series' "", airing on BBC One in the United Kingdom and premium television network Starz.»
[4] «Candace Elaine | Candace Elaine is a Canadian actress who has become a naturalized American citizen. Born 1972 in Edmonton, Alberta, Canada, Elaine is an accomplished dancer, fashionista, and stage and film actor. She most recently appeared opposite Stone Cold Steve Austin, Michael Shanks, and Michael Jai White in the action feature "Tactical Force", playing the role of Ilya Kalashnikova.»
[5] «Amy Steel | Amy Steel (born Alice Amy Steel; May 3, 1960) is an American film and television actress. She is best known for her roles as Ginny Field in "Friday the 13th Part 2" (1981) and Kit Graham in "April Fool's Day" (1986). She has starred in films such as "Exposed" (1983), "Walk Like a Man" (1987), "What Ever Happened to Baby Jane? " (1991), and "Tales of Poe" (2014). Steel has had numerous guest appearances on several television series, such as "Family Ties" (1983), "The A-Team" (1983), "Quantum Leap" (1990), and "China Beach" (1991), as well as a starring role in "The Powers of Matthew Star" (1982–83).»
Question: which American actor was Candace Kita guest starred with
Tweet Response: Candace Kita has guest starred with many American actors, including Nora Dunn, Jordan Bridges, Keri Russell, Pamela Anderson, and Bill Murray. #CandaceKita #gueststar #Americanactors

---

Context:
[1] «The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series is written and presented by Jeremy Paxman and debuted on BBC One at 9:00pm on Sunday 15 February 2009.»
[2] «Victorian (comics) | The Victorian is a 25-issue comic book series published by Penny-Farthing Press and starting in 1999. The brainchild of creator Trainor Houghton, the series included a number of notable script writers and illustrators, including Len Wein, Glen Orbik and Howard Chaykin.»
[3] «The Great Victorian Collection | The Great Victorian Collection, published in 1975, is a novel by Northern Irish-Canadian writer Brian Moore. Set in Carmel, California, it tells the story of a man who dreams that the empty parking lot he can see from his hotel window has been transformed by the arrival of a collection of priceless Victoriana on display in a vast open-air market. When he awakes he finds that he can no longer distinguish the dream from reality.»
[4] «Victorian People | Victorian People: A Reassessment of Persons and Themes, 1851-1867 is a book by the historian Asa Briggs originally published in 1955. It is part of a trilogy that also incorporates "Victorian Cities" and "Victorian Things".»
[5] «The Caxtons | The Caxtons: A Family Picture is an 1849 Victorian novel by Edward Bulwer-Lytton that was popular in its time.»
Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Tweet Response: The Victorians - Their Story In Pictures is a 2009 British documentary series written and presented by Jeremy Paxman, who was born in 1950. #Victorian #documentary #JeremyPaxman

---

Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»
[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine's Hall of Fame and "Tae Kwon Do Times" magazine's Hall of Fame.»
[4] «West Coast Magazine | West Coast Magazine (1987–1998). was a three times a year Scottish literary publication consisting of poetry, short fiction, articles, essays and reviews. Founding editors were Gordon Giles, Kenny MacKenzie and Joe Murray. The proof issue appeared in October 1987 and contained some articles and poems that did not appear in official issues. West Coast Magazine (WCM) was initially funded by East Glasgow Gear Project and Glasgow City Council; ultimately funded by the Scottish Arts Council.»
[5] «Southwest Art | Southwest Art is a magazine published by F+W that specializes in fine art depicting artwork of the American Southwest.»
Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?
Tweet Response: Tae Kwon Do Times has published articles by Scott Shaw, along with other notable authors in the martial arts world. #TaeKwonDo #MartialArts #Magazine

---

Context:
[1] «Scott Lowell | Scott Lowell (born February 22, 1965 in Denver, Colorado) is an American actor best known for his role as Ted Schmidt on the Showtime drama "Queer as Folk".»
[2] «Ted Schmidt | Theodore "Ted" Schmidt is a fictional character from the American Showtime television drama series "Queer as Folk", played by Scott Lowell. Fellow show cast member Peter Paige, who plays Emmett Honeycutt originally auditioned for the role. Lowell was cast and he stated that he had an instant connection with the character. "Queer as Folk" is based on the British show of the same name and Ted is loosely based on the character Phil Delaney, played by Jason Merrells. Phil was killed off in that series, whereas show creator Daniel Lipman decided to develop the character into a full-time role for the US version.»
[3] «Chris Lowell | Christopher Lowell (born October 17, 1984) is an American television actor. He played the role of Stosh "Piz" Piznarski in the CW noir drama "Veronica Mars" and the character William "Dell" Parker in the ABC "Grey's Anatomy" spin-off "Private Practice".»
[4] «Kevin Schmidt | Kevin Gerard Schmidt (born August 16, 1988) is an American actor, known best for his role as Henry in "Cheaper by the Dozen" and its sequel and as Noah Newman in "The Young and the Restless". Schmidt also starred on Cartoon Network's first live-action scripted television series, "Unnatural History". Schmidt also co-created, starred in, produced, and directed a cult web-series, "Poor Paul". Schmidt continues to write, direct, and act, and has also participated in humanitarian organizations. Schmidt is president of the Conscious Human Initiative, a non-profit entity that intends to alleviate malnutrition worldwide. He played Ryan in .»
[5] «Frederick Koehler | Frederick Koehler (born June 16, 1975) is an American actor best known for his role as Chip Lowell on "Kate & Allie" as well as Andrew Schillinger on the HBO drama "Oz". He is distinguished for appearing much younger than his chronological age (e.g., appearing about 20 years old when he was actually 38).»
Question: What show is an American-Canadian drama starring Scott Lowell playing Ted Schmidt?
Tweet Response:


OUTPUT:

 Prediction(
    tweet_response=' Scott Lowell played Ted Schmidt on the American-Canadian drama "Queer as Folk". #ScottLowell #TedSchmidt #QueerAsFolk'
)

Last updated