DROP EXTENSION IFEXISTS vectors;CREATE EXTENSION vectors;
Create the database and load documents
가장 유사한 벡터를 검색하기 위해 LangChain에서 pgvecto.rs를 사용하는 방법을 보여드리겠습니다.먼저, 텍스트 로더와 텍스트 분할기를 생성하여 텍스트를 청크로 분할해야 합니다. 여기서는 마크다운 파일인 pgvecto.rs-docs/src/getting-started/overview.md를 예로 사용합니다.
## PGVecto.rs needs the connection string to the database.## We will load it from the environment variables.import osPORT = os.getenv("DB_PORT", 5432)HOST = os.getenv("DB_HOST", "localhost")USER = os.getenv("DB_USER", "postgres")PASS = os.getenv("DB_PASS", "mysecretpassword")DB_NAME = os.getenv("DB_NAME", "postgres")# Run tests with shell:URL ="postgresql+psycopg://{username}:{password}@{host}:{port}/{db_name}".format( port=PORT, host=HOST, username=USER, password=PASS, db_name=DB_NAME,)# The pgvectors Module will try to create a table with the name of the collection.# So, make sure that the collection name is unique and the user has the permission to create a table.COLLECTION_NAME ="state_of_the_union_test"db = PGVecto_rs.from_documents( embedding=embeddings, documents=docs, collection_name=COLLECTION_NAME, db_url=URL,)
Query index
마지막으로 LangChain에서 가장 유사한 청크를 검색할 수 있습니다.
query ="What is pgvecto.rs"docs_with_score = db.similarity_search_with_score(query)for doc, score in docs_with_score:print("-"*80)print("Score: ", score)print(doc.page_content)print("-"*80)
> Created a chunk of size 1181, which is longer than the specified 1000
--------------------------------------------------------------------------------
Score: 0.25059962
# Overview
An introduction to the pgvecto.rs.
## What is pgvecto.rs
pgvecto.rs is a Postgres extension that provides vector similarity search functions. It is written in Rust and based on [pgrx](https://github.com/tcdi/pgrx). It is currently in the beta status, we invite you to try it out in production and provide us with feedback. Read more at [📝our launch blog](https://modelz.ai/blog/pgvecto-rs).
## Why use pgvecto.rs
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 0.29536954
- 💃 **Easy to use**: pgvecto.rs is a Postgres extension, which means that you can use it directly within your existing database. This makes it easy to integrate into your existing workflows and applications.
- 🔗 **Async indexing**: pgvecto.rs's index is asynchronously constructed by the background threads and does not block insertions and always ready for new queries.
- 🥅 **Filtering**: pgvecto.rs supports filtering. You can set conditions when searching or retrieving points. This is the missing feature of other postgres extensions.
- 🧮 **Quantization**: pgvecto.rs supports scalar quantization and product qutization up to 64x.
- 🦀 **Rewrite in Rust**: Rust's strict compile-time checks ensure memory safety, reducing the risk of bugs and security issues commonly associated with C extensions.
## Comparison with pgvector
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score: 0.35845917
More details at [📝`pgvecto.rs` vs. pgvector](/faqs/comparison-pgvector.md).
## Quick start
For new users, we recommend using the [Docker image](https://hub.docker.com/r/tensorchord/pgvecto-rs) to get started quickly.
...
Initialize existing database
위에서 벡터 스토어를 처음부터 새로 만들었습니다. 하지만 기존 벡터 스토어를 사용하여 작업하고 싶을 때가 많습니다. 이를 위해 직접 초기화할 수 있습니다.
db =PGVecto_rs( embedding=embeddings, collection_name=COLLECTION_NAME,# OpenAI embedding has 1536 dimensions. dimension=1536, db_url=URL,)
그런 다음 스토어에 벡터를 추가하고 쿼리할 수 있습니다:
> (Document(page_content='foo'), 0.0)
RAG pipeline with LlamaIndex and pgvecto.rs
LlamaIndex는 사용자 지정 데이터 소스를 대규모 언어 모델(LLM)에 연결하기 위한 간단하고 유연한 데이터 프레임워크입니다.pgvecto.rs는 가장 유사한 벡터를 검색할 수 있는 LlamaIndex 통합 기능을 제공합니다.
이번에는 pgvectors.rs를 LlamaIndex에서 RAG 파이프라인을 구축해 보겠습니다.
Install dependencies
LangChain 통합을 사용하려면 일부 종속성이 필요합니다:
%pip install llama-index "pgvecto_rs[sdk]"
도커 컨테이너에서 pgvecto.rs 확장자를 사용하여 포스트그레스 인스턴스를 시작할 수 있습니다: 위에서 실행했다면 다음 단계는 건너 뛰세요.
# set Logging to DEBUG for more detailed outputsquery_engine = index.as_query_engine()response = query_engine.query("What did the author do growing up?")