Example listing. This is a demonstration of how products appear on Synthosy — it is not a real product for sale.

No preview available

UK Legal Corpus Embeddings — 500k Document Chunks

0 sales33 views

Description

Pre-computed vector embeddings for 500,000 chunks from public UK legal sources — ready to drop into any RAG pipeline.


**Coverage:**

- UK case law (Supreme Court, Court of Appeal, High Court — 2000–2024)

- UK statutes and statutory instruments (all currently in force)

- Legal commentary from open-access journals

- GDPR and data protection guidance (ICO)


**Technical details:**

- Embedding model: text-embedding-3-large (OpenAI)

- Dimensions: 3072

- Chunk size: 512 tokens with 50-token overlap

- Format: Parquet files + FAISS index included

- Total size: ~8.2 GB


**What you get:**

- FAISS index for fast similarity search

- Parquet files (chunk text + metadata + embeddings)

- Python retrieval script (works with LangChain and LlamaIndex)

- Example RAG pipeline notebook


**Use cases:** Legal AI assistants, contract analysis, compliance tools, legal research

Customer Reviews

Sign in and purchase this product to leave a review.

No reviews yet.