Easily Build Customized LLMs

Python
OPENAI
LLMs
Jupyter
OBBBA
Author

Eileen Murphy

Published

July 15, 2025

Introduction

One of the use cases for localized LLMs is to help digest new bills, policies, and/or court decisions in an efficient and expedient manner. Journalists frequently encounter brand new material before it becomes public and cannot store the document on any server.

In this model, you need to add a complex, detailed, and involved document, in this case it’s the summary of the “One Big Beautiful Bill Act” or some might call the “One Big Bad Bill Act.” Whatever you call it, we’ll refer to it as the OBBBA. It does a pretty good job and you can see how you structure your query makes a difference on the response. Some prompts will be better than others.

For best results, I have found that adding other analysis improves the responses and contexts, but here we are just going to have the OBBA summary and text from the government website as a demonstration on how this can be used and test it to see if it would be comprehensive enough for journalists to use.

This model uses minimally trained model from OPENAI, and fairly inexpensive of all the models to do the query. Since the subject matter is so narrow - we do not need a big model - just big enough to be fairly responsive to our queries.

Adding more PDFs will get a richer and more comprehensive output. Working on your queries or prompts will also improve the results.

Python Packages Required

Show the code
#Install in packages (pip) in terminal - if missing
#!pip install python-dotenv
from dotenv import load_dotenv
#pip install duckdb
import duckdb
#pip install llama_index_core
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
import os #built in package
#pip install openai
import openai
#pip install textwrap
import textwrap 
#pip install llama_index.vector_stores.duckdb
import llama_index.vector_stores.duckdb
#pip install llama-index-embeddings-openai
from llama_index.embeddings.openai import OpenAIEmbedding
#pip install llama-index-llms-openai
from llama_index.llms.openai import OpenAI
#pip install gradio
import gradio as gr

The LLM creates a vector store everytime it runs. So we delete it before storing the new vector

Show the code
file_path = 'persist/my_vector_store.duckdb'

# Check if file exists
if os.path.exists(file_path):
  #Delete the file
  os.remove(file_path)
  print("File deleted successfully")
else:
  print("File doesn't exist - first run - it's all good")
File deleted successfully

Load OpenAI Key

Show the code
from dotenv import load_dotenv
#load_dotenv()

load_dotenv(dotenv_path="secrets/.env")

api_key = os.getenv('OPENAI_API_KEY')

from openai import OpenAI
client = OpenAI(api_key=api_key)

Import the indexing packages to store the indexing in DuckDB

Show the code
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.duckdb import DuckDBVectorStore
from llama_index.core import StorageContext

vector_store = DuckDBVectorStore("my_vector_store.duckdb", persist_dir="persist/")
documents = SimpleDirectoryReader("/Users/Eileen/Desktop/GoData/Blog/posts/LLM_Demo/OBBBA/").load_data()

Index being created on PDFs

Show the code
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Deployment

LLM_Demo is deployed here:

Or scroll below and try it out!

Show the code
from IPython.display import IFrame

# Create an IFrame object
# Parameters: url, width, height
iframe = IFrame(src="https://godata-llm-demo.hf.space", width=1000, height=5000)
# Display the iframe
iframe