Security Now Transcript Query Engine

This Python program allows you to build a retrieval-augmented generation (RAG) pipeline over Steve Gibson's Security Now podcast transcripts. You can query a range of years and get a coherent, LLM-generated answer based on indexed podcast content.

How it works

Loads and parses text transcripts with chunked context windows
Creates a per-year vector index
Runs the query on each year
Optionally prints intermediate results (per year result)
Final summary synthesized by another call to the model

How to Use

1. Install Dependencies

pip install llama-index openai python-dotenv

2. Download Transcripts

You can get Security Now transcripts from:

https://www.grc.com/securitynow.htm

To download a range of .txt transcripts (e.g., episodes 480 to 500):

mkdir -p transcripts
cd transcripts
for i in {480..500}; do wget "https://www.grc.com/sn/sn-${i}.txt"; done
cd ..

Each .txt file should include a line starting with DATE: in this format:

DATE:		May 26, 2015

This will be used to auto-sort transcripts into per-year directories.

3. Sort Transcripts by Year

Use the included sortfiles.py script to organize .txt transcript files into year folders based on the date contained in each file. The script will only process files with a .txt extension and organize them into folders for each year between 2015 and 2025.

By default, the script looks for transcripts in the ./transcripts directory. If your transcripts are located elsewhere, you can specify the target directory using the --transcripts-dir parameter. For example:

python sortfiles.py --transcripts-dir /path/to/your/transcripts

If no directory is specified, running:

python sortfiles.py

will default to organizing the files in ./transcripts. After running the command, transcript files will be moved into subdirectories such as ./transcripts/2015, ./transcripts/2016, etc.

4. Set Up API Key

Create a .env file:

OPENAI_API_KEY=your_openai_api_key_here

5. Run the Program

python sn_cli.py \
  -sy 2016 \
  -ey 2018 \
  -q "What did Steve say about VPNs?" \
  --hide-intermediate  # Optional: Hide intermediate year responses

Flags

Flag	Long Form	Description
`-sy`	`--start-year`	Starting year for querying transcripts (>= 2015)
`-ey`	`--end-year`	Ending year for querying transcripts (<= 2025)
`-q`	`--query`	Your natural language query
	`--hide-intermediate`	Hide intermediate year responses (default behavior shows intermediate)
	`--transcripts-dir`	Directory containing transcript files (default: ./transcripts)
	`--index-dir`	Directory containing index files (default: ./index)
`-d`	`--debug`	Print LLM internal debug prompts

Notes

Make sure transcripts are organized into folders like ./transcripts/2015, ./transcripts/2016, etc.
The indexing happens on first use per year and is cached.
You can delete ./index/* folders to rebuild indexes.

License

GNU Affero General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
index		index
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
app.py		app.py
requirements.txt		requirements.txt
sn_cli.py		sn_cli.py
sn_rag_engine.py		sn_rag_engine.py
sortfiles.py		sortfiles.py
test_sortfiles.py		test_sortfiles.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Security Now Transcript Query Engine

How it works

How to Use

1. Install Dependencies

2. Download Transcripts

3. Sort Transcripts by Year

4. Set Up API Key

5. Run the Program

Flags

Notes

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

gregoreyo/sn

Folders and files

Latest commit

History

Repository files navigation

Security Now Transcript Query Engine

How it works

How to Use

1. Install Dependencies

2. Download Transcripts

3. Sort Transcripts by Year

4. Set Up API Key

5. Run the Program

Flags

Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages