🎙️🗄️ AI Blog Search
This blog is now home to dozens of posts (and dozens of readers)! I sometimes want to quickly find an old post, or let them find an old post, without having to open the GitHub repository and use the search function there.
So I added a new search function directly into the blog powered by Cloudflare Workers AI and our vector database, Vectorize.
🎯 I have a few goals for this project:
- Implement an essentially zero-cost search function in my blog.
- Make the search fuzzy enough that you do not need exact terms.
- Deliver the search experience and results within my blog homepage.
- Deploy this all inside of my existing Cloudflare Pages project without additional Workers or other auxillary services.
🗺️ This walkthrough covers how to:
- Use Cloudflare Vectorize to store vectors for each blog post.
- Add a front-end flow to my Lumen-themed Gatsby blog.
- Implement the search API as a Cloudflare Pages function.
⏲️Time to complete: ~30 minutes
👔 I work there. I work at Cloudflare. Several of my posts on this blog that discuss Cloudflare focus on building things with Cloudflare Workers. I’m a Workers customer and pay my invoice to use it.
Vectorize the Posts
Cloudflare’s Vectorize product is a globally distributed vector database that enables you to build full-stack, AI-powered applications with Cloudflare Workers. I recently set it up to add vector-embedded search for my personal blog posts. I wound up building a simple UI that allows me to test the search as well as view the associated metadata for the entries.
Vector Embedding
I have written posts on this blog for years, and I continue to add new ones, so I introduced a GitHub action in the blog repository that will take new posts and send them to Vectorize and it also has a flow that I can add old posts manually.
The manual option is tedious but I did not want all posts sent. You can adapt it to batch posts if you’d like.
Local Review
This repository contains a script where you can start reviewing your vector metadata without deploying the application as a whole.
- Copy or download the python script linked above.
- Replace
TOKEN
with your actual Cloudflare bearer token (it needs read/write permissions for Vectorize). - Replace
account-id
andindex-name
in the url with your account ID and the name of your Vectorize index.
The script makes two requests so be sure to replace it in both.
Implement the UI
I will not go into too much detail here mostly because I finished this project during my son’s morning nap and the time constraint meant I leaned heavily on an LLM to write the .scss and .tsx files to bring this to life in the blog. The interesting part about this is the vector DB and the search, anyway.
I’m going to move quickly through this section and just call out the pull requests and files if you want to copy these for your own usage.
Search Toggle Pull Request
- Most of the UI functionality is implemented in this commit to these files.
- I added a search-toggle component for the toggle and input field.
- I modified the sidebar-author component which houses the search-toggle experience.
- Some modifications to the theme-switcher component which this intends to mimic.
Search Results Pull Request
- This implements the results UI.
- I have it only show the highest four matches because I don’t think pagination inside of a page that already has pagination is useful. And if the embedding model is good enough, those first four should get you what you want.
- Some of the site-wide styles need to change to handle the rounded corners and focus indicators.
Deploy the Search Function
I adapted the scripts in my explorer earlier into a Pages function iside of the project.
Unlike the first playground app, I used the bindings available in the Pages UI to set the variables I need (VECTORIZE
and AI
). Implementing it this way, inside of the project, means I do not have to worry about tokens or managing a separate deployment.
What’s next?
You can begin using this in the blog right now!
I am curious about how well the embedding model holds up. Will future models improve enough that it is worth blowing away my current Vector DB and redoing it with new embeddings? Probably.