knowledge-base
Othertogo knowledge base — crawl, chunk, embed & hybrid-search content for RAG
togo install togo-framework/knowledge-baseInstall
togo install togo-framework/knowledge-baseTurns the thin ai-firecrawl/ai-crawlee/ai-searxng data-source drivers into a real knowledge base: ingest documents, chunk + embed them, run hybrid search (keyword + vector, fused by reciprocal-rank-fusion), and crawl sources on a schedule with content-hash change detection. Collections keep tenants/topics isolated.
Usage
kb, _ := knowledgebase.FromKernel(k)
// Ingest — chunks + embeds; returns (doc, changed). Re-ingesting identical
// content is a no-op (dedupe / change detection).
kb.Ingest(ctx, knowledgebase.Document{
URL: "/docs/intro", Title: "Intro", Text: "...", Collection: "docs",
})
// Hybrid search (keyword + vector, RRF-merged).
hits := kb.Search(knowledgebase.Query{Text: "how do I install the cli", TopK: 5, Collection: "docs"})
for _, h := range hits {
fmt.Println(h.Score, h.Title, h.Snippet)
}
Scheduled crawl + change detection
kb.AddSource(
knowledgebase.Source{Name: "blog", URL: "https://site.com/blog", Collection: "blog", Cron: "@daily"},
func(ctx context.Context, url string) (knowledgebase.Document, error) {
// wire ai-firecrawl / ai-crawlee here
return fetchMarkdown(ctx, url)
},
)
changed, _ := kb.Crawl(ctx, "blog") // ingests only if content changed
Pair with the scheduler plugin to run kb.Crawl on each source's cron.
Embeddings
Ships a deterministic local embedder (hashing/bag-of-words) so search works offline and tests are reproducible. Swap a real one for semantic quality:
kb.WithEmbedder(myAIEmbedder) // e.g. backed by the ai plugin's Embed
REST API
Method | Path | Description |
|---|---|---|
| POST | /api/kb/ingest | ingest a {url,title,text,collection} document |
| GET | /api/kb/search?q=&collection= | hybrid search |
| GET | /api/kb/documents?collection= | list documents |
| GET | /api/kb/sources | list crawl sources |
Configuration
No required env. The store is a bounded in-memory index (swap a DB/vector store via the seam). For a persistent pgvector + BM25 backend, see rag-postgres.
<div align="center"> <h3>Premium sponsors</h3> <p> <a href="https://id8media.com"><strong>ID8 Media</strong></a> · <a href="https://one-studio.co"><strong>One Studio</strong></a> </p> <p><sub>Support togo — <a href="https://github.com/sponsors/fadymondy">become a sponsor</a>.</sub></p> </div>