vectors
vectors
Vector search capability on a store resource. Adds semantic search to a resource by configuring an embedding model, chunking strategy, and vector backend. The resource’s fields automatically become payload indexes in the vector database, so you can combine semantic similarity search with structured metadata filtering.
When to use
Use vectors when you need to:
- Search over unstructured text using semantic similarity (e.g., “find documents about contract termination clauses”)
- Build retrieval-augmented generation (RAG) pipelines
- Combine structured queries (filter by category, date) with semantic search (find similar content)
- Enable
recallsteps to retrieve relevant context from a knowledge base
If you only need structured data queries (SQL-style), use read actions without vectors. If you only need vector search without any structured fields, consider a store with source: qdrant.
Syntax
resource <resource_name> vectors source: qdrant embedding_model: "<model_name>" chunking: <strategy> chunk_size: <number> chunk_overlap: <number>Parameters
| Parameter | Required | Description |
|---|---|---|
source | Yes | Vector backend. Currently qdrant is the supported option. |
embedding_model | Yes | The model used to generate embeddings. Examples: "nomic-embed-text" (local via Ollama), "text-embedding-3-small" (OpenAI), "voyage-3" (Voyage AI). |
chunking | No | How text is split into chunks before embedding. One of: semantic, paragraph, fixed. Default: semantic. |
chunk_size | No | Target size of each chunk in tokens. Default varies by strategy. |
chunk_overlap | No | Number of overlapping tokens between adjacent chunks. Provides context continuity. Default: 200. |
Chunking strategies
| Strategy | Description | Best for |
|---|---|---|
semantic | Splits on semantic boundaries (topic shifts, section breaks). Uses an LLM or heuristic to find natural break points. | Long-form documents, articles, reports |
paragraph | Splits on paragraph boundaries (double newlines). | Well-structured text with clear paragraphs |
fixed | Splits at a fixed token count. | Uniform chunk sizes, code, logs |
Examples
Resource with vector search
machine knowledge_base
stores store docs source: managed
resource document id as uuid, is primary_key title as text, is required content as text, is required category as text timestamps
create add_document accept: [title, content, category]
read by_category argument category as text, is required filter: category == arg(category)
vectors source: qdrant embedding_model: "nomic-embed-text" chunking: semantic chunk_size: 1000 chunk_overlap: 200When a document is created, its content field is automatically chunked, embedded, and stored in Qdrant. The title, category, and other fields become payload attributes in the vector index, enabling filtered similarity search.
RAG pipeline using vectors
machine contract_search
stores store legal source: managed
resource contract id as uuid, is primary_key department as text, is required classification as text, is required body as text, is required expiry_date as date timestamps
create ingest accept: [department, classification, body, expiry_date]
vectors source: qdrant embedding_model: "text-embedding-3-small" chunking: paragraph chunk_size: 500 chunk_overlap: 100
accepts query as text, is required department as text
responds with answer as text sources as list
implements recall find_relevant collection: "legal-contracts" query: input.query filter: {department: input.department} limit: 5
ask answer, using: "anthropic:claude-sonnet-4-6" with task "Answer using ONLY these sources. Cite each claim. Sources: ${steps.find_relevant.results} Question: ${input.query}" returns answer as text sources as listLocal embeddings for offline use
resource memo id as uuid, is primary_key subject as text, is required body as text, is required timestamps
vectors source: qdrant embedding_model: "nomic-embed-text" chunking: fixed chunk_size: 512Using nomic-embed-text with a local Ollama instance means embeddings are generated on-device. No API keys, no network calls, works offline. On desktop, the Mashin app bundles Ollama and Qdrant as sidecars.
Governance
Vector operations are governed effects:
- Embedding generation is a
memorycapability. If the embedding model is cloud-hosted (OpenAI, Voyage), it involves a network call governed by the interpreter. - Vector storage (inserting embeddings) is a
memorycapability recorded in the behavioral ledger. - Similarity search via
recallsteps is a governedmemorycapability with full provenance: every retrieval records which chunks were returned, their similarity scores, and which document they came from. - Provenance chain: document hash, chunk hash, embedding model, retrieval query, and answer are all linked in the behavioral ledger. This is the foundation for auditable RAG.
Local embedding models (via Ollama) do not require network governance but are still recorded in the ledger.
Translations
| Language | Keyword |
|---|---|
| English | vectors |
| Spanish | vectores |
| French | vecteurs |
| German | Vektoren |
| Japanese | ベクトル |
| Chinese | 向量 |
| Korean | 벡터 |