Semantic Search in OutSystems Developer Cloud
Understanding the trade-offs and how to optimize retrieval quality with query rewriting, reranking, and custom chunking

Throughout my diverse career, I've accumulated a wealth of experience in various capacities, both technically and personally. The constant desire to create innovative software solutions led me to the world of Low-Code and the OutSystems platform. I remain captivated by how closely OutSystems aligns with traditional software development, offering a seamless experience devoid of limitations. While my managerial responsibilities primarily revolve around leading and inspiring my teams, my passion for solution development with OutSystems remains unwavering. My personal focus extends to integrating our solutions with leading technologies such as Amazon Web Services, Microsoft 365, Azure, and more. In 2023, I earned recognition as an OutSystems Most Valuable Professional, one of only 80 worldwide, and concurrently became an AWS Community Builder.
ODC now lets you add semantic search to your entities with a few clicks. But if you stop there, your retrieval quality will likely disappoint you. Here's why — and what to do about it.
OutSystems Developer Cloud (ODC) has a built-in semantic search mechanism that works directly on top of your entities. You select which entities and text attributes should be searchable, and ODC takes care of the rest. No third-party search service is required.
That convenience comes with trade-offs. In this article, I'll explain how semantic search works, walk you through what ODC supports and where it falls short, and then show how you can dramatically improve retrieval quality.
Why Semantic Search Matters
Instead of matching keywords, semantic search captures the meaning behind a query and returns results based on conceptual relevance. This is essential for chatbots, recommendation engines, and any scenario where users express their intent in natural language.
How Semantic Search Works
Traditional keyword search matches exact words. If you search for "restart device", it finds documents containing those words. Semantic search works differently — it converts text into numerical vectors (embeddings) that represent meaning, and then finds other vectors that are close in that meaning space.
This is powerful. A search for "How do I restart the device?" will match a document titled "Reset procedure" even though neither word overlaps. The embedding model understands that "restart" and "reset" express a similar intent.
But this is also where things get tricky. Let me show you some examples.
When Similarity Works Well
These pairs are semantically different in wording but close in meaning — exactly what you want semantic search to find:
| Query | Matches With | Why It Works |
|---|---|---|
| "How do I restart the device?" | "Reset procedure for your equipment" | Different words, same intent |
| "The screen is black" | "Display not showing any output" | Symptom described differently |
| "Cancel my subscription" | "How to end my membership" | Synonyms and paraphrases |
When Similarity Misleads
Here's where it gets dangerous. These pairs have high cosine similarity — they look close in vector space — but their meaning is fundamentally different:
| Text A | Text B | Similarity | The Problem |
|---|---|---|---|
| "How do I enable two-factor authentication?" | "How do I disable two-factor authentication?" | Very high | Opposite intent, almost identical embedding |
| "Error 503 on the Kiox 300" | "Error 404 on the Kiox 300" | Very high | Different error codes, nearly the same vector |
| "The device won't turn on" | "The device will turn on" | Very high | Negation barely moves the embedding |
| "Delivery takes 2 days" | "Delivery takes 14 days" | Very high | Numbers are poorly distinguished |
| "Python 2.7 end of life" | "Python 3.12 release notes" | High | Version numbers are semantically compressed |
These examples illustrate a fundamental limitation of dense vector embeddings. The models are trained on natural language patterns, and they're excellent at capturing general meaning. But they compress fine-grained differences — negations, specific numbers, error codes, version identifiers — into nearly identical regions of the vector space.
Why this matters: This is one of the main reasons why production RAG systems use hybrid search (dense + sparse vectors). Sparse vectors excel at exact term matching and would easily distinguish "Error 503" from "Error 404". ODC doesn't support hybrid search today, which makes the optimization techniques later in this article even more important.
Cosine Similarity in a Nutshell
When the system compares two embeddings, it uses cosine similarity — a score between -1 and 1, where 1 means identical direction in vector space. In practice, similarity scores for related text pairs typically fall between 0.3 and 0.95, depending on the embedding model and content type. A difference of 0.02 in similarity score can be the difference between a relevant and an irrelevant result, yet the misleading pairs above often score within that margin of the correct result.
This is why retrieval alone isn't enough. You need additional mechanisms — query rewriting and reranking — to catch what vector similarity misses. We'll get to those later.
How ODC Implements Semantic Search
ODC's semantic search is tightly integrated with its entity model. Here are the key components:
- Index Ingestion — You configure one or more text attributes of an entity for indexing. ODC chunks the content, generates embeddings via an embedding model, and stores them in a PostgreSQL PgVector extension.
Search Index — The vector database holds the chunked, embedded data.
Retrieve — At query time, the user's input is embedded and compared against the index using dense vector similarity.
- Augment & Generate — Retrieved chunks can be used in RAG pipelines (e.g., via Agent Workbench or custom applications) to ground LLM responses.
A note on embedding models: The quality of your embeddings depends heavily on the underlying model. ODC does not currently disclose which embedding model is used or allow you to bring your own. This limits your ability to evaluate its strengths and weaknesses for your specific domain — particularly for specialized terminology or non-English content.
At a high level, ODC's semantic search covers three dimensions: understanding the intent behind a query, recognizing contextual relationships between words, and grasping meaning through synonyms, paraphrases, and linguistic associations. These are the core strengths of any embedding-based retrieval system — and also where its limitations begin, as we've seen above.
Chunking Strategies in OutSystems
Chunking is the process of splitting large documents into smaller, meaningful sections so they can be efficiently embedded, searched, and retrieved. Without chunking, you'd embed entire records as a single vector — burying the meaning of individual sections in noise.
Good chunking ensures that retrieval is accurate and focused, irrelevant text doesn't overwhelm the model, hallucinations are reduced, and the system can handle large, mixed-topic content.
Instead of sending a 200-page product manual to the LLM, chunking retrieves only the section about "Resetting device settings" — so the model gives a precise answer, not noise from the entire document.
Let's look at the four chunking strategies available in ODC.
Fixed-Size Chunking
This one splits text into equally sized pieces based on a maximum character count, with a configurable overlap between chunks.
It's very simple and extremely fast, and it works even with unstructured or messy documents. The problem is that it splits sentences and concepts mid-word or mid-thought. A sentence like "hold the On/Off button for 10 seconds until the Bosch logo appears" can be cut right in the middle. This produces meaningless or noisy chunks that result in very poor semantic retrieval quality.
Best practice: Fixed-size chunking is the weakest strategy for semantic search. It's acceptable only as a baseline or when content is completely unstructured and no better option is feasible. For anything with natural sentence or paragraph boundaries, avoid it.
Sentence-Based Chunking
You define how many sentences a chunk may contain, along with a maximum character count and overlap. This respects natural sentence boundaries and produces more coherent chunks than fixed-size splitting.
There are some things to be aware of though. Sentence detection depends on language. ODC has dictionaries for 15 languages (English, German, French, etc.) that handle nuances like acronyms and abbreviations. For unsupported languages, only punctuation is used — leading to incorrect sentence splits. If the system doesn't identify the language, it defaults to English.
More importantly, sentence-based chunking does not respect paragraph or section boundaries. A chunk may contain the last sentence of one topic and the first sentence of the next. Headings, lists, and tables are not treated differently from body text.
Best practice: Sentence-based chunking is a step up from fixed-size, but it only works well for homogeneous, well-punctuated prose in supported languages. Structured documents with headings, tables, or mixed content types will suffer.
Recursive Chunking
This approach defines character limits and overlaps while prioritizing a hierarchy of specific characters as delimiters (e.g., headings → paragraphs → sentences). The splitter tries the highest-level separator first and only falls back to smaller ones when chunks exceed the size limit.
It aligns chunks with natural document structure, which makes it great for manuals, structured PDFs, and HTML content. The downside is that it fails on documents with broken or inconsistent structure. If headings are missing or formatting is irregular, the chunker degrades to something close to fixed-size behavior. It also doesn't merge semantically related content across sections. If "Causes" and "Resolution" live in different sections, they end up in different chunks — even though they belong to the same concept.
Best practice: Recursive chunking is the best general-purpose strategy available in ODC. It works well for structured content but cannot handle cross-section reasoning or poorly formatted input.
Smart Chunking (Default)
This is ODC's default method. It combines recursive chunking with default separators, automatically adapting to the content found in searchable fields. No configuration required — it just works.
That said, it is still fundamentally recursive chunking with automated separator selection, so it inherits all the same limitations. It has no semantic understanding and cannot detect that "causes", "symptoms", and "reset procedure" belong to the same conceptual unit. Since the separators are chosen automatically, it can also be difficult to predict or debug how content is being split.
Best practice: Smart chunking is a sensible default, but don't assume it's optimal. For critical RAG applications, always evaluate whether recursive chunking with custom separators gives you better results.
Alternative Chunking Strategies
In addition to the built-in chunking strategies, several other approaches have emerged in the RAG space that are deemed more advanced for production systems. Understanding these methods can highlight the limitations of ODC's built-in options and guide you in implementing your own custom chunking solutions.
Semantic Chunking
Instead of splitting text by characters, sentences, or structural markers, semantic chunking groups text by meaning. It uses an embedding model to measure how similar consecutive sentences or paragraphs are to each other. When the similarity drops significantly, it creates a chunk boundary.
This means a troubleshooting article where "symptoms", "causes", and "resolution" flow naturally into each other would stay in one chunk — because the meaning is connected. Recursive chunking would split them into separate chunks based on their headings, losing that connection.
Semantic chunking is especially valuable for poorly structured documents where headings are missing or inconsistent. It doesn't rely on formatting at all — only on what the text actually means.
Implementation complexity: Moderate — requires an embedding model call for each sentence pair during ingestion. Can be built in ODC by calling an external embedding API in your ingestion pipeline.
Adaptive Chunking
Adaptive chunking dynamically adjusts the chunk size based on content complexity. Dense, technical paragraphs get smaller chunks so that each embedding captures a focused idea. Simple, straightforward passages get larger chunks to avoid fragmenting content unnecessarily.
Think of a product manual where one section is a simple feature overview and the next is a detailed troubleshooting flow with multiple conditions. Fixed or recursive chunking would use the same granularity for both. Adaptive chunking would produce larger chunks for the overview and smaller, more focused chunks for the troubleshooting steps.
This prevents information loss in complex sections and reduces noise in simple ones.
Implementation complexity: Moderate to high — requires heuristics or a model to assess content density and adjust chunk sizes dynamically. Can be implemented with rule-based logic or a lightweight LLM call per section.
Context-Enriched Chunking
With all the strategies above, each chunk stands on its own. It has no awareness of what came before or after it. Context-enriched chunking solves this by adding a brief summary of neighboring chunks to each chunk.
For example, if chunk 3 contains a resolution step, context-enriched chunking would prepend something like: "The previous section described error 503 occurring when the device loses network connectivity during a firmware update." This way, the embedding of chunk 3 captures not just the resolution itself but also the problem it relates to.
This is critical for multi-step reasoning, where the answer to a user's question spans multiple sections. Without context enrichment, the retriever might find the resolution chunk but miss the connection to the specific error that caused it.
Implementation complexity: Moderate — requires a post-processing step after initial chunking that generates summaries of neighboring chunks (typically via an LLM call) and prepends them. Straightforward to implement but adds ingestion latency and cost.
AI-Driven Chunking
AI-driven chunking uses an LLM to read the entire document and decide where the most meaningful breakpoints are. Instead of following rules (split at headings, split every N sentences), the LLM identifies conceptual units the way a human reader would.
This is the most expensive strategy — it requires an LLM call during ingestion for every document — but it produces the most intuitive, human-like chunks. It's particularly useful for mixed-source documents where structure, formatting, and content types vary wildly and no single rule-based strategy fits.
Implementation complexity: High — requires a full LLM processing call for every document during ingestion. Significantly increases ingestion time and cost. Best reserved for high-value content where retrieval quality is critical.
Key takeaway: The absence of these strategies means that out of the box, the quality of your ODC semantic search is limited by how well the four built-in chunking methods fit your content. For heterogeneous data — a mix of FAQs, troubleshooting guides, product specs, and legal text — no single built-in method will perform well across all content types.
Custom Chunks in OutSystems
ODC does allow you to disable the built-in chunking on semantic search attributes and implement your own chunking logic.
This means you can:
Build a custom chunking pipeline in your application that preprocesses text before it is written to the entity.
Apply different chunking strategies to different content types — e.g., recursive chunking for structured manuals, sentence-based chunking for FAQs, and a custom semantic grouping for troubleshooting guides.
Implement any of the advanced strategies listed above by calling an LLM during ingestion to determine optimal chunk boundaries.
Store the pre-chunked text in your entity attributes so that ODC only handles the embedding and indexing — not the splitting.
This shifts the chunking responsibility from ODC's built-in mechanism to your application, giving you full control over chunk quality at the cost of additional development effort and ingestion complexity.
Best practice: For production RAG applications with diverse or complex content, seriously consider disabling the default chunkers and implementing a custom ingestion pipeline. The built-in methods are convenient for prototyping and simple use cases, but a tailored chunking strategy will consistently deliver better retrieval quality.
In advanced RAG systems, the highest quality comes from a hybrid ingestion pipeline that applies different chunking methods to different content types. With custom chunking in ODC, this is achievable — it just requires you to build and maintain that pipeline yourself.
Dense-Only Vectors: A Significant Limitation
Chunking determines what goes into your vectors. But the type of vector itself also has a major impact on retrieval quality.
ODC semantic search uses only dense vector embeddings — compact numerical vectors that capture semantic meaning.
Dense vectors are great at understanding paraphrases and synonyms. "How do I restart the device?" matches "Reset procedure" — that kind of thing. They work well for natural language queries.
Where they struggle is exact term matching. Searching for error code "503" or product name "Kiox 300" may return semantically similar but factually wrong results. Domain-specific terms like "HANA", "SAP", or "OML" may not be well represented in the embedding model's training data. Dense embeddings compress numbers and codes into a meaning space that doesn't distinguish "Error 503" from "Error 510".
In production RAG systems, the best practice is Hybrid Search — combining dense vector search (semantic similarity) with sparse vector search (keyword and exact-match relevance). A typical starting point uses a weighted formula:
Final Score = 0.7 × Dense Score + 0.3 × Sparse Score
Note that this weighting is use-case-dependent and should be tuned for your specific data and query patterns. The 0.7/0.3 split is a commonly cited baseline, not a universal rule.
This ensures that both semantic relevance and exact matching influence the ranking. A query like "How do I fix error 503 on the Kiox 300?" benefits from dense search understanding the intent ("fixing an issue", "troubleshooting") and sparse search matching the exact terms "503" and "Kiox 300".
ODC does not support sparse vectors or hybrid search. This means that queries containing specific identifiers, codes, or product names may return less precise results than a hybrid system would.
What's Available and What's Not
Now that you've seen how chunking works and where dense-only vectors fall short, here's a summary of what ODC semantic search supports today:
| Capability | Status |
|---|---|
| Dense vector embeddings | ✅ Supported |
| Sparse vector embeddings | ❌ Not supported |
| Hybrid search (dense + sparse) | ❌ Not supported |
| Custom embedding models | ❌ Not supported |
| Text attributes only | ✅ (no binary, numeric, or image data) |
| Fixed-size chunking | ✅ Supported |
| Sentence-based chunking | ✅ Supported |
| Recursive chunking | ✅ Supported |
| Smart chunking (default) | ✅ Supported |
| Semantic chunking | ❌ Not supported (custom build) |
| Adaptive chunking | ❌ Not supported (custom build) |
| Context-enriched chunking | ❌ Not supported (custom build) |
| AI-driven chunking | ❌ Not supported (custom build) |
Optimizing Retrieval: Query Rewriting and Reranking
Given the information above, the question becomes: how can you improve retrieval quality within ODC? The answer lies in two techniques that you can implement in your application logic.
Query Rewriting
Query rewriting is the process of transforming a user's original query into a better, clearer, or more complete version before sending it to the retrieval system.
In RAG, the answer quality heavily depends on what you retrieve. If the query is unclear or incomplete, the retriever may miss relevant documents, retrieve irrelevant content, or show overly broad results. This is especially critical with dense-only search, where the embedding of a vague query lands in a broad, non-specific region of the vector space.
Consider a chatbot conversation:
User: "How do I reset it?"
The system cannot know what "it" refers to. Sending this raw query to semantic search will produce poor results because the embedding of "How do I reset it?" is far too generic.
An LLM rewrites the query using the conversation history to produce a self-contained, specific query:
Rewritten Query: "How do I reset the OutSystems Developer Cloud redeployment pipeline when it is stuck in pending state?"
This rewritten query produces a far more precise embedding that lands much closer to the relevant chunks in the vector space.
Query rewriting can fix ambiguity, expand missing context, add synonyms, convert conversational questions into standalone ones, and turn fragments into full queries.
How to Implement in ODC
Since ODC doesn't provide built-in query rewriting, you implement it as a pre-processing step in your application:
Capture the conversation history.
Before calling the semantic search action, send the user's latest message along with the conversation history to an LLM with a prompt like: "Rewrite the following user question as a standalone, self-contained search query. Use the conversation history to resolve any ambiguous references. Return only the rewritten query."
Use the rewritten query as the input to ODC's semantic search.
This is a lightweight LLM call (a few tokens in, a few tokens out) that can dramatically improve retrieval relevance at minimal cost.
Best practice: Query rewriting is the single highest-impact, lowest-cost optimization you can make for your RAG pipeline. Implement it always, even for simple use cases.
Reranking
Reranking is a post-retrieval optimization step where an additional model — often a cross-encoder or LLM-powered relevance scorer — evaluates and reorders the initially retrieved documents to ensure the most relevant items appear at the top.
Semantic search retrieves candidates based on embedding similarity, which is a fast but rough approximation. The initial ranking often contains results that are topically related but don't answer the question, results that are semantically similar but factually irrelevant, and truly relevant results buried below mediocre ones.
This problem is amplified in ODC because there's no hybrid search, the chunking strategies are limited, and dense-only retrieval can surface conceptually similar but wrong content.
How It Works (Two-Stage Retrieval)
Stage 1 — Retrieval (fast, broad): ODC semantic search retrieves a broad set of candidate chunks (e.g., top 10–20 results).
Stage 2 — Rerank (precise, slower): A more powerful model (cross-encoder or LLM) evaluates each candidate together with the user query and assigns a refined relevance score.
Here's an example. User query: "How do I fix error 503 on the Kiox 300?"
| Stage 1 Retrieval (unranked) | Stage 2 Reranked |
|---|---|
| Reset procedure for the Kiox 300 | Error 503 explanation — directly answers the question |
| Error 503 — system process blocked | Reset procedure for Kiox 300 — part of the solution |
| Firmware update instructions | Firmware update instructions — somewhat related |
| Battery charging calibration steps | |
| Warranty disclaimer |
The reranker pushes the most relevant result to the top and filters out noise — something the initial dense vector similarity alone could not accomplish.
Reranking greatly improves precision, lowers token costs (you pass fewer, better chunks to the LLM for generation), reduces hallucination, and is especially valuable for technical and repetitive domains where many chunks are topically similar but only a few are actually useful.
How to Implement in ODC
Over-retrieve: Configure your semantic search to return more results than you ultimately need (e.g., retrieve 15, use 3–5).
Call a reranking model: After retrieval, send the query and the retrieved chunks to a reranking API. Options include:
Cohere Rerank API — Purpose-built reranking model. Fast, cost-effective, and consistent. Recommended as a first choice.
Cross-encoder models (e.g., via Azure AI or a custom endpoint) — High accuracy, good for domain-specific tuning.
LLM-as-reranker — Use a prompt that asks the LLM to score each chunk's relevance to the query on a scale of 1–10. This works but is slower, more expensive per call, and less deterministic than dedicated reranking models. Use it only when a purpose-built reranker isn't available.
Sort and filter: Reorder the results by the new relevance score and take only the top N.
Pass to generation: Use the reranked chunks as context for the LLM response.
A simple LLM-based reranking prompt:
"Given the following user query and a list of text passages, rate each passage's relevance to the query on a scale of 0 to 10. Return only the passage IDs and their scores."
Putting It All Together
Here's the recommended architecture for high-quality semantic search in ODC:
This pipeline compensates for most of the limitations:
| Limitation | Mitigation |
|---|---|
| No hybrid search | Query rewriting adds explicit terms; reranking catches exact-match relevance |
| Dense-only vectors | Reranking with a cross-encoder evaluates query-document pairs more precisely |
| Limited chunking strategies | Over-retrieval + reranking filters out noisy chunks |
| No built-in query expansion | Query rewriting expands and clarifies intent |
Important caveat: Query rewriting and reranking serve as mitigations, not as comprehensive solutions to these limitations. For highly accurate retrieval, a more advanced approach incorporating both sparse and dense vectors, along with hybrid search, is necessary.
Summary
ODC's built-in semantic search is a significant step forward — it can remove the need for third-party vector databases and simplifies the developer experience. For high-accuracy RAG applications though, be aware of its constraints.
Know your content. Understand the structure and diversity of the data you're indexing. Choose the chunking strategy that fits — don't blindly accept the default.
Use recursive or smart chunking for structured content. If your entity data contains well-structured text with natural section boundaries, recursive chunking will outperform fixed-size and sentence-based approaches.
Avoid fixed-size chunking unless you're dealing with completely unstructured, messy text and have no better option.
Consider custom chunking. While built-in chunking strategies are convenient, custom chunking can significantly enhance accuracy and quality — especially for heterogeneous content.
Implement query rewriting. A simple LLM call before retrieval can transform a vague conversational query into a precise search input. This is your highest-impact optimization.
Implement reranking. Over-retrieve and then rerank. This compensates for the lack of hybrid search and the limitations of dense-only retrieval. It's especially critical for technical domains with specific terminology. Prefer dedicated reranking models over LLM-as-reranker for cost, speed, and consistency.
Acknowledge the limitations you can't change. Currently, ODC lacks support for sparse vectors, hybrid search, and custom embedding models. Navigate these constraints by employing the pre- and post-retrieval optimizations mentioned earlier. For high-accuracy requirements, an external tech stack remains necessary.
Monitor and iterate. Collect user feedback on search quality. The gap between "good enough" and "production-grade" is almost always closed through iterative refinement of chunking parameters, query rewriting prompts, and reranking thresholds.
If you've implemented any of these optimizations in your ODC projects, I'd love to hear about your results and experiences. Let's connect on LinkedIn.





