MariaDB FTS vs. Elasticsearch: Choosing Your Backend Search Solution
In the development journey of a Backend Developer, the "Search" problem is always a significant milestone. Everything usually starts with a simple LIKE %keyword% statement. But when the data grows to millions of records and users start complaining, "Why can't the system find anything when I misspell one letter?", that's when you need a more professional solution.
MariaDB Full-Text Search vs. Elasticsearch: When to go "All-in"?
In the development journey of a Backend Developer, the "Search" problem is always a significant milestone. Everything usually starts with a simple LIKE %keyword% statement. But when the data grows to millions of records and users start complaining, "Why can't the system find anything when I misspell one letter?", that's when you need a more professional solution.
The question is: Should you leverage MariaDB's built-in Full-text Search (FTS) or accept the complexity of implementing Elasticsearch?
This article will delve into the architecture, pros and cons, and real-world scenarios to help you make the most informed choice.
1. MariaDB Full-Text Search: The Power of Simplicity
Many people often underestimate the search capabilities of relational databases (RDBMS). In fact, since version 10.0.5+, MariaDB has supported extremely powerful Full-text Search on the InnoDB engine (which supports ACID).
1.1. How it works: Inverted Index in SQL
Instead of scanning every row of data (Table Scan), MariaDB creates an Inverted Index. It breaks text strings into individual words (tokens), removes meaningless words (stop-words), and stores their positions.
1.2. Search Modes
MariaDB provides three main modes that you need to master:
- Natural Language Mode: The default mode. It calculates relevance based on the frequency of the keyword's appearance.
- Boolean Mode: Allows the use of operators like
+(must be present),-(must not be present),*(wildcard). - Query Expansion: Performs a broader search based on related words (useful when the dataset is small).
1.3. Code Example: Practical Implementation
Suppose you have an articles table with millions of posts:
-- 1. Create a Full-text Index
ALTER TABLE articles ADD FULLTEXT(title, content);
-- 2. Advanced search with Boolean Mode
-- Find articles that must contain 'Backend', may contain 'Nodejs', but MUST NOT contain 'PHP'
SELECT id, title,
MATCH(title, content) AGAINST('+Backend Nodejs -PHP' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, content) AGAINST('+Backend Nodejs -PHP' IN BOOLEAN MODE)
ORDER BY score DESC;
1.4. The "unbeatable" advantages of MariaDB
- Data Consistency (ACID): This is its greatest strength. When you
UPDATEa row, the subsequentSELECTstatement will immediately see the latest result. There is no synchronization delay. - Zero Operational Cost: You don't need to set up an extra server, spend more RAM on a new service, or worry about maintaining the connection between two systems.
- Relational Support: You can easily filter by
user_id,status = 'published', orJOINwith thecategoriestable in a single query.
2. Elasticsearch: When Search becomes the soul of the product
If MariaDB is an included "feature", then Elasticsearch is a specialized "platform". Built on top of Apache Lucene, ES doesn't see data in terms of rows/columns, but as Documents (JSON).
2.1. Distributed Architecture and High Availability
ES is designed to run on a cluster of multiple Nodes. Data is divided into Shards and backed up across Replicas.
2.2. Analysis Capabilities (Analysis Pipeline)
What makes ES "intelligent" lies in its Analysis process:
- Character Filters: Remove HTML characters, convert special characters.
- Tokenizer: Splits a string into words (can use a Vietnamese Plugin to tokenize compound words like "học sinh" instead of "học" and "sinh").
- Token Filters: Convert to lowercase, handle synonyms, or create N-grams to support Autocomplete.
2.3. Advanced Scoring with the BM25 algorithm
Unlike MariaDB which uses a simple TF-IDF, ES uses BM25. It better handles the "term frequency saturation" problem (a term appearing too many times doesn't necessarily mean the document is significantly more relevant).
2.4. Code Example: Extremely flexible Query DSL
Let's see how ES handles a complex search request (Multi-match with weighting and fuzzy search):
GET /blog_index/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "Backnd tối ưu",
"fields": ["title^5", "summary^2", "content"],
"fuzziness": "AUTO", -- Automatically corrects typos (Backnd -> Backend)
"operator": "and"
}
}
],
"filter": [
{ "term": { "status": "active" } }
]
}
},
"highlight": {
"fields": { "content": {} } -- Returns the text snippet containing the keyword for highlighting
}
}
3. Head-to-Head Comparison: Which one is for you?
| Criteria | MariaDB FTS | Elasticsearch |
|---|---|---|
| Data Freshness | Real-time (Immediate) | Near Real-time (~1s delay) |
| Typo Tolerance (Fuzzy) | Very limited (requires manual Trigram/Levenshtein) | Natively supported (Fuzziness, Suggest) |
| Speed with Large Data | Slows down as data exceeds > 10GB | Maintains excellent speed due to distribution |
| Language Processing | Basic | Very powerful (Plugins for Vietnamese, Japanese, Korean...) |
| Analytics | Difficult | Very powerful (Aggregation, Histogram) |
| RAM Usage | Economical | Very high (Requires at least 4-8GB to run stably) |
4. In-depth Analysis: When should you switch systems?
This is the most crucial part for Senior/Lead Developers when designing architecture. Adding Elasticsearch to a system is not just about "installing another app"; it's about changing the entire data flow.
4.1. Scenario 1: Stick with MariaDB (Keep it Simple)
You should continue using MariaDB if:
- Internal Admin Systems (Back-office): Where employees only need to find an exact order code or customer name.
- Highly Structured Data: You need to perform many calculations and filter by complex logical conditions where search results are only a small part.
- Limited Resources: Your team doesn't have a dedicated person for DevOps/Infrastructure. Operating a failing ES Cluster (OOM, Red Cluster) will put immense pressure on the Backend team.
4.2. Scenario 2: When Elasticsearch is a must
You must consider ES when:
- Search is the "face" of your product: For example, an e-commerce site or a food discovery app. If a user types "piza" and doesn't get "pizza", you lose a customer.
- Massive Log Data: You need to search and analyze billions of log lines per day.
- Advanced Autocomplete Needs: As the user types each character, the system must suggest results instantly with a latency of < 50ms.
4.3. The pain called "Data Sync"
When using ES, you must face the problem: How to keep the data in ES consistent with MariaDB?
- Dual Write: The application writes to both places. Disadvantage: Prone to inconsistency if one side fails.
- Delayed Job: Write to the DB, then push a job to a Queue to update ES. Disadvantage: Has a noticeable delay.
- CDC (Change Data Capture): Use a tool like Debezium to listen to MariaDB's Binlog and automatically push changes to ES. This is the most professional solution but also the most difficult to set up.
5. Conclusion: "Boring Architecture" is often the right choice
As Backend Developers, we are often drawn to shiny new technologies. However, a good architecture is not the one with the most technologies, but the one that solves the problem at the lowest cost.
- Advice from experience: Start with MariaDB Full-text Search. Optimize indexes, fine-tune queries, and use a Cache (Redis) to reduce the load on the DB. Only when you feel that MariaDB has truly become a "bottleneck" in terms of both performance and features, should you begin your journey with Elasticsearch.