You’ll receive an email confirming your submission.
Our team will contact you within 24–72 hours, depending on the complexity of your request.
By submitting, you agree to our [Privacy Policy] and consent to receive updates or consultation support from Open Reach Tech.
Please select the privacy consent checkbox.

components..title

components..description

components..title

components..description

You’ll receive an email confirming your submission.
Our team will contact you within 24–72 hours, depending on the complexity of your request.
By submitting, you agree to our [Privacy Policy] and consent to receive updates or consultation support from Open Reach Tech.
Please select the privacy consent checkbox.

MariaDB FTS vs. Elasticsearch: Choosing Your Backend Search Solution

In the development journey of a Backend Developer, the "Search" problem is always a significant milestone. Everything usually starts with a simple LIKE %keyword% statement. But when the data grows to millions of records and users start complaining, "Why can't the system find anything when I misspell one letter?", that's when you need a more professional solution.

Banner of MariaDB FTS vs. Elasticsearch: Choosing Your Backend Search Solution

MariaDB Full-Text Search vs. Elasticsearch: When to go "All-in"?

In the development journey of a Backend Developer, the "Search" problem is always a significant milestone. Everything usually starts with a simple LIKE %keyword% statement. But when the data grows to millions of records and users start complaining, "Why can't the system find anything when I misspell one letter?", that's when you need a more professional solution.

The question is: Should you leverage MariaDB's built-in Full-text Search (FTS) or accept the complexity of implementing Elasticsearch?

This article will delve into the architecture, pros and cons, and real-world scenarios to help you make the most informed choice.


1. MariaDB Full-Text Search: The Power of Simplicity

Many people often underestimate the search capabilities of relational databases (RDBMS). In fact, since version 10.0.5+, MariaDB has supported extremely powerful Full-text Search on the InnoDB engine (which supports ACID).

1.1. How it works: Inverted Index in SQL

Instead of scanning every row of data (Table Scan), MariaDB creates an Inverted Index. It breaks text strings into individual words (tokens), removes meaningless words (stop-words), and stores their positions.

1.2. Search Modes

MariaDB provides three main modes that you need to master:

  1. Natural Language Mode: The default mode. It calculates relevance based on the frequency of the keyword's appearance.
  2. Boolean Mode: Allows the use of operators like + (must be present), - (must not be present), * (wildcard).
  3. Query Expansion: Performs a broader search based on related words (useful when the dataset is small).

1.3. Code Example: Practical Implementation

Suppose you have an articles table with millions of posts:

-- 1. Create a Full-text Index
ALTER TABLE articles ADD FULLTEXT(title, content);

-- 2. Advanced search with Boolean Mode
-- Find articles that must contain 'Backend', may contain 'Nodejs', but MUST NOT contain 'PHP'
SELECT id, title, 
       MATCH(title, content) AGAINST('+Backend Nodejs -PHP' IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, content) AGAINST('+Backend Nodejs -PHP' IN BOOLEAN MODE)
ORDER BY score DESC;

1.4. The "unbeatable" advantages of MariaDB

  • Data Consistency (ACID): This is its greatest strength. When you UPDATE a row, the subsequent SELECT statement will immediately see the latest result. There is no synchronization delay.
  • Zero Operational Cost: You don't need to set up an extra server, spend more RAM on a new service, or worry about maintaining the connection between two systems.
  • Relational Support: You can easily filter by user_id, status = 'published', or JOIN with the categories table in a single query.

2. Elasticsearch: When Search becomes the soul of the product

If MariaDB is an included "feature", then Elasticsearch is a specialized "platform". Built on top of Apache Lucene, ES doesn't see data in terms of rows/columns, but as Documents (JSON).

2.1. Distributed Architecture and High Availability

ES is designed to run on a cluster of multiple Nodes. Data is divided into Shards and backed up across Replicas.

2.2. Analysis Capabilities (Analysis Pipeline)

What makes ES "intelligent" lies in its Analysis process:

  1. Character Filters: Remove HTML characters, convert special characters.
  2. Tokenizer: Splits a string into words (can use a Vietnamese Plugin to tokenize compound words like "học sinh" instead of "học" and "sinh").
  3. Token Filters: Convert to lowercase, handle synonyms, or create N-grams to support Autocomplete.

2.3. Advanced Scoring with the BM25 algorithm

Unlike MariaDB which uses a simple TF-IDF, ES uses BM25. It better handles the "term frequency saturation" problem (a term appearing too many times doesn't necessarily mean the document is significantly more relevant).

2.4. Code Example: Extremely flexible Query DSL

Let's see how ES handles a complex search request (Multi-match with weighting and fuzzy search):

GET /blog_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "Backnd tối ưu", 
            "fields": ["title^5", "summary^2", "content"],
            "fuzziness": "AUTO", -- Automatically corrects typos (Backnd -> Backend)
            "operator": "and"
          }
        }
      ],
      "filter": [
        { "term": { "status": "active" } }
      ]
    }
  },
  "highlight": {
    "fields": { "content": {} } -- Returns the text snippet containing the keyword for highlighting
  }
}

3. Head-to-Head Comparison: Which one is for you?

CriteriaMariaDB FTSElasticsearch
Data FreshnessReal-time (Immediate)Near Real-time (~1s delay)
Typo Tolerance (Fuzzy)Very limited (requires manual Trigram/Levenshtein)Natively supported (Fuzziness, Suggest)
Speed with Large DataSlows down as data exceeds > 10GBMaintains excellent speed due to distribution
Language ProcessingBasicVery powerful (Plugins for Vietnamese, Japanese, Korean...)
AnalyticsDifficultVery powerful (Aggregation, Histogram)
RAM UsageEconomicalVery high (Requires at least 4-8GB to run stably)

4. In-depth Analysis: When should you switch systems?

This is the most crucial part for Senior/Lead Developers when designing architecture. Adding Elasticsearch to a system is not just about "installing another app"; it's about changing the entire data flow.

4.1. Scenario 1: Stick with MariaDB (Keep it Simple)

You should continue using MariaDB if:

  • Internal Admin Systems (Back-office): Where employees only need to find an exact order code or customer name.
  • Highly Structured Data: You need to perform many calculations and filter by complex logical conditions where search results are only a small part.
  • Limited Resources: Your team doesn't have a dedicated person for DevOps/Infrastructure. Operating a failing ES Cluster (OOM, Red Cluster) will put immense pressure on the Backend team.

4.2. Scenario 2: When Elasticsearch is a must

You must consider ES when:

  • Search is the "face" of your product: For example, an e-commerce site or a food discovery app. If a user types "piza" and doesn't get "pizza", you lose a customer.
  • Massive Log Data: You need to search and analyze billions of log lines per day.
  • Advanced Autocomplete Needs: As the user types each character, the system must suggest results instantly with a latency of < 50ms.

4.3. The pain called "Data Sync"

When using ES, you must face the problem: How to keep the data in ES consistent with MariaDB?

  1. Dual Write: The application writes to both places. Disadvantage: Prone to inconsistency if one side fails.
  2. Delayed Job: Write to the DB, then push a job to a Queue to update ES. Disadvantage: Has a noticeable delay.
  3. CDC (Change Data Capture): Use a tool like Debezium to listen to MariaDB's Binlog and automatically push changes to ES. This is the most professional solution but also the most difficult to set up.

5. Conclusion: "Boring Architecture" is often the right choice

As Backend Developers, we are often drawn to shiny new technologies. However, a good architecture is not the one with the most technologies, but the one that solves the problem at the lowest cost.

  • Advice from experience: Start with MariaDB Full-text Search. Optimize indexes, fine-tune queries, and use a Cache (Redis) to reduce the load on the DB. Only when you feel that MariaDB has truly become a "bottleneck" in terms of both performance and features, should you begin your journey with Elasticsearch.