MySQL and the use of Cache (NoSQL)

I recently did a survey to understand how people are using cache with databases—specifically with MySQL. The result, I confess, was very different from what I expected.

31% of MySQL users use MemCache. 19% use Query Cache, MySQL’s native cache. This means that 50% use some kind of cache, which makes a lot of sense to me.

A surprising shock was those who said they “don’t know what cache is for” and those who “can’t say if they use it or not.” We’re talking about 19% of the sample! Well, no one is born knowing this. I came to the conclusion that it would be really cool to ramp up the creation of articles on the subject.

Is cache for everyone? No, it’s not. Obviously, every type of application can benefit from a cache. However, sometimes the benefit brought by caching is so small that it’s not worth using, since some new lines of code would be needed in the application. Certainly, some will disagree with this paragraph. But after so many years of consulting, I’ve concluded that this is an inexorable truth.

For those 19% who don’t know what cache is, we need to avoid the (very common) confusion with buffer. A buffer is a volatile memory area intended to reduce latency problems between two resources. Getting boring? Don’t understand? I know fancy words only get in the way. So, let’s go. Which is faster: processor or disk? Processor, right? Good! Imagine the processor needs to process a large amount of data and write the result to disk. The processor can process data blocks much faster than the disk can write. Here come the buffers! Since the disk can’t keep up, the processor processes the data and wants to get rid of it to keep processing. So it sends it to the disk… actually, these data go to the disk controller buffers, and then to the disk’s own buffer. This exists to prevent the processor from stopping to wait for the slow disk to write and say, “OK, send me more data.” So the data sits in buffers waiting for the disk to ask for it. This works for reading and writing. I used the disk example, but buffers have many applications: write buffers, read buffers, disk buffers, network buffers, print buffers, etc. A buffer is always used with fast devices (RAM, flash, etc.) and can be implemented in hardware or software.

Cache and buffer have such subtle peculiarities and differences that they are often confused. Cache has exactly the same characteristics as a buffer. Both are temporary, volatile memory. The difference lies in their use.

While a buffer is mostly used during processing of information, cache is used to store the result of processing for a certain amount of time.

Examples of Buffers inside MySQL

In MySQL, there are global and session buffers. Typical examples of buffers implemented in software are these MySQL session buffers:

a) Read Buffer (variable read_buffer_size): A buffer used when a SELECT (read) is done sequentially, from the first record to the last record of a table;

b) Read Random Buffer (variable read_rnd_buffer_size): Another read buffer, but used when an index is used and the table rows will be read randomly;

c) Bulk Insert Buffer (variable bulk_insert_buffer_size): This one is a write buffer, unlike the previous two which are read buffers. This buffer is used by MySQL when a large number of INSERTs are demanded. Note: used by LOAD DATA command, mysqlimport (command line tool), and INSERT commands like INSERT…VALUES… (row 1), (row 2), (row n).

All these buffers exist because it takes much longer for MySQL to interact with storage devices than with RAM and the processor (latency).

Example of Cache inside MySQL:

Are we following so far? I hope so! In class, I like to thoroughly cover this topic because once understood, you never forget.

Cache (which is a first cousin to buffer) is used to store processing results so that these results can be made instantly available, very quickly, and with minimal overhead (processing cost).

Caches are usually NoSQL structures. Very simple. They store results as (key, result) pairs. Where:

– Key: Usually a hash (like MD5, for example), which uniquely identifies the stored result;

– Result: The data stored in the cache.

MySQL has a NoSQL Cache called Query Cache. It works as follows: every SELECT command that meets the Query Cache configuration criteria is stored inside the Query Cache. The configurations and possibilities regarding what is stored in the Query Cache or not, how to enable or disable Query Cache, how to purge expired data (invalidate data), are many and beyond the scope of this article.

What do we need to know about Query Cache? First, how it stores data. The SELECT to be stored is “hashed,” that is, the entire SELECT command, from FROM, JOIN, WHERE to LIMIT, is processed by an algorithm that converts the SQL line into a unique key. Then, the hash of the SELECT and its result are stored.

Imagine the SELECT * FROM countries_table. This SELECT, processed by the hash algorithm, will be transformed into 310905b644f0844c78037c104716e6bd, which is the key. This key is stored along with the SELECT result, which, consequently, is all data in the countries table.

Every identical SELECT sent to the server will not cause processing or disk access to retrieve data. The data stored in Query Cache will be returned. Since there is no disk, no processing, just a NoSQL search inside a CACHE, the result delivery is very… very fast and with a MySQL cost close to zero.

Cautions with Query Cache:

Hash for SELECT * FROM countries_table => 310905b644f0844c78037c104716e6bd

Hash for sELECT * FROM countries_table => 6ce0a48206e12beb563ad7f2f823fe4b

Note that changing a single letter from S to s already causes a different hash. This means a different key. Different keys, two records or entries in the Query Cache.

Query Cache is an LRU (Least Recently Used) cache. That is, when space needs to be freed for new entries, the cache evicts the least used, least read records. The algorithm for this is complex, intelligent, and very efficient, although it causes fragmentation inside the cache.

Another very smart, but also a curse of Query Cache, is how stored data invalidation occurs.

Imagine a SELECT on inventory was done. The result is Caipirinha = 10. So, there are 10 caipirinhas in stock. And beware, alcohol should never run out of stock! If that happens, it’s a stack overflow! Next, another identical SELECT is requested. The result returned from Cache is: Yes, we have 10 caipirinhas. But what if all were sold?

Thinking about this, every time a table undergoes UPDATE, DELETE, or even INSERT, all data from that table stored in the Query Cache are invalidated and will no longer be returned. Of course, the space is freed for new storage.

MemCache: The Best Cache and NoSQL for MySQL

MemCache is the external cache and also the natural NoSQL for MySQL. So much so that, in the survey I conducted, it’s worth remembering: 31% of MySQL users use MemCache.

It’s a cache implemented via software in RAM through a daemon. Extremely fast, efficient, robust, and on the road since 2003. Initially developed by Brad Fitzpatrick for LiveJournal. Today it has a very active community. MemCache is FOSS — Free Open Source Software. That means it’s free and open-source, feel free to jump in.

I’ve used MemCache since 2005, I’m part of the community (with very little contribution, mainly due to lack of time), and I can testify about it: It’s damn good! Oh, you dirty minds.

Currently, I have it installed at more than 50 clients. Some are very high-demand applications, and MemCache’s behavior is remarkable, sometimes even scary in its perfection.

Extremely simple to use, basically it has 2 commands:

– GET (key): Searches the cache for the given key. If it exists, returns the stored value (usually the result of a SELECT, if we’re talking about MySQL);

– SET (key, what_to_store, time_to_live): Stores in the cache. Key, in MySQL terms, is usually an MD5 of the entire SELECT command; what_to_store is the SELECT result, and time_to_live is how long this information should live.

This is the best Cache, and the best NoSQL I know. Simple, robust, fast, and can handle any abuse.

MemCache is such a serious Cache/NoSQL that starting from MySQL version 5.6, it will be incorporated into MySQL. Oracle wouldn’t do this if MemCache wasn’t truly serious.

Parallels between Query Cache and MemCache

– Both are NoSQL, yes sir. In Query Cache, this isn’t obvious because it’s automatic. Once configured, every SELECT (meeting the configuration criteria) is automatically stored in Query Cache. Before executing any SELECT on a table, it checks Query Cache if that SELECT was already executed and stored. All transparent to the developer.

– Query Cache is based on LRU, and cached data are automatically invalidated upon any table changes. This is perfect because it prevents serving outdated or incorrect cached information. The downside is that any table change, even if not related to cached rows, causes total invalidation of all Query Cache entries for that table. In other words, if Query Cache holds all sales for São Paulo state, hypothetically: WHERE state = ‘SP’, a new sale in Amazonas state will invalidate all São Paulo cache entries. Pretty bad, right? Not to mention it generates overhead.

– MemCache is based on TTL (time to live), which defines how long the stored content lives. For example, when you store something in MemCache, you define how long that data should live, its expiration time. Until expiration, the data is served. This means MemCache should NOT be used for inventory, transactions, etc. In other words, MemCache is only for static or rarely updated data.

– Query Cache only stores MySQL query (SELECT) results; MemCache can store literally any information in (key, value) NoSQL format. In MemCache you can store query results but also photos, images, PDFs, etc.

Cache: A Lifesaver for Databases in General?

I have clients running a single MySQL server with 28,000 queries per second, no problems, and no cache at all. None. This is a homemade ERP application with a very well-modeled database. That’s why I insist that the success or failure of an application starts way back, at entity definition and database modeling.

On the other hand, I have clients who can’t get past 500 queries per second without issues, often using Query Cache at least.

The truth behind this story is that Query Cache only makes sense for some tables in the database, those with very little updating. My biggest annoyance with Query Cache is that you can’t manage it properly. You can’t know, for example, which queries are inside. Even though I understand how it works, I still see it as a big, black, ugly Pandora’s box. Ultimately, it’s also a resource consumer for MySQL.

MemCache, and here I’m not defending it blindly, I like the fact that it’s external to MySQL and saves its life. I don’t like TTL, but truth be told, it’s fantastic when cache existence is defined along with the company’s business rules. MemCache should only be used for very static things. Some practical examples for MemCache use:

– Tables containing ZIP codes, neighborhoods, cities, states, countries, historical data, encyclopedias, D-1 data (data older than one day that can no longer be changed);

– Business can define that products, promotions, product categories are only available after 5 minutes, so these tables have TTL = 5 minutes;

– Business can define that e-commerce storefronts, user profile updates on social sites, are updated every minute, so TTL = 1 minute.

Uncle says:

Use Cache, Query Cache, MemCache, Jboss Cache, whatever. Use it! In applications with intense database use, caches can take care of MySQL’s performance and health.

Use them without any moderation!

Schedule a meeting here

Visit our Blog

Learn more about databases

Learn about monitoring with advanced tools

Have questions about our services? Visit our FAQ

Want to see how we’ve helped other companies? Check out what our clients say in these testimonials!

Discover the History of HTI Tecnologia

Institucional

Sustentação

Monitoramento

Consultoria

Contact

Siga-nos nas redes sociais

Acesse a página

Compromisso com a Inovação

Foco na Segurança

Parcerias Estratégicas

Soluções Personalizadas

Expertise Comprovada

Acesse a página

Acesse a página

Acesse a página

Sustentação

Monitoramento

Consultoria

Acesse a página

MySQL and the use of Cache (NoSQL)

Examples of Buffers inside MySQL

Compartilhar:

Institucional

Sustentação

Monitoramento

Consultoria

Contact

Siga-nos nas redes sociais

Quem somos

Acesse a página

Compromisso com a Inovação

Foco na Segurança

Parcerias Estratégicas

Soluções Personalizadas

Expertise Comprovada

Nossa História

Acesse a página

Blog

Acesse a página

FAQ

Acesse a página

Sustentação

Monitoramento

Consultoria

Testemunhos

Acesse a página