During our November 26, 2025, Office Hours livestream, a viewer brought up an excellent question. What does the alphabet soup of UTF-8, UTF8MB3, UTF8MB4, etc. mean? It really does look like a jumble of technical jargon at first glance, but it’s actually straightforward to understand.
UTF-8 determines how your database stores text and what characters it can understand. If you have ever encountered those acronyms in phpMyAdmin or app configs and wondered what they mean, then look no further. We’ll break it all down in this post.
The main question
During the livestream, a viewer asked Nathan this question.
UTF8 is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in the future. Please consider using UTF8MB4 in order to be unambiguous. Are you familiar with it? Best practices?
As Nathan aptly put it, it’s alphabet soup. It means nothing to the everyday person. However, it’s a crucial piece of how computers, and databases consequently, understand and store text.
What is UTF-8?
Before we dive deeper into the topic, we must first understand what UTF-8 means. The dash is essential.
It’s a standardized way of representing letters, numbers, punctuation, and symbols inside a computer. It’s a character encoding, a way to store characters defined by Unicode. That matters because computers don’t store letters as letters.
For example, “Hello” isn’t stored as H-E-L-L-O. A computer doesn’t understand letters the same way we do. However, it understands numbers. Because of that, it needs a character set like UTF-8 to store text.
The character set determines which numbers correspond to which characters. Or, to put a more visual perspective on the explanation, it’s like a giant grid of every character in the Unicode standard. From letters to numbers, punctuation, symbols, and even emojis, each character has a numerical value.
So, from the computer’s point of view, “Hello” looks like this in Unicode: U+0048 U+0065 U+006C U+006C U+006F.
What is UTF8MB4 then?
On the other hand, you have UTF8MB4, without the dash. You have most likely seen this spelled with lowercase letters in phpMyAdmin, but for clarity, we’ll continue using uppercase throughout this post.
That’s important because there is a significant difference between UTF-8 and UTF8MB4. UTF-8 is the character encoding standard, defined by Unicode, and supports 1-4 bytes per character. It also supports all Unicode characters, including emojis and rare scripts. It’s the standard used across browsers, files, APIs, operating systems, and so on.
However, UTF8MB4 is not the latest version of UTF-8. We understand why it may seem that way. Instead, UTF8MB4 is MySQL’s latest version of its implementation of the standard.
In simpler terms, when MySQL wanted to use UTF-8 to encode characters in their databases, they didn’t go with the full version, so to speak. Instead, MySQL implemented only the 1-3 bytes portion of UTF-8, leaving out the 4-byte range. That severely limited the number of characters the database could store.
That is, until UTF8MB4 came along, and MySQL databases could finally store up to 4 bytes per character and fully utilize the Unicode standard.
Why does using UTF8MB4 matter?
After all that technical explanation, the reason UTF8MB4 is important is very simple. Unless your MySQL database uses it, your website may not be able to show specific characters. Have you ever been on a site and encountered empty boxes or rows of question marks where content should be instead? That’s most likely due to the site’s database being unable to recognize–and therefore store and then serve–specific characters. It’s like when you are missing an emoji pack on your phone.
But why should you bother if your site is purely in English and doesn’t support emojis, for example? Because sooner or later, someone is going to paste a symbol, and it will appear as corrupted text instead. That doesn’t look professional at all, nor pretty.
Furthermore, UTF8MB4 supports all languages, is future-proof, and works great with many of the most popular modern frameworks (WordPress, Laravel, Drupal, etc.).
The alphabet soup is telling you how to improve your site
In the end, switching to UTF8MB4 in your database is a good idea. You can check whether your database uses UTF8MB4 by going to phpMyAdmin (or whichever management tool you use) and looking at the table’s collation. If it says utf8mb4, then you are all set.
Depending on your setup, you might need to do things differently, but upgrading will only benefit your website. Of course, it is a change to your database, so always back it up beforehand.
Switching ensures your site can store and display all modern characters, avoids database errors related to them, and handles all manner of scripts without breaking. It’s a small thing with a huge impact.
And if you have a similar question, or any question regarding running an agency, a tricky client issue, or hosting, register for Office Hours and have it answered live.
FAQ
What is "collation," and how does it relate to the character set I choose?
Character set and collation are related but distinct settings that are often confused.
The character set (e.g., utf8mb4) defines which characters can be stored. Collation defines how those characters are compared and sorted. For example, whether "é" and "e" are treated as equal in a search, or whether comparisons are case-sensitive.
For utf8mb4, the most commonly recommended collation is utf8mb4_unicode_ci. Getting the collation wrong can cause search queries to return unexpected results or sort content in the wrong order, even if all the characters display correctly.
Can switching to UTF8MB4 affect database performance or storage size?
For most WordPress sites, the performance difference is negligible, but worth understanding. Because UTF8MB4 allows up to 4 bytes per character (versus the 3-byte cap of MySQL's old utf8/UTF8MB3), indexed string columns can take up slightly more space.
On older MySQL versions, this could require adjusting innodb_large_prefix settings to avoid index length errors during migration. On modern MySQL (5.7.7+) and MariaDB, this is handled automatically.
If my site already displays content fine, does that mean my database is already using UTF8MB4?
Not necessarily. A site can appear to display content correctly even with the older UTF8MB3 encoding, as long as no one has ever submitted a 4-byte character.
That includes most emojis and certain less common scripts. The problem only surfaces when that first 4-byte character is inserted, at which point MySQL either silently truncates the data or throws an error, depending on its configuration. The absence of visible problems today is not a reliable indicator so it’s worth checking proactively in phpMyAdmin.
Does the encoding need to match at every level, or is it enough to just set it on the database?
It needs to match at every level of the stack: the database, the individual tables, the individual columns, and the connection between your application and the database.
A common gotcha is updating the database-level default to utf8mb4 but leaving older tables or columns in their original encoding. When there's a mismatch, MySQL will attempt to convert characters on the fly, which can introduce encoding errors or question marks even when the underlying data is fine.
.webp)



