EmoArt
Post
Glossary

Tag sequence - the mechanism behind sub-national flag emoji

Last updated: 2026-05-13·~4 min

This article takes about 4 minutes to read.

🏴󠁧󠁢󠁥󠁮󠁧󠁿 (England) is not a single character. It's a black flag base, four invisible tag characters spelling "gbeng," and a cancel-tag terminator. Six codepoints for one flag.Tag sequences are the Unicode mechanism for representing flags below the country level - regions, states, and constituent countries within larger nations. They're more complex than the Regional Indicator system used for country flags, and their complexity is the main reason support is patchy across platforms.

Definition

A tag sequence is a Unicode emoji construction that uses Tag Characters (U+E0020 to U+E007F) to encode a sub-national identifier alongside a base emoji. The general structure is: base emoji + sequence of tag characters + cancel tag (U+E007F). For sub-national flags, the base emoji is the black flag (🏴 U+1F3F4) and the tag characters spell out an ISO 3166-2 subdivision code.

The structure of a sub-national flag

Take the England flag 🏴󠁧󠁢󠁥󠁮󠁧󠁿 as an example. It's composed of:

  1. 🏴 (U+1F3F4) - black flag base
  2. Tag g (U+E0067) - lowercase "g" as a tag character
  3. Tag b (U+E0062) - lowercase "b"
  4. Tag e (U+E0065) - lowercase "e"
  5. Tag n (U+E006E) - lowercase "n"
  6. Tag g (U+E0067) - lowercase "g"
  7. Cancel tag (U+E007F) - terminator

The four tag characters between the second "g" and the cancel tag spell "gbeng," which is the ISO 3166-2 code for England. "GB" is the country (United Kingdom) and "ENG" is the subdivision. Combined, they identify "England within the UK."

Why this design

The Regional Indicator system used for country flags works only for ISO 3166-1 country codes (A-Z pairs). Sub-national entities don't fit that two-letter scheme. Unicode could have assigned a separate codepoint for each subdivision flag, but there are thousands of subdivisions globally, and dedicating codepoints to each would consume the supplementary plane quickly. Tag sequences solve this by encoding the identifier as a string of invisible tag characters.

The trade-off is complexity: each flag is 6+ codepoints instead of 2, and platforms must implement both the Regional Indicator and tag sequence mechanisms to support all flags.

Officially supported flags

Unicode's Recommended for General Interchange (RGI) list is conservative: only flags meeting widespread support criteria are recommended. Currently, only three subdivision flags are RGI:

  • 🏴󠁧󠁢󠁥󠁮󠁧󠁿 - England (gbeng)
  • 🏴󠁧󠁢󠁳󠁣󠁴󠁿 - Scotland (gbsct)
  • 🏴󠁧󠁢󠁷󠁬󠁳󠁿 - Wales (gbwls)

Other subdivisions (US states, Canadian provinces, German Länder, Japanese prefectures, etc.) can technically be encoded as tag sequences, but they aren't RGI and aren't supported by major platforms. Sending one to most users will display as the bare black flag plus a tofu sequence.

Why support is patchy

  • Implementation complexity: Vendors must parse multi-codepoint sequences and look up subdivision codes
  • Asset proliferation: Each supported flag needs its own glyph asset; supporting hundreds of subdivisions is expensive
  • Political sensitivity: Recognizing a subdivision flag can be politically charged; vendors are cautious
  • Limited demand: Most users don't request sub-national flags; the ROI for vendors is low

The result is that even on modern Apple, Google, Samsung, and Microsoft devices, only the three RGI subdivisions reliably render. Other tag sequences typically fall back to the black flag glyph followed by visible tag character squares.

Tag characters: a brief history

Tag Characters (U+E0000 block) were originally added in Unicode 3.1 for "language tagging" - a now-deprecated mechanism for marking language transitions in plain text. They sat unused for over a decade before being repurposed for emoji tag sequences in Unicode 9.0 (2016). This repurposing is why the codepoint range looks somewhat alien for emoji use; it's the legacy of an older feature that found new life.

Practical considerations

  • Don't rely on subdivision flags outside the three RGI ones: rendering will fail on most devices
  • Character counting is expensive: each subdivision flag is 6-7 codepoints, more than 14 UTF-16 code units
  • Copy-paste survives: tag sequences are part of standard Unicode and survive between modern systems
  • Screen readers vary: announcement quality for subdivision flags ranges from "flag of England" (when supported) to "black flag" plus the tag letters spelled out individually

Common misconceptions

  • ❌ "All sub-national flags are emoji" → ✅ Only England, Scotland, and Wales are RGI; others are technically encodable but not supported
  • ❌ "Tag sequences are the same as Regional Indicators" → ✅ Different mechanism; Regional Indicators are paired letters, tag sequences are longer strings
  • ❌ "Adding more subdivision flags is just a font asset issue" → ✅ Vendors weigh asset cost, political considerations, and user demand together

Related terms

  • Regional Indicator - the simpler flag mechanism for country-level flags
  • Codepoint - the unit each tag character occupies
  • ZWJ - a similar combining mechanism, but for ZWJ sequences instead of tag sequences

Was this article helpful?