Glossary
SMSTelecommunicationsCharacter Encoding

What is GSM-7 Character Encoding? SMS Character Limits Explained

Learn what GSM-7 encoding is, how it affects SMS character limits, and why understanding character encoding is essential for cost-effective text messaging.

Written by: Victor VillalobosReviewed by: Jennifer VillalobosDecember 18, 20258 min read

What is GSM-7 Character Encoding? SMS Character Limits Explained

GSM-7 is a character encoding standard used in SMS (Short Message Service) that allows up to 160 characters per message segment. Developed by the European Telecommunications Standards Institute (ETSI), GSM-7 is the default encoding for text messages and plays a crucial role in determining SMS costs and delivery.

Understanding GSM-7 Encoding

GSM-7 uses 7 bits to represent each character, which is where the name comes from. This 7-bit encoding is more efficient than the 8-bit ASCII or UTF-8 standards used in computing, allowing more characters to fit within the limited bandwidth of SMS.

The Math Behind 160 Characters

An SMS message has a maximum payload of 1120 bits. With GSM-7 encoding:

text
1120 bits ÷ 7 bits per character = 160 characters

This is why the standard SMS length is 160 characters when using GSM-7 compatible text.

The GSM-7 Character Set

The GSM-7 basic character set includes 128 characters optimized for Western European languages:

Standard Characters (7 bits each)

CategoryCharacters
Uppercase lettersA-Z
Lowercase lettersa-z
Numbers0-9
Common punctuation. , : ; ! ?
Special characters@ £ $ ¥ # % &
Symbols( ) < > = + - /
WhitespaceSpace, newline

Extended Characters (14 bits each)

Some characters require an escape sequence, effectively using 2 character slots:

CharacterDescriptionCount as
{Left curly bracket2 characters
}Right curly bracket2 characters
[Left square bracket2 characters
]Right square bracket2 characters
~Tilde2 characters
\Backslash2 characters
^Caret2 characters
Euro sign2 characters
Pipe2 characters
Important: Using extended characters reduces your effective message length. A message with 10 euro signs (€) uses 20 character slots, leaving only 140 for other text.

GSM-7 vs UCS-2 Encoding

When a message contains characters outside the GSM-7 set, the encoding automatically switches to UCS-2 (Unicode):

EncodingBits per characterCharacters per SMS
GSM-77 bits160 characters
UCS-216 bits70 characters

Characters That Trigger UCS-2

Common characters that force UCS-2 encoding:

  • Emojis - All emojis require Unicode
  • Chinese, Japanese, Korean - CJK characters
  • Arabic, Hebrew - Right-to-left scripts
  • Cyrillic - Russian, Ukrainian, etc.
  • Smart quotes - " " ' ' (curly quotes)
  • Special symbols - ™ © ® and many others

The Cost Impact

Switching to UCS-2 can more than double your SMS costs:

Example: A 140-character message with one emoji
  • Without emoji (GSM-7): 1 SMS segment
  • With emoji (UCS-2): 2 SMS segments (140 chars ÷ 70 = 2)
This is why character encoding awareness is critical for SMS marketing campaigns.

Concatenated Messages (Long SMS)

When messages exceed the single-segment limit, they're split into multiple parts:

GSM-7 Concatenation

SegmentsCharacters per segmentTotal characters
1160160
2153306
3153459
4153612
The reduction to 153 characters occurs because 7 characters are reserved for the User Data Header (UDH), which tells the receiving device how to reassemble the message.

UCS-2 Concatenation

SegmentsCharacters per segmentTotal characters
17070
267134
367201
467268

Common GSM-7 Pitfalls

1. Invisible Character Substitution

Word processors and some applications automatically replace characters:

TypedAuto-replacedEncoding Impact
"" or "GSM-7 → UCS-2
'' or 'GSM-7 → UCS-2
- (en-dash)GSM-7 → UCS-2
... (ellipsis)GSM-7 → UCS-2
Solution: Always compose SMS in plain text editors or use SMS-specific tools that sanitize input.

2. Copy-Paste from Documents

Copying text from Microsoft Word, Google Docs, or email clients often introduces:

  • Smart quotes and apostrophes
  • Non-breaking spaces
  • Hidden formatting characters
  • Em/en dashes

3. Emoji Insertion

A single emoji can convert your entire message to UCS-2, reducing capacity from 160 to 70 characters.

4. Locale-Specific Characters

Characters common in certain languages but outside GSM-7:

LanguageProblematic Characters
Polishą ć ę ł ń ó ś ź ż
Turkishğ ı İ ş
Portugueseã õ
Germanß (sometimes)

GSM-7 National Language Extensions

The 3GPP defined national language extensions to support additional characters while maintaining 7-bit efficiency:

  • Turkish - Adds ğ, ı, İ, ş, Ş, ç, Ç
  • Spanish - Adds á, é, í, ó, ú, ü, ñ, Ñ, ¿, ¡
  • Portuguese - Adds ã, Ã, õ, Õ, â, ê, ô, etc.
However, carrier support varies significantly. Many carriers don't support national language shifts, causing character corruption or message failure.

Best Practices for SMS Character Encoding

1. Validate Before Sending

Always check message encoding before transmission:

javascript
function isGsm7Compatible(text) { const gsm7Chars = /^[A-Za-z0-9 \r\n@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ!"#¤%&'()+,\-./:;<=>?¡ÄÖÑܧ¿äöñüà^{}\\\[~\]|€]*$/; return gsm7Chars.test(text); }

2. Character Count Accurately

Account for extended characters:

javascript
function countGsm7Characters(text) { const extendedChars = /[€\[\]{}~\\^
]/g; const extended = (text.match(extendedChars)
[]).length; return text.length + extended; }

3. Sanitize User Input

Replace problematic characters before sending:

javascript
function sanitizeForGsm7(text) { return text .replace(/[""]/g, '"') // Smart quotes to straight .replace(/['']/g, "'") // Smart apostrophes .replace(/–/g, "-") // En-dash to hyphen .replace(/—/g, "-") // Em-dash to hyphen .replace(/…/g, "..."); // Ellipsis to periods }

4. Warn Users About Encoding Changes

If your application accepts user input for SMS, show real-time feedback:

  • Current character count
  • Number of SMS segments
  • Encoding type (GSM-7 or UCS-2)
  • Characters causing encoding switch

5. Consider Transliteration

For international messages, consider transliterating non-GSM-7 characters:

OriginalTransliterated
cafécafe
naïvenaive
ñn
üu
This maintains GSM-7 encoding at the cost of some linguistic accuracy.

SMS Encoding and Messaging APIs

Modern messaging APIs handle encoding automatically, but understanding helps optimize:

Zavu API Example

When sending via API, the platform:

  • Detects message encoding automatically
  • Calculates segment count
  • Applies appropriate pricing
  • Returns encoding info in response
  • json
    { "message": { "id": "msg_abc123", "encoding": "GSM-7", "segments": 1, "characterCount": 142 } }

    The Future of SMS Encoding

    While GSM-7 remains the standard, the industry is evolving:

    RCS (Rich Communication Services)

    RCS (Rich Communication Services) removes character limitations entirely, supporting:
    • Unlimited text length
    • Full Unicode support
    • Rich media (images, videos)
    • Read receipts and typing indicators

    Fallback Strategies

    Smart messaging platforms use:

  • RCS first - If supported by carrier and device
  • WhatsApp/Telegram - For rich messaging needs
  • SMS - As universal fallback with encoding optimization
  • Testing GSM-7 Compatibility

    Before launching SMS campaigns, test with:

    Online Tools

    • GSM-7 character validators
    • SMS length calculators
    • Encoding detection tools

    Device Testing

    Send test messages to actual devices across:

    • Different carriers
    • Various phone models
    • Multiple countries

    Conclusion

    Understanding GSM-7 encoding is essential for anyone working with SMS messaging. The 160-character limit, extended character costs, and UCS-2 fallback directly impact message delivery and costs.

    Key takeaways:

    • GSM-7 allows 160 characters per single SMS
    • Extended characters count double (€, [, ], etc.)
    • Non-GSM-7 characters trigger UCS-2, reducing capacity to 70 characters
    • Concatenated messages lose characters to headers
    • Always sanitize text before sending to avoid unexpected encoding switches
    By optimizing for GSM-7 compatibility, you can reduce SMS costs significantly while ensuring reliable message delivery across all carriers and devices.

    Further Reading

    • SMS concatenation and multipart messages
    • Unicode in telecommunications
    • RCS messaging standards
    • SMS delivery optimization

    Follow us on social media

    Build with Zavu

    Start sending messages with our unified communications API.

    Get started free
    What is GSM-7 Character Encoding? SMS Character Limits Explained | Zavu Glossary | Zavu