What is GSM-7 Character Encoding? SMS Character Limits Explained
GSM-7 is a character encoding standard used in SMS (Short Message Service) that allows up to 160 characters per message segment. Developed by the European Telecommunications Standards Institute (ETSI), GSM-7 is the default encoding for text messages and plays a crucial role in determining SMS costs and delivery.Understanding GSM-7 Encoding
GSM-7 uses 7 bits to represent each character, which is where the name comes from. This 7-bit encoding is more efficient than the 8-bit ASCII or UTF-8 standards used in computing, allowing more characters to fit within the limited bandwidth of SMS.
The Math Behind 160 Characters
An SMS message has a maximum payload of 1120 bits. With GSM-7 encoding:
text1120 bits ÷ 7 bits per character = 160 characters
This is why the standard SMS length is 160 characters when using GSM-7 compatible text.
The GSM-7 Character Set
The GSM-7 basic character set includes 128 characters optimized for Western European languages:
Standard Characters (7 bits each)
| Category | Characters |
|---|
| Uppercase letters | A-Z |
|---|---|
| Lowercase letters | a-z |
| Numbers | 0-9 |
| Common punctuation | . , : ; ! ? |
| Special characters | @ £ $ ¥ # % & |
| Symbols | ( ) < > = + - / |
| Whitespace | Space, newline |
Extended Characters (14 bits each)
Some characters require an escape sequence, effectively using 2 character slots:
| Character | Description | Count as |
|---|
{ | Left curly bracket | 2 characters | |
|---|---|---|---|
} | Right curly bracket | 2 characters | |
[ | Left square bracket | 2 characters | |
] | Right square bracket | 2 characters | |
~ | Tilde | 2 characters | |
\ | Backslash | 2 characters | |
^ | Caret | 2 characters | |
€ | Euro sign | 2 characters | |
| Pipe | 2 characters |
GSM-7 vs UCS-2 Encoding
When a message contains characters outside the GSM-7 set, the encoding automatically switches to UCS-2 (Unicode):
| Encoding | Bits per character | Characters per SMS |
|---|
| GSM-7 | 7 bits | 160 characters |
|---|---|---|
| UCS-2 | 16 bits | 70 characters |
Characters That Trigger UCS-2
Common characters that force UCS-2 encoding:
- Emojis - All emojis require Unicode
- Chinese, Japanese, Korean - CJK characters
- Arabic, Hebrew - Right-to-left scripts
- Cyrillic - Russian, Ukrainian, etc.
- Smart quotes - " " ' ' (curly quotes)
- Special symbols - ™ © ® and many others
The Cost Impact
Switching to UCS-2 can more than double your SMS costs:
Example: A 140-character message with one emoji- Without emoji (GSM-7): 1 SMS segment
- With emoji (UCS-2): 2 SMS segments (140 chars ÷ 70 = 2)
Concatenated Messages (Long SMS)
When messages exceed the single-segment limit, they're split into multiple parts:
GSM-7 Concatenation
| Segments | Characters per segment | Total characters |
|---|
| 1 | 160 | 160 |
|---|---|---|
| 2 | 153 | 306 |
| 3 | 153 | 459 |
| 4 | 153 | 612 |
UCS-2 Concatenation
| Segments | Characters per segment | Total characters |
|---|
| 1 | 70 | 70 |
|---|---|---|
| 2 | 67 | 134 |
| 3 | 67 | 201 |
| 4 | 67 | 268 |
Common GSM-7 Pitfalls
1. Invisible Character Substitution
Word processors and some applications automatically replace characters:
| Typed | Auto-replaced | Encoding Impact |
|---|
" | " or " | GSM-7 → UCS-2 |
|---|---|---|
' | ' or ' | GSM-7 → UCS-2 |
- | – (en-dash) | GSM-7 → UCS-2 |
... | … (ellipsis) | GSM-7 → UCS-2 |
2. Copy-Paste from Documents
Copying text from Microsoft Word, Google Docs, or email clients often introduces:
- Smart quotes and apostrophes
- Non-breaking spaces
- Hidden formatting characters
- Em/en dashes
3. Emoji Insertion
A single emoji can convert your entire message to UCS-2, reducing capacity from 160 to 70 characters.
4. Locale-Specific Characters
Characters common in certain languages but outside GSM-7:
| Language | Problematic Characters |
|---|
| Polish | ą ć ę ł ń ó ś ź ż |
|---|---|
| Turkish | ğ ı İ ş |
| Portuguese | ã õ |
| German | ß (sometimes) |
GSM-7 National Language Extensions
The 3GPP defined national language extensions to support additional characters while maintaining 7-bit efficiency:
- Turkish - Adds ğ, ı, İ, ş, Ş, ç, Ç
- Spanish - Adds á, é, í, ó, ú, ü, ñ, Ñ, ¿, ¡
- Portuguese - Adds ã, Ã, õ, Õ, â, ê, ô, etc.
Best Practices for SMS Character Encoding
1. Validate Before Sending
Always check message encoding before transmission:
javascriptfunction isGsm7Compatible(text) { const gsm7Chars = /^[A-Za-z0-9 \r\n@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ!"#¤%&'()+,\-./:;<=>?¡ÄÖÑܧ¿äöñüà^{}\\\[~\]|€]*$/; return gsm7Chars.test(text); }
2. Character Count Accurately
Account for extended characters:
javascriptfunction countGsm7Characters(text) { const extendedChars = /[€\[\]{}~\\^[]).length; return text.length + extended; }
]/g; const extended = (text.match(extendedChars)
3. Sanitize User Input
Replace problematic characters before sending:
javascriptfunction sanitizeForGsm7(text) { return text .replace(/[""]/g, '"') // Smart quotes to straight .replace(/['']/g, "'") // Smart apostrophes .replace(/–/g, "-") // En-dash to hyphen .replace(/—/g, "-") // Em-dash to hyphen .replace(/…/g, "..."); // Ellipsis to periods }
4. Warn Users About Encoding Changes
If your application accepts user input for SMS, show real-time feedback:
- Current character count
- Number of SMS segments
- Encoding type (GSM-7 or UCS-2)
- Characters causing encoding switch
5. Consider Transliteration
For international messages, consider transliterating non-GSM-7 characters:
| Original | Transliterated |
|---|
| café | cafe |
|---|---|
| naïve | naive |
| ñ | n |
| ü | u |
SMS Encoding and Messaging APIs
Modern messaging APIs handle encoding automatically, but understanding helps optimize:
Zavu API Example
When sending via API, the platform:
json{ "message": { "id": "msg_abc123", "encoding": "GSM-7", "segments": 1, "characterCount": 142 } }
The Future of SMS Encoding
While GSM-7 remains the standard, the industry is evolving:
RCS (Rich Communication Services)
RCS (Rich Communication Services) removes character limitations entirely, supporting:- Unlimited text length
- Full Unicode support
- Rich media (images, videos)
- Read receipts and typing indicators
Fallback Strategies
Smart messaging platforms use:
Testing GSM-7 Compatibility
Before launching SMS campaigns, test with:
Online Tools
- GSM-7 character validators
- SMS length calculators
- Encoding detection tools
Device Testing
Send test messages to actual devices across:
- Different carriers
- Various phone models
- Multiple countries
Conclusion
Understanding GSM-7 encoding is essential for anyone working with SMS messaging. The 160-character limit, extended character costs, and UCS-2 fallback directly impact message delivery and costs.
Key takeaways:
- GSM-7 allows 160 characters per single SMS
- Extended characters count double (€, [, ], etc.)
- Non-GSM-7 characters trigger UCS-2, reducing capacity to 70 characters
- Concatenated messages lose characters to headers
- Always sanitize text before sending to avoid unexpected encoding switches
Further Reading
- SMS concatenation and multipart messages
- Unicode in telecommunications
- RCS messaging standards
- SMS delivery optimization