Character Encoding Converter

Convert text between different character encodings: UTF-8, UTF-16, ASCII, and Latin-1. View byte representations and detect encoding issues.

0 characters

0 characters

How to Use the Character Encoding Converter

Single Mode

  1. Select your source encoding (From Encoding) and target encoding (To Encoding)
  2. Enter or paste your text in the input area
  3. The conversion happens automatically in real-time
  4. Enable "Show byte representation" to see hexadecimal bytes
  5. Enable "Show encoding warnings" to see incompatible characters
  6. Click the "Copy" button to copy the result to your clipboard

Batch Mode

  1. Switch to "Batch" mode using the mode toggle
  2. Select your source and target encodings
  3. Upload a TXT or CSV file, or enter multiple texts (one per line)
  4. Click "Process Batch" to convert all items at once
  5. View results in a detailed table with success/error indicators
  6. Download results as TXT or CSV

Common Use Cases

  • Debugging encoding issues: Fix "mojibake" (garbled text) in web applications and databases
  • Legacy data migration: Convert old ASCII/Latin-1 data to modern UTF-8
  • ASCII validation: Test if text can be safely stored in ASCII-only systems
  • Internationalization testing: Verify how text appears in different encodings
  • Web development: Ensure proper encoding for HTML, XML, and JSON data
  • API integration: Convert data between systems with different encoding requirements

Understanding Character Encodings

ASCII (7-bit)

ASCII (American Standard Code for Information Interchange) uses 7 bits to represent 128 characters (0-127). It includes English letters, numbers, punctuation, and control characters. ASCII is the most compatible encoding but only supports basic English characters. Use ASCII when you need maximum compatibility with legacy systems or when you're certain your text only contains English characters.

Latin-1 / ISO-8859-1 (8-bit)

Latin-1 extends ASCII to 8 bits, representing 256 characters (0-255). It includes Western European characters like à, é, ñ, and ü. Latin-1 was commonly used in older web pages and databases. It's suitable for Western European languages but cannot represent characters from other languages or emoji.

UTF-8 (Variable: 1-4 bytes)

UTF-8 is the most popular encoding on the web. It's backwards-compatible with ASCII (ASCII characters use the same bytes in UTF-8) and can represent any Unicode character including emoji, Chinese, Arabic, and more. UTF-8 uses 1 byte for ASCII characters and 2-4 bytes for other characters. It's the recommended encoding for modern web applications and data storage.

UTF-16 (Variable: 2-4 bytes)

UTF-16 is used internally by JavaScript, Java, and Windows. It uses 2 bytes for most characters and 4 bytes for rare characters. UTF-16 is efficient for languages with many non-ASCII characters but uses more space for English text compared to UTF-8. It's less common on the web but important for Windows file systems and programming languages.

Common Encoding Scenarios

ASCII → UTF-8

Perfect conversion. All ASCII characters (0-127) have identical byte values in UTF-8. No data loss.

UTF-8 → ASCII

Lossy conversion. Non-ASCII characters (accents, emoji, non-English text) cannot be represented and will cause errors. Only use when you're certain the text contains only ASCII characters.

Latin-1 → UTF-8

Perfect conversion for Western European text. All Latin-1 characters (0-255) have corresponding UTF-8 representations.

UTF-8 → Latin-1

Lossy conversion. Characters outside the 0-255 range (emoji, Asian characters, special symbols) cannot be represented. Works well for Western European text.

UTF-8 ↔ UTF-16

Lossless conversion. Both encodings support the full Unicode character set. Choose based on your system requirements (UTF-8 for web, UTF-16 for Windows/Java).

Frequently Asked Questions

What's the difference between ASCII and UTF-8?

ASCII is a 7-bit encoding supporting only 128 characters (English letters, numbers, basic punctuation). UTF-8 is a variable-length encoding supporting over 1 million characters from all languages, including emoji. UTF-8 is backwards-compatible with ASCII, meaning any ASCII text is valid UTF-8.

Why do I see strange characters in my text?

This is called "mojibake" and happens when text is encoded in one encoding but decoded in another. For example, UTF-8 text decoded as Latin-1 will show strange characters. To fix it, identify the original encoding and convert correctly. Our tool shows warnings when characters can't be represented in the target encoding.

When should I use UTF-8 vs UTF-16?

Use UTF-8 for web applications, APIs, JSON, XML, and file storage. It's the web standard and efficient for English text. Use UTF-16 when required by your platform (Windows file system, Java/JavaScript strings) or when working with languages that have many non-ASCII characters.

How do I fix encoding issues in my data?

First, identify the current (wrong) encoding and the target (correct) encoding. Use our converter to transform the text. For databases, use tools like MySQL's CONVERT() function. For files, use text editors that support encoding conversion or command-line tools like iconv. Always backup data before converting.

Is my data stored or transmitted?

No, all conversions happen locally in your browser using JavaScript. Your text never leaves your device, ensuring complete privacy and security.