Character Sets

We say a string is composed of characters. A character could be a letter, digit, space or punctuation. The exact characters one could type would often depend on convention set by the equipment manufacturer.

Binary Coded Decimal or BCD was extended to 6 bits to include upper case characters and some punctuation.

CDC Display Code was a similar to BCD code but with substitutions of some mathematical punctuation.

Extended BCD Interchange Code or EBCDIC was an 8 bit code introduced with the successful IBM/360 architecture.

American Standard Code for Information Exchange or ASCII was a 7 bit code embraced in telecommunications and minicomputers.

Finally Unicode subsumes ASCII and extends it to 16 bits and beyond.

UTF-8 is a cleverly encoded variation of Unicode that squeezes 7-bit ASCII into 8 bits with the extra denoting Unicode extensions.