The encoding has traditionally been either ASCII, one of its many derivatives such as ISO/IEC 646 etc., or sometimes EBCDIC. No other encodings are used in plain text files which neither contain any (character-based) structural tags such as heading marks, nor any typographic markers like bold face, italics, etc.
Unicode is today gradually replacing the older ASCII derivatives limited to 7 or 8 bit codes. It will probably serve much the same purposes, but this time permitting almost any human language as well as important punctuation and symbols such as mathematical relations (≠ ≤ ≥ ≈), multiplication (× •), etc, which are not included in the very rudimentary and incomplete ASCII set.
The purpose of using plain text today is primarily a "lowest common denominator" independence from programs that require their very own special encoding or formatting (with due sacrifices and limitations). Plain text files can be opened, read, and edited with most text editors. Examples include Notepad (Windows), edit (DOS), ed, vi or vim (Unix, Linux), SimpleText (Mac OS), or TextEdit (Mac OS X). Other computer programs are also capable of reading and importing plain text.
It can also be used by simple computer tools such as line printing text commands like type (DOS and Windows) and cat (Unix).
Plain text files are almost universal in programming; a source code file containing instructions in a programming language is almost always a plain text file. Plain text is also commonly used for configuration files, who were read for saved settings at the startup of a program.
Plain text is a way to represent generic text without attributes such as fonts, subscripts, and boldface; due to this simplicity, it is readable and processable by almost any computer program. In a way a HTML, SGML and an XML file is regarded as plain text, since no control codes (see below) are used, but real structural tags are actually included in these formats. As regards to the SGML and XML author, these tags are "human readable" since that format author understands the structure by reading the format. This may illuminate the complications of the usage of terms within computer science: it's all a relative view point.
SPACE (= 32 = 20H) are not intended as displayable characters, but instead as control characters. They are used for a diversity of interpreted meanings, for example the code NULL (= 0, sometimes denoted Ctrl-@) is used as string end markers in the programming language C and successors. Most troublesome of these are the codes LF (= LINE FEED = 10 = 0AH) and CR (= CARRIAGE RETURN = 13 = 0DH). Windows and OS/2 require the sequence CR,LF to represent a newline, while Unix and relatives uses just the LF, and Classic Mac OS (but not Mac OS X) uses just the code CR. This was once a slight problem when transferring files between Windows and Unices, but today most computer programs treat this seamlessly.
See also
- E-text
- MIME Content-type
- Formatted text
- Filename extension
- File format
- Binary file
- Text file
- Editor wars
- File system
- Configuration file
- Source code