prev
l
next
Run length encoding

What is RLE ?
Principle of RLE

CompuServe standard for RLE file format
MS Windows standard for RLE file format


Books
References
Related links

 

Run length encoding

What is RLE ?

Run-length encoding (RLE) is a very simple form of data compression encoding. It is based on simple principle of encoding data. This principle is to every stream which is formed of the same data values (repeating values is called a run) i.e sequence of repeated data values is replaced with count number and a single value. This intuitive principle works best on certain data types in which sequences of repeated data values can be noticed; RLE is usually applied to the files that a contain large number of consecutive occurrences of the same byte pattern.

RLE may be used on any kind of data regardless of its content, but data which is being compressed by RLE determines how good compression ratio will be achieved. So RLE is used on text files which contains multiple spaces for indention and formatting paragraphs, tables and charts. Digitized signals also consist of unchanged streams so such signals can also be compressed by RLE. A good example of such signal are monochrome images, and questionable compression would be probably achieved if such compression was used on continous-tone (photographic) images.
Fair compression ratio may be achieved if RLE is applied on computer generated color images.

RLE is a lossless type of compression and cannot achieve great compression ratios, but a good point of that compression is that it can be easily implemented and quickly executed.

back to top


Principle of RLE

As it was said above basic RLE principle is that the run of characters is replaced with the number of the same characters and a single character.
Example:
before:                           R T A A A A S D E E E E E

after RLE compression: R T *4A S D *5E
In the above example each stream of the same characters is being replaced with the number of characters, single character and character *, at the first sight usage of * may seem redundant but characters * which represents control character (which may variously implemented) are necessary because some streams consist of the same characters or number of character is too small, so this special character represents that next characters represents the number of repetition and character after the number is the character which will be repeated. If control character or a run of control characters (in our example *) are in a stream which is being encoded then one extra character is coded. It is important to realize that the encoding process is effective only if there are sequences of 4 or more repeating characters because three characters are used to conduct RLE so for instance coding two repeating characters would lead to expansion and coding three repeating characters wouldn't cause compression or expansion. The decoding process is also very simple, if there aren't control characters the stream is just copied and if control character occurred then it must be removed and appropriate character is copied in a defined number of times. It can be noticed that the process of decoding control characters don't lead to any special procedures.
It is important to point out that there are many different run-length encoding schemes. The above example has just been used to demonstrate the basic principle of RLE encoding. In a particular case the implementation of RLE depends on what type of data is being compressed.

In the following paragraph the process of encoding using flow chart will be shown.

CTRL
COUNT
CHAR

Fig 1. Format of three byte code word

CTRL - control character which is used to indicate compression
COUNT- number of counted characters in stream of the same characters
CHAR - repeating characters

Fig 2. RLE - flow chart

Process of RLE starts with initialization of character counter, repetition counter and a variable which represents the current character (Ch), then if all characters in file have been processed encoding ends. If there are more characters then Ch variable is being stored in temporary variable (if ChCount equals 1), else actual character is being compared to the previous character and then result of that comparison leads to repetition counter increment or to another comparison in which it is being tested if the number of consecutive characters is greater than four in other words does the stream is just copied or coded according to code word shown in Fig 1.

back to top


Examples of RLE implementations

RLE algorithms are parts of various image compression techniques like BMP, PCX, TIFF, and is also used in PDF file format, but RLE also exists as separate compression technique and file format.

CompuServe standard for RLE file format

CompuServe RLE file format standard was formed in the 80's and defines the compression for 1-bit images.
Header sequence in RLE file represents Graphic Mode Control, control is initiated when program runs onto a sequence of three characters, those characters are
ASCII ESC (HEX 1B), ASCII G(HEX 47) and the third character is ASCII H (HEX 48) or M (HEX 4D). Third character represents resolution, there are two possible graphics modes, and those are high resolution graphic mode (256 x 192 pixels) represented by sequence <ESC><G><H> and medium resolution graphic mode (128 x 96 pixels)which is represented by <ESC><G><M> sequence.
After header sequence ,data sequence starts, basic data sequence consists of a pair of run length encoded ASCII characters. The first number represents number of the background (turned off) pixels and the second character is the number of foreground (turned on) pixels. Each number of a pair represents the count number of pixels plus 32 decimal, i.e. from each number 32 is substracted and that number represents how many next pixels will be turned on or turned off depending on what number of pair we observe. Usually it is used ASCII ~(HEX 7E, DEC 126) as highest possible value, because some terminals interpret ASCII character 7F HEX as <RUBOUT>, because RLE file format was used as file which was to show graphic on terminals. Previous facts lead to conlussion that in each byte we can denote repetition of 94 pixels (126 - 32). For example pair <D><'> (HEX: 44 27, DECIMAL 68 39) means next 68 (decimal) pixels are turned off and then 39 (decimal) pixels are turned on.

Data in file is written in such a way that if the last pixel set was on position 254 then the next pixel will be on the first position in next line i.e. pictures are being drawn from up to down. Let's illustrate this with an example; if the last pixel set on line was on position 252 and data sequence consists of pair 21 hex, 27hex i.e one background pixel and seven foreground pixels then following pixel is turned off, then the following two pixels of current line are turned on, and then the rest of five pixels turned on, on the beginning of the next line.
The ending sequence for RLE standard consists of three characters <ESC><G><N>, <ESC> is a control character which ends the graphic display. Basic convention is that control character shouldn't effect the display. All control characters should be ignored besides <ESC> and <BEL> characters, <BEL> can be optionally used, so in some cases RLE file ending sequence consists of <BEL><ESC><G><N>. In other words end of RLE file according to standard is <ESC><G><H> or <BEL><ESC><G><N>.


Example - RLE file (all numbers are in ASCII HEX format):
1B 47 48 7E 20 7E 20 7E 20 7E 20 ....
 .     .    .    .    .    .    .    .    .     .    .
 .    41 36  .    .   .    .    .    .     .    .
 .     .    .    .    .   .    .    .    .     .    .
 .     .    .   07 1B 47 4E  

1B 47 48 - is header <ESC><G><H> and represents high resolution, first data sequence pair 20hex, 7Ehex means that first 94dec pixels are all turned on, the second data sequence is the same so second 94d pixels are also turned on (the first 188d pixels are turned on so far), and so on. Then somewhere in the file pairs 41h 36h occurs which means that next 33d pixels are turned off and after that 22d pixels are turned off, etc. Last four character are the ending sequence which was described above.

Click to download samples of CompuServe RLE images

back to top


MS Windows standard for RLE file format

MS Windows standard for RLE have the same file format as well-known BMP file format, but it's RLE format is defined only for 4-bit and 8-bit colour images.
Two types of RLE compression is used 4bit RLE and 8bit RLE as expected the first type is used for 4-bit images, second for 8-bit images.

4bit RLE
Compression sequence consists of two bytes, first byte (if not zero) determines number of pixels which will be drawn. The second byte specifies two colors, high-order 4 bits (upper 4 bits) specifies the first color, low-order 4bits specifies the second color this means that after expansion 1st, 3rd and other odd pixels will be in color specified by high-order bits, while even 2nd, 4th and other even pixels will be in color specified by low-order bits. If first byte is zero then the second byte specifies escape code. (See table below)

Second byte
Definition
0
End-of-line
1
End-of-Rle(Bitmap)
2
Following two bytes defines offset in x and y direction (x is right,y is up). The skipped pixels get color zero.
>=3
when expanding following >=3 nibbles (4bits) are just copied from compressed file, file/memory pointer must be on 16bit boundary so adequate number of zeros follows

Table 1. Definition of escape codes(the first byte of compression sequence is 0)

Examples for 4bit RLE:

Compressed data
Expanded data
06 52
5 2 5 2 5 2
08 1B
1 B 1 B 1 B 1 B
00 06 83 14 34
8 3 1 4 3 4
00 02 09 06
Move 9 positions right and 6 up
00 00
End-of -line
04 22
2 2 2 2
00 01
End-of-RLE(Bitmap)

8bit RLE
Sequence when compressing is also formed from 2 bytes, the first byte (if not zero) is a number of consecutive pixels which are in color specified by the second byte.
Same as 4bit RLE if the first byte is zero the second byte defines escape code, escape codes 0, 1, 2, have same meaning as described in Table 1. while if escape code is >=3 then when expanding the following >=3 bytes will be just copied from the compressed file, if escape code is 3 or other greater odd number then zero follows to ensure 16bit boundary.

Examples for 8bit RLE

Compressed data
Expanded data
06 52 52 52 52 52 52 52
08 1B 1B 1B 1B 1B 1B 1B 1B 1B
00 03 83 14 34 83 14 34
00 02 09 06 Move 9 positions right and 6 up
00 00 End-of -line
04 2A 2A 2A 2A 2A
00 01 End-of-RLE(Bitmap)

back to top


Books

back to top


References

back to top


Related Links

back to top