|
Run
length encoding
What
is RLE ?
Run-length
encoding (RLE) is a very simple form of data compression
encoding. It is based on simple principle of encoding
data. This principle is to every stream which is formed
of the same data values (repeating values is called a
run) i.e sequence of repeated data values is replaced
with count number and a single value. This intuitive principle
works best on certain data types in which sequences of
repeated data values can be noticed; RLE is usually applied
to the files that a contain large number of consecutive
occurrences of the same byte pattern.
RLE may be used on any kind
of data regardless of its content, but data which is being
compressed by RLE determines how good compression ratio
will be achieved. So RLE is used on text files which contains
multiple spaces for indention and formatting paragraphs,
tables and charts. Digitized signals also consist of unchanged
streams so such signals can also be compressed by RLE.
A good example of such signal are monochrome images, and
questionable compression would be probably achieved if
such compression was used on continous-tone (photographic)
images.
Fair compression ratio may be achieved if RLE is applied
on computer generated color images.
RLE is a lossless type of
compression and cannot achieve great compression ratios,
but a good point of that compression is that it can be
easily implemented and quickly executed.
back to top
Principle
of RLE
As it was said above
basic RLE principle is that the run of characters is replaced
with the number of the same characters and a single character.
Example:
before: R
T A A A A S D E E E E E
after RLE compression: R T *4A S D
*5E
In the above example each stream of the same characters
is being replaced with the number of characters, single
character and character *, at the first sight usage of
* may seem redundant but characters * which represents
control character (which may variously implemented) are
necessary because some streams consist of the same characters
or number of character is too small, so this special character
represents that next characters represents the number
of repetition and character after the number is the character
which will be repeated. If control character or a run
of control characters (in our example *) are in a stream
which is being encoded then one extra character is coded.
It is important to realize that the encoding process is
effective only if there are sequences of 4 or more repeating
characters because three characters are used to conduct
RLE so for instance coding two repeating characters would
lead to expansion and coding three repeating characters
wouldn't cause compression or expansion. The decoding
process is also very simple, if there aren't control characters
the stream is just copied and if control character occurred
then it must be removed and appropriate character is copied
in a defined number of times. It can be noticed that the
process of decoding control characters don't lead to any
special procedures.
It is important to point out that there are many different
run-length encoding schemes. The above example has just
been used to demonstrate the basic principle of RLE encoding.
In a particular case the implementation of RLE depends
on what type of data is being compressed.
In the following paragraph
the process of encoding using flow chart will be shown.
Fig 1.
Format of three byte code word
CTRL - control character which is used to indicate compression
COUNT- number of counted characters in stream of the same
characters
CHAR - repeating characters
Fig 2. RLE
- flow chart
Process of RLE starts with
initialization of character counter, repetition counter
and a variable which represents the current character
(Ch), then if all characters in file have been processed
encoding ends. If there are more characters then Ch variable
is being stored in temporary variable (if ChCount equals
1), else actual character is being compared to the previous
character and then result of that comparison leads to
repetition counter increment or to another comparison
in which it is being tested if the number of consecutive
characters is greater than four in other words does the
stream is just copied or coded according to code word
shown in Fig 1.
back to top
Examples
of RLE implementations
RLE algorithms are parts
of various image compression techniques like BMP, PCX,
TIFF, and is also used in PDF file format, but RLE also
exists as separate compression technique and file format.
CompuServe standard
for RLE file format
CompuServe RLE file
format standard was formed in the 80's and defines the
compression for 1-bit images.
Header sequence in RLE file represents Graphic Mode Control,
control is initiated when program runs onto a sequence
of three characters, those characters are
ASCII ESC (HEX 1B), ASCII G(HEX 47) and the third character
is ASCII H (HEX 48) or M (HEX 4D). Third character represents
resolution, there are two possible graphics modes, and
those are high resolution graphic mode (256 x 192
pixels) represented by sequence <ESC><G><H>
and medium resolution graphic mode (128 x 96 pixels)which
is represented by <ESC><G><M> sequence.
After header sequence ,data sequence starts, basic data
sequence consists of a pair of run length encoded ASCII
characters. The first number represents number of the
background (turned off) pixels and the second character
is the number of foreground (turned on) pixels. Each number
of a pair represents the count number of pixels plus 32
decimal, i.e. from each number 32 is substracted and that
number represents how many next pixels will be turned
on or turned off depending on what number of pair we observe.
Usually it is used ASCII ~(HEX 7E, DEC 126) as highest
possible value, because some terminals interpret ASCII
character 7F HEX as <RUBOUT>, because RLE file format
was used as file which was to show graphic on terminals.
Previous facts lead to conlussion that in each byte we
can denote repetition of 94 pixels (126 - 32). For example
pair <D><'> (HEX: 44 27, DECIMAL 68 39) means
next 68 (decimal) pixels are turned off and then 39 (decimal)
pixels are turned on.
Data in file is written in such a way
that if the last pixel set was on position 254 then the
next pixel will be on the first position in next line
i.e. pictures are being drawn from up to down. Let's illustrate
this with an example; if the last pixel set on line was
on position 252 and data sequence consists of pair 21
hex, 27hex i.e one background pixel and seven foreground
pixels then following pixel is turned off, then the following
two pixels of current line are turned on, and then the
rest of five pixels turned on, on the beginning of the
next line.
The ending sequence for RLE standard consists of three
characters <ESC><G><N>, <ESC>
is a control character which ends the graphic display.
Basic convention is that control character shouldn't effect
the display. All control characters should be ignored
besides <ESC> and <BEL> characters, <BEL>
can be optionally used, so in some cases RLE file ending
sequence consists of <BEL><ESC><G><N>.
In other words end of RLE file according to standard is
<ESC><G><H> or <BEL><ESC><G><N>.
Example - RLE file (all numbers are in ASCII HEX
format):
1B 47 48 7E 20 7E 20 7E 20 7E 20 ....
. . . . . .
. . .
. .
. 41 36 . .
. . .
. . .
. . . . .
. . .
. .
.
. . .
07 1B 47 4E
1B 47 48 - is header
<ESC><G><H> and represents high resolution,
first data sequence pair
20hex, 7Ehex means that first 94dec
pixels are all turned on, the second data sequence is
the same so second 94d pixels are also turned on (the
first 188d pixels are turned on so far), and so on. Then
somewhere in the file pairs 41h 36h occurs which means
that next 33d pixels are turned off and after that 22d
pixels are turned off, etc. Last four character are the
ending sequence which was described above.

Click to download samples of CompuServe RLE images
back to top
MS
Windows standard for RLE file format
MS Windows standard for RLE
have the same file format as well-known BMP file format,
but it's RLE format is defined only for 4-bit and 8-bit
colour images.
Two types of RLE compression is used 4bit RLE and 8bit
RLE as expected the first type is used for 4-bit images,
second for 8-bit images.
4bit RLE
Compression sequence consists
of two bytes, first byte (if not zero) determines number
of pixels which will be drawn. The second byte specifies
two colors, high-order 4 bits (upper 4 bits) specifies
the first color, low-order 4bits specifies the second
color this means that after expansion 1st, 3rd and other
odd pixels will be in color specified by high-order bits,
while even 2nd, 4th and other even pixels will be in color
specified by low-order bits. If first byte is zero then
the second byte specifies escape code. (See table below)
|
Second
byte
|
Definition
|
|
0
|
End-of-line
|
|
1
|
End-of-Rle(Bitmap)
|
|
2
|
Following
two bytes defines offset in x and y direction (x
is right,y is up). The skipped pixels get color
zero.
|
|
>=3
|
when
expanding following >=3 nibbles (4bits) are just
copied from compressed file, file/memory pointer
must be on 16bit boundary so adequate number of
zeros follows
|
Table 1.
Definition of escape codes(the first byte of compression
sequence is 0)
Examples for 4bit RLE:
|
Compressed
data
|
Expanded
data
|
|
06 52
|
5 2
5 2 5 2
|
|
08 1B
|
1 B
1 B 1 B 1 B
|
|
00 06
83 14 34
|
8 3
1 4 3 4
|
|
00 02
09 06
|
Move
9 positions right and 6 up
|
|
00 00
|
End-of
-line
|
|
04 22
|
2 2
2 2
|
|
00 01
|
End-of-RLE(Bitmap)
|
8bit RLE
Sequence when compressing is also formed from 2 bytes,
the first byte (if not zero) is a number of consecutive
pixels which are in color specified by the second byte.
Same as 4bit RLE if the first byte is zero the second
byte defines escape code, escape codes 0, 1, 2, have same
meaning as described in Table 1. while if escape code
is >=3 then when expanding the following >=3 bytes
will be just copied from the compressed file, if escape
code is 3 or other greater odd number then zero follows
to ensure 16bit boundary.
Examples for 8bit
RLE
|
Compressed
data
|
Expanded
data
|
| 06
52 |
52
52 52 52 52 52 |
| 08
1B |
1B
1B 1B 1B 1B 1B 1B 1B |
| 00
03 83 14 34 |
83
14 34 |
| 00
02 09 06 |
Move
9 positions right and 6 up |
| 00
00 |
End-of
-line |
| 04
2A |
2A
2A 2A 2A |
| 00
01 |
End-of-RLE(Bitmap) |
back to top
Books
back to top
References
back to top
Related Links
back to top |