Many scientific, engineering and commercial applications call for operations with real numbers. In many cases, a fixed-point numerical representation can be used. Nevertheless, this approach is not always feasible since the range that may be required is not always attainable with this method. Instead, floating-point numbers have proven to be an effective approach as they have the advantage of a dynamic range, but are more difficult to implement, less precise for the same number of digits, and include round-off errors.
The floating-point numerical representation is similar to scientific notation differing in that the radix point location is fixed usually to the right of the leftmost (most significant) digit. The location of the represented number’s radix point, however, is indicated by an exponent field. Since it can be assigned to be anywhere within the given number of bits, numbers with a “floating” radix point have a wide dynamic range of magnitudes that can be handled while maintaining a suitable precision.
The IEEE standardized the floating-point numerical representation for computers in 1985 with the IEEE-754 standard [1]. This specific encoding of the bits is provided and the behavior of arithmetic operations is precisely defined. This IEEE format minimizes calculation anomalies, while permitting different implementation possibilities. Since the 1950’s binary arithmetic has become predominantly used in computer operations given its simplicity for implementation in electronic circuits. Consequently, the heavy utilization of binary floating-point numbers mandates the IEEE binary floating-point standard to be required for all existing computer architectures, since it simplifies the implementation. More importantly, it allows architectures to efficiently communicate with one another, since numbers adhere to the same IEEE standard.
Although binary encoding in computer systems is revalent, decimal arithmetic is becoming increasingly important and indispensable as binary arithmetic can not always satisfy the necessities of many current applications in terms of robustness and precision. Unfortunately, many architectures still resort to software routines to emulate operations on decimal numbers or, worse yet, rely on binary arithmetic and then convert to the necessary precision. When this happens, many software routines and binary approximations could potentially leave off crucial bits to represent the value necessary and potentially cause severe harm to many applications.
Author: Ivan Dario Castellanos
Source: Oklahoma State University