Binary Data Series Part 3: Floating Point Numbers
This article is part of the multi part series on binary data. If you're knowledge of positive and negative integers and binary numbers is a bit flakey, I'd recommend that you read the part 1 and part 2 before you continue.
In this article we will discuss how fractions are represented in binary and some limitations you must keep in mind when developing software that require very precise calculations.
To start things off consider how we represent floating points in the decimal system.
Thousands | Hundreds | Tens | Ones | . | Tenth | Hundredth | Thousandth | Ten-Thousanth | Hundred-Thousanth |
10^3 | 10^2 | 10^1 | 10^0 | . | 10^-1 | 10^-2 | 10^-3 | 10^-4 | 10^-5 |
1 | 9 | 8 | 1 | . | 0 | 8 | 4 | 2 | 7 |
Each position to the right of the decimal point represents some fraction of 10. Each position to the right is 10 times smaller than the previous position.
This is very much how floats in binary is represented.
Eight | Four | Two | One | . | One-Half | One-Fourth | One-Eighth | One- Sixteenth | One-Thirty Second |
2^3 | 2^2 | 2^1 | 2^0 | . | 2^-1 | 2^-2 | 2^-3 | 2^-4 | 2^-5 |
1 | 9 | 8 | 1 | . | 0 | 8 | 4 | 2 | 7 |
Converting from binary floats to decimal fractions is the same as converting integers. Multiply each bit by it's positional value and add them up. Observe:
101.011
- The first digit on the left is 1. It's is in the 2^2 position, making 1 * 2^2 = 4
- The next digit is 0. It's in the 2^1 position, meaning 0 * 2^1 = 0
- The next digit is 1. It's in the 2^0 position, meaning 1 * 2^0 = 1
- The next digit is 0. It's in the 2^-1 position, meaning 0 * 2^-1 = 0
- The next digit is 1. It's in the 2^-2 position, meaning 1 * 2^-2 = 0.25
- The next digit it 1. It's in the 2^-3 position, meaning 1 * 2^-3 = 0.125
Adding these numbers together we get 4 + 0 + 1 + 0 + 0.25 + 0.125 = 5.375
Now suppose we have a 12.25 and we want to convert this to binary. Here's how it's done:
First convert the integer part to binary:
12 therefore becomes 1100
Next we deal with the fractional part of the number. For this we repeatedly multiply the fractional part of number by 2 until we get 0, each time adding a binary 1 digit when the product is greater than or equal to 1 and a binary 0 when it is less than 1:
Multiplication by 2 | Product | Digit |
0.25 * 2 | 0.5 | 0 |
0.5 * 2 | 1.0 | 1 |
The fractional part of the number is .01. Combine this with the integer part and we get 1100.01
Great, now that you we've gone over how to convert floating point from decimal to binary, let's talk about how these are stored in a computer.
The IEEE Floating Point Standard
Most computer systems today follow the IEEE 754 standard for representing real numbers. In the technical standard, floating point numbers are represented using 3 components. These are:
- The sign bit
- The exponent
- The mantissa
I'll break each of these down and explain what they mean.
Do you remember learning about scientific notation in primary school? In case you haven't heard of it before or need a refresher, scientific notation is a method for writing very large or small numbers. The numbers 890,000,000 and 0.00000051 or example is written in scientific notation as 8.9 x 108 and 5.1 x 10-7. A number in scientific notation is expressed as a number between 1 and 10 multiplied by a power of 10.
Binary real numbers can also be expressed in scientific notation. A binary number in scientific notation is expressed as a number 0 or 1 multiplied by a power of two. Let's look at the decimal number 85.125 and express that in binary scientific notation:
First we convert 85.125 to binary. Following the steps outlined above, we will get 1010101.001. Now to express this in scientific notation we will move the decimal point six bits to the left and then multiply by 26. The result will be 1.010101001 x 2^6.
If this number was stored as a single precision 32 bit float, here's what it will look like:
Sign bit | Exponent | Mantissa |
0 | 10000101 | 01010100100000000000000 |
Let's unpack things a bit further…
The sign bit represents the sign of the number. If the float is positive than this bit will be 0 otherwise it is 1.
The exponent is 8 bits in length and represents the exponent part of the expression. You may have noticed that notice that exponent is not 00000110, which is 6 in binary. This is because 127 was added to the number. The 127 serves as a bias term and allows the computer to represent both positive and negative exponents.
The mantissa is 23 bits in length and represents the part of the number in scientific notation that is to the right of the decimal point.
This number can also be represented in a double precision 64 bit floating point. Here's what it would look like:
Sign bit | Exponent | Mantissa |
0 | 10000000101 | 0101010010000000000000000000000000000000000000000000 |
As you can see the double precision floating point type uses 1 bit to represent the sign of the number, 11 bits for the exponent with a bias of 1023, and 52 bits for the mantissa. Just as with the different types of integer types covered in the last article, single precision and double precision floating point can store different ranges of real numbers.
The range a single precision float can represent approximately +/- 10^38.25 while the range of a double precision float is +/- 10^308.25
With that large of a range you would think that a computer can represent any type of number you provide it accurately. But you would be mistaken. Allow me to explain.
The Limits of Floating Point Numbers
I'll begin my explanation by first demonstrating the conversion of the real number 0.4 into binary.
Multiplication by 2 | Product | Digit |
0.4 * 2 | 0.8 | 0 |
0.8 * 2 | 1.6 | 1 |
0.6 * 2 | 1.2 | 1 |
0.2 * 2 | 0.4 | 0 |
0.4 * 2 | 0.8 | 0 |
0.8 * 2 | 1.6 | 1 |
0.6 * 2 | 1.2 | 1 |
… | … | … |
As you can see the number 0.4 in binary repeats infinitely. 0.011001100110011001100110011…
The number 0.3 and 0.1 are other examples of numbers where you will see this behavior. I'll leave it up to you to convert them to binary and see for yourself.
There are many decimal floating points like these that cannot be represented exactly in binary. As a consequence, what you often get when you type floating point numbers into a computer is not the exact number, but an approximation of that number.
That is why when you run certain calculations on a computer like 0.1 + 0.2, you get 0.30000000000000004 instead of exactly 0.3.
In many cases, the approximation isn't an issue. We lob off the trailing digits we don't need and make it 0.3. But where it really becomes an issue is when we are dealing with applications that must make very precise calculations. Financial applications one case where precise calculations are crucial.
So what can be done for cases like these? In these situations, developers turn to Decimal data types, data types that are designed to represent decimal numbers precisely in a computer system. In the case of finance applications, developers may also choose to represent money in cents rather than in dollars, allowing them to use an integer data type instead of a floating point type to store the money.
That's all, folks!
I've hoped you enjoyed the binary data series. Feel free to leave a comment if you have any questions.