Latest tweets from eosgarden

Noxeos - Mixing C++ with C/Objective-C: http://t.co/VS5zeASf
eosgarden - 16.01.2012 / 23:01
FileSytem for iPhone is finally compatible with iOS 5 - update or download it now: http://t.co/yXqCFkC9
eosgarden - 11.01.2012 / 22:22
Noxeos - Warning flags for Clang: http://t.co/iZFQWi6f
eosgarden - 10.01.2012 / 20:36
AutoPurge - Optimize your system memory with a single click: http://t.co/9oTKasSS
eosgarden - 10.01.2012 / 20:24
Manual - Unix man pages at your fingertips: http://t.co/81rZTSeC
eosgarden - 10.01.2012 / 20:12
PropEdit 2.1.0: http://t.co/zJsERSpb
eosgarden - 10.01.2012 / 19:48
@dodyrw NodeJS binary is included in WebStart. GUI is ready, and will be available in the next version (should be released in a few days).
eosgarden - 08.01.2012 / 22:03
@dfeyer Don't know the Percona version (yet)… Is it better than MySQL?
eosgarden - 07.01.2012 / 10:59
PropEdit 2.0 has been released. Now with ACLs support! http://t.co/zJsERSpb
eosgarden - 07.01.2012 / 10:50
PropEdit 2.0 is ready!
eosgarden - 07.01.2012 / 10:07
 
 
 

Binary representation of single precision floating point numbers

Author: Jean-David Gadina <macmade(at)eosgarden.com>
Source: IEEE Standard for Floating-Point Arithmetic - IEEE 754
Copyright (C) Jean-David Gadina.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.
 
 

Table of contents

  1. Theory
  2. Example
  3. Special numbers
    1. Denormalized numbers
    2. Zero
    3. Infinity
    4. NaN
  4. Range
    1. Normalized numbers
    2. Denormalized numbers
  5. C code example

1. Theory

Single precsion floating point numbers are usually called 'float', or 'real'. They are 4 bytes long, and are packed the following way, from left to right:
  • Sign: 1 bit
  • Exponent: 8 bits
  • Mantissa: 23 bits
X XXXX XXXX XXX XXXX XXXX XXXX XXXX XXXXX
Sign
1 bit
Exponent
8 bits
Mantissa
23 bits
The sign indicates if the number is positive or negative (zero for positive, one for negative).
The real exponent is computed by substracting 127 to the value of the exponent field. It's the exponent of the number as it is expressed in the scientific notation.
The full mantissa, which is also sometimes called significand, should be considered as a 24 bits value. As we are using scientific notation, there is an implicit leading bit (sometimes called the hidden bit), always set to 1, as there is never a leading 0 in the scientific notation.
For instance, you won't say 0.123 · 105 but 1.23 · 104.
The conversion is performed the following way:
-1S · 1.M · 2( E - 127 )
Where S is the sign, M the mantissa, and E the exponent.

2. Example

For instance, 0100 0000 1011 1000 0000 0000 0000 0000, which is 0x40B80000 in hexadecimal.
Hex 4 0 B 8 0 0 0 0
Bin 0100 0000 1011 1000 0000 0000 0000 0000
Sign Exponent Mantissa
0 1000 0001 (1) 011 1000 0000 0000 0000 0000
  • The sign is 0, so the number is positive.
  • The exponent field is 1000 0001, which is 129 in decimal. The real exponent value is then 129 - 127, which is 2.
  • The mantissa with the leading 1 bit, is 1011 1000 0000 0000 0000 0000.
The final representation of the number in the binary scientific notation is:
-10 · 1.0111 · 22
Mathematically, this means:
1 · ( 1 · 20 + 0 · 2-1 + 1 · 2-2 + 1 · 2-3 + 1 · 2-4 ) · 22
( 20 + 2-2 + 2-3 + 2-4 ) · 22
22 + 20 + 2-1 + 2-2
4 + 1 + 0.5 + 0.25
The floating point value is then 5.75.

3. Special numbers

Depending on the value of the exponent field, some numbers can have special values. They can be:
  • Denormalized numbers
  • Zero
  • Infinity
  • NaN (not a number)

3.1. Denormalized numbers

If the value of the exponent field is 0 and the value of the mantissa field is greater than 0, then the number has to be treated as a denormalized number.
In such a case, the exponent is not -127, but -126, and the implicit leading bit is not 1 but 0.
That allows smaller numbers to be represented.
The scientific notation for a denormalized number is:
-1S · 0.M · 2-126

3.2. Zero

If the exponent and the mantissa fields are both 0, then the final number is zero. The sign bit is permitted, even if it does not have much sense mathematically, allowing a positive or a negative zero.
Note that zero can be considered as a denormalized number. In that case, it would be 0 · 2-126, which is zero.

3.3. Infinity

If the value of the exponent field is 255 (all 8 bits are set) and if the value of the mantissa field is 0, the number is an infinity, either positive or negative, depending on the sign bit.

3.4. NaN

If the value of the exponent field is 255 (all 8 bits are set) and if the value of the mantissa field is not 0, then the value is not a number. The sign bit as no meaning in such a case.

3. Range

The range depends if the number is normalized or not. Below are the ranges for that two cases:

3.1 Normalized numbers

  • Min: ±1.1754944909521E-38 / ±1.00000000000000000000001-126
  • Max: ±3.4028234663853E+38 / ±1.11111111111111111111111128

3.2 Denormalized numbers

  • Min: ±1.4012984643248E-45 / ±0.00000000000000000000001-126
  • Max: ±1.1754942106924E-38 / ±0.11111111111111111111111-126

4. C code example

Below is an example of a C program that will converts a binary number to its float representation:
/* System includes */
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

/* Definition of the boolean data type */
typedef enum { FALSE, TRUE } boolean;

/**
* Converts a integer to its float representation
*
* This function converts a 32 bits integer to a single precision floating point
* number, as specified by the IEEE Standard for Floating-Point Arithmetic
* (IEEE 754). This standard can be found at the folowing address:
* {@link http://ieeexplore.ieee.org/servlet/opac?punumber=4610933}
*
* @param unsigned long The integer to convert to a floating point value
* @return float The floating point number
* @author Jean-David Gadina <macmade@eosgarden.com>
*/
float binaryToFloat( unsigned long binary )
{
/* Gets the sign field */
/* Bit 0, left to right */
boolean sign = binary >> 31;

/* Gets the exponent field */
/* Bits 1 to 8, left to right */
unsigned char exp = ( ( binary >> 23 ) & 0xFF );

/* Gets the mantissa field */
/* Bits 9 to 32, left to right */
unsigned long mantissa = ( binary & 0x7FFFFF );

/* Storage for the return value */
float floatValue = 0;

/* Counter */
signed int i = 0;

/* Checks the values of the exponent and the mantissa fields to handle special numbers */
if( exp == 0 && mantissa == 0 )
{
/* Zero - No need for a computation even if it can be considered as a denormalized number */
return 0;
}
else if( exp == 255 && mantissa == 0 )
{
/* Infinity */
return 0;
}
else if( exp == 255 && mantissa != 0 )
{
/* Not a number */
return 0;
}
else if( exp == 0 && mantissa != 0 )
{
/* Denormalized number - Exponent is fixed to -126 */
exp = -126;
}
else
{
/* Computes the real exponent */
exp = exp - 127;

/* Adds the implicit bit to the mantissa */
mantissa = mantissa | 0x800000;
}

/* Process the 24 bits of the mantissa */
for( i = 0; i > -24; i-- )
{
/* Checks if the current bit is set */
if( mantissa & ( 1 << ( i + 23 ) ) )
{
/* Adds the value for the current bit */
/* This is done by computing two raised to the power of the exponent plus the bit position */
/* (negative if it's after the implicit bit, as we are using scientific notation) */
floatValue += ( float )pow( 2, i + exp );
}
}

/* Returns the final float value */
return ( sign == FALSE ) ? floatValue : -floatValue;
}

/**
* C main() function
*
* @return int The exit status
*/
int main( void )
{
printf( "%f\n", binaryToFloat( 0x40B80000 ) );
return EXIT_SUCCESS;
}

Comments:

Author: Private Krankenversicherung Vergleich
Date: 1 October 2010 / 10:28
You made some good points there. I did a search about the subject and almost not found any specific details on other websites, but then great to be here, really, thanks.

- Lucas
Author: Jean-David Gadina
Date: 27 November 2010 / 21:54
Thanx a lot for the comment Lucas : )

Add a comment:

Anti-Spam: