Implementing the reals
The most common way to implement "real" numbers on computers nowadays is by using floating-point numbers. They are defined by three values, the mantissa $c$, the exponent $E$, and a single sign bit $s$, so that any number is represented as $$x = (-1)^{s} \times c \times b^E$$
with $b$ the basis used, generally either $2$ or $10$. These days, most computers use the IEEE Standard for Binary Floating-Point Arithmetic, which uses the basis $b = 2$, the exponent as a signed integer, and a mantissa expressed as a binary decimal number
$$c = c_0.c_1c_2c_3...$$ or in other words $$c = \sum_{i = 0}^n c_i 2^{-i}$$There are many more characteristics to the IEEE 754 standard, but these are the basic mathematical notions involved for it. It has many benefits, hence its common use, but also suffer from a few drawbacks :
- It suffers from rounding problems due to the limited amount of precision of the mantissa
- Floating-point numbers are not distributed evenly throughout the number line, as each power of $2$ contains the same amount of numbers, meaning that there will be the same amount of values between for instance $0$ and $2^{-63}$ as there are between $2^{63}$ and $2^{64}$.
- Many values are redundant.
An alternative to floating-point arithmetics is fixed-point arithmetics, which is simply the same representation as integers, extended to decimals, so that two arrays of bits $c$ and $d$ will represent the number
$$x = \sum_{i = 0}^n 2^{c_i} + \sum_{i = 0}^m 2^{-d_i}$$Unlike floating-point numbers, fixed-point numbers are evenly distributed and have no redundant values, although they trade this for a much shorter precision. While a 32 bits floating-point numbers, with 8 bits dedicated to the exponent, will range from order of $2^{-126}$ to $2^{127}$, a fixed-point number with 16 bits dedicated to the integer part and 16 to decimals, will range between the orders of $2^{-15}$ and $2^{15}$. They also still suffer from the same rounding problem.
1. Rationals as pairs of integers
The simplest way to implement rationals exactly is to represent them simply as the ratio of two integers, one signed and one unsigned.struct fraction
{
int numerator;
unsigned int denominator;
};
{
numerator;
denominator;
};
fraction +(fraction a, fraction b)
{
return fraction(
a.denominator * b.numerator + a.numerator * b.denominator ,
a.denominator * b.denominator);
};
fraction -(fraction a, fraction b)
{
return fraction(
a.denominator * b.numerator - a.numerator * b.denominator ,
a.denominator * b.denominator);
};
fraction *(fraction a, fraction b)
{
return fraction(
a.numerator* b.numerator,
a.denominator * b.denominator);
};
fraction /(fraction a, fraction b)
{
return fraction(
a.numerator* b.denominator,
a.denominator * b.numerator);
};
stuff
int fraction::gcd(int a, int b)
{
return b?gcd(b, a%b):a;
}
void fraction::reduce()
{
int c = gcd(this->num, this->den);
this->num /= c;
this->den /= c;
if(this->den < 0){this->den *= -1; this->num *= -1;}
}
stuff