## design – Which is preferred: subclass double or create extension methods to test (relative) equality due to floating point differences?

I am writing financial calculation software using .NET C#, which needs to be blazingly fast. There is a lot of fractional math. So using decimal type is pretty much out of the question, given its poor speed relative to using double. But of course double has its problems testing for equality, with floating point rounding issues.

My options seem to be subclassing double and overriding ==, < and >; versus creating extension methods for double equivalent to these. My tendency is to go with the latter – less code to change and maybe it will be less confusing to others reading the code later? Is there another option? What are other good reasons to choose one over the other?

## computer architecture – Is machine epsilon the largest relative error in representing a number as a floating point number?

Thanks for contributing an answer to Computer Science Stack Exchange!

But avoid

• Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

## regex – Early Stages of floating point class template in C++

I’m currently designing a class template to represent scientific notation or a floating-point number system. There are currently 4 distinct types: BIN, DEC, OCT, & HEX. This could easily be expanded by the user.

This class utilizes some of the power of C++17 or greater such as scoped-enums, lambdas, string_view, and regular-expressions. With that being said, here is what the current state of my class looks like:

fpn.h

#pragma once

#include <algorithm>
#include <array>
#include <cassert>
#include <cstdint>
#include <cmath>
#include <iostream>
#include <map>
#include <optional>
#include <regex>
#include <string>
#include <utility>

enum class BaseTy {
BIN = 2,
OCT = 8,
DEC = 10,
HEX = 16
};

static const std::regex bin_regex(R"((-?(01)*).?((01)*))");
static const std::regex oct_regex(R"((-?(0-7)*).?((0-7)*))");
static const std::regex dec_regex(R"((-?(0-9)*).?((0-9)*))");
static const std::regex hex_regex(R"((-?(0-9a-fA-F)*).?((0-9a-fA-F)*))");

static const std::regex bin_exp_regex(R"(-?(01)*)");
static const std::regex oct_exp_regex(R"(-?(0-7)*)");
static const std::regex dec_exp_regex(R"(-?(0-9)*)");
static const std::regex hex_exp_regex(R"(-?(0-9a-fA-F)*)");

static const std::map<BaseTy, std::regex> ValidDigitSets = {
{BaseTy::BIN, bin_regex},
{BaseTy::OCT, oct_regex},
{BaseTy::DEC, dec_regex},
{BaseTy::HEX, hex_regex}
};

static const std::map<BaseTy, std::regex> ValidExponentSets = {
{BaseTy::BIN, bin_exp_regex},
{BaseTy::OCT, oct_exp_regex},
{BaseTy::DEC, dec_exp_regex},
{BaseTy::HEX, hex_exp_regex}
};

static const auto match = ()(const std::string& digits, const std::regex& regex) -> bool {
std::smatch base_match;
return std::regex_match(digits, base_match, regex);
};

template<BaseTy BASE = BaseTy::DEC>
class Fpn {
public:
const uint16_t Base = static_cast<uint16_t>(BASE);

public:
std::string digits_{ "0" };
int64_t integral_value_{ 0 };
uint64_t decimal_value_{ 0 };
int64_t exponent_{1};
size_t decimal_location_{1};

Fpn() = default;
Fpn(const std::string_view digit_sequence, const std::string_view exponent = "") {
std::cmatch digit_match;

if (!std::regex_match(digit_sequence.data(), digit_match, ValidDigitSets.at(BASE))) {
throw std::runtime_error("invalid digit sequence");
}
// Assert to make sure that the input has the correct character sets
assert(
(match(exponent.data(), ValidDigitSets.at(BASE))) &&
"invalid character sequence entered"
);

// if exponent is empty we can treat this as a value raised to the 1st power
exponent_ = exponent.empty() ? 1 : std::stoi(exponent.data(), nullptr, static_cast<int>(BASE));
if (exponent_ == 0) {
digits_ = { "1" };
integral_value_ = 1;
decimal_value_ = 0;
decimal_location_ = 1;
return;
}

// Set the digits_ member value...
digits_ = digit_sequence.data();

decimal_location_ = digit_match(1).length();
if (digit_match(1).length() != 0) {
integral_value_ = std::stoi(digit_match(1).str().c_str(), nullptr, static_cast<int>(BASE));
}
if (digit_match(2).length() != 0) {
decimal_value_ = std::stoi(digit_match(2).str().c_str(), nullptr, static_cast<int>(BASE));
}
}

template <typename OS>
friend OS& operator << (OS& os, const Fpn& fpn) {
return os << "digits{" << fpn.digits_ << "}nt"
<< "integral = " << fpn.integral_value_ << "nt"
<< "decimal = " << fpn.decimal_value_ << "nt"
<< "dec loc = " << fpn.decimal_location_ << "nt"
<< "exponent = " << fpn.exponent_ << "nn";
}
};



Here is the driver program that uses the above class:

main.cpp

#include "fpn.h"

int main() {
try {
Fpn large("314195.2");
Fpn small("1.24");
Fpn sample("420");

std::cout << "large:nt" << large;
std::cout << "small:nt" << small;
std::cout << "sample:nt" << sample;
std::cout << "bin:nt" << Fpn<BaseTy::BIN>("10.11");
std::cout << "octal:nt" << Fpn<BaseTy::OCT>("17");
std::cout << "hex:nt" << Fpn<BaseTy::HEX>("2A");
}
catch (const std::exception& e) {
std::cerr << e.what() << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}


Here is the generated output for simple basic cases:

Output

large:
digits{314195.2}
integral = 314195
decimal = 2
dec loc = 6
exponent = 1

small:
digits{1.24}
integral = 1
decimal = 24
dec loc = 1
exponent = 1

sample:
digits{420}
integral = 420
decimal = 0
dec loc = 3
exponent = 1

bin:
digits{10.11}
integral = 2
decimal = 3
dec loc = 2
exponent = 1

octal:
digits{17}
integral = 15
decimal = 0
dec loc = 2
exponent = 1

hex:
digits{2A}
integral = 42
decimal = 0
dec loc = 2
exponent = 1


I have not tested the exponent part just yet and I have not implemented any of the arithmetic or logical operators. However, the class appears to be working as expected.

The overall concept of this class is that it is not just fixed around a floating-point number that is fixed to a specific number system such as Decimal. The idea here is that it should be flexible for any number system… Log2 - Binary, Log8 - Octal, Log10 - Decimal, Log16 - Hexadecimal, LogX - Polynomial, etc.

There are four predefined types to this library. However, the user has the capability of extending this to any LogX number system by adding in their type to the enumerated class, updating the predefined regular expressions and their equivalent static const maps. The class itself should automatically handle the rest.

As for the structure of the representation of the numbering systems, the following applies for all types:

The left-hand side of the . is the integral part, the right-hand side of the . up to the ^ is the decimal part which is represented by the 1st string_view passed into the constructor, and everything to the right of the ^ is the exponent which is represented by the 2nd string_view passed into the constructor.

Here are some examples:

The class will calculate the integral and decimal values as well as the decimal point location or position. The exponent is handled separately in which I have yet to implement it to populate the internal member variable. The class is in its early stage so it is far from complete. This is now where I’m at within the design decisions…

From a design perspective, I am interested in your thoughts, concerns, and feedback.

• Are there any code smells?
• Does it seem to be modular and portable?
• Is it generic and reusable?
• Does it express intent and is it readable?
• Does it exhibit the ability to be computationally efficient?
• What kind of improvements can be made?
• Are there any corner cases or gotchas that I’m missing or overlooking?
• Even just personal feedback, opinions, or suggestions are welcomed.

These are the things I’d like to have answered or looked at before I continue to add any “functionality” to this class.

Edit

Note: Outside of the class I have two sets of regular expressions one for the sequence of digits and another for the exponent. Since this is in the early stages of development, I’m restricting exponents to be only integral values to keep things simple. In a future version, I may extend this to where the exponents could also be represented by floating-point values such as 12.34^5.6. In that case, then only the first set of regex would be needed where the second set might be considered redundant. This is just an additional note to the reader.

## Validity of Algorithm for Testing Two Floating Point Numbers

This question is related to the epsilon- (or delta- if you prefer) test for floating point equality. But my question is not how to do it. Instead I have a related algorithm for testing equality, and I would like feedback as to what might be wrong with it.

The idea is to take the quotient of two floating point numbers and compare it to one. This eliminates the difficulty of choosing a value for epsilon. I have tested it extensively and am happy with the results. Here is a sample implementation in Java.

public class DoubleEquals
{
public static final double  EPSILON = 1e-14;

public static boolean equals( double param1, double param2 )
{
// Accounts for 0, +/-INFINITY;
// if both values are NaN the result will be false
boolean result  = param1 == param2;
if ( !result )
{
double quot = param1 / param2;
result = quot > 0 && (1 - quot) < EPSILON;
}

return result;
}
}
$$$$
`

## IEEE 754 addition wrong result floating point numbers

I want to add two IEEE 754 numbers.
I followed the steps to add two 754 numbers. However the result it not correct.
Number 1:
S:0
E:01111111
M:11111111111111111111111

Number 2:
S:0
E:01111111
M:00000000000000000000000

Here is my calculation:

The site http://weitz.de/ieee/ gives this result:
S: 0
E: 10000000
M: 10000000000000000000000

in my calculation the mantissa is 01111…
Why?

## floating point – Is $x$ in the working range [3,2]?

Consider $$x$$ = $$(0.1001)_b$$ and $$F$$(3,2). ($$F$$(3,2) is the set of all floating point numbers with 3 digits in the mantissa and 2 digits in the exponent.)

The question is $$x$$ in the working range (3,2)?

I am quite confused does $$F$$(3,2) mean $$(0.111)_b$$ x $$2^{11}$$ = $$(111)_b$$?