Category Archives: Технические науки

Machine FP partial invariance issue

Invariance issue

In computer representation:

“a + b + c” and “a + c + b” is not the same!
(and not the samefor multiplicationas well).

Hallelujah! I finally got that simple fact! After so many years of working in IT industry and software development!Well, Ikind of knew this, but never took it seriously until recently
If you guys are curious how ape dealt with getting bananatask
If you are same late as I am, read bellow.

Floating point machine representation

Usuallyfloating point number is represented as follows:
v = m * (be)

Where

m– is the mantissa, an integer with limited range. For example, for decimal numbers it could be in range from 0 to 99. For 24 bit binary numbers it is in range from 0 to (224-1), or from 0 to 16777215.
b– is the base, usually b = 2, an integer value,
e– is exponent, integer, it could take both negative and positive values.
For example in decimal numbers representation 0.5 is represented as:
0.5 = 5 * 10-1 (here m=5, b=10, e=-1)
For binary numbers 0.5 is 2-1 (m=1, b=2, e=-1)

Some people know, that in order to store bigger numbers we need more space in memory. But bigger precision also requires more memory, for we need mantissa of greater width, and thus we also need more bits to store it.

Integer vs float

While working with regular integer numbers we also having data loss and overflow issues, and yet we’re able to control it. We keep in mind minimum and maximum possible integer results, and this know when overflow might happen.
Floating point numbers is different. AFAIK no sane people control mantissa overflow, except perhaps some really rare cases. So here, better to think it just happens all the time.

Inevitable data loss

It is impossible to store numbers with infinite precision, and thus, data loss is inevitable. It’s obvious, but easy to miss if you had never dealt with some cases.
We can’t work with exact real number “N”…
We only able to work with its nearest machine floating pointrepresentation, fp(N) or:
N* = fp(N)

For mantissa in range 0 .. 999 we have next errors.
Number9999will be stored as
v = fp(9999) = 999e+1 = 9990
(here we lost info about most right “9”)

and number1.001will be stored just as
v = fp(1.001)=1
(here we lost info about most right “1”)

a + b + c

Actually v = a + b + c is performed in two steps:
Step 1: x = a + b
Step 2: v = x + c
Or with respect to fp transformation:
Step 1: x = fp(a + b)
Step 2: v = fp(x + c)
By changing the order of sum components, we in fact change what we’re going to loss on each step. And by changing order of band c we get different data loss, just like a final result.

Examples

Let’s demonstrate it on the next example.
  • mantissa can store up to 2 decimal digits, and thus in range 0 .. 99.
  • base is 10.
  • exponent could be any, for it doesn’t matter here really.
Let’s use values:
a = 99 (m=99, e = 0)
b = 10 (m=1, e = 1)
c = 1 (m=1, e = 0)
And consider the difference of “a+b+c” and “a+c+b”:
a + b +c:
fp(a+b) = fp(99+10) = fp(109) = 100
v = fp( fp(a+b) + c ) = fp(100 + 1) = fp(101) = 100

a + c + b:
fp(a+c) = fp(99+1) = fp(100) = 100
v = fp( fp(a+c) + b ) = fp(100 + 10) = fp(110) = 110
Unbelievable for regular people, but so obvious to programmers (and yet unbelievable):
(a + b + c = 100) ≠ (a + c + b = 110)

Well, to be more correct:
( fp(a + b + c) = 100 ) ≠ ( fp(a + c + b) = 110)

Upd:

As one of solutions, wider mantissa should be used for result, and only after all operation items participated in result, it then may be truncated to fp number with thinner mantissa.
If items have mantissa of N bits, then

  • for sum of M+1 items result should have M+N  bits mantissa,
  • for multiplication of M items result should have M*N bits mantissa.

Real example written on C is below.


example.c

#include 

// Helpers declaration, for implementation scroll down
float getAllOnes(unsigned bits);
unsigned getmantissasaBits();

int main() {

// Determine mantissasa size in bits
unsigned mantissasaBits = getmantissasaBits();

// Considering mantissasa has only 3 bits, we would then need:
// a = 0b10 m=1, e=1
// b = 0b110 m=11, e=1
// c = 0b1000 m=1, e=3

float a = 2,
b = getAllOnes(mantissasaBits) - 1,
c = b + 1;

float ab = a + b;
float ac = a + c;

float abc = a + b + c;
float acb = a + c + b;

printf("n"
"FP partial invariance issue demo:n"
"n"
"mantissasa size = %i bitsn"
"n"
"a = %.1fn"
"b = %.1fn"
"c = %.1fn"
"(a+b) result: %.1fn"
"(a+c) result: %.1fn"
"(a + b + c) result: %.1fn"
"(a + c + b) result: %.1fn"
"---------------------------------n"
"diff(a + b + c, a + c + b) = %.1fnn",
mantissasaBits,
a, b, c,
ab, ac,
abc, acb,
abc - acb);

return 1;
}

// Helpers

float getAllOnes(unsigned bits) {
return (unsigned)((1 << bits) - 1);
}

unsigned getmantissasaBits() {

unsigned sz = 1;
unsigned unbeleivableHugeSize = 1024;
float allOnes = 1;

for (;sz != unbeleivableHugeSize &&
allOnes + 1 != allOnes;
allOnes = getAllOnes(++sz)
) {}

return sz-1;
}

Output

FP partial invariance issue demo:

mantissasa size = 24 bits

a = 2.0
b = 16777214.0
c = 16777215.0
(a+b) result: 16777216.0
(a+c) result: 16777216.0
(a + b + c) result: 33554432.0
(a + c + b) result: 33554430.0
---------------------------------
diff(a + b + c, a + c + b) = 2.0

Please follow and like us:

LLVM, MergeFunctions pass

–>

**** MergeFunctions pass ****

Sometimes code contains functions that does exactly the same things even though they are non-equal on binary level. It could happen due to several reasons: mainly the usage of templates and automatic code generators. Though sometimes user itself could write same thing twise 🙂
The main purpose of pass is to recognize equal functions and merge them.

*** MergeFunctions, main fields and runOnModule ***

The are two key fields in class:
FnSet – the set of all unique functions. It keeps items that couldn’t be merged with each other.
Deferred – merging process can affect bodies of functions that are in FnSet already. These functions should be checked again. In that case we remove them from FnSet, and mark then as to be analyzed again: put them into Deferred list.

** runOnModule **

The algorithm is pretty simple:
1. Put all module’s functions into worklist.
2. Scan worklist’s functions twice: first enumerate only strong functions and then only weak functions:
2.1. (inside loop body) Take function from worklist and try to insert it into FnSet: check whether it equal to one of functions in FnSet. If there is equal function in FnSet: merge function token from worklist with that equal function from FnSet. Otherwise add function from worklist to FnSet.
3. After worklist scanning and merging operations complete, check Deferred list. If it is not empty: refill worklist contents with Deferred list and do 2 again, or exit from method otherwise (Deferred is empty).

* Narrative structure *

Article consists of two parts. First part describes comparison procedure itself. The second one describes the merging process.
Description will be in top-down form. First, top-level methods will be described. While the terminal ones will be at the end, in the tail of each part.
Few more words about top-level and complex objects comparison. Complex objects comparison function, basic-block, etc) is mostly based on its sub-objects comparison results. So, again, if reader will see the reference to method that wasn’t described yet, he will find its description a bit below.

*** Merge Functions Pass Comparison Algorithm, “compare” method ***

Comparison starts in “FunctionComparator::compare” method.
1. First parts to be compared are function attributes and some properties that outsides “attributes” term, but still could make function different without changing its body. This part of comparison is done within simple == operator (e.g. F1->hasGC() == F2->hasGC()). There are full list of function properties to be compared on this stage:
* Attributes (those are returned by Function::getAttributes() method).
* GC, for equivalence, RHS and LHS should be both either without GC or with the same one.
* Section, like a GC, RHS and LHS should be defined in the same section.
* Variable arguments. If LHS and RHS should be both either with or without var-args.
* Calling convention should be the same.
2. Function type.
Checked by FunctionComparator::isEquivalentType(Type*, Type*) method. It checks return type and parameters type; the method itself will be described later.
3. Associate function formal parameters with each other. Then during stage of function bodies, if we see usage of 1st argument from LEFT function, we want to see it in RIGHT at the same place, otherwise functions are different. This is done by “FunctionComparator::enumerate(const Value*, const Value*)” method (will be described a bit later).
4. Function body comparison. As its written in method comments:
“We do a CFG-ordered walk since the actual ordering of the blocks in the linked list is immaterial. Our walk starts at the entry block for both functions, then takes each block from each terminator in order. As an artifact, this also means that unreachable blocks are ignored.”
So, using this walk we get BBs from LEFT and RIGHT in the same order, and compare them by “FunctionComparator::compare(const BasicBlock*, const BasicBlock*)” method.
We also associate BBs with each other like we did with function formal arguments: Then if we meet reference to basic block “A” in LHS, we want to see reference to “A`” in RHS at the same place, and “A`” ought to be associated with “A”. Otherwise functions are different.

** FunctionComparator::isEquivalentType **

Let’s describe this comparison in six steps.
0. If left type is pointer, try to coerce it to integer type. It could be done if its address space is 0, or if address spaces are ignored at all. Do the same thing for right type.
1. It returns true if left and right types are equal:
“if (LeftTy == RightTy) return true;”
2. If types are of different kind (different type IDs). Return “false”.
Below cases when we have same type IDs goes.
3. If types are vectors or integers, return result of its pointers comparison.
4. Check left type by its ID, whether it belongs to the next group (call it equivalent-group):
* Void
* Float
* Double
* X86_FP80
* FP128
* PPC_FP128
* Label
* Metadata
Method treats LEFT and RIGHT as equals (return true). Since in that case its enough to see equivalent IDs. Note, if left belongs to this group, while right doesn’t, or right just has different typeID we return “false”.
5. Left and right are pointers, then they are equal if and only if they belongs to the same address space.
6. If left type is complex (structure, function or array, whatever else), and if right type is of the same kind.
Then both LEFT and RIGHT will be expanded and their element types will be checked with the same way.
Method treats them as equal if they are of the same kind and all their element types are equal as well.
7. Otherwise method returns “false”. Even if types has the same TypeID, we can’t treat them as equals. Instead there are now other cases, and its point to put llvm_unreachable call.
Special note about case with pointers and integers. Its a point of false-positive now. Consider next case on 32bit machine:
void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)
Here: foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.
As you can see it breaks transitivity. That means that result depends on order of how functions are presented in module. Next order causes merging of foo0 and foo1:
foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be merged with foo2.
This case looks like a bug and it is under discussion now (see PR17925).

** FunctionComparator::enumerate(const Value*, const Value*) **

Main purpose is to associate Value from left with Value from right. If we see usage of Value “A” at left, we expect to see usage of “A`” at right, at the same place, and we also expect to see “A`” associated with “A”.
Method returns “true” if values are associated already (implicitly by its nature, or explicitly by helper structures in MergeFunction pass).
Method returns “false” if values could not be associated. It indicates to caller-side, that things are being compared could not be equal.
We associate (we use “enumerate” for):
* Function arguments. i-th argument from left function associated with i-th argument from right function.
* BasicBlock instances. In basic-block enumeration loop we associate i-th BasicBlock from LEFT with i-th BasicBlock from RIGHT.
* Instructions.
* Instruction operands. Note, we can meet Value here we have never seen before. In this case it is not function argument nor BasicBlock, nor Instruction. It is global value. That means it is constant (its supposed to be seen here, at least). Method only accepts as equal next:
* Constants that are of the same type and value
* Right constant could be losslessly bit-casted to the left.
Otherwise method returns “false”.
Below is the detailed method body description.
Method performs next four things:
1. If left Value is left/right Function instance, then right Value should be right/left Function instance. If so: return true.
Note we return true for self-reference, and for cross-reference, in example below fact0 is equal to fact1, and ping is equal to pong as well:
// self-reference
unsigned fact0(unsigned n) { return n > 1 ? n * fact0(n-1) : 1; }
unsigned fact1(unsigned n) { return n > 1 ? n * fact1(n-1) : 1; }
// cross-reference
unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0; }
unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
Though, the ping-pong case is pretty seldom in real live.
Otherwise we go to next stage.
2. If left Value is constant. Method returns true in cases:
* Right one is the same constant,
* Both LEFTt and RIGHT are null values of the same type (it invokes isEquivalentType method),
* RIGHT could be losslessly bit-casted to the LEFT.
Otherwise method returns “false”.
3. If left is InlineAsm instance. The right should be the same instance then; if so: we return true. Otherwise return false.
4. Explicit association of L (left value) and R (right value).
Now follow the logic. We can associate values were not associated before. New values for us. Since “enumerate” is called for values that stays at the same place of their functions, we met them first at the same place. It is important.
MergeFunction pass has two helper data structures:
* id_map – is map of format map. Keeps track of all associated values. With left value as a key.
* seen_values – set of right values for whom there already was attempt to create an association.
On this stage method checks id_map[L].
* If it is not null, L is already associated with something, the result of id_map[L] == R comparison is returned in this case.
* If it is null, then we see this value first time, if R was not associated yet (seen_values.insert(R) returns “true”), we do the association: setup R as value for id_map[L].
Otherwise: there is still no association for L, but R was associated before, so method returns “false”.

*** compare(const BasicBlock*, const BasicBlock*) ***

Compares two BasicBlock instances.
It enumerates instructions from left BB and right BB.
1. It associates left instruction with right one, using “enumerate” method.
2. If left is GEP, it compares them using isEquivalentGEP method. Since we have some optimizations for this case.
3. Otherwise method ensures that LEFT and RIGHT performs the same operation (isEquivalentOperation) and its operands are equal: left operand should be properly associated with right one, and it should be of the same type (isEquivalentType).

*** isEquivalentGEP ***

Compares two GEPs.
There is an optimization for case, where offset is provided by constant values for both left and right GEPs. We calculate final offset for both of them using accumulateConstantOffset method. If we got same offset for left and right: return true.
Otherwise we don’t know what the final offset is. Compare GEP’s operands (as we do for all other instructions).

*** isEquivalentOperation ***

Compares instruction opcodes and some important operation properties.
It returns false in one of next cases:
* opcodes are different,
* number of operands are different,
* operation types are different,
* operation optional flags are different (checked by hasSameSubclassOptionalData method),
* operand types are different.
* Also for some particular instructions it checks equivalence of some significant attributes (`load`, `store`, `cmp`, `call, `invoke`, see method contents for full list).
For example for `load` left and right should be with the same alignment.

*** Merging process, mergeTwoFunctions ***

Once MergeFunctions found that current function (“G”) is equal to one that were analyzed before (function “F”) it calls mergeTwoFunctions(Function*, Function*).
Operation affects FnSet contents with next way: “F” will stay in FnSet. “G” being equal to “F” will not be added to FnSet. Calls of “G” would be replaced with something else. It changes bodies of callers. So, functions that calls “G” would be put into Deferred set and removed from FnSet, and analyzed again.
The approach is next:
1. If we can use alias and both of “F” and “G” are weak. It is most wished case. We make both of them with aliases to third strong function “H”. Actually “H” is “F”. See below how its made. In case when we can just replace “G” with “F” everywhere, we can use replaceAllUsesWith operation.
2. “F” could not be overriden, while “G” could. It would be good to do the next: after merging the places where overridable function where used, still use overridable stub.
So try to make “G” alias to “F”, or create overridable tail call wrapper around “F” and replace “G” with that call.
3. Neither “F” nor “G” could be overridden. We can’t use RAUW. We can just change the callers: call “F” instead of “G”. That’s what replaceDirectCallers does.
Below is detailed body description.

** If “F” may be overriden **

As follows from mayBeOverridden comments: “whether the definition of this global may be replaced by something non-equivalent at link time”. If so, thats ok: we can use alias to “F” instead of “G” or change call instructions itself.

* HasGlobalAliases, removeUsers *

First consider the case when we have global aliases of one function name to another. Our purpose is make both of them with aliases to third strong function. Though if we keep “F” alive and without major changes we can leave it in FnSet. Try to combine these two goals.
Do stub replacement of “F” itself with an alias to “F”.
1. Create stub function “H”, with the same name and attributes like function “F”. It takes maximum alignment of “F” and “G”.
2. Replace all uses of function “F” with uses of function “H”. It is two steps procedure instead. First of all, we must take into account, all functions from whom “F” is called, will be changed: since we change the call argument (from “F” to “H”). If so we must to review these caller functions again after this procedure. We remove callers from FnSet, that’s why we call “removeUsers(F)”.
2.1. Inside removeUsers(Value* V) we go through the all values that use value “V” (or “F” in our context). If value is instruction, we go to function that holds this instruction and mark it as to-be-analyzed-again (put to Deferred set), we also remove caller from FnSet.
2.2. Now we can do the replacement: call F->replaceAllUsesWith(H).
3. Get rid of “G”, and get rid of “H”.
4. Set “F” linkage to private. Make it strong 🙂

* No global aliases, replaceDirectCallers *

If global aliases are not supported. We call replaceDirectCallers then. Just go through all calls of “G” and replace it with calls of “F”. If you look into method you will see that it scans all uses of “G” too, and if use is callee (if user is call instruction and “G” is used as what to be called), we replace it with use of “F”.

** If “F” could not be overriden, writeThunkOrAlias **

We call writeThunkOrAlias(Function *F, Function *G). Here we try to replace “G” with alias to “F” first. Next conditions are essential:
* target should support global aliases,
* the address itself of “G” should be not significant, not named and not referenced anywhere,
* function should come with external, local or weak linkage.
Otherwise we write thunk: some wrapper that has “G”s interface and calls “F”, so “G” could be replaced with this wrapper.

* writeAlias(Function *F, Function *G) *

As follows from llvm reference:
“Aliases act as “second name” for the aliasee value”. So we just want to create second name for “F” and use it instead of “G”:
1. create global alias itself (“GA”),
2. adjust alignment of “F” so it must be max of current and “G”s alignment;
3. replace uses of “G”:
3.1. first mark all callers of “G” as to-be-analyzed-again, using removeUsers method (see chapter above),
3.2. call G->replaceAllUsesWith(GA).
4. Get rid of “G”.

* writeThunk(Function *F, Function *G) *

As it written in method comments:
“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G with bitcast(F). Deletes G.”
In general it does the same as usual when we want to replace callee, except the first point:
1. We generate tail call wrapper around “F”, but with interface that allows use it instead of “G”.
2. “As-usual”: removeUsers and replaceAllUsesWith then.
3. Get rid of “G”.

*** That’s it. ***

If you have some questions or additions, please let me know.
Please follow and like us: