Category Archives: Информатика

Compiling OpenCV on mac OS 10.13.6 with CUDA

Just a little hint for myself:

cmake -DCMAKE_BUILD_TYPE=Release \
-DOPENCV_EXTRA_MODULES_PATH=<path to opencv contrib repo>/modules \
<path to opencv source>

Important things here:

  • OPENCV_CUDA_FORCE_BUILTIN_CMAKE_MODULE – is required, due to some weird staff in cmake (cmake internal FindCUDA.cmake seems to be wrong), this comment on opencv github by alalek was extemely usefull and provided us with right hint.
  • WITH_PROTOBUF is also required, otherwise it fails with crazy libprotobuf linker errors.
  • OPENCV_EXTRA_SHARED_LINKER_FLAGS=”-lomp” If you’re using clang 10 or older, you’ll need to to add “-lomp” to linker flags manually, because it ignores “-fopenmp” flag which cmake passes to clang driver in this case. Look’s like it falls just here:

  • Rest of flags looks pretty obvious
Please follow and like us:

C++ sql-like Select example (imperfect)

I just would like to keep it here…

May be there is better implementation? Spent on it 30 mins, have no more time today.

// select.cpp

#include <vector>
#include <iostream>

      typename OutputCollectionT,
      typename OutputItemT = typename OutputCollectionT::value_type
  class Select {
          typename InputCollectionT,
          typename InputItemT = typename InputCollectionT::value_type
      class From {
          static OutputCollectionT Do(
              const InputCollectionT &Input,
              std::function<OutputItemT(const InputItemT &In)> Selector
          ) {
            OutputCollectionT Out;
            for (const InputItemT &In : Input) {
            return Out;

struct A {
    int P1;
    int P2;
    int P3;

struct B {
    int P1;
    int P3;

int main() {
  std::vector<A> aa = {
      {1, 2, 3},
      {2, 3, 4},
      {3, 4, 5}

  auto bb = Select<std::vector<B>>::From<std::vector<A>>::Do(
      aa, [=] (const A &a) -> B {
        return { a.P1, a.P3 };

  for (auto &b : bb) {
    << "P1:" << b.P1 << ", "
    << "P3:" << b.P3 << "\n";

  return 0;
Please follow and like us:

Edge detection shader for text

Hi there! I’m working on text rendering for my small Bird OSD project.

So I want to add contours to the text, so it could be visible whatever background it is rendered on (bright or dark).

For example I want to enhance text rendering for cases like this:

(Ugh… My eyes suffer!)

Into this one:

Assuming we have 1-component color on input which consists only of alpha channel, I want to mark as edge alpha values around 0.5.

Below is my shader which works, and in fact above are screenshots with its demonstraction. It still has some limits though. Edge radius  can’t take values ended with .5 due to special rounding case for N*0.5 values. If you use it, and find more issues, please let me know.

// The inpute textures
uniform sampler2D uTexture;
varying vec2 vTexCoord;  // Interpolated texture coordinate per fragment.

uniform float uOpacity;

uniform float uWidth;

uniform float uHeight;

// If foreground value is higher than threshold, than edge is zero for this pixel
const float NO_EDGE_TRESHOLD = 0.5;

// Edge radius, works fine in range from 0.6 to 2.0
// Please don't use N*0.5 values, since it has special rounding rules
// and pixel at the left may be in is not the same distance comparing to the right.
const float EDGE_RADIUS = 1.;

const vec3 EDGE = vec3(0., 0., 0.);

// Detects whether we should put edge value in the center.
// Not that if the center is foreground value = 1, then there is
// no need in edge
// (in practice we also admit some values below 1,
// determined by threshold).
// We work with fonts, not the regular image, so
// edge is a function from average of two pixels (not the difference):
// f(l, r) = edge((l + r) / 2)
//    assuming edge should be max, when input value is "k" (belongs to range (0, 1) )
// So, how 'edge' function is defined?
// (see picture if formulaes are difficult)
//    edge(v) = (1./k) * x, if x <= k && x > 0
//              otherwise it is line which goes through p1[x,y] = [1, k] and p2[x,y] = [0, 1]
//  Y ^
//    | p[y=1,x=k]
//    |  /\
//    | /  \
//    |/    \
//    ----------->
//    0  k  1    X
//    y=edge(x) formulae
// in case when k = 0.5 then
// f(l, r) = 1 - |l + r - 1|
// Let's use this case!
// params:
//    left - left foreground value
//    center - center foreground value
//    right - right foreground value
// returns:
//    edge value.
float getEdge(float left, float center, float right) {
    if (center > NO_EDGE_TRESHOLD)
        return 0.;

    if (center > left && center > right)
        return 0.;

    float ledge = 1. - abs(left + center - 1.);
    float redge = 1. - abs(right + center - 1.);

    return max(ledge, redge);

float getNeighbour(float row, float col) {
    float dx = EDGE_RADIUS / uWidth;
    float dy = EDGE_RADIUS / uHeight;

    float texX = clamp(vTexCoord.x + col * dx, 0. + dx/2., 1. - dx/2.);
    float texY = clamp(vTexCoord.y + row * dy, 0. + dx/2., 1. - dx/2.);

    return texture2D(uTexture, vec2(texX, texY)).a;

float calcEdge(float centerValue) {
    // Nighbour pixels:
    // neighbour[i][j] is neighbour with X = x + (j-1) * DX; Y = y + (i-1) * DY;
    // neighbour[0][0] is neighbour with X = x - DX; Y = y - DY;
    float neighbour_0[3];
    float neighbour_1[3];
    float neighbour_2[3];

    for (int j = 0; j != 3; ++j)
        neighbour_0[j] = getNeighbour(-1., float(j-1));

    for (int j = 0; j != 3; ++j)
        neighbour_1[j] = getNeighbour(0., float(j-1));

    for (int j = 0; j != 3; ++j)
        neighbour_2[j] = getNeighbour(1., float(j-1));

    float horEdge = getEdge(neighbour_1[0], centerValue, neighbour_1[2]);
    float vertEdge = getEdge(neighbour_0[1], centerValue, neighbour_2[1]);
    float ltrbEdge = getEdge(neighbour_0[0], centerValue, neighbour_2[2]);
    float rtlbEdge = getEdge(neighbour_0[2], centerValue, neighbour_2[0]);

    return max( max(horEdge, vertEdge), max(ltrbEdge, rtlbEdge) );

vec4 calcFinalValue(vec3 foreground, float foregroundValue, float edgeValue) {
#if 1
    float sumFgEdge = foregroundValue + edgeValue;
    vec3 color = vec3(
        foreground * foregroundValue / sumFgEdge +
        EDGE * edgeValue / sumFgEdge);

    return vec4(color.r, color.g, color.b, min(sumFgEdge, 1. * uOpacity));

    //return vec4(EDGE, edgeValue);
    return vec4(foreground.r, foreground.g, foreground.b, foregroundValue);

// The entry point for our fragment shader.
void main()
    vec4 texColor = texture2D(uTexture, vTexCoord);

    float foregroundValue = texColor.a;
    float edgeValue = calcEdge(foregroundValue);

    gl_FragColor = calcFinalValue(
        vec3(texColor.r, texColor.g, texColor.b),
Please follow and like us:

The Bird Project

Bird OSD


You have Raspberry Camera and you need FPV, but you can’t fight 100-200ms latency? Then there is a solution.

Bird OSD turns your Raspberry PI into FPV stream source with OSD overlay.


Since raspberry has Video Composite Output, you can then cast raspberrian screen just like a regular FPV signal over FPV transmitter module!

Raspberry Pi works on broadcomm SoC  with VideoCore processor so that means we can apply OSD overlay to camera stream with really low realtime latencies.

X server is not requried

Bird OSD is a systemd service, it uses raspivid app to grab camera image, and it uses own bird-osd GLES2 application to apply overlay with sensor data on it.

So finally you should see something like this:

(GPS was broken, sorry, still can’t demonstrate in real fly)

Another pic from FPV goggles:


  1. RPI device with sensors board (navio2 is ok)
  2. Raspberry Camera connected to it.
  3. Something sending MAVLink data to ardupilot, arducopter, whatever)

How to install

Download .deb package onto your raspberry device:

$ wget

And then install it:

$ sudo dpkg -i bird-osd_1.1.2_armhf.deb

Then you should target MAVLink channel to

E.g. for arducopter:

$ sudo nano /etc/default/arducopter 

Ensure you have string like this:

TELEM1="-A udp:"

Or like this:

TELEM2="-C udp:"

In case you modified /etc/default/arducopterconfig, then you should restart service:

$ sudo systemctl restart arducopter

Finally you should start bird-osdservice with this command:

sudo systemctl start bird-osd

Then on monitor connected to your raspberryyou should see whatever your camera sees + overlay with sensors data!

It is still very first version:

  1. I only tested it on RPI 3, I added dependency to raspividand to bash:
    libraspberrypi-bin (>= 1.20180417-1), bash (>= 4.4-5)

    Perhaps dependency versions are higher then it really needs, just had no opportunity to test it on another envs.

  2. Do not to blame me guys for not opening sources. There are such a mess, need to sort them first.
  3. It still consumes too much of CPU time. After holidays I’ll work a bit on optimizations. It uses text atlas, but still builds text layout dynamically. It should render every static text to texture; per profiling survey results, it should improve performance on 30-40% (since most of text labels are static).
  4. Any proposals are welcome.

How enable or disable service

If you want to enable bird-osdon boot, you should run:

$ sudo systemctl enable bird-osd

This command disables service:

$ sudo systemctl disable bird-osd

How to uninstall

And this command removes bird-osdfrom you raspberry device:

$ dpkg -P bird-osd

Relevant topics

Edge detection for text – simple edge detection shader for text-like foreground drawings

Please follow and like us:

Cross compilation for Raspberry from Sierra

In very short

If you need to compile something for raspberry just run this:

path/to/clang --target=arm-linux-gnueabihf --sysroot=/some/path/arm-linux-gnueabihf/sysroot my-happy-program.c  -fuse-ld=lld

In command above “arm-linux-gnueabihf” – is my target triple.

If you don’t like LLVM or just need GCC, read Yuzhou Cheng’s article . Or lookup in nets something like “cross compilation for raspberry”. This may help. Below we describe how to do it with LLVM.


We assume that reader knows how to deal with command line. If not, don’t worry, it’s ok, not to know some things in our life. Feel free and just ask questions in comments.

Let’s start

Root FS

Of course you still need rootfs. And also you perhaps need gcc binutils, but perhaps you would like to use ones provided by llvm infrastructure. But. You don’t have to build it, just get it, e.f. from Linux package. But actually I’m looking for solution how to make it enough just to mount my raspberry rootfs.

How to get LLVM

At current moment there are precompiled binaries for Mac OS (go to “Pre-Built Binaries” paragraph):

Or for version 7.0.0 you may run this in terminal:

$ wget

Compiling LLVM from sources

Don’t worry this is a bit different from building gcc. Difference is in statistics fact, that it usually successful and you can really drink cup of coffee.


Below are few brew commands which adds all dependencies we need.

$ brew install swig
$ brew install cmake

Get sources

Get LLVM, Clang, LLD and LLDB sources, once again same link:

Or for 7.0.0:

1. Extract LLVM sources.

2. Extract LLD into llvm/tools/lld

3. Extract LLDB into llvm/tools/lldb

4. Most tricky part: lldb needs to be code signed. This article describes how to to that. Actually you should find it in your lldb sources dir, in lldb/docs/code-signing.txt.

5. Create some binary dir, let say “llvm.darwin-x86_64”, and cd into it.

6. Compile

cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release <path to llvm sources>

make -j<num-parallel jobs, for me it is 8>

7. Test it.

make -j8 check

8. Use it!

Post scriptum

Optionally you may use legacy binutils. In this case install them with brew:

$ brew install arm-linux-gnueabihf-binutils

But I prefer just to use single solution.

CMake Toolchain

Below is my cmake toolchain file, which uses clang (built from sources). Hope it will be useful for you.



# Custom toolchain-specific definitions for your project

# There we go!
# Below, we specify toolchain itself!

SET(TARGET_TRIPLE arm-linux-gnueabihf)

# Specify your target rootfs mount point on your compiler host machine

# Specify clang paths
SET(LLVM_DIR /Users/stepan/projects/shared/toolchains/llvm-7.0.darwin-release-x86_64/install)
SET(CLANG ${LLVM_DIR}/bin/clang)
SET(CLANGXX ${LLVM_DIR}/bin/clang++)

# Specify compiler (which is clang)

# Specify binutils

SET (CMAKE_AR      "${LLVM_DIR}/bin/llvm-ar" CACHE FILEPATH "Archiver")
SET (CMAKE_NM      "${LLVM_DIR}/bin/llvm-nm" CACHE FILEPATH "NM")
SET (CMAKE_OBJDUMP "${LLVM_DIR}/bin/llvm-objdump" CACHE FILEPATH "Objdump")
SET (CMAKE_RANLIB  "${LLVM_DIR}/bin/llvm-ranlib" CACHE FILEPATH "ranlib")

# You may use legacy binutils though.
#SET(BINUTILS /usr/local/Cellar/arm-linux-gnueabihf-binutils/2.31.1)

# Specify sysroot (almost same as rootfs)

# Specify lookup methods for cmake

# Sometimes you also need this:

# Specify raspberry triple
set(CROSS_FLAGS "--target=${TARGET_TRIPLE}")

# Specify other raspberry related flags

# Gather and distribute flags specified at prev steps.

# Use clang linker. Why?
# Well, you may install custom arm-linux-gnueabihf binutils,
# but then, you also need to recompile clang, with customized triple;
# otherwise clang will try to use host 'ld' for linking,
# so... use clang linker.

Sometimes you need to run “cmake” twice, for first compilation gives you this:

error: invalid linker name in argument '-fuse-ld=lld;-fuse-ld=lld'

I have no idea why that happens. Rerunning cmake really helps.

Ok, that’s it.

Message me if you feel lonely dude, I’m still on it, it will try to help!

Please follow and like us:

Machine FP partial invariance issue

Invariance issue

In computer representation:

“a + b + c” and “a + c + b” is not the same!
(and not the samefor multiplicationas well).

Hallelujah! I finally got that simple fact! After so many years of working in IT industry and software development!Well, Ikind of knew this, but never took it seriously until recently
If you guys are curious how ape dealt with getting bananatask
If you are same late as I am, read bellow.

Floating point machine representation

Usuallyfloating point number is represented as follows:
v = m * (be)


m– is the mantissa, an integer with limited range. For example, for decimal numbers it could be in range from 0 to 99. For 24 bit binary numbers it is in range from 0 to (224-1), or from 0 to 16777215.
b– is the base, usually b = 2, an integer value,
e– is exponent, integer, it could take both negative and positive values.
For example in decimal numbers representation 0.5 is represented as:
0.5 = 5 * 10-1 (here m=5, b=10, e=-1)
For binary numbers 0.5 is 2-1 (m=1, b=2, e=-1)

Some people know, that in order to store bigger numbers we need more space in memory. But bigger precision also requires more memory, for we need mantissa of greater width, and thus we also need more bits to store it.

Integer vs float

While working with regular integer numbers we also having data loss and overflow issues, and yet we’re able to control it. We keep in mind minimum and maximum possible integer results, and this know when overflow might happen.
Floating point numbers is different. AFAIK no sane people control mantissa overflow, except perhaps some really rare cases. So here, better to think it just happens all the time.

Inevitable data loss

It is impossible to store numbers with infinite precision, and thus, data loss is inevitable. It’s obvious, but easy to miss if you had never dealt with some cases.
We can’t work with exact real number “N”…
We only able to work with its nearest machine floating pointrepresentation, fp(N) or:
N* = fp(N)

For mantissa in range 0 .. 999 we have next errors.
Number9999will be stored as
v = fp(9999) = 999e+1 = 9990
(here we lost info about most right “9”)

and number1.001will be stored just as
v = fp(1.001)=1
(here we lost info about most right “1”)

a + b + c

Actually v = a + b + c is performed in two steps:
Step 1: x = a + b
Step 2: v = x + c
Or with respect to fp transformation:
Step 1: x = fp(a + b)
Step 2: v = fp(x + c)
By changing the order of sum components, we in fact change what we’re going to loss on each step. And by changing order of band c we get different data loss, just like a final result.


Let’s demonstrate it on the next example.
  • mantissa can store up to 2 decimal digits, and thus in range 0 .. 99.
  • base is 10.
  • exponent could be any, for it doesn’t matter here really.
Let’s use values:
a = 99 (m=99, e = 0)
b = 10 (m=1, e = 1)
c = 1 (m=1, e = 0)
And consider the difference of “a+b+c” and “a+c+b”:
a + b +c:
fp(a+b) = fp(99+10) = fp(109) = 100
v = fp( fp(a+b) + c ) = fp(100 + 1) = fp(101) = 100

a + c + b:
fp(a+c) = fp(99+1) = fp(100) = 100
v = fp( fp(a+c) + b ) = fp(100 + 10) = fp(110) = 110
Unbelievable for regular people, but so obvious to programmers (and yet unbelievable):
(a + b + c = 100) ≠ (a + c + b = 110)

Well, to be more correct:
( fp(a + b + c) = 100 ) ≠ ( fp(a + c + b) = 110)


As one of solutions, wider mantissa should be used for result, and only after all operation items participated in result, it then may be truncated to fp number with thinner mantissa.
If items have mantissa of N bits, then

  • for sum of M+1 items result should have M+N  bits mantissa,
  • for multiplication of M items result should have M*N bits mantissa.

Real example written on C is below.



// Helpers declaration, for implementation scroll down
float getAllOnes(unsigned bits);
unsigned getmantissasaBits();

int main() {

// Determine mantissasa size in bits
unsigned mantissasaBits = getmantissasaBits();

// Considering mantissasa has only 3 bits, we would then need:
// a = 0b10 m=1, e=1
// b = 0b110 m=11, e=1
// c = 0b1000 m=1, e=3

float a = 2,
b = getAllOnes(mantissasaBits) - 1,
c = b + 1;

float ab = a + b;
float ac = a + c;

float abc = a + b + c;
float acb = a + c + b;

"FP partial invariance issue demo:n"
"mantissasa size = %i bitsn"
"a = %.1fn"
"b = %.1fn"
"c = %.1fn"
"(a+b) result: %.1fn"
"(a+c) result: %.1fn"
"(a + b + c) result: %.1fn"
"(a + c + b) result: %.1fn"
"diff(a + b + c, a + c + b) = %.1fnn",
a, b, c,
ab, ac,
abc, acb,
abc - acb);

return 1;

// Helpers

float getAllOnes(unsigned bits) {
return (unsigned)((1 << bits) - 1);

unsigned getmantissasaBits() {

unsigned sz = 1;
unsigned unbeleivableHugeSize = 1024;
float allOnes = 1;

for (;sz != unbeleivableHugeSize &&
allOnes + 1 != allOnes;
allOnes = getAllOnes(++sz)
) {}

return sz-1;


FP partial invariance issue demo:

mantissasa size = 24 bits

a = 2.0
b = 16777214.0
c = 16777215.0
(a+b) result: 16777216.0
(a+c) result: 16777216.0
(a + b + c) result: 33554432.0
(a + c + b) result: 33554430.0
diff(a + b + c, a + c + b) = 2.0

Please follow and like us:

LLVM, MergeFunctions pass


**** MergeFunctions pass ****

Sometimes code contains functions that does exactly the same things even though they are non-equal on binary level. It could happen due to several reasons: mainly the usage of templates and automatic code generators. Though sometimes user itself could write same thing twise 🙂
The main purpose of pass is to recognize equal functions and merge them.

*** MergeFunctions, main fields and runOnModule ***

The are two key fields in class:
FnSet – the set of all unique functions. It keeps items that couldn’t be merged with each other.
Deferred – merging process can affect bodies of functions that are in FnSet already. These functions should be checked again. In that case we remove them from FnSet, and mark then as to be analyzed again: put them into Deferred list.

** runOnModule **

The algorithm is pretty simple:
1. Put all module’s functions into worklist.
2. Scan worklist’s functions twice: first enumerate only strong functions and then only weak functions:
2.1. (inside loop body) Take function from worklist and try to insert it into FnSet: check whether it equal to one of functions in FnSet. If there is equal function in FnSet: merge function token from worklist with that equal function from FnSet. Otherwise add function from worklist to FnSet.
3. After worklist scanning and merging operations complete, check Deferred list. If it is not empty: refill worklist contents with Deferred list and do 2 again, or exit from method otherwise (Deferred is empty).

* Narrative structure *

Article consists of two parts. First part describes comparison procedure itself. The second one describes the merging process.
Description will be in top-down form. First, top-level methods will be described. While the terminal ones will be at the end, in the tail of each part.
Few more words about top-level and complex objects comparison. Complex objects comparison function, basic-block, etc) is mostly based on its sub-objects comparison results. So, again, if reader will see the reference to method that wasn’t described yet, he will find its description a bit below.

*** Merge Functions Pass Comparison Algorithm, “compare” method ***

Comparison starts in “FunctionComparator::compare” method.
1. First parts to be compared are function attributes and some properties that outsides “attributes” term, but still could make function different without changing its body. This part of comparison is done within simple == operator (e.g. F1->hasGC() == F2->hasGC()). There are full list of function properties to be compared on this stage:
* Attributes (those are returned by Function::getAttributes() method).
* GC, for equivalence, RHS and LHS should be both either without GC or with the same one.
* Section, like a GC, RHS and LHS should be defined in the same section.
* Variable arguments. If LHS and RHS should be both either with or without var-args.
* Calling convention should be the same.
2. Function type.
Checked by FunctionComparator::isEquivalentType(Type*, Type*) method. It checks return type and parameters type; the method itself will be described later.
3. Associate function formal parameters with each other. Then during stage of function bodies, if we see usage of 1st argument from LEFT function, we want to see it in RIGHT at the same place, otherwise functions are different. This is done by “FunctionComparator::enumerate(const Value*, const Value*)” method (will be described a bit later).
4. Function body comparison. As its written in method comments:
“We do a CFG-ordered walk since the actual ordering of the blocks in the linked list is immaterial. Our walk starts at the entry block for both functions, then takes each block from each terminator in order. As an artifact, this also means that unreachable blocks are ignored.”
So, using this walk we get BBs from LEFT and RIGHT in the same order, and compare them by “FunctionComparator::compare(const BasicBlock*, const BasicBlock*)” method.
We also associate BBs with each other like we did with function formal arguments: Then if we meet reference to basic block “A” in LHS, we want to see reference to “A`” in RHS at the same place, and “A`” ought to be associated with “A”. Otherwise functions are different.

** FunctionComparator::isEquivalentType **

Let’s describe this comparison in six steps.
0. If left type is pointer, try to coerce it to integer type. It could be done if its address space is 0, or if address spaces are ignored at all. Do the same thing for right type.
1. It returns true if left and right types are equal:
“if (LeftTy == RightTy) return true;”
2. If types are of different kind (different type IDs). Return “false”.
Below cases when we have same type IDs goes.
3. If types are vectors or integers, return result of its pointers comparison.
4. Check left type by its ID, whether it belongs to the next group (call it equivalent-group):
* Void
* Float
* Double
* X86_FP80
* FP128
* PPC_FP128
* Label
* Metadata
Method treats LEFT and RIGHT as equals (return true). Since in that case its enough to see equivalent IDs. Note, if left belongs to this group, while right doesn’t, or right just has different typeID we return “false”.
5. Left and right are pointers, then they are equal if and only if they belongs to the same address space.
6. If left type is complex (structure, function or array, whatever else), and if right type is of the same kind.
Then both LEFT and RIGHT will be expanded and their element types will be checked with the same way.
Method treats them as equal if they are of the same kind and all their element types are equal as well.
7. Otherwise method returns “false”. Even if types has the same TypeID, we can’t treat them as equals. Instead there are now other cases, and its point to put llvm_unreachable call.
Special note about case with pointers and integers. Its a point of false-positive now. Consider next case on 32bit machine:
void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)
Here: foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.
As you can see it breaks transitivity. That means that result depends on order of how functions are presented in module. Next order causes merging of foo0 and foo1:
foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be merged with foo2.
This case looks like a bug and it is under discussion now (see PR17925).

** FunctionComparator::enumerate(const Value*, const Value*) **

Main purpose is to associate Value from left with Value from right. If we see usage of Value “A” at left, we expect to see usage of “A`” at right, at the same place, and we also expect to see “A`” associated with “A”.
Method returns “true” if values are associated already (implicitly by its nature, or explicitly by helper structures in MergeFunction pass).
Method returns “false” if values could not be associated. It indicates to caller-side, that things are being compared could not be equal.
We associate (we use “enumerate” for):
* Function arguments. i-th argument from left function associated with i-th argument from right function.
* BasicBlock instances. In basic-block enumeration loop we associate i-th BasicBlock from LEFT with i-th BasicBlock from RIGHT.
* Instructions.
* Instruction operands. Note, we can meet Value here we have never seen before. In this case it is not function argument nor BasicBlock, nor Instruction. It is global value. That means it is constant (its supposed to be seen here, at least). Method only accepts as equal next:
* Constants that are of the same type and value
* Right constant could be losslessly bit-casted to the left.
Otherwise method returns “false”.
Below is the detailed method body description.
Method performs next four things:
1. If left Value is left/right Function instance, then right Value should be right/left Function instance. If so: return true.
Note we return true for self-reference, and for cross-reference, in example below fact0 is equal to fact1, and ping is equal to pong as well:
// self-reference
unsigned fact0(unsigned n) { return n > 1 ? n * fact0(n-1) : 1; }
unsigned fact1(unsigned n) { return n > 1 ? n * fact1(n-1) : 1; }
// cross-reference
unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0; }
unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; }
Though, the ping-pong case is pretty seldom in real live.
Otherwise we go to next stage.
2. If left Value is constant. Method returns true in cases:
* Right one is the same constant,
* Both LEFTt and RIGHT are null values of the same type (it invokes isEquivalentType method),
* RIGHT could be losslessly bit-casted to the LEFT.
Otherwise method returns “false”.
3. If left is InlineAsm instance. The right should be the same instance then; if so: we return true. Otherwise return false.
4. Explicit association of L (left value) and R (right value).
Now follow the logic. We can associate values were not associated before. New values for us. Since “enumerate” is called for values that stays at the same place of their functions, we met them first at the same place. It is important.
MergeFunction pass has two helper data structures:
* id_map – is map of format map. Keeps track of all associated values. With left value as a key.
* seen_values – set of right values for whom there already was attempt to create an association.
On this stage method checks id_map[L].
* If it is not null, L is already associated with something, the result of id_map[L] == R comparison is returned in this case.
* If it is null, then we see this value first time, if R was not associated yet (seen_values.insert(R) returns “true”), we do the association: setup R as value for id_map[L].
Otherwise: there is still no association for L, but R was associated before, so method returns “false”.

*** compare(const BasicBlock*, const BasicBlock*) ***

Compares two BasicBlock instances.
It enumerates instructions from left BB and right BB.
1. It associates left instruction with right one, using “enumerate” method.
2. If left is GEP, it compares them using isEquivalentGEP method. Since we have some optimizations for this case.
3. Otherwise method ensures that LEFT and RIGHT performs the same operation (isEquivalentOperation) and its operands are equal: left operand should be properly associated with right one, and it should be of the same type (isEquivalentType).

*** isEquivalentGEP ***

Compares two GEPs.
There is an optimization for case, where offset is provided by constant values for both left and right GEPs. We calculate final offset for both of them using accumulateConstantOffset method. If we got same offset for left and right: return true.
Otherwise we don’t know what the final offset is. Compare GEP’s operands (as we do for all other instructions).

*** isEquivalentOperation ***

Compares instruction opcodes and some important operation properties.
It returns false in one of next cases:
* opcodes are different,
* number of operands are different,
* operation types are different,
* operation optional flags are different (checked by hasSameSubclassOptionalData method),
* operand types are different.
* Also for some particular instructions it checks equivalence of some significant attributes (`load`, `store`, `cmp`, `call, `invoke`, see method contents for full list).
For example for `load` left and right should be with the same alignment.

*** Merging process, mergeTwoFunctions ***

Once MergeFunctions found that current function (“G”) is equal to one that were analyzed before (function “F”) it calls mergeTwoFunctions(Function*, Function*).
Operation affects FnSet contents with next way: “F” will stay in FnSet. “G” being equal to “F” will not be added to FnSet. Calls of “G” would be replaced with something else. It changes bodies of callers. So, functions that calls “G” would be put into Deferred set and removed from FnSet, and analyzed again.
The approach is next:
1. If we can use alias and both of “F” and “G” are weak. It is most wished case. We make both of them with aliases to third strong function “H”. Actually “H” is “F”. See below how its made. In case when we can just replace “G” with “F” everywhere, we can use replaceAllUsesWith operation.
2. “F” could not be overriden, while “G” could. It would be good to do the next: after merging the places where overridable function where used, still use overridable stub.
So try to make “G” alias to “F”, or create overridable tail call wrapper around “F” and replace “G” with that call.
3. Neither “F” nor “G” could be overridden. We can’t use RAUW. We can just change the callers: call “F” instead of “G”. That’s what replaceDirectCallers does.
Below is detailed body description.

** If “F” may be overriden **

As follows from mayBeOverridden comments: “whether the definition of this global may be replaced by something non-equivalent at link time”. If so, thats ok: we can use alias to “F” instead of “G” or change call instructions itself.

* HasGlobalAliases, removeUsers *

First consider the case when we have global aliases of one function name to another. Our purpose is make both of them with aliases to third strong function. Though if we keep “F” alive and without major changes we can leave it in FnSet. Try to combine these two goals.
Do stub replacement of “F” itself with an alias to “F”.
1. Create stub function “H”, with the same name and attributes like function “F”. It takes maximum alignment of “F” and “G”.
2. Replace all uses of function “F” with uses of function “H”. It is two steps procedure instead. First of all, we must take into account, all functions from whom “F” is called, will be changed: since we change the call argument (from “F” to “H”). If so we must to review these caller functions again after this procedure. We remove callers from FnSet, that’s why we call “removeUsers(F)”.
2.1. Inside removeUsers(Value* V) we go through the all values that use value “V” (or “F” in our context). If value is instruction, we go to function that holds this instruction and mark it as to-be-analyzed-again (put to Deferred set), we also remove caller from FnSet.
2.2. Now we can do the replacement: call F->replaceAllUsesWith(H).
3. Get rid of “G”, and get rid of “H”.
4. Set “F” linkage to private. Make it strong 🙂

* No global aliases, replaceDirectCallers *

If global aliases are not supported. We call replaceDirectCallers then. Just go through all calls of “G” and replace it with calls of “F”. If you look into method you will see that it scans all uses of “G” too, and if use is callee (if user is call instruction and “G” is used as what to be called), we replace it with use of “F”.

** If “F” could not be overriden, writeThunkOrAlias **

We call writeThunkOrAlias(Function *F, Function *G). Here we try to replace “G” with alias to “F” first. Next conditions are essential:
* target should support global aliases,
* the address itself of “G” should be not significant, not named and not referenced anywhere,
* function should come with external, local or weak linkage.
Otherwise we write thunk: some wrapper that has “G”s interface and calls “F”, so “G” could be replaced with this wrapper.

* writeAlias(Function *F, Function *G) *

As follows from llvm reference:
“Aliases act as “second name” for the aliasee value”. So we just want to create second name for “F” and use it instead of “G”:
1. create global alias itself (“GA”),
2. adjust alignment of “F” so it must be max of current and “G”s alignment;
3. replace uses of “G”:
3.1. first mark all callers of “G” as to-be-analyzed-again, using removeUsers method (see chapter above),
3.2. call G->replaceAllUsesWith(GA).
4. Get rid of “G”.

* writeThunk(Function *F, Function *G) *

As it written in method comments:
“Replace G with a simple tail call to bitcast(F). Also replace direct uses of G with bitcast(F). Deletes G.”
In general it does the same as usual when we want to replace callee, except the first point:
1. We generate tail call wrapper around “F”, but with interface that allows use it instead of “G”.
2. “As-usual”: removeUsers and replaceAllUsesWith then.
3. Get rid of “G”.

*** That’s it. ***

If you have some questions or additions, please let me know.
Please follow and like us: