python – How to speed up this numba based decision tree?

In order to improve the performance of this code fragment, I have speeded up some code fragments by using the numba. However, a frustrating thing is that the performance of this decision tree is bad. For example, in the below code fragment, the performance of the decision tree is nearly 70 times slower than the implementation in Scikit-learn (220 ms vs 3 ms). Therefore, how can I achieve better performance by modifying this code?

import time

import numpy as np
from numba import njit
from sklearn.base import BaseEstimator
from sklearn.datasets import make_hastie_10_2
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


class Node:
    def __init__(self, left, right, rule):
        self.left = left
        self.right = right
        self.feature = rule(0)
        self.threshold = rule(1)


class Leaf:
    def __init__(self, value):
        """
        `value` is an array of class probabilities if classifier is True, else
        the mean of the region
        """
        self.value = value


class DecisionTree(BaseEstimator):
    def __init__(
            self,
            classifier=True,
            max_depth=None,
            n_feats=None,
            criterion="entropy",
            seed=None,
            regularized=False,
    ):
        """
        A decision tree model for regression and classification problems.

        Parameters
        ----------
        classifier : bool
            Whether to treat target values as categorical (classifier =
            True) or continuous (classifier = False). Default is True.
        max_depth: int or None
            The depth at which to stop growing the tree. If None, grow the tree
            until all leaves are pure. Default is None.
        n_feats : int
            Specifies the number of features to sample on each split. If None,
            use all features on each split. Default is None.
        criterion : {'mse', 'entropy', 'gini'}
            The error criterion to use when calculating splits. When
            `classifier` is False, valid entries are {'mse'}. When `classifier`
            is True, valid entries are {'entropy', 'gini'}. Default is
            'entropy'.
        seed : int or None
            Seed for the random number generator. Default is None.
        """
        if seed:
            np.random.seed(seed)

        self.depth = 0
        self.root = None

        self.n_feats = n_feats
        self.criterion = criterion
        self.classifier = classifier
        self.max_depth = max_depth if max_depth else np.inf

        self.factor = 1.2
        self.regularized = regularized
        self.feature_importances_ = None

        if not classifier and criterion in ("gini", "entropy"):
            raise ValueError(
                "{} is a valid criterion only when classifier = True.".format(criterion)
            )
        if classifier and criterion == "mse":
            raise ValueError("`mse` is a valid criterion only when classifier = False.")

    def fit(self, X, Y):
        """
        Fit a binary decision tree to a dataset.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features
        Y : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
            An array of integer class labels for each example in `X` if
            self.classifier = True, otherwise the set of target values for
            each example in `X`.
        """
        self.n_classes = max(Y) + 1 if self.classifier else None
        self.feature_importances_ = np.zeros(X.shape(1))
        self.used_variables = np.zeros(X.shape(1))
        self.n_feats = X.shape(1) if not self.n_feats else min(self.n_feats, X.shape(1))
        self.root = self._grow(X, Y)
        self.feature_importances_ = self.feature_importances_ / np.sum(self.feature_importances_)

    def predict(self, X):
        """
        Use the trained decision tree to classify or predict the examples in `X`.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features

        Returns
        -------
        preds : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
            The integer class labels predicted for each example in `X` if
            self.classifier = True, otherwise the predicted target values.
        """
        return np.array((self._traverse(x, self.root) for x in X))

    def predict_class_probs(self, X):
        """
        Use the trained decision tree to return the class probabilities for the
        examples in `X`.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features

        Returns
        -------
        preds : :py:class:`ndarray <numpy.ndarray>` of shape `(N, n_classes)`
            The class probabilities predicted for each example in `X`.
        """
        assert self.classifier, "`predict_class_probs` undefined for classifier = False"
        return np.array((self._traverse(x, self.root, prob=True) for x in X))

    def _grow(self, X, Y, cur_depth=0):
        # if all labels are the same, return a leaf
        if len(set(Y)) == 1:
            if self.classifier:
                prob = np.zeros(self.n_classes)
                prob(Y(0)) = 1.0
            return Leaf(prob) if self.classifier else Leaf(Y(0))

        # if we have reached max_depth, return a leaf
        if cur_depth >= self.max_depth:
            v = np.mean(Y, axis=0)
            if self.classifier:
                v = np.bincount(Y, minlength=self.n_classes) / len(Y)
            return Leaf(v)

        cur_depth += 1
        self.depth = max(self.depth, cur_depth)

        N, M = X.shape
        feat_idxs = np.random.choice(M, self.n_feats, replace=False)

        # greedily select the best split according to `criterion`
        feat, thresh, best_gain = _segment(self.criterion, self.regularized, self.used_variables, self.factor,
                                           X, Y, feat_idxs)
        self.feature_importances_(feat) += len(X) * best_gain
        self.used_variables(feat) = 1

        l = np.argwhere(X(:, feat) <= thresh).flatten()
        r = np.argwhere(X(:, feat) > thresh).flatten()

        # grow the children that result from the split
        left = self._grow(X(l, :), Y(l), cur_depth)
        right = self._grow(X(r, :), Y(r), cur_depth)
        return Node(left, right, (feat, thresh))

    def _traverse(self, X, node, prob=False):
        if isinstance(node, Leaf):
            if self.classifier:
                return node.value if prob else node.value.argmax()
            return node.value
        if X(node.feature) <= node.threshold:
            return self._traverse(X, node.left, prob)
        return self._traverse(X, node.right, prob)


@njit()
def _segment(criterion, regularized, used_variables, factor, X, Y, feat_idxs):
    """
    Find the optimal split rule (feature index and split threshold) for the
    data according to `self.criterion`.
    """
    best_gain = -np.inf
    split_idx, split_thresh = None, None
    for i in feat_idxs:
        vals = X(:, i)
        levels = np.unique(vals)
        thresholds = (levels(:-1) + levels(1:)) / 2 if len(levels) > 1 else levels
        gains = np.array((_impurity_gain(criterion, Y, t, vals) for t in thresholds))

        if regularized and used_variables(i):
            gains = gains * factor

        if gains.max() > best_gain:
            split_idx = i
            best_gain = gains.max()
            split_thresh = thresholds(gains.argmax())

    return split_idx, split_thresh, best_gain


@njit()
def _impurity_gain(criterion, Y, split_thresh, feat_values):
    """
    Compute the impurity gain associated with a given split.

    IG(split) = loss(parent) - weighted_avg(loss(left_child), loss(right_child))
    """
    if criterion == "entropy":
        parent_loss = entropy(Y)
    elif criterion == "gini":
        parent_loss = gini(Y)
    elif criterion == "mse":
        parent_loss = mse(Y)
    else:
        raise Exception

    # generate split
    left = np.argwhere(feat_values <= split_thresh).flatten()
    right = np.argwhere(feat_values > split_thresh).flatten()

    if len(left) == 0 or len(right) == 0:
        return 0

    # compute the weighted avg. of the loss for the children
    n = len(Y)
    n_l, n_r = len(left), len(right)
    if criterion == "entropy":
        e_l, e_r = entropy(Y(left)), entropy(Y(right))
    elif criterion == "gini":
        e_l, e_r = gini(Y(left)), gini(Y(right))
    else:
        raise Exception

    child_loss = (n_l / n) * e_l + (n_r / n) * e_r

    # impurity gain is difference in loss before vs. after split
    ig = parent_loss - child_loss
    return ig


@njit()
def mse(y):
    """
    Mean squared error for decision tree (ie., mean) predictions
    """
    return np.mean((y - np.mean(y)) ** 2)


@njit()
def entropy(y):
    """
    Entropy of a label sequence
    """
    hist = np.bincount(y)
    ps = hist / np.sum(hist)
    # return -np.sum((p * np.log2(p) for p in ps if p > 0))
    return -np.array((p * np.log2(p) for p in ps if p > 0)).sum()


@njit()
def gini(y):
    """
    Gini impurity (local entropy) of a label sequence
    """
    hist = np.bincount(y)
    N = np.sum(hist)
    return 1 - np.array(((i / N) ** 2 for i in hist)).sum()


if __name__ == '__main__':
    n_samples = 1000
    X, y = make_hastie_10_2(n_samples=n_samples)
    y(y < 0) = 0
    X = X.astype(np.float)
    y = y.astype(np.int)
    test_size = 0.2
    print(X.shape)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)

    ts = time.time()
    tree = DecisionTree(seed=0, criterion='gini', max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))

    ts = time.time()
    tree = DecisionTree(seed=0, criterion='gini', max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))

    ts = time.time()
    tree = DecisionTreeClassifier(random_state=0, max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))
```

Constructing a Minimum Spanning Tree

Suppose that I have $n$ nodes and $m$ undirected weighted edges.

I also have $k$ sequences, where the $i$-th sequence is a permutation of $n-1$ integers (representing the nodes) from $(2…n)$. Each sequence is guaranteed to be unique, and there are no duplicate integers within each sequence.

A sequence is satisfied if for every node $u$ of the sequence, there exist an edge $(u, v)$ such that $v$ has already been “added” earlier in the sequence.

I am required to construct a minimum spanning tree that satisfies all $k$ sequences starting from the node labelled $1$.

Is there an efficient way to solve this?

Proof that “local” minimum spanning tree is “global” minimum spanning tree

I’m trying to understand a proof from the book “Graph Theory with Application to Engineering & Computer Science” by Narsingh Deo.

The chapter is about trees in non oriented graphs.

A bit of terminology so that you can understand the theorem and the beginning of the proof from the book:

The author calls minimum spanning trees, shortest spanning trees.

The author calls a branch of a spanning tree any edge of the tree.

A chord of a spanning tree is any edge of the underlying graph that is not in the tree.

A fundamental circuit associated to a spanning tree is a circuit formed by adding one of its chords to a spanning tree (for the author a “circuit” is a closed path, there is no repetition of vertices, it’s what most other sources I’ve read call a cycle). So, a fundamental circuit associated to a spanning tree is a actually a cycle formed by adding one of its chords to a spanning tree.

The distance between two spanning trees $S$ and $T$ of the same graph is (regarding $S$ and $T$ as sets of edges), is $|Ssetminus T|$ (which happens to be equal to $|Tsetminus S|$).

There’s a step in the proof of Theorem 3-16 that I don’t understand.

Theorem 3-16:

A spanning tree T (of a given weighted connected Graph G) is a shortest spanning tree (of G) if and only if there exists no other spanning tree (of G) at a distance of one from T whose weight is smaller than that of T

Proof:

Let $T_1$ be a spanning tree in G satisfying the hypothesis of the theorem (i.e. there is no spanning tree at a distance of one from $T_1$ which is shorter than $T_1$). The proof will be completed by showing that if $T_2$ is a shortest spanning tree (different from $T_1$) in G, the weight of $T_1$ will also be equal to that of $T_2$. Let $T_2$ be a shortest spanning tree in G. Clearly, $T_2$ must also satisfy the hypothesis of the theorem (otherwise there will be a spanning tree shorter than $T_2$ at a distance of one from $T_2$, violating the assumption that $T_2$ is shortest).

Consider an edge $e$ in $T_2$ which is not in $T_1$. Adding $e$ to $T_1$ forms a fundamental circuit with branches in $T_1$. Some, but not all, of the branches in $T_1$ that form the fundamental circuit with $e$ may also be in $T_2$; each of these branches in $T_1$ has a weight smaller than or equal to that of $e$, because of the assumption on $T_1$. Amongst all those edges in this circuit which are not in $T_2$ at least one, say $b_j$, must form some fundamental circuit (with respect to $T_2$) containing $e$.

I’m stuck at the last sentence I just quoted:

“Amongst all those edges in this circuit which are not in $T_2$ at least one, say $b_j$, must form some fundamental circuit (with respect to $T_2$) containing $e$.”

I don’t see why among those cycles, there should necessarily be one that contains $e$. Why is that?

custom roms – Q: How to build lineage os 15.1 device tree from existing device tree?

I want to build a lineage os 15.1 rom, I am new to it can someone help me how can I make a existing device tree (supports treble) work for lineage os 15.1 it has no vendorsetup.sh so it does not show up while building the rom

Link to existing device tree:

https://github.com/bartcubbins/device_sony_tulip-mainline

ai – Behavior tree: why?

Taken from a reddit thread: “Most of the non-blackboard AI functions are really buggy anyways, and I was only able to get error tolerant AI using the behavior tree.”

  • this indicates that if using Behavior trees, it is harder to make bad errors in your AI

Taken from a reddit thread: “In my (admittedly very limited) experience, it’s biggest benefit is that it’s a tool designed around creating AI specifically. It makes you structure your logic in a way that is visually easier to troubleshoot in simulation as your AI gets more complicated.. but logically I don’t think it does anything magical that couldn’t be done without it.”

  • this also indicates the same and also that the more complex your AI, the more Behavior trees shine.

Advantages of using behavior tree? So far it’s more work than it’s worth. from unrealengine

Efficient insertion of deterministic acyclic finite state automaton (DAFSA) into a suffix tree?

I have a problem where I can easily construct a set of deterministic acyclic finite state automata (DAFSA) and I need a full-text search index, e.g. a suffix tree, over all the words encoded by these automata.

Is there an efficient algorithm for inserting all the words encoded in a DAFSA into a suffix tree? By efficient I mean that the complexity must not depend on the on the total number of words encoded in the DAFSA.

data structures – What exactly is the difference between a Balanced Binary Search Tree and an AVL tree?

I’m learning some Data Structures and I cannot figure out the difference between the Balanced BST and the AVL Tree. From my understanding, an AVL tree is a balanced tree with the height difference <= 1.

But then what about the Balanced Binary Search Tree? What’s the difference here?

Let’s take an example:

I have a sorted array X = {1,2,3,4,5} and if I insert each of the values in X into a Balanced Binary Search Tree, I get the PreOrder output as 3 1 2 4 5. However, if I insert each of the values of X in an AVL tree, I get the PreOrder output as 2 1 4 3 5.

I notice that an AVL is always self-balancing, but it seems that a balanced BST is not necessarily an AVL.

I am quite confused about this. Is there something I am missing here?

bitcoin core – Do full nodes store the complete merkle tree or do they regenerate it when creating a merkle proof?

I understand what the merkle root is for. And I understand that blocks don’t store the merkle tree.

Question 1) Is there any place that the complete merkle trees get stored? I don’t mean the merkle root hashes since I know they are in the block headers.

Question 2) Let’s say a full node starts proving to a light node that a specific transaction is in Block J. How does the full node send the merkle branch to the light node? Does it loop through the transactions again to get the hashes and then sends the interior node hashes of transactions, or do full nodes already have the complete merkle tree (whole tree and each internal hash) stored somewhere?