decision problem – Randomized Polynomial time algorithm for bipartite perfect matching

I know that deciding whether there is a perfect matching in a given bipartite graph is in RP. Specifically, given a bipartite graph, one can show that there is polynomial-time randomised algorithm that answers “there is no perfect matching” w.p 1 when there is no perfect matching, and answers “there is a perfect matching” w.p at least 1/2 when there is a perfect matching.

My question is: how can I use the above algorithm to find a specific perfect matching (with the same one-sided error), and not only decide if there is a perfect matching?

I am familiar with the “Decision to Search in NP”, I tried to do something similar here, but did not succeed to bound the probability. Here is my attempt:

Given a Bipartite graph G = (V, E), use the above algorithm to decide if there is a perfect matching. If the answer is no, then answer “No”. If the answer is yes, consider some edge e = (u, v) in E, and remove u and v from the graph to get the graph G’. Then ask again if there is a perfect matching in G’. If the answer is yes, then a perfect matching M’ of G’ with the edge (u, v) are a perfect matching in G. Otherwise, if the answer is no, then we know that the edge (u, v) is not in any perfect matching of G. So, we remove the edge e from G (u and v remain in the graph), and we continue recursively.

Things that I know:

  1. If the algorithm that decides whether there is a perfect matching or not, is nonrandomised, then my suggestion actually produces a perfect matching of G, and it does so by applying the algorithm at most |E| times.

  2. I know that if a language has a randomised algorithm with one sided error 1/2, then the error can be amplified to $1/2^k$ where k is the length of the input.

I did not succeed to show that my suggestions works. That is, I did not succeed to show that it succeeds to produce a perfect matching w.p at least 1/2 when there is a perfect matching. (When there is no perfect matching, it clearly answers “No” w.p 1).

Thanks

decision tree – Rummy Card Game Bot

in my beginner’s computer science class, we are asked to create an bot for a variant of rummy. It has to make logical moves as it needs to beat other very simple bots. I wanted to know how you would go about coding such program. From what I’ve seen in a few Google searches, people seem to rely heavily on decision trees, but I feel like there must be a more efficient ways of dealing with this kind of problem (without completely ditching the trees). I’m not looking for any code, I’d just like to learn how to address this kind of problem.

Thank you

python – How to speed up this numba based decision tree?

In order to improve the performance of this code fragment, I have speeded up some code fragments by using the numba. However, a frustrating thing is that the performance of this decision tree is bad. For example, in the below code fragment, the performance of the decision tree is nearly 70 times slower than the implementation in Scikit-learn (220 ms vs 3 ms). Therefore, how can I achieve better performance by modifying this code?

import time

import numpy as np
from numba import njit
from sklearn.base import BaseEstimator
from sklearn.datasets import make_hastie_10_2
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier


class Node:
    def __init__(self, left, right, rule):
        self.left = left
        self.right = right
        self.feature = rule(0)
        self.threshold = rule(1)


class Leaf:
    def __init__(self, value):
        """
        `value` is an array of class probabilities if classifier is True, else
        the mean of the region
        """
        self.value = value


class DecisionTree(BaseEstimator):
    def __init__(
            self,
            classifier=True,
            max_depth=None,
            n_feats=None,
            criterion="entropy",
            seed=None,
            regularized=False,
    ):
        """
        A decision tree model for regression and classification problems.

        Parameters
        ----------
        classifier : bool
            Whether to treat target values as categorical (classifier =
            True) or continuous (classifier = False). Default is True.
        max_depth: int or None
            The depth at which to stop growing the tree. If None, grow the tree
            until all leaves are pure. Default is None.
        n_feats : int
            Specifies the number of features to sample on each split. If None,
            use all features on each split. Default is None.
        criterion : {'mse', 'entropy', 'gini'}
            The error criterion to use when calculating splits. When
            `classifier` is False, valid entries are {'mse'}. When `classifier`
            is True, valid entries are {'entropy', 'gini'}. Default is
            'entropy'.
        seed : int or None
            Seed for the random number generator. Default is None.
        """
        if seed:
            np.random.seed(seed)

        self.depth = 0
        self.root = None

        self.n_feats = n_feats
        self.criterion = criterion
        self.classifier = classifier
        self.max_depth = max_depth if max_depth else np.inf

        self.factor = 1.2
        self.regularized = regularized
        self.feature_importances_ = None

        if not classifier and criterion in ("gini", "entropy"):
            raise ValueError(
                "{} is a valid criterion only when classifier = True.".format(criterion)
            )
        if classifier and criterion == "mse":
            raise ValueError("`mse` is a valid criterion only when classifier = False.")

    def fit(self, X, Y):
        """
        Fit a binary decision tree to a dataset.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features
        Y : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
            An array of integer class labels for each example in `X` if
            self.classifier = True, otherwise the set of target values for
            each example in `X`.
        """
        self.n_classes = max(Y) + 1 if self.classifier else None
        self.feature_importances_ = np.zeros(X.shape(1))
        self.used_variables = np.zeros(X.shape(1))
        self.n_feats = X.shape(1) if not self.n_feats else min(self.n_feats, X.shape(1))
        self.root = self._grow(X, Y)
        self.feature_importances_ = self.feature_importances_ / np.sum(self.feature_importances_)

    def predict(self, X):
        """
        Use the trained decision tree to classify or predict the examples in `X`.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features

        Returns
        -------
        preds : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
            The integer class labels predicted for each example in `X` if
            self.classifier = True, otherwise the predicted target values.
        """
        return np.array((self._traverse(x, self.root) for x in X))

    def predict_class_probs(self, X):
        """
        Use the trained decision tree to return the class probabilities for the
        examples in `X`.

        Parameters
        ----------
        X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, M)`
            The training data of `N` examples, each with `M` features

        Returns
        -------
        preds : :py:class:`ndarray <numpy.ndarray>` of shape `(N, n_classes)`
            The class probabilities predicted for each example in `X`.
        """
        assert self.classifier, "`predict_class_probs` undefined for classifier = False"
        return np.array((self._traverse(x, self.root, prob=True) for x in X))

    def _grow(self, X, Y, cur_depth=0):
        # if all labels are the same, return a leaf
        if len(set(Y)) == 1:
            if self.classifier:
                prob = np.zeros(self.n_classes)
                prob(Y(0)) = 1.0
            return Leaf(prob) if self.classifier else Leaf(Y(0))

        # if we have reached max_depth, return a leaf
        if cur_depth >= self.max_depth:
            v = np.mean(Y, axis=0)
            if self.classifier:
                v = np.bincount(Y, minlength=self.n_classes) / len(Y)
            return Leaf(v)

        cur_depth += 1
        self.depth = max(self.depth, cur_depth)

        N, M = X.shape
        feat_idxs = np.random.choice(M, self.n_feats, replace=False)

        # greedily select the best split according to `criterion`
        feat, thresh, best_gain = _segment(self.criterion, self.regularized, self.used_variables, self.factor,
                                           X, Y, feat_idxs)
        self.feature_importances_(feat) += len(X) * best_gain
        self.used_variables(feat) = 1

        l = np.argwhere(X(:, feat) <= thresh).flatten()
        r = np.argwhere(X(:, feat) > thresh).flatten()

        # grow the children that result from the split
        left = self._grow(X(l, :), Y(l), cur_depth)
        right = self._grow(X(r, :), Y(r), cur_depth)
        return Node(left, right, (feat, thresh))

    def _traverse(self, X, node, prob=False):
        if isinstance(node, Leaf):
            if self.classifier:
                return node.value if prob else node.value.argmax()
            return node.value
        if X(node.feature) <= node.threshold:
            return self._traverse(X, node.left, prob)
        return self._traverse(X, node.right, prob)


@njit()
def _segment(criterion, regularized, used_variables, factor, X, Y, feat_idxs):
    """
    Find the optimal split rule (feature index and split threshold) for the
    data according to `self.criterion`.
    """
    best_gain = -np.inf
    split_idx, split_thresh = None, None
    for i in feat_idxs:
        vals = X(:, i)
        levels = np.unique(vals)
        thresholds = (levels(:-1) + levels(1:)) / 2 if len(levels) > 1 else levels
        gains = np.array((_impurity_gain(criterion, Y, t, vals) for t in thresholds))

        if regularized and used_variables(i):
            gains = gains * factor

        if gains.max() > best_gain:
            split_idx = i
            best_gain = gains.max()
            split_thresh = thresholds(gains.argmax())

    return split_idx, split_thresh, best_gain


@njit()
def _impurity_gain(criterion, Y, split_thresh, feat_values):
    """
    Compute the impurity gain associated with a given split.

    IG(split) = loss(parent) - weighted_avg(loss(left_child), loss(right_child))
    """
    if criterion == "entropy":
        parent_loss = entropy(Y)
    elif criterion == "gini":
        parent_loss = gini(Y)
    elif criterion == "mse":
        parent_loss = mse(Y)
    else:
        raise Exception

    # generate split
    left = np.argwhere(feat_values <= split_thresh).flatten()
    right = np.argwhere(feat_values > split_thresh).flatten()

    if len(left) == 0 or len(right) == 0:
        return 0

    # compute the weighted avg. of the loss for the children
    n = len(Y)
    n_l, n_r = len(left), len(right)
    if criterion == "entropy":
        e_l, e_r = entropy(Y(left)), entropy(Y(right))
    elif criterion == "gini":
        e_l, e_r = gini(Y(left)), gini(Y(right))
    else:
        raise Exception

    child_loss = (n_l / n) * e_l + (n_r / n) * e_r

    # impurity gain is difference in loss before vs. after split
    ig = parent_loss - child_loss
    return ig


@njit()
def mse(y):
    """
    Mean squared error for decision tree (ie., mean) predictions
    """
    return np.mean((y - np.mean(y)) ** 2)


@njit()
def entropy(y):
    """
    Entropy of a label sequence
    """
    hist = np.bincount(y)
    ps = hist / np.sum(hist)
    # return -np.sum((p * np.log2(p) for p in ps if p > 0))
    return -np.array((p * np.log2(p) for p in ps if p > 0)).sum()


@njit()
def gini(y):
    """
    Gini impurity (local entropy) of a label sequence
    """
    hist = np.bincount(y)
    N = np.sum(hist)
    return 1 - np.array(((i / N) ** 2 for i in hist)).sum()


if __name__ == '__main__':
    n_samples = 1000
    X, y = make_hastie_10_2(n_samples=n_samples)
    y(y < 0) = 0
    X = X.astype(np.float)
    y = y.astype(np.int)
    test_size = 0.2
    print(X.shape)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)

    ts = time.time()
    tree = DecisionTree(seed=0, criterion='gini', max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))

    ts = time.time()
    tree = DecisionTree(seed=0, criterion='gini', max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))

    ts = time.time()
    tree = DecisionTreeClassifier(random_state=0, max_depth=3)
    tree.fit(X_train, y_train)
    print(accuracy_score(y_train, tree.predict(X_train)))
    print(accuracy_score(y_test, tree.predict(X_test)))
    te = time.time()
    print('%2.2f ms' % ((te - ts) * 1000))
```

If a poly-time solution exists for an NP-Complete (Decision) problem, then there exists a poly-time solution for the NP-hard (Optimization) flavor?

In this particular example, the following works:
Let $A$ be the algorithm for deciding whether some graph $G$ has a TSP-route has a cost of at most $b$.
We start by finding the optimal $b^ast$ using $A$ and binary search on an appropriate range (say $(0, sum_{e in E(G)} w(e))$) in polynomial time.

Following this, we start modifying the instance graph $G$ by setting edge weights to some large value so they will never be picked (e.g. $1 + b^ast$) in a route of optimal cost $b^ast$ and using $A$, we check whether the resulting graph still has a route of cost $b^ast$.
If that is the case, we keep the modification and otherwise we undo it.
However, we now know that any such route (in the modified graph) must include the edge we messed with, so we store that.
Successively applying this to all edges we will find at least one such edge per pass over $E(G)$ and after at most $|V(G)|$ passes, we will have found a TSP-route of cost $b^ast$ consisting of the edges we stored.

Hence we call $A$ an polynomial ($O(|V(G)| cdot |E(G)|)$, or if you want $O(|V(G)|^3$) amount of times (in the second step, the first one also takes polynomially many) and all of our other operations can also be easily done in polynomial time.

Therefore, we can solve the optimization problem in polynomial time if we have a polynomial-time algorithm for the decision variant.

This approach also works for other problems (with slight variations for the second part); in fact I am not aware of a problem where this general strategy fails. I am actually pretty sure that one can use it to recover an accepting path of a polynomially-bounded NTM which can then be used to solve optimization variants of all $mathsf{NP}$-complete problems.

machine learning – What does the decision boundary of XOR problem look like?

My textbook walks through an example of solving the XOR problem in machine learning using a two-dimensional RBF network. It does this by setting the centers for the two basis functions at (0,0) and (1,1). Afterwards, it challenges us and asks to imagine what the decision boundary for an RBF would look like if the two centers were instead at a sample of each class (class 0 could be (0,0) and class 1 could be (1 0)). Since there is only two classes, wouldn’t this result in essentially a straight line, or would it look like something else? Is there any reason why we would want to do this?

java – Pattern / solution for Boolean decision making chains

I need a solution for decision making chain. There are number of criteria that may return true, false or be inconclusive. A non-functional code (Java) would look like this:

Boolean res = nullValuesCheck(fieldValue, node);
if (res != null) {
    return res;
}
res = typeCheck(node);
if (res != null) {
    return res;
}
res = dictCheck(dict, fieldValue);
if (res != null) {
    return res;
}
return finalCheck(fieldName, fieldValue); //also returns a Boolean

I’m considering creating an extended predicate that would return a nullable Boolean instead of boolean, so that an inconclusive result could be returned.

I tried googling, but found no apparent solution (got lot of mishits on some simple java problems instead). I am wondering whether there exists a pattern, a library maybe, that would handle this problem properly. The problem seems generic and simple to solve and someone must’ve solved it already. I don’t want to reinvent the wheel.

probability question on decision making

A person is want to know whether stock A or stock B will perform
better next quarter. He seeks the opinions of two advisors. He knows
that each advisor will choose the better stock with probability p, independent
of the other. His strategy is as follows: if the two advisors agree, buy
the stock on which they agree. If they disagree, toss a fair coin to
decide which stock to buy. Evaluate his strategy.

I think when two advisor choose the same stock, the variance of that prob. p will be lower, but what if they choose different stock? please give a direction, thank you.

customs and immigration – I-539 filed; decision pending while Green card application interview is also pending

In March 2020, my wife traveled on her valid B2 visa to USA to visit me while her green card application was pending. In April 2020, we received an update from NVC that my wife’s documents have been accepted and the case is now with Mumbai embassy and they will call to schedule an interview. However, due to COVID the Mumbai embassy stopped scheduling any visa interviews. Then we filed for I-539 to extend her stay while we wait for Mumbai embassy to open up.

My question is – if she leaves the US before her I-539 is approved/denied then will she be able to re-enter USA on her B-2 visa or would that visa be voided? Note – her I-94 expires on Sept 3rd 2020.

Also, does denial of I-539 or overstay past her I-94 date be considered during her interview at the Mumbai Embassy?

Facts –

-She came to USA on March 4th 2020
-I-94 expires on Sept 3rd
-I-539 applied on Aug 7th
-USCIS confirmed that they will make a decision by mid-Nov 2020 or mid-Jan 2021

Thanks in advance for all your help!

complexity theory – What to say if a decision problem is not NP?

Let $O$ be an optimisation problem and $P$ its decision variant.
I know that $O$ is said to be NP-hard if $P$ is $NP-$complete.
However, what to say about $O$ when the decision problem $P$ is no more in the class $NP$, and thus it’s no more $NP-$complete(the algorithm of deciding is pseudo-polynomial and there is no better alternatives)?