machine learning – Does Linear Discriminant Analysis make dimensionality reduction before classification?

I’m trying to understand what LDA exactly does when used as a classifier, i’ve understood how the dimensionality reduction works and i’ve understood that the classification task is carried out with the application of Bayes’ theorem, but i still can’t figure out if LDA executes both operation when used as a classification algorithm.

It’s correct to say that LDA as a classifier executes by itself dimensionality reduction and then applies Bayes’ theorem for classification?

If that makes any difference, i’ve used LDA in Python from the sklearn library.

Classification of isometries of hyperbolic 3-space

Denote the upper half space by $mathcal{H}_{3}=Bbb{C}times (0,infty)$. A point $P in mathcal{H}_{3}$ is given as, $P=(z, t)=(x, y, t)=z+t j$ where $z=x+i y$ and $j=(0,0,1) .$ The group $P S L_{2}(mathbb{C})$ has a natural action on $mathcal{H}_{3} .$ Let $M=left(begin{array}{c}alpha & beta \ gamma & deltaend{array}right)$
and $P=z+t j$ a point in $mathcal{H}_{3}$. Then, $P S L_{2}(mathbb{C})$ acts on $mathcal{H}_{3}$ via linear fractional transformation as follows
M times P rightarrow frac{alpha P+beta}{gamma P+delta}

More explicitly we have $M(z+t j)=z^{*}+t^{*} j in mathcal{H}_{3}$ where
z^{*}=frac{(alpha z+beta)(bar{gamma} bar{z}+bar{delta})+alpha bar{gamma} t^{2}}{|gamma z+delta|^{2}+|gamma|^{2} t^{2}} \
t^{*}=frac{t}{|gamma z+delta|^{2}+|gamma|^{2} t^{2}}

I want to prove the following three statements: Let $Min PSL_2(Bbb{C})$.

  • $M$ is it has a fixed point in $mathcal{H}_{3}$ $Longleftrightarrow operatorname{Tr}(M) in(-2,2)$
  • $M$ has no fixed point in $mathcal{H}_{3}$ and a single fixed point on $partial mathcal{H}_3, Longleftrightarrow operatorname{Tr}(M)=pm 2,$
  • $M$ has no fixed point in $mathcal{H}_3$ and exactly 2 fixed points on $partial mathcal{H}_3$. $Longleftrightarrow operatorname{Tr}(M) notin(-2,2)$ (and hyperbolic $Longleftrightarrow operatorname{Tr}(M) in mathbb{R} backslash(-2,2))$.

My attempt:
If $M$ fixes a point $P$ then we have the following equations

$$frac{(alpha z+beta)(bar{gamma} bar{z}+bar{delta})+alpha bar{gamma} t^{2}}{|gamma z+delta|^{2}+|gamma|^{2} t^{2}}=z$$ and $$frac{t}{|gamma z+delta|^{2}+|gamma|^{2} t^{2}}=t$$

So this gives $|gamma z+delta|^{2}+|gamma|^{2} t^{2}=1$ and $(alpha z+beta)(bar{gamma} bar{z}+bar{delta})+alpha bar{gamma} t^{2}=z$. After these two equations, I am not able to solve $z$ and $t$. I want to see relations between the trace and fixed points. I would be very glad if someone is able to relate the trace of $g$ to the fixed points of $M$ from my equations. Thanks a lot.

pr.probability – Bayes risk of binary classification problem with conditionally independet covariates

In the setting of this problem, $eta(vec{x})$ is $P(Y=1|vec{X}=vec{x})$, $Y in {0,1}$, $X in R^d$. Being the true probability know, the classification rule is simply $eta(vec{x})>0.5 Rightarrow hat{Y}=1$. The risk of misclassification is, as usual, $E(min(eta(vec{x}),1-eta(vec{x})))$. However, I am asked to prove that if the Xs are iid conditional con Y, $X_i|Y sim p(X|Y) forall X_i$, then
$$R^*_d leqslant e^{-cd}$$
I have tried anything I could think of but I achieved very little. I see the intuition (if the features are all conditionally independent, increasing them add new information to the estiamtion and allows for a lower minimal error), but I have no idea of how to prove it formally. Big thanks in advance to anybody who is able to crack it!

classification – LSTM : What should I do if I am always getting an output too close to one value?

The usual starting point is that if the score is above 0.5, classify it as ham, otherwise as spam. If most emails are ham, then it makes sense that most emails give you a score above 0.5, so you have not said anything that indicates there is a problem.

This approach assumes that the proportion of ham vs spam in the training set is the same as the proportion at test time.

If that doesn’t work, one standard approach is to choose a threshold, and everything with a score above the threshold is treated as ham, everything below as spam. A standard way to set a threshold is, after you’ve trained the LSTM, choose the optimal threshold based on the training set (i.e., that maximizes the accuracy on the training set, etc.), or on a validation set.

design – looking for source code or algorithm to do accelerometer only sleep classification

I need to find off the shelf or at least an algorithm for using accelerometer raw data to determine sleep classifications (wake, light, deep, REM).


I tested github’s SLEEPPY python accelerometer-only, wrist worn sleep classifier, but it limits classification to sleep and awake periods, and does not measure sleep states (light, deep (N1 N2 N3 N4) or REM).

Problem of classification of algebraic varieties

In the section 6 Nonsingular Curves of the “Algebraic Geometry by Robin Hartshorne ” i study :

“In considering the problem of classification of algebraic varieties, we can
formulate several subproblems, based on the idea that a nonsingular projective
variety is the best kind:

(a) classify varieties up to birational equivalence;

(b) within each birational equivalence class, find a nonsingular
projective variety;

(c) classify the nonsingular projective varieties in a given birational equivalence class.

In general, all three problems are very $color{red} {difficult} $. “

How these problems are difficult ? What are the difficulties of proving? . are these problems open ?

python – optimize binary classification method for speed

I have the following code for determining TP, TN, FP and FN values for binary classification given two sparse vectors as input (using the sparse library):

def confused(sys1, ann1):
    # True Positive (TP): we predict a label of 1 (positive), and the true label is 1.
    TP = np.sum(np.logical_and(ann1 == 1, sys1 == 1))

    # True Negative (TN): we predict a label of 0 (negative), and the true label is 0.
    TN = np.sum(np.logical_and(ann1 == 0, sys1 == 0))

    # False Positive (FP): we predict a label of 1 (positive), but the true label is 0.
    FP = np.sum(np.logical_and(ann1 == 0, sys1 == 1))

    # False Negative (FN): we predict a label of 0 (negative), but the true label is 1.
    FN = np.sum(np.logical_and(ann1 == 1, sys1 == 0))
    return TP, TN, FP, FN

I’m trying to find a way to optimize this for speed. This is based on how-to-compute-truefalse-positives-and-truefalse-negatives-in-python-for-binary-classification-problems where my addition was to add the sparse arrays to optimize for memory usage, since the input vectors for the current problem I am trying to solve have over 7.9 M elements, and the positive cases (i.e., 1), are few and far between wrt the negative cases (i.e., 0).

I’ve done profiling of my code and about half the time is spent in this method.

topology – Classification of the Summit supercomputer

Can we classify a supercomputer in more than one group? For example, Flynn’s classification, classification according to topology, classification according to memory access.
For the Summit supercomputer, I have found no sources on these topics, except for Flynn’s classification. How can I classify it in other categories, for example by topology?

machine learning – Tweet Classification into topics- What to do with data

Good evening,
First of all, I want to apologize if the title is misleading.
I have a dataset made of around 60000 tweets, their date and time as well as the username. I need to classify them into topics. I am working on topic modelling with LDA getting the right number of topics (I guess) thanks to this R package, which calculates the value of three metrics(“CaoJuan2009”, “Arun2010”, “Deveaud2014”). Since I am very new to this, I just thought about a few questions that might be obvious for some of you, but I can’t find online.

  1. I have removed, before cleaning the data (removing mentions, stopwords, weird characters, numbers etc), all duplicate instances (having all three columns in common), in order to avoid them influencing the results of topic modelling. Is this right?

  2. Should I, for the same reason mentioned before, remove also all retweets?

  3. Until now, I thought about classifing using the “per-document-per-topic” probability. If I get rid of so many instances, do I have to classify them based on the “per-word-per-topic” probability?

  4. Do I have to divide the dataset into testing and training? I thought that is a thing only in supervised training, since I cannot really use the testing dataset to measure quality of classification.

  5. Antoher goal would be to classify twitterers based the topic they most are passionate about. Do you have any idea about how to implement this?

Thank you all very much in advance.