mathematics – Arranging points in a 2D grid so that successively added points approximate a square shape

So, I ran into a problem with placing points in a grid while always trying to fit them into a square. Here are some examples (the numbers represent the order of placement):

Placing 1 point:

0

Placing 2 points:

0 1

Placing 3 points:

0 1
2

Placing 7 points:

0 1 4
2 3 6
5

Placing 32 points:

0  1  4  9  16 25
2  3  6  11 18 27
5  7  8  13 20 29
10 12 14 15 22 31
17 19 21 23 24 
26 28 30

Placing 36 points:

0  1  4  9  16 25
2  3  6  11 18 27
5  7  8  13 20 29
10 12 14 15 22 31
17 19 21 23 24 33
26 28 30 32 34 35

I need a way of finding the coordinates of a point from its index. For example:

f(0) = 0, 0
f(1) = 0, 1
f(2) = 1, 0
f(3) = 1, 1
f(4) = 2, 0
f(5) = 0, 2
f(6) = 2, 1
f(7) = 1, 2
f(8) = 2, 2
...

binary search – Algorithm to find approximate position of element from a noisy sorted list

Let’s have a static function f(n) which for a given n returns only these answers “lower” or “higher” comparing against an imaginary number x

In a sorted list l = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
It is obvious that using binary search we can find the position of the element in O’log(n) time.

Now let’s have a sorted list but with some noise (somewhat sorted)
l = (1, 2, 4, 3, 5, 6, 9, 7, 8, 11, 10, 12, 13, 16, 15, 14, 17, 20, 19, 18)

I’m looking for an algorithm that converges quickly to the approximate position of the imaginary x. I do have an algorithm (a modified binary search, relaxing the boundaries each side, performing at logn), Just wanted to hear from fresh perspective from other people.

Expectation:
It is not allowed to sort the list, because the dataset is not sortable in a practical way.

The motivation to this question is, I’m writing a guessing game, containing list of publicly soured words sorted by popularity, the objective of the computer is to guess the number of words the player knows by asking as little questions as possible, where the player would respond “know” or “don’t know”, hence it’s important for the algorithm to converge quickly

statistics – Binomial distr. vs normal distr. (approximate)

Is there a specific reason as to why we don’t have to convert sample proportion to number of things with desired characteristics when using the normal distr. to approximate the binomial distr. ?

I guess the exact question is why does it is the mean of the sample prop. that equals to pop. proportion and not the mean of the number of things with desired characteristics?

plotting – How to plot (approximate) solutions to system of equations?

I have the following system of equations, for $x, y, m$ real numbers and $g, h$ complex numbers:

$$
(x^2 – x) g^2 + (2 m x – x – m + 1/2)g + (m^2 – m) = 0
\(y^2 – y) h^2 + (- 2 m y + y + m – 1/2) h + (m^2 – m) = 0
\ lvert g rvert^2 = lvert h rvert^2
\ (g + overline{g})^2 + (h + overline{h})^2 = 4 lvert g rvert^2
$$

(Sorry for the latex, but the Stackexchange system wouldn’t let me post and kept thinking this was code…)

I am looking for a way to plot those $(x, y)$ for which there exists $m, g, h$ that solve this system of equations (and more general ones of this type: the first two equations will be higher degree polynomials while the second two will always be the same). The method needs to be able equations involving absolute values and conjugates of complex numbers. In this situation the solution is the two lines $y = x$ and $y = 1 – x$ in the square $(0, 1) times (0, 1)$.

I have tried things like Eliminate, Resolve, Reduce, and ParametricRegion but these solutions often are very sensitive to the domains (if the domain is not “correct” then it just displays a blank plot), and also they are bad at handling equations with absolute values and conjugates.

I would settle for something numerical, but I haven’t seen any Mathematica function that can numerically determine solutions to a system of equations when that solution is not unique (i.e. where the solution set is some curve and not just points). If there was some method of sampling many $x, y, m, g, h$ and recording which ones “almost” solve the system of equations, then plotting those $x, y$, then this could work. Even if the line is not sharp, just having an idea of the curve will be good enough.

The numerical method would need to handle also a similar system which draws a hyperbola instead of just two lines:

$$
(x^2 – x) g^2 + (2 m x – x – m + 1/2)g + (m^2 – m) = 0
\ (y^2 – 2 y) h^2 + (- 2 m y + y + 2 m – 1) h + (m^2 – m) = 0
\ lvert g rvert^2 = lvert h rvert^2
\ (g + overline{g})^2 + (h + overline{h})^2 = 4 lvert g rvert^2
$$

numerics – Approximate integer factorization

Suppose we would like to compute an approximate prime factorization of a large integer x in the sense that the difference of x and its approximation is minimized. A naive way to state the problem in Mathematica is as follows.

n = 50;
x = RandomInteger[10^100];
vars = Map[Symbol["z" <> ToString[#]] &, Range[n]];
f = Abs[Times@@Map[Prime[#]^Symbol["z" <> ToString[#]] &, Range[n]] - x];
cons = Map[# >= 0 &, vars];
NMinimize[{f, cons}, vars [Element] Integers]

The result is not very convincing and simply increasing n does not help. Do you have a better way to solve the problem?

optimization – Approximate a large linear span with a small one

The subject is outside my field of expertise (I’m not sure if the tags are correct – feedback is appreciated).

The claim below could be true or false. If false I’d like to replace it with its best true approximation. If true I’d like to prove it.

First of all, I’d like to know what I have to study to understand my own question.

Claim Let $Esubseteq{mathbb R}^N$. Let $Vert{cdot}Vert$ be the 1-norm. Then
$$
forallvarepsilon exists n=n(varepsilon) exists e_1,dots,e_nin E forall finlangle Erangle exists ginlangle e_1,dots,e_nranglequad Vert f- gVert<varepsilon Vert fVert.
$$

Note that $n$ does not depend on $N$ nor $E$.

algorithms – Approximate duplicate sampling from a stream

The following question (in two parts) comes from a homework sheet of the fall 2019 semester cs170 course taught at UC Berkely taught by professors Vazerani and Tal.

Design an algorithm that takes in a stream $z_{1}, . . . , z_{M}$ of
$M$ integers in $mathbb{Z_{n}}$ and at any time $t$ can output a
uniformly random element in $z_{1}, . . . , z_{t}$ . Your algorithm
may use at most polynomial in $lg(n)$ and $lg(M)$ space.

This is an instance of Reservoir Sampling with reservoir size of $1$.

For a stream $S = z_{1}, . . . , z_{2n}$ of $2n$ integers in $(n)$, we
call $j ∈ (n)$ a duplicate element if it occurs more than once. Design
an algorithm that takes in $S$ as input and with probability at least
$1 − frac{1}{n}$ outputs a duplicate element. Your algorithm may use
at most polynomial in $lg(n)$ space.

Attempts

Consider the naive approach of consuming the entire stream and sampling. At least $n$ elements in the stream have duplicates. Since each of the $2n$ elements in the stream is equally likely to be sampled, the probability of sampling a duplicate element is at least $frac{n}{2n} = frac{1}{2}$, which is not nearly as tight a bound as $1 – frac{1}{n}$.

On the other hand, Reservoir sampling produces a simple random sample, so maybe we could model the reservoir as sampling from a multiset and compute the probability of there being a duplicate using multinomial distribution? Although, the multinomial requires independent trials.

I don’t see a path forward for part $b$ of the problem yet. Any hints?

approximation – What is a term for a problem that is hard to approximate within a factor $c$?

Let $f$ be a maximization problem. If there is a reduction from SAT to the following problem: “given an integer $c$, decide if there is an $x$ for which $f(x)geq c$“, then $f$ is NP-hard. Suppose there is a reduction from SAT to the following problem:

Given an integer $c$, return “yes” if there is an $x$ for which $f(x)geq 3 c$, and return “no” if there is no $x$ for which $f(x)geq c$ (otherwise, the reply is undefined).

This means that, unless P=NP, there is no 3-factor approximation to $f$. What complexity class does $f$ belong to?

Initially, I thought it must be APX-hard, since it is hard to approximate. But
Wikipedia defines a problem as APX-hard as: “there is a PTAS reduction from every problem in APX to that problem”, and I am not sure it is equivalent.

algorithms – Efficient way to find key points on spline to approximate it with line strip

Given a spline, what is an efficient way to find (approximately) the least amount (and position) of key-points to approximate the spline with a line strip, so that the largest distance between the line strip and the spline at any given point is <= d.

Here is a visual example. I’m looking to find a computationally efficient way to find the points in the green circles that I use as beginnings and endings for the lines.

Efficient way to find key points on spline to approximate it with line strip

What I thought about is walking along the spline, summing up the traveled distance. As I walk the spline, I test the last position as a key-point candidate. Once the distance traveled on the spline and the length of the line through the key-point candidate deviate by a certain percentage, I use the last key point candidate that was still below that percentage. What I’m uncertain about is first, if there is a more efficient way, and second, deviation in length as a percentage is not really the same as distance (e.g. in cm). If there is a relatively long, relatively straight part of the spline, the length deviation that I might accept, could lead to quite a large distance between the spline and the line somewhere on the way, couldn’t it?