algorithms – Proving a tighter upperbound (big-O) for this problem


Motivation

So the other day I had fun providing a new solution to this famous question. In the analysis part I showed that my little algorithm has space complexity: O(k) and Ω(log(k)). However, my rough logic says that we should be able to prove a tighter bound of O(min(k,log(n))), but I was unable to prove it. Moreover it seems like an interesting, general-case problem.


Finding the space complexity is equivalent to answering this question:

Data:

Array of size k; where elements, e, are unique integers: 0 <= e < n, for some n.

Events:

Find the average (rounded down) of the elements.
Randomly discard either all the elements > than the average, or all the elements <= the average.
We then do the same thing to the reduced array size.
We keep repeating the above steps until the array is size 1.

My Question:

Is it O(min(k,log(n))) on the number of times average is counted? and if so, how do we prove it? (NOTE that we don’t care about the time it takes to calculate the average or remove the elements. This is because this questions is deliberately designed to be equivalent to my space-complexity problem.)


My thoughts:

It seems really intuitive that it’s O(min(k,log(n))) because if k is n, then taking the average will always divide the elements in half. However, I can’t seem to prove that it doesn’t sometimes perform worse when k < n.

Thinking about this a bit we recognise that having outliers is really what makes for bad performance, so I try to imagine a worst-case scenario:

array = (1, 2, ..., (x-2), (x-1), (n-1)), where (x-1) < average < (n-1)

In this scenario, we in worst-case reduce the array size to x-1; however, by increasing k by adding any number of elements, e: (x-1) < e < (n-1), the worst-case array size (for the next iteration) is still x-1.

Moreover:

average = (x(x-1)/2 + n-1)/x

Thinking about the complexity, for a given n, worst case x:

x = average

=> x^2 - x^2/2 + x/2 = n-1
=> O(x) = O(sqrt(n))

and here x represents the biggest sub-size.

So in this artificial “worst case” scenario, we get O(sub-array size) = O(min(k,sqrt(n))), which kind of implies that the overall complexity = O(min(k,log(sqrt(n)))) = O(min(k,log(n))).

However, this proof is quite informal and I’m not sure how to show that this genuinely represents the worst-case scenario.