Wednesday, August 26, 2009

Longest Multipermutation

Here's an algorithmic problem for your enjoyment:

Define a multipermutation to be a sequence of integers that contains all numbers from 1 to some K at least once. Examples include (1,3,1,2,2), or (1,4,3,2); but not (1,2,3,5,5).

Given an array of n integers, find the longest multipermutation in O(n) time.
I saw a recent paper doing this in superlinear time, and of course I had to find a linear-time algorithm :) Can you also find one?

17 comments:

Unknown said...
This comment has been removed by the author.
Anonymous said...

are there any constraints on the space requirement? seems like with O(n) space it should be easy.

Anonymous said...

Your question seems ill-defined. What is a multi-permutation on an array? You define it on an integer K, not an array.

Unknown said...
This comment has been removed by the author.
hasan said...

Call the array A. Label all elements greater than n to (n+2)

Preprocess A for constant time range max queries. For each 1 repeat this:

Create a list R of first occurrences of 2,3,4.. to the right of the 1 we are considering.

Create a list L of first occurrences of 2,3,4.. to the right of the 1 we are considering.

For i=2 to n if rmq(pos[current 1], L[i]) is smaller than rmq(pos[current 1], R[i]) add to stack left i otherwise right i (if there is)

Check after each step if we formed a valid multipermutation by one more rmq query.

Extend to the left and to the right all valid multipermutations for the 1 we are considering. (this takes amortized linear time.)

Anonymous said...

Sorry, I was not clear enough.

The goal is to find the longest
contiguous subsequence of the array, which is a multipermutation.

-mihai

hasan said...

oh by the way, we stop creating lists L and R when we see another 1.

This way each segment between two 1's is traced over at most twice. hence linear time.

--hasan

Shaddin said...

Assuming I understand the question correctly, this can be done by repeated application of linear time median. The key observation is that it suffices to find K (as in the statement of the question). This can be done as follows. The algorithm below is not linear time, but I will explain how to make it linear time afterwards.

1- Find the median m, and partition elements into a left half and right half around the median. This can be done in linear time.

2- If all 1,...,m-1 are in the entire array, then recurse on the RHS. We would like this to run in linear time in the length of the current sub-array, which is not the case (we are checking the entire original array). I will explain how to fix this below.

3- Else, recurse on the LHS

(Base case is when we have a single element, in which case this element is K)


If we can make step 2 run in linear time (in the size of the current sub-array), we would be done since we are halving the size of the array at each recursive step.

How do you make this linear time? Notice that we don't need to check that 1,...,m-1 are in the array, because this duplicates work. Instead, we keep track of m': the largest value of m for which the if condition yielded an affirmative, and simply check that m' ... m-1 are in the LHS. This is linear time. Done, I think..

Shaddin said...

Oh, nevermind you said _contiguous_ subsequence. My solution above was simply for a subsequence.

Did you edit your comment? I could swear the word contiguous was not there the first time i read it :)

Anonymous said...

If I have correctly understood the problem statement then you could just:
- Find in O(n) time and space the minimum integer x in [1..n] not present in the array
- Scan the array and identify each maximal subarray of elements < x. This take O(n) time.
- For each such subarray, check if they are multipermutations and take the longest, overall O(n) time and space.

Mihai said...

Anon, what happens if none of the subarrays (containing elements less than x) are multipermutations? Recurse? This is no longer linear time.

Hasan, I don't follow your algorithm.

Shaddin, can you edit comments in blogger? I never found the feature..

Anonymous said...

I do not understand the question. You want the longest multipermutation that is a (contiguous) substring of the original?

Dan said...

I think I came up with basically the same algorithm as hasan, but it doesn't require preprocessing for range max queries, and maybe it will be simpler to understand (or not).

The idea is to maintain subarrays which are multipermutations, not necessarily maximal - which I'll call "green" subarrays - and subarrays surrounding each green one which must be contained in any larger multipermutation - I'll call these "yellow" subarrays. Keep growing these (occasionally turning yellow subarrays green when there is an actual multipermutation) until the whole array is covered either yellow or green, and then the largest green one is the answer.

Call the array A with entries A[1]..A[n].

I'll first present the case where there is only one "1" in the array. Note that any multipermutation must contain this "1", and also that it is itself a length-1 multipermutation. Say this "1" occurs at index k in the array. Then, for i=2..n, store the following:
-L[i], the largest integer j<k such that A[j]=i
-LM[i] := max(A[L[i]..k])
-R[i], the smallest integer j>k such that A[j]=i
-RM[i] := max(A[k..R[i]])

Clearly, these four arrays can be computed in linear time in a single pass left and right from index k.

We will maintain a single "green" subarray, which is itself a subarray of a single "yellow" subarray. Initially, these are both just the single "1" at A[k]. Also maintain:
-The largest integer in the yellow subarray, initially 1; call it a.
-The smallest integer not present in the yellow subarray, initially 2; call it b

Now repeat the following steps:
-(Grow yellow subarray) Compare LM[b] and RM[b]. If LM[b] is smaller, then grow the yellow subarray to the left to index L[b] - which must not already be in the yellow subarray by the definition of b. Update a and b as each new element is added (updating b in constant time can be done with some more bookkeeping). If RM[b] is smaller, do the opposite. If LM[b]=RM[b], then grow in both directions.
-(Check for multipermutation) If b>a, change the whole yellow subarray to green and expand in both directions to include all adjacent integers ≤a.

This process terminates after at most n steps (any number larger than n can't be part of a multipermutation), and the amortized cost over all steps is O(n), since the yellow array only grows. At the end, the green array is the largest multipermutation.

For the more general case (more than one "1" in the array), we start initially with each "1" being its own green (and yellow) array, and grow them all simultaneously, merging the yellow and then green arrays when they meet. Constructing the arrays becomes a bit trickier to do in linear time, but is possible by using the fact that any multipermutation on a sub-interval of length m can only contain integers ≤m.

Anonymous said...

Let me describe an algorithm to find all maximal contiguous multipermutations in an array A[1..n]. Assume that the maximum value contained in A is at most n. Let <_t be the total order of the elements of A induced by a stable sort of A. Define the two arrays:

- L[i] = max{j | j precedes i and A[j] >_t A[i]}
- R[i] = min{j | j follows i and A[j] >_t A[i]}

L[i] (R[i]) remains undefined whenever the corresponding set of indices is empty. Both L and R can be computed in linear time using a classic stack-based algorithm.

The algorithm mantains a collection of disjoint sets of indices of A and, for each set S, the value min(S) = min{A[e]| e in S}. All the required operations can be supported in O(1) time using a simple table based representation. Here is the pseudo-code:


Add set {m}, where m is the index of maximum element;

Scan elements of A\{m} in descending order of <_t {

c = index of current element;

//at least one among L[c], R[c] is defined
x = (A[L[c]] <_t A[R[c]])? L[c]: R[c];

//x has been already inserted
Locate set S containing x;

if(A[c] >= min(S)-1) {
Insert c into S and Update min(S);
} else { Add new set {c}; }

}

At the end of the algorithm the set of contiguous maximal multiperms of A coincides with the collection of sets having minimum = 1.

Mihai said...

@Dan: First I should apologize that I did not have the energy to think about your solution more. It sounds quite different from the ones I've thought about, but it seems there are gaps which are non-obvious to fill out.

A friend of mine who tried to understand it wrote back:

"I think it could be correct, in principle. But I'm missing several details to get O(n) running time, especially as Dan skips too quickly the main case of several 1s (for a single 1, the solution is correct). I think merging those subarrays could work out. But it seems you need the following data structure query: find the largest index containing element X, where the index is <= some K. Such queries seem to require superconstant time (predecessor search). These queries appear necessary when you merge two subarrays, and their endpoints change."

Mihai said...

Anon, are you sure you have the proper data-structure support to make this algorithm O(n)?

Anonymous said...

You just need an array reporting for each element the id of the containing set, and another array associating each id with the min of the set. O(n) space and O(1) time for each operation.