Saturday, 24 October 2015

Sort elements by frequency | Set 1

Print the elements of an array in the decreasing frequency if 2 numbers have same frequency then print the one which came first.

Examples:
Input:  arr[] = {2, 5, 2, 8, 5, 6, 8, 8}
Output: arr[] = {8, 8, 8, 2, 2, 5, 5, 6}

Input: arr[] = {2, 5, 2, 6, -1, 9999999, 5, 8, 8, 8}
Output: arr[] = {8, 8, 8, 2, 2, 5, 5, 6, -1, 9999999}


METHOD 1 (Use Sorting)
  1) Use a sorting algorithm to sort the elements O(nlogn)    
  2) Scan the sorted array and construct a 2D array of element and count O(n).
  3) Sort the 2D array according to count O(nlogn).
Example:
  Input 2 5 2 8 5 6 8 8

  After sorting we get
  2 2 5 5 6 8 8 8

  Now construct the 2D array as
  2, 2
  5, 2
  6, 1
  8, 3

  Sort by count
  8, 3
  2, 2
  5, 2
  6, 1
How to maintain order of elements if frequency is same?
The above approach doesn’t make sure order of elements if frequency is same. To handle this, we should use indexes in step 3, if two counts are same then we should first process(or print) the element with lower index. In step 1, we should store the indexes instead of elements.
  Input 5  2  2  8  5  6  8  8

  After sorting we get
  Element 2 2 5 5 6 8 8 8
  Index   1 2 0 4 5 3 6 7

  Now construct the 2D array as
  Index, Count
  1,      2
  0,      2
  5,      1
  3,      3

  Sort by count (consider indexes in case of tie)
  3, 3
  0, 2
  1, 2
  5, 1
  
  Print the elements using indexes in the above 2D array.
Below is C++ implementation of above approach.
// Sort elements by frequency. If two elements have same
// count, then put the elements that appears first
#include<bits/stdc++.h>
using namespace std;
 
// Used for sorting
struct ele
{
    int count, index, val;
};
 
// Used for sorting by value
bool mycomp(struct ele a, struct ele b) {
    return (a.val < b.val);
}
 
// Used for sorting by frequency. And if frequency is same,
// then by appearance
bool mycomp2(struct ele a, struct ele b) {
    if (a.count != b.count) return (a.count < b.count);
    else return a.index > b.index;
}
 
void sortByFrequency(int arr[], int n)
{
    struct ele element[n];
    for (int i = 0; i < n; i++)
    {
        element[i].index = i;    /* Fill Indexes */
        element[i].count = 0;    /* Initialize counts as 0 */
        element[i].val = arr[i]; /* Fill values in structure
                                     elements */
    }
 
    /* Sort the structure elements according to value,
       we used stable sort so relative order is maintained. */
    stable_sort(element, element+n, mycomp);
 
    /* initialize count of first element as 1 */
    element[0].count = 1;
 
    /* Count occurrences of remaining elements */
    for (int i = 1; i < n; i++)
    {
        if (element[i].val == element[i-1].val)
        {
            element[i].count += element[i-1].count+1;
 
            /* Set count of previous element as -1 , we are
               doing this because we'll again sort on the
               basis of counts (if counts are equal than on
               the basis of index)*/
            element[i-1].count = -1;
 
            /* Retain the first index (Remember first index
               is always present in the first duplicate we
               used stable sort. */
            element[i].index = element[i-1].index;
        }
 
        /* Else If previous element is not equal to current
          so set the count to 1 */
        else element[i].count = 1;
    }
 
    /* Now we have counts and first index for each element so now
       sort on the basis of count and in case of tie use index
       to sort.*/
    stable_sort(element, element+n, mycomp2);
    for (int i = n-1, index=0; i >= 0; i--)
        if (element[i].count != -1)
           for (int j=0; j<element[i].count; j++)
                arr[index++] = element[i].val;
}
 
// Driver program
int main()
{
    int arr[] = {2, 5, 2, 6, -1, 9999999, 5, 8, 8, 8};
    int n = sizeof(arr)/sizeof(arr[0]);
 
    sortByFrequency(arr, n);
 
    for (int i=0; i<n; i++)
       cout << arr[i] << " ";
    return 0;
}
Output:
8 8 8 2 2 5 5 6 -1 9999999 
Thanks to Gaurav Ahirwar for providing above implementation.


METHOD 2(Use BST and Sorting)
1. Insert elements in BST one by one and if an element is already present then increment the count of the node. Node of the Binary Search Tree (used in this approach) will be as follows.
struct tree
{
  int element;
  int first_index /*To handle ties in counts*/
  int count;
}BST;
2.Store the first indexes and corresponding counts of BST in a 2D array.
3 Sort the 2D array according to counts (and use indexes in case of tie).
Time Complexity: O(nlogn) if a Self Balancing Binary Search Tree is used. This is implemented in Set 2.


METHOD 3(Use Hashing and Sorting)
Using a hashing mechanism, we can store the elements (also first index) and their counts in a hash. Finally, sort the hash elements according to their counts.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above

No comments:

Post a Comment