Sorting Algorithms
Sorting algorithms can be stable or unstable. A stable sort algorithm preserves the relative order of records with equal keys in the sorted output as they appear in the unsorted output.
On the other hand unstable sorting algorithms do not guarantee that any two or more objects with the same keys will appear in the same order before an after sorting.
Sorting Algorithm | Best Time Complexity | Average Time Complexity | Worst Time Complexity | Space Complexity | Stable Sort |
---|---|---|---|---|---|
Bubble Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes |
Selection Sort | O(n^2) | O(n^2) | O(n^2) | O(1) | No |
Insertion Sort | O(n) | O(n^2) | O(n^2) | O(1) | Yes |
Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes |
Quick Sort | O(n log n) | O(n log n) | O(n^2) | O(log n) | No |
Bucket Sort | O(n+k) | O(n+k) | O(n^2) | O(n) | Yes |
Radix Sort | O(nk) | O(nk) | O(nk) | O(n+k) | Yes |
Counting Sort | O(n+k) | O(n+k) | O(n+k) | O(k) | Yes |
Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No |
Cycle Sort | O(n^2) | O(n^2) | O(n^2) | O(1) | No |
n
is the number of elements being sorted.k
is the range of the input (for non-comparison based sorts).- Space complexity considers auxiliary space.
Bubble Sort
Bubble Sort repeatedly compares and swaps adjacent elements if they are in the wrong order. - Complexity: Best: O(n), Average and Worst: O(n^2)
1 2 3 4 5 6 7 |
|
Selection Sort
Divides the input into two parts: sorted and unsorted.
First the smallest element in the unsorted sublist is found, and placed at the end of the sorted sublist. This is repeated until the list is fully sorted.
1 2 3 4 5 6 7 8 9 10 |
|
Insertion Sort
Segments the list into sorted and unsorted parts like selection sort. It iterates over the unsorted sublist, and then inserts the element being viewed into the correct position of the sorted sublist.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Merge Sort
Merge Sort is a divide-and-conquer algorithm that divides the input array into two halves, calls itself for the two halves, and then merges the two sorted halves. The merge function is key to its performance, efficiently combining two sorted arrays into a single sorted array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Quick Sort
Quicksort is a divide-and-conquer algorithm that sorts an array by selecting a 'pivot' element and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The steps can be summarized as:
- Divide: Choose a pivot and partition the array such that elements less than the pivot are on the left, and elements greater than the pivot are on the right.
- Conquer: Recursively apply the same process to the sub-arrays.
- Combine: Since the algorithm sorts in place, no explicit merge step is needed like in merge sort.
Choice of Pivot
The choice of the pivot can significantly affect the performance of Quicksort. Common strategies include:
- First Element: Choose the first element as the pivot.
- Last Element: Choose the last element as the pivot.
- Random Element: Choose a random element as the pivot to reduce the chance of worst-case complexity.
- Median: Ideally, choose the median of the array as the pivot, though this is computationally expensive to determine.
- Median of Three: Choose the median of the first, middle, and last elements as the pivot. This is a good compromise, offering better performance without significant overhead.
Hoare Partition Scheme
Hoare's partitioning algorithm is an efficient way of partitioning an array into two parts around a pivot. It works by initializing two pointers (one at the start and one at the end of the array) and moving them towards each other until they find an inversion (a pair of elements where one is greater than the pivot and the other is smaller). These elements are then swapped.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Lomuto Partition Scheme
The Lomuto partition scheme is simpler but less efficient than Hoare's scheme. It works by choosing the last element as the pivot, then moving all elements smaller than the pivot to the left of it and all greater elements to the right.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Bucket Sort
Bucket Sort is a distribution sort that divides the input into several groups, called "buckets," and then sorts these buckets individually using a suitable sorting algorithm or recursively applying Bucket Sort.
It's especially useful for data that is uniformly distributed over a range - i.e. when the range of values is N.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Counting Sort
Counting Sort is a non-comparative sorting algorithm suitable for sorting items with integer keys. It operates by counting the number of objects that have each distinct key value and using arithmetic on those counts to determine the positions of each key value in the output sequence.
- Non-Comparative: Unlike other sorting algorithms, Counting Sort does not compare the elements of the array to sort them.
- Stable Sort: Preserves the relative order of records with equal keys.
- Efficient for Small Range of Keys: Best suited when the range of potential items (k) is not significantly greater than the number of items (n).
Algorithm Steps
- Count Occurrences: Count the occurrences of each unique element.
- Cumulative Count: Transform the count array to store the cumulative sum of counts.
- Place the Elements: Place each element in its correct position based on the cumulative count and decrease the count by one for each placement.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Radix Sort
Radix Sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. It processes integer keys from least significant digit to most significant digit.Radix Sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. It processes integer keys from least significant digit to most significant digit.Radix Sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. It processes integer keys from least significant digit to most significant digit.
- Non-Comparative: Does not compare elements directly.
- Stable Sort: Maintains the relative order of records with equal keys.
- Efficient for Large Keys: Performs well when the keys are large.
Algorithm Steps
- Get the Maximum Number: To know the number of digits.
- Counting Sort for Each Digit: Apply counting sort to sort elements based on the current digit.
- Repeat for each digit until the largest number.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Heap Sort
Heap Sort is a comparison-based sorting algorithm that builds a heap from the input data, then repeatedly extracts the maximum element from the heap and rebuilds the heap until all elements are sorted.
- In-Place Sorting: Though it operates on the array itself to sort it, a small constant extra space is used for the heap.
- Not Stable: Does not maintain the relative order of records with equal keys.
- Efficient for Large Data Sets: Offers good performance on large data sets.
Algorithm Steps
- Build a Max Heap: Rearrange the array to form a max heap, where the largest element is at the root.
- Extract Elements from Heap: Repeatedly move the root of the heap to the end of the array, reduce the size of the heap by one, and then rebuild the heap.
- Repeat Until Sorted: Continue the extraction process until all elements are sorted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Cycle Sort
Cycle Sort is an in-place, comparison-based sorting algorithm that is theoretically optimal in terms of the total number of writes to the original array. It's particularly useful where memory write or swap operations are costly.
- In-Place Sorting: Does not require additional space for sorting, except for variable storage.
- Not Stable: May change the relative order of equal elements.
- Write-Efficient: Minimizes the number of memory writes, which is its main advantage.
Algorithm Steps
- Find Cycles: Start from the first element and consider it as the starting point of a cycle. Then, find the correct position for this element by counting the number of elements that are smaller.
- Rotate Cycle: Once the correct position is found, place the element there and repeat the process for all elements to complete the cycle.
- Repeat: Continue until the end of the array is reached.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
External Memory Sorting:
External memory sorting is a technique used when the size of the input data exceeds the available memory capacity. In such scenarios, traditional in-memory sorting algorithms become impractical, requiring the use of disk-based sorting techniques to process large datasets efficiently.
Challenges of Large Datasets:
-
Limited Memory: The primary challenge is the limited amount of available memory, which prevents loading the entire dataset into RAM for sorting.
-
Disk I/O Overhead: Disk I/O operations are significantly slower than memory operations, leading to performance bottlenecks when reading and writing data from/to disk.
Multiway Merge-Sort:
Multiway merge-sort is a disk-based sorting algorithm that efficiently sorts large datasets by dividing the input into smaller chunks that can fit into memory, sorting them in-memory, and then merging the sorted chunks to produce the final sorted output.
Algorithm Overview:
-
Initial Partitioning: Divide the input dataset into multiple smaller chunks or blocks that can fit into memory.
-
In-Memory Sorting: Load each block into memory and perform an in-memory sorting algorithm (e.g., quicksort or heapsort) to sort the data within each block.
-
Multiway Merge: Merge the sorted blocks using a multiway merge algorithm, such as k-way merge, which merges k sorted sequences efficiently.
-
Output Generation: As the merge proceeds, write the sorted output to disk.
-
Final Merge: If the number of blocks exceeds the memory capacity, perform another merge pass until all blocks are merged into a single sorted output.
Complexity:
-
Time Complexity: The time complexity of multiway merge-sort depends on the number of passes required for merging and the efficiency of the in-memory sorting algorithm. Typically, it is dominated by the disk I/O operations and the number of merge passes needed. The total number of disk transfers \(T_{\text{disk}}\) is often used as a measure of performance. For \(N\) elements and \(M\) available memory blocks, the total number of disk transfers can be approximated as \(T_{\text{disk}} \approx 2N \log_M N\).
-
Space Complexity: The space complexity is determined by the number of blocks that can be loaded into memory at once and the size of each block. It is often proportional to the memory capacity available.