Back to Course

Algorithm

0% Complete
0/82 Steps
  1. Getting Started with Algorithm
    What is an Algorithm?
  2. Characteristics of Algorithm
    1 Topic
  3. Analysis Framework
  4. Performance Analysis
    3 Topics
  5. Mathematical Analysis
    2 Topics
  6. Sorting Algorithm
    Sorting Algorithm
    10 Topics
  7. Searching Algorithm
    6 Topics
  8. Fundamental of Data Structures
    Stacks
  9. Queues
  10. Graphs
  11. Trees
  12. Sets
  13. Dictionaries
  14. Divide and Conquer
    General Method
  15. Binary Search
  16. Recurrence Equation for Divide and Conquer
  17. Finding the Maximum and Minimum
  18. Merge Sort
  19. Quick Sort
  20. Stassen’s Matrix Multiplication
  21. Advantages and Disadvantages of Divide and Conquer
  22. Decrease and Conquer
    Insertion Sort
  23. Topological Sort
  24. Greedy Method
    General Method
  25. Coin Change Problem
  26. Knapsack Problem
  27. Job Sequencing with Deadlines
  28. Minimum Cost Spanning Trees
    2 Topics
  29. Single Source Shortest Paths
    1 Topic
  30. Optimal Tree Problem
    1 Topic
  31. Transform and Conquer Approach
    1 Topic
  32. Dynamic Programming
    General Method with Examples
  33. Multistage Graphs
  34. Transitive Closure
    1 Topic
  35. All Pairs Shortest Paths
    6 Topics
  36. Backtracking
    General Method
  37. N-Queens Problem
  38. Sum of Subsets problem
  39. Graph Coloring
  40. Hamiltonian Cycles
  41. Branch and Bound
    2 Topics
  42. 0/1 Knapsack problem
    2 Topics
  43. NP-Complete and NP-Hard Problems
    1 Topic
Lesson 30 of 43
In Progress

Optimal Tree Problem

Suppose we have to encode a text that comprises symbols from some n-symbol alphabet by assigning to each of the text’s symbols some sequence of bits called the codeword. For example, we can use a fixed-length encoding that assigns to each symbol a bit string of the same length m (m ≥ log2 n). This is exactly what the standard ASCII code does. One way of getting a coding scheme that yields a shorter bit string on the average is based on the old idea of assigning shorter codewords to more frequent symbols and longer codewords to less frequent symbols. This idea was used, in particular, in the telegraph code invented in the mid-19th century by Samuel Morse. In that code, frequent letters such as e (.) and a (.−) are assigned short sequences of dots and dashes while infrequent letters such as q (− − .−) and z (− − ..) have longer ones.

Variable-length encoding, which assigns codewords of different lengths to different symbols, introduces a problem that fixed-length encoding does not have. Namely, how can we tell how many bits of an encoded text represent the first (or, more generally, the ith) symbol? To avoid this complication, we can limit ourselves to the so-called prefix-free (or simply prefix) codes. In a prefix code, no codeword is a prefix of a codeword of another symbol. Hence, with such an encoding, we can simply scan a bit string until we get the first group of bits that is a codeword for some symbol, replace these bits by this symbol, and repeat this operation until the bit string’s end is reached.

If we want to create a binary prefix code for some alphabet, it is natural to associate the alphabet’s symbols with leaves of a binary tree in which all the left edges are labeled by 0 and all the right edges are labeled by 1. The codeword of a symbol can then be obtained by recording the labels on the simple path from the root to the symbol’s leaf. Since there is no simple path to a leaf that continues to another leaf, no codeword can be a prefix of another codeword; hence, any such tree yields a prefix code.

Among the many trees that can be constructed in this manner for a given alphabet with known frequencies of the symbol occurrences, how can we construct a tree that would assign shorter bit strings to high-frequency symbols and longer ones to low-frequency symbols? It can be done by the following greedy algorithm, invented by David Huffman while he was a graduate student at MIT.

Lesson Content
0% Complete 0/1 Steps

New Report

Close