Problem:
Given a list of words, write a function that returns the most frequently used words, using a dictionary.
Example:
Input: ["test", "testing", "retest", "testing"]
Output: ["testing"]
Constraints:
– The words are not necessarily unique or lower-case.
– The list of words may be empty, in which case the function should return an empty list.
– Assume that the words in the list are all strings and longer than one character.
Understanding the problem
This challenge here requires you to write a function that takes in a list of words, and then finds the word(s) that have the highest number of occurrences in that list. Please bear in mind that the words may not be unique and the list may be empty, the function should take this into account and return an empty list. Furthermore, all of the words must be strings, with a length longer than one letter.
To give an example, if the input list is [“test”, “testing”, “retest”, “testing”], the output should be [“testing”], as it appears twice in the list, while the others only appear once.
How to solve this problem?
In order to solve this problem, let’s make sure we have a foundational understanding of Python functions, data structures like dictionaries, iterating through an array of words to count their occurrences, as well as comparing two values. With these skills under our belt, we’ll be well on our way to solving this!
The Benefits of Data Structures
Data structures are useful tools that help us keep our data organised in an easily accessible way. For example, there’s a structure called a ‘dictionary’, which lets you store values under certain keys. So if you have the key of ‘name’, for instance, you could store the associated value of ‘Bob’ – then if you ever want to look up ‘Bob’, you can use the key of ‘name’!
Efficient Word Counting
You can quickly go through a list of words with a for loop. In the process, look and see if the current word has already been recorded in a dictionary. If it’s already there, add one to the count. If it’s not already there, it’s easy – just add it to the dictionary and give it a starting count of one. By checking out each word in the list this way, you can easily count all of the occurrences of any given word.
Using the Comparison Operator
Just a friendly reminder that you can easily compare two values with the use of the comparison operator which is either ‘>’ or ‘<‘. This operator will help you decide if one value is bigger or smaller than the other. For instance, if you have x and y, you can just write an if/else statement to compare them as you can see in the example.
if x > y:
# Code that runs if x is greater than y
else:
# Code that runs if y is greater than x
Let’s take a look at how we can solve this problem using a dictionary and a for loop. We will store each word as a key and the number of occurrences of that word as the value in the dictionary. Then, while looping through all the words, we will check if the current word is already in the dictionary. If it is, we will increment its count by one. But if not, we’ll add it to the dictionary with a count of one. To check the highest occurring word or words, we will create a variable and set it to 0.
As we loop through the dictionary, we will compare the count for each word with the variable. If it’s greater, we will set the variable to the current word’s count and also assign it to the variable for most frequent words. In case two or more words have the same number of occurrences, we’ll assign them both to the most frequent words variable. After we have completed the loop, we need to return the most frequent words variable which will contain the word or words that appeared the most number of times in the list.
# Function to find the most frequent words in a list
def most_frequent_words(words):
# Dictionary to store words and corresponding occurrence
word_count = {}
# Iterate through words list
for word in words:
# Check if word is already present in the dictionary
if word in word_count:
# If yes, increment count by 1
word_count[word] += 1
# If no, set count to 1
else:
word_count[word] = 1
# Variables to store highest occurrence and corresponding words
max_count = 0
most_frequent_words = []
# Iterate through dictionary
for word in word_count:
# Compare current word's count to maximum count
if word_count[word] > max_count:
# If count is greater, set max_count to current word's count
max_count = word_count[word]
most_frequent_words = [word]
elif word_count[word] == max_count:
# If count is equal, add word to the most_frequent_words list
most_frequent_words.append(word)
# Return most frequent words
return most_frequent_words
Linear Time Complexity
The time complexity of this code is O(n), where n is the length of the list of words. Basically we look through the list of words just one time and store them and their respective amount of instances in a dictionary. Afterwards, we go through the dictionary to find out which words have the highest count. We only look through the list and dictionary one time which results in linear time complexity.
Linear Space Complexity
As you can see, the space complexity of this code is O(n) where n is the length of the list of words. This is because the dictionary we use to store the words and their counts takes up the same amount of space as the total number of words in the list. Basically, all we need is one single data structure to keep track of all the elements, so the overall space complexity is still linear.