Remove Duplicates in an Unsorted Linked List in Python

by Kal Bartal

Problem: 

Given an unsorted linked list of integers, remove any duplicated nodes and return a reference to the head of the updated linked list.

Constraints:

The linked list should not be sorted in place, and you are not allowed to allocate extra memory. Your algorithm should have a run time complexity of O(n).

Example:

Input:

[1, 7, 3, 2, 3, 7, 1]

Output:

[1, 7, 3, 2]

Understanding the problem

This problem requires you to find and remove any duplicate elements in an unsorted linked list. The linked list should not be sorted in place and you are not allowed to allocate extra memory. Therefore, you will need to come up with an algorithm that scans the linked list, detects any duplicates, and then removes them from the list.

For example, given the following input of [1, 7, 3, 2, 3, 7, 1], the expected output should be [1, 7, 3, 2]. Notice that any duplication of 1 or 7 has been removed, resulting in the unique elements of the list being returned as the output. 

Your algorithm should have a run time complexity of O(n) – which means that it should be able to work through the list of elements quickly, rapidly detect any duplicates, and then efficiently remove them. 

Solving this problem

To solve this problem, you must have a basic knowledge of linked lists and their associated operations. You will also need to familiarise yourself with the concept of run time complexity and how it relates to your solution. 

While you are not allowed to allocate extra memory, you will need to understand the logic of how to utilise the existing memory of a linked list in order to detect duplicates. Also, the algorithm should be efficient and have a run time complexity of O(n). 

Associated Operations on Linked Lists

Linked lists are data structures comprising of a sequence of interconnected nodes. Each node stores a value, and additionally has a pointer pointing to the node containing the next value. The nodes in a linked list are internally connected through these pointers. 

The associated operations are the set of operations that can be performed on linked lists, given their structure. These operations can include: 

  • Inserting a node
  • Removing a node 
  • Finding a node with a given value 
  • Traversing the list to access individual nodes

For example, if you had a linked list of the numbers [1, 2, 3], each node would contain a single value and a pointer pointing to the following node. This sequence of nodes and their respective pointers would internally connect them together. 

If you wanted to add the number 4 to this list, you could utilise the insert operation and add the node containing 4 after the node containing 3. This would result in the new linked list of [1, 2, 3, 4]

As you can see, the operations of a linked list are useful in manipulating the order and content of the individual nodes. 

Examples

  • Inserting a node
  • Removing a node 
  • Finding a node with a given value 
  • Traversing the list to access individual nodes

Run Time Complexity

  • Efficiency of an algorithm and duration of task completion
  • Time to process input (O notation)
  • Linear time with increasing elements (O(n))

Run time complexity refers to how efficient an algorithm is and how long it takes for the algorithm to complete its task. It is measured by the amount of time needed for the algorithm to process its input, and is commonly represented in the Big-O notation.

For instance, if an algorithm takes 10 seconds to process 10 elements, it would have a run time complexity of O(10), meaning it takes 10 seconds to process 10 elements. On the other hand, an algorithm with a run time complexity of O(n) would take linear time as the amount of elements increase, i.e. if an algorithm takes 10 seconds to process 10 elements, it would take 20 seconds to process 20 elements.

For this particular problem, you are expected to develop an algorithm with a run time complexity of O(n), or linear time. This means that no matter how many elements are in the linked list, the algorithm should take linear time to detect duplicates and remove them.

Detecting Duplicates in a Linked List 

  • Track existing memory of the linked list
  • Compare each node’s value to the values of the nodes that follow it
  • Modify pointer of node before duplicate to remove it from the list

The logic of utilising the existing memory of a linked list to detect duplicates is to keep track of each node’s value and pointer. As you traverse the list, compare each node’s value to the values of the nodes that come after it. If a duplicate is found, you can remove it by modifying the node’s pointer before it.

For example, if the linked list you are traversing contains the elements [1, 7, 4, 5, 4], then you would go through each node and compare the subsequent nodes to the node you are currently at. In this case, when you reach the node containing the number 4, you would compare the next node’s value to see if it is a duplicate. In this case, the subsequent node does contain the same value of 4 so it can be identified as a duplicate. 

Now, you can simply modify the pointer of the node containing the number 4 so that it points to the node containing the number 5, instead of the duplicate node containing 4. This therefore removes the duplicate from the linked list, leaving you with [1, 7, 4, 5] as the output.

Examples:

  • If list is [1, 7, 4, 5, 4], compare the 4th node’s value to the subsequent one (the ‘4’)
  • Modify pointer of 4th node so it points to the 5th instead of the duplicate node

Removing Duplicates in an Unsorted Linked List in Python

Here is a Python solution to this problem which has a run time complexity of O(n)

# Definition for singly-linked list
class Node:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next


# Function to remove duplicates
def removeDuplicates(head):
    # Check if the linked list is empty
    if head is None:
        return head

    # Set the current node to the head of the list
    current_node = head

    # Loop through the linked list and compare each node to the subsequent node
    while current_node is not None:
        # Set the subsequent node to the node after the current node
        subsequent_node = current_node.next

        # Store the reference of the current node
        previous_node = current_node

        # Loop through subsequent nodes and compare the value to the current node
        while subsequent_node is not None:
            # If a duplicate is found
            if current_node.val == subsequent_node.val:
                # Store the reference of the subsequent node
                temp = subsequent_node

                # Point the previous node to the node after the duplicate
                previous_node.next = temp.next

                # Move the subsequent node one position forward
                subsequent_node = temp.next

            # If no duplicate is found, move the previous and subsequent node one position forward
            else:
                previous_node = subsequent_node
                subsequent_node = subsequent_node.next

        # Set the current node to the node one position ahead
        current_node = current_node.next

    # Return the reference to the head of the updated linked list
    return head

We define a simple Node class which contains a value and a reference to the subsequent node. We then create a function to remove duplicates which takes in the head of the linked list as a parameter. 

First, we check if the linked list is empty, and if it is, return the head.

We then set the current node to the head of the list and begin looping through the linked list. We set the subsequent node to the node after the current node, and also store the reference of the current node as the previous node. 

Once we have these references set, we begin looping through the subsequent nodes and compare their values to the value of the current node. 

If we find a duplicate, we store the reference of the subsequent node, remove the duplicate by modifying the pointer of the current node, and then move the subsequent node one position forward.

If no duplicate is found, we move the previous and subsequent node one position forward and start comparing the next node.

Once the comparison is done, we set the current node to the node one position ahead and begin looping through the list again until we reach the end. 

Finally, we return the reference to the head of the updated linked list.

Analysis of Time Complexity

  • O(n): Takes linear time to complete task
  • Double loop: Look at nodes/subsequent nodes, inner loop runs n times
  • Checks for duplicates, removes/moves subsequent node

The time complexity of this code is O(n). This means that it takes linear time to complete the task – no matter how many elements there are in the linked list, the algorithm will take the same amount of time to complete its task. 

Specifically, this algorithm uses a double loop iterating through the linked list in order to detect any duplicates and then remove them. The outer loop looks at each node in the linked list, and the inner loop looks at the subsequent nodes for comparisons. 

The inner loop then runs for a total of n times, where n is the number of elements in the linked list. During each iteration, the algorithm checks if there is a duplicate and either removes it or moves the subsequent node forward.

Therefore, the time complexity of this algorithm is O(n). The total number of operations it takes to complete the task depends on the number of elements in the linked list, and increases linearly as the number of elements increases.

O(1) Space Complexity 

  • Space complexity is O(1)
  • No extra memory needed
  • Memory consumption is constant

The space complexity of this code is O(1). This means that the algorithm requires a fixed amount of memory in order to complete its task. 

Specifically, this algorithm does not allocate any extra memory for it to execute. All it does is to iterate through the existing linked list and modify the references of the nodes to either detect duplicates or delete them. This means that the total memory consumption is constant regardless of the number of elements in the list.

Therefore, the space complexity of this algorithm is O(1). 

Final Thoughts

Requirements for Removing Duplicates in a Linked List:

  • Understand linked list structure
  • Concept of run time complexity
  • Algorithm efficiency

This challenge requires you to come up with an algorithm to remove duplicates in an unsorted linked list while keeping run time complexity and memory consumption to a minimum. It is a great problem to work on to get a deeper understanding of linked lists, the concept of run time complexity, and the importance of algorithm efficiency. 

Also, you have to think carefully about how to structure your code and how to utilise the existing memory of a linked list in order to detect duplicates. It is certainly a fun and challenging problem that can help you to hone your coding skills.

Related Posts

Leave a Comment