# Hash Tables: Finding Duplicates in an Array | Day #19

## **Introduction**

Finding duplicates in an array is a common problem in software development, with applications ranging from detecting repeated data entries in **databases** to validating **user inputs** in forms. In large datasets, identifying duplicates efficiently is crucial, as traditional approaches like brute force comparison can be too slow.

A **hash table** provides a highly efficient way to solve this problem. By leveraging the **O(1)** time complexity of hash table operations, we can quickly track elements and identify duplicates with minimal overhead. If you’re new to hash tables, be sure to check out our YouTube video on [*What is a hash table?*](https://youtu.be/t77ZBjxDPcg?si=5jk1mmyCHmc1FXj3) for more context.

In this guide, we’ll explore how hash tables can efficiently find duplicates, walk through implementation in **Python** and **JavaScript**, and cover **real-world use cases** and edge cases.

---

%[https://youtu.be/mqWBl1zCinM?si=cWdNGD-fH2OO17gl] 

## **How Hash Tables Help Identify Duplicates**

A **hash table** stores data as **key-value pairs**, where each key corresponds to an element and maps to a specific memory location through a **hash function**. This structure allows for **constant time lookups** and insertions, making it ideal for tracking elements in an array as you traverse it.

### **Logic Behind the Approach**

To find duplicates, follow these steps:

1. Initialize an **empty hash table**.
    
2. Traverse the array element by element.
    
3. For each element:
    
    * If the element is **already in the hash table**, it’s a duplicate.
        
    * If the element is **not in the hash table**, add it.
        

This algorithm ensures that every element is checked only **once**, making it very efficient.

---

## **Algorithm Overview**

Here’s a step-by-step breakdown:

1. **Initialize an empty hash table** (can be an object or dictionary).
    
2. **Iterate through the array**.
    
3. **Check if the element exists** in the hash table:
    
    * If it exists, **add it to the duplicates list**.
        
    * If it doesn’t exist, **store it in the hash table**.
        
4. **Return the duplicates list** at the end.
    

Using a hash table allows us to avoid costly nested loops and detect duplicates in **O(n)** time.

---

## **Code Implementation**

Below are code examples in **Python** and **JavaScript** to demonstrate how to implement this solution.

### **Python Code Example**

```python
def find_duplicates(arr):
    hash_table = {}  # Initialize an empty hash table
    duplicates = []  # Store found duplicates

    for num in arr:
        if num in hash_table:
            duplicates.append(num)  # Add duplicate to the list
        else:
            hash_table[num] = True  # Insert the element in the hash table

    return duplicates

# Test the function
print(find_duplicates([1, 2, 3, 2, 4, 3]))  # Output: [2, 3]
```

### **JavaScript Code Example**

```javascript
function findDuplicates(arr) {
    const hashTable = {};  // Initialize an empty hash table
    const duplicates = [];  // Store found duplicates

    arr.forEach(num => {
        if (hashTable[num]) {
            duplicates.push(num);  // Add duplicate to the list
        } else {
            hashTable[num] = true;  // Insert the element in the hash table
        }
    });

    return duplicates;
}

// Test the function
console.log(findDuplicates([1, 2, 3, 2, 4, 3]));  // Output: [2, 3]
```

Both examples demonstrate a simple approach using hash tables to find duplicates in **O(n)** time. You can read more about the time complexity of hash tables in our detailed guide on [*How hash tables work*](https://hojaleaks.com/introduction-to-hash-tables-a-beginner-friendly-guide-day-18#heading-how-hash-tables-work).

---

## **Time Complexity Analysis**

Using a hash table for finding duplicates provides **optimal performance**.

* **Time Complexity:**
    
    * **Insertion** and **lookup** in a hash table take **O(1)** on average.
        
    * Traversing the array takes **O(n)**, where n is the size of the array.
        
    * Therefore, the overall time complexity is **O(n)**.
        
* **Space Complexity:**
    
    * A hash table requires **O(n)** space in the worst case to store all elements.
        

This makes hash tables an ideal solution for large datasets where performance is critical.

---

## **Edge Cases**

It’s essential to handle special scenarios to ensure your code works in all situations:

1. **Empty Arrays:**
    
    * Input: `[]`
        
    * Output: `[]`
        
    * Explanation: No elements mean no duplicates.
        
2. **Arrays with Unique Elements:**
    
    * Input: `[1, 2, 3, 4]`
        
    * Output: `[]`
        
    * Explanation: All elements are unique, so no duplicates are found.
        
3. **Arrays with Large Datasets:**
    
    * When handling large arrays, ensure that your **hash function distributes elements evenly** to avoid **collisions**. For more on this, see our article on hash table collision handling.
        

---

## **Real-World Use Cases**

Hash tables are extensively used in practical applications for **duplicate detection**, including:

* **E-commerce platforms:** Detecting repeated product listings in carts or catalogs.
    
* **Web forms:** Ensuring users don't submit duplicate registrations or responses.
    
* **Database management:** Identifying redundant entries to maintain data integrity.
    

These use cases highlight how hash tables provide efficient and practical solutions in everyday programming tasks.

---

## **Conclusion**

Using hash tables to find duplicates in an array is a **fast and efficient** approach, especially for large datasets. By leveraging the **O(1)** lookup time, you can quickly determine if an element has been seen before and avoid expensive nested loops.

If you’re interested in exploring more about hash tables, check out:

* [What is a hash table?](https://hojaleaks.com/introduction-to-hash-tables-a-beginner-friendly-guide-day-18)
    
* [How hash tables work](https://hojaleaks.com/introduction-to-hash-tables-a-beginner-friendly-guide-day-18#heading-how-hash-tables-work)
    
* [Hash Tables vs. Other Data Structures](https://hojaleaks.com/introduction-to-hash-tables-a-beginner-friendly-guide-day-18#heading-hash-tables-vs-other-data-structures)
    

---

## **Call to Action**

Now that you’ve learned how to use hash tables to find duplicates, try implementing the solution from scratch with **different datasets** and explore how the performance varies. You can also experiment with **alternative methods** like sorting-based duplicate detection.

Feel free to **share your results and questions** in the comments section, and dive deeper into the world of **hash tables**!