One of the most common issues you can face in Airtable is data duplication. But how do we handle it?
The most common source of data duplication are forms. Respondents sometimes submit the form twice, just because they were not sure whether the first time was successful. This, in turn, creates duplicate records in Airtable.
Let’s explore three ways of handling the issue, which are different in difficulty, flexibility, and degree of automation:
Method | 1. Automations | 2. Scripting | 3. Dedupe [Article Part 2] |
---|---|---|---|
Data quantity | <100 records | >100 records | <100 record |
Handling complexity (e.g.,merging & actions) | low | high | medium |
When to use | Incoming records from forms | Can be implemented in any setting | Will run on existing records only |
The easiest way to handle data duplication in Airtable is with the use of Airtable Automations. With this system, you will be able to mark any new entry as duplicated based on the existing data. The main requirement for this deduplication method is a unique identifier field (column).
The unique identifier field is nothing but a value that has to be unique for each of the records. This can be an employee number, email address or a Student ID number. As long as it is unique, it can be used to remove duplicate values. In the following example, I will use an email field as such a unique identifier.
To begin, we need to create an Automation. For that, we need to click on the “automations” layer which is available on the top left corner of your Airtable base. Once you click on it, you will have the option to click on the “Create automation” on the bottom left corner of your screen. Now let’s go through the Automation setup step by step
The first step is to select a trigger which will prompt your automation to run. Let’s select the “When record matches condition” trigger. This option will trigger as soon as a record matches the condition we will define in step #2.
For the trigger to work, it needs a couple of input parameters. Firstly, select the table which contains the records you want to de-duplicate. Then you set the condition to trigger, when the unique identifier is not empty.
Our example is displayed in the screenshot below. The deduplication automation will run when the email field is “not empty”.
Please note: this trigger will run as soon as the cell value is changed. Therefore, it works best with form submissions, but it is not flexible enough to handle duplication through manual entry in grid view.
Next thing, we select the action we would like to perform. In our case, we would like to identify all the records that have the same identifier. To do that, we use the “Find Records” action.
The screenshot below shows how we set up this action to filter for duplicates in our Airtable:
This action will then pass on a list of all the records that have the same email address to the next action
Before we can mark the item as duplicate, we need to have an intermediate step where we check how many records there are for this specific email. We do this by inserting a conditional action.
Our goal is to trigger the action, if there is more than one record with the same email address. Luckily, the previous action already selected all the records with the same email address. We just need to extract the information and put it into a conditional statement.
When we create the condition, we are automatically prompted to insert a statement:
Once this is done, the conditional actions will run, every time the length of the records with the same email is larger than one.
If the new record is added, we need to mark them as such. For that, we use the “Update Records” action. In the table we create a field (column) called “Duplicate”. When the record is a duplicate, we want the field value to be “true”
The screenshot below shows how we set up this action to mark duplicates in our Airtable:
Now you will need to activate the automation by toggling the button on the top left corner to “on”. This will then automatically tag all the incoming records as duplicates. To further handle the duplicates, you can insert additional automations or manually filter the grid view in the “Data layer” by the “Duplicate” field. Then, you will have to sort out the duplicates manually.
Some things that you need to keep in mind are that for this option to work, there must be a unique identifier field. Furthermore, this method only works with new records, as the trigger is only activated once a record is changed.
Furthermore, handling the duplicates after the tagging with this method is still manual. Nevertheless, for some easier applications with limited records it might be the perfect solution. The next solution is more complex, but it overcomes many of the drawbacks of this simple method
This method of de-duplication is more difficult but also more flexible. It is more appropriate for environments where there isn’t necessarily one single field as a unique identifier, or the actions to be taken with the duplicate values are more complicated (e.g., combine records automatically). This method will be using the scripting extension to handle duplication, allowing you to delete, update or perform any other operation on the duplicate records.
In our example, we will be using the same starting table as for the simple method A (automation). Thus, the actions our script performs are still quite simple, but there are now limits to what can be done. If you are not experienced with Java Script, or you would just like to find out what is possible through the Airtable API an Airtable consultant can best answer your questions. Monino Solutions offers free sessions to answer quick questions. If you are interested feel free to sign up here.
Our sample script linked below works with the same email field as already used in method A to remove the duplicates. Below you can find a step by step instruction on how to run the sample script.
Firstly, make sure that you are on the data layer of your base (top left corner right next to the base name). Go to the “Extensions” option on the top right corner of your Airtable base. This will open the Extension sidebar with an option to click on “Add an extension” on the top right corner. After you click that, please select “Scripting” from the marketplace pop-up.
Once the extension is added, it will offer you a “Edit code button”. If you open it it will give you an entry field for the code. Here you can write your own code, or insert sample code and tailor it to your needs. Click here to get the code for the de-duplication sample script from our github page.
Now that you have the sample code in your extension, please make sure to adapt it to your data structure. For example:
In adjusting this kind of script is where Airtable developers from Monino Solutions can bring the most value. If you are not experienced with Java Script, an Airtable developer could make up for the cost just through the time savings it generates for you.
Below you can find a more detailed explanation of the script.
The script can look intimidating at first, nevertheless, it eliminates the duplicates in a very similar way as method A.
This is just an example, we can work together on customising your script in many ways, depending on your specific business needs. If you found this article helpful, don’t forget to bookmark our website for future, useful articles. Feel free to also schedule a first, introductory call to discuss your specific needs and how we can help you achieve them.
If you are interested in learning about the third method, Airtable’s Dedupe Extension (Method C), read the second part of this article click here.
Click here to schedule a free initial consultation session to discuss your specific needs.