# Analytics Calculations: Match+Index 3: Finding Duplicates

josephmckeown
Moderator, Lightspeed Staff Posts:

**59**moderator
We can use Analytics to help identify instances of duplicate Products, Customers, or pretty much any dimension on a report. Say we want to find duplicate Products. We could start by running the

Let's also add System ID as a dimension, to distinguish instances of Products with the same name from one another...

Then, to rule out case-sensitivity, let's use the "Upper" calculation from our cell-based functions to convert all content in the Product Description to upper case...

So when saving, everything is presented in capital letters...

The next thing we want to do is ask Analytics to find the

And we are going to ask Analytics to match the upper case descriptions...

...

...this should show us the row that each Description first shows up on, and we should expect to find that in

...the first instance is identical to the row.

Most cases.

It's the ones that don't match their row that we want to find now.

Let's save our first version of the report...

So now, let's ask Analytics to identify where the first instance is

Let's create an IF statement. If the row is the same as the first instance, return a 0, if not, return a 1.

So we'll start the IF statement...

Copy+paste the "First Instance" calculation...

and compare it to the row...

So that when we save...

We get a new calculation highlighting duplicates over others. Again, let's save this report as new...

Now we don't want to just identify the duplicates. We want to pull them out in a meaningful way. To start this, let's have the duplicates rise in sequence. We could do this by using the running_total function from our math functions...

and ask it to add the 1s from the duplicates...

so that when we save...

...with each new duplicate on the list comes a new number in sequence.

Let's save our work so far as a new report...

So with each new duplicate comes a new integer: 1, 2, 3, etc. Does this remind you of any other sequence? If you guessed rows, you're right!

That means we can ask Analytics, by row, to find

We'll start it with brackets...

First, we'll ask it to look at the Row...

...then, we'll copy the running total of duplicates...

...and ask it to look at this sequence...

Now look at the numbers that show up on our table...

It tells us that the first duplicate is in row 298, that the second duplicate is in row 409, that the third is in row 700, and so on...

We're almost done, but once again, let's save our work as new...

Finally, let's use the Index function to pull up these Descriptions so that we can now find them in Lightspeed!

We'll start our Index function with brackets...

Then ask it to look in the Descriptions column...

...for our matched position...

...of rising duplicates.

So now when we save...

We get a list of only duplicated Products!

Let's save our work again as new.

There are a few cosmetic things we can do now to tighten up the report. First, we can remove all but perhaps the final two calculations...

So that gives us more real estate to play with...

Also, let's hide the first set of dimensions from the Visualization...

This is an important step because the Dimensions on the left-hand-side have

Finally, we could also ask Analytics to look for the System IDs for duplicate products, just by copy+pasting our final calculation...

...but substituting the index column from Description for the the System ID

This way, when we save...

We get one report that finds the results we need for our next steps in Lightspeed!

*"Dusty Inventory"*report, and removing the filters for "Is Dusty" and "Quantity on Hand is greater than zero"Let's also add System ID as a dimension, to distinguish instances of Products with the same name from one another...

Then, to rule out case-sensitivity, let's use the "Upper" calculation from our cell-based functions to convert all content in the Product Description to upper case...

So when saving, everything is presented in capital letters...

The next thing we want to do is ask Analytics to find the

*first*instance of each Description. We can do this by using the "Match" calculation from our table-based functions. I'll start by creating the calculation in brackets...And we are going to ask Analytics to match the upper case descriptions...

...

*to*the upper case descriptions......this should show us the row that each Description first shows up on, and we should expect to find that in

*most*cases......the first instance is identical to the row.

Most cases.

It's the ones that don't match their row that we want to find now.

Let's save our first version of the report...

So now, let's ask Analytics to identify where the first instance is

*not*the same as the row ID, this means that the first instance is higher on the table, and the current instance therefore, a duplicate.Let's create an IF statement. If the row is the same as the first instance, return a 0, if not, return a 1.

So we'll start the IF statement...

Copy+paste the "First Instance" calculation...

and compare it to the row...

So that when we save...

We get a new calculation highlighting duplicates over others. Again, let's save this report as new...

Now we don't want to just identify the duplicates. We want to pull them out in a meaningful way. To start this, let's have the duplicates rise in sequence. We could do this by using the running_total function from our math functions...

and ask it to add the 1s from the duplicates...

so that when we save...

...with each new duplicate on the list comes a new number in sequence.

Let's save our work so far as a new report...

So with each new duplicate comes a new integer: 1, 2, 3, etc. Does this remind you of any other sequence? If you guessed rows, you're right!

That means we can ask Analytics, by row, to find

*where*on the table Duplicate 1 occurs, again, using the Match function.We'll start it with brackets...

First, we'll ask it to look at the Row...

...then, we'll copy the running total of duplicates...

...and ask it to look at this sequence...

Now look at the numbers that show up on our table...

It tells us that the first duplicate is in row 298, that the second duplicate is in row 409, that the third is in row 700, and so on...

We're almost done, but once again, let's save our work as new...

Finally, let's use the Index function to pull up these Descriptions so that we can now find them in Lightspeed!

We'll start our Index function with brackets...

Then ask it to look in the Descriptions column...

...for our matched position...

...of rising duplicates.

So now when we save...

We get a list of only duplicated Products!

Let's save our work again as new.

There are a few cosmetic things we can do now to tighten up the report. First, we can remove all but perhaps the final two calculations...

So that gives us more real estate to play with...

Also, let's hide the first set of dimensions from the Visualization...

This is an important step because the Dimensions on the left-hand-side have

*no bearing*to the results on the right...Finally, we could also ask Analytics to look for the System IDs for duplicate products, just by copy+pasting our final calculation...

...but substituting the index column from Description for the the System ID

This way, when we save...

We get one report that finds the results we need for our next steps in Lightspeed!

**Note, this process will work only for the first 5,000 rows of results in Analytics. Remember you can filter results by starting letters of Description, so that if duplicates are not in the same Category or Vendor, you can still identify them!**Calculations Home*Post edited by josephmckeown on June 2018

Joseph McKeown

Lightspeed Analytics and reporting consultant

Lightspeed HQ

Lightspeed Analytics and reporting consultant

Lightspeed HQ

#### Recent Discussions

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

## Quick Links

#### Popular Tags

- api 145
- eCom 78
- retail 77
- eCommerce 44
- integrations 33
- Lightspeed Analytics 31
- Lightspeed 31
- sales 30
- support 21
- apps 21
- ecom api 20
- shipping 20
- Product Development 19
- Lightspeed Payments 16
- theme 16
- Custom Field 14
- workarounds 14
- producten 14
- product 14
- Online Store 13
- receipts 13
- Restaurant 13
- tips 13
- themes 13
- omnichannel 12