I looked into the summary statistic which I shared previously and I was a bit confused to distinguish the different between “Distinct” and “Unique” when both of them give almost the similar meaning. In one of the datasets, the name column has “Distinct” value of 740. This means within the datasets there are 740 unique names.
Then, why the “Unique” value is not 740 also.
Here is the explanation I found in most layman way, “Distinct” means total number of different values regardless how many times it appears in the dataset. A name appears in the list multiple times is counted as 1 distinct count.
Whereas, the “Unique” value is total number of values that only appear once. This means there are 740 distinct names in the dataset and out of it there is 485 names has 1 record only.
Depend on the business use cases, for example in my project, it mentions that for client names appear more than once, keep the first record. Then, after some data cleaning, we will able to see the “Distinct” value and “Unique” value is the same in the statistics summary.