Hacking the engagement ring industrial complex

I’d do anything for love, including using “big data” techniques to analyze engagement ring prices for my girlfriend.  #Hooraybigdata!  In reality,  I used rudimentary tech skills to build a webcrawler that scraped publicly available pricing information off two major retailers’ websites.  Then I analyzed the data to get the best price.  This is my story.

Everybody seems to have an opinion about engagement rings.  Diamond industry marketers prey upon our fear and general lack of information to get us to spend more than would otherwise be prudent. While an engagement ring can be a nice symbol of your commitment, your choice of engagement ring is a personal decision that is YOURS to make alone, based on your values and budget.  Don’t let anybody tell you otherwise.  It is certainly NOT a sound financial investment.  Like a car, it depreciates as quickly as you buy it.

The good news is that diamonds are a commodity, albeit an expensive one.  At the retail level, the market is extremely competitive — a fact that you can use to cut through the marketing bullshit to get a better price.  While your local jeweler may know more than you do about diamonds, he doesn’t control the price of diamonds.  Only the DeBeers cartel can do that.  Armed with a little bit of information, you can probably get yourself a better deal.

Step 1: Gather the Data

I needed good data about pricing.  Luckily there are many websites out there with thousands of diamonds to choose from.  Blue Nile and diamondpriceguru are two good examples.  The pricing for these sites is extremely transparent: you enter a couple of parameters and you instantly have access to thousands of diamond prices.  I had heard anecdotally that Blue Nile was the least expensive of any site.  Is this true?  I set to find out.

I used my rudimentary technical skills to build a webcrawler.  There are many webcrawlers out there but I decided to use import.io for its simplicity.  After an hour of messing around, I had a 5,700 diamond dataset with data on the 4 Cs:  Cut, Color, Clarity, and Carat for two Websites, Blue Nile and James Allen.

It’s worth noting that my budget limited me to the 0.75-1.5 carat range so I didn’t collect data beyond this range.

The 4 Cs Explained

  • Cut: The cut of a diamond determines its sparkle and fire.  The conventional wisdom says that cut is the factor you want to splurge on because it can be readily observed with the naked eye.
  • Color:  The less color, the more you pay.  Very difficult to tell with the naked eye
  • Clarity:  Almost every diamond has tiny imperfections called inclusions.  Super difficult to see with the naked eye.
  • Carat:  A measure of weight, not size.  This is the most expensive factor of any diamond.

 Step 2: Analyze the Data

I fed my dataset into SPSS statistical software for analysis.


    • N=5,731
    • 69 % James Allen
    • 31 % Blue Nile
    • Range: 0.75-1.5 carat
    • Price per carat: mean $6,314 / standard deviation $2,768

The high standard deviation tells me that there is very high variability around diamond prices. Variability is good. Since my goal is to get the best diamond at the best price, the data suggests that there’s a wide range of prices out there, i.e. a better chance of getting a good deal.

I wanted to understand two things:  (1) what are the major drivers of diamond prices, and (2) If two retailers are selling largely similar commodities, can one be systematically more expensive than the other?

I used regression analysis to answer part 1.  Below is the output of my regression model.

Model Summary

Regression summary

Regression Coefficients

Regression output

I used the 4 Cs to predict the price per carat of a diamond.  The high R-square means that this model is pretty good at explaining price.  The .000 Sig column indicates that every variable is statistically significant.

Data Coding:

Cut Color Clarity Source
Ideal (1) D (1) FL (1) Blue Nile (0)
Excellent (2) E (2) VVS1 (2) James Allen (1)
Very good (3) F (3) VVS2 (3)
G (4) VS1 (4)
H (5) VS2 (5)
I (6) SI1 (6)
J (7) SI2 (7)

How would you use this model to predict the price of a 1.2 carat diamond with an excellent cut, D color, and VVS1 clarity from Blue Nile?  Using the coefficients above, the predicted price would be:

$7,977 constant – 2 x $297 (excellent cut) – 1 x $704 (D color) – 2 x $800 (VVS1 clarity) + 1.2 x $5,526 (carat) + 0 x $992 (source) = $11,710 x 1.2 carats = $14,052.  A quick online search of Blue Nile shows we are in the ballpark.

A quick note on Cut: I read online that this was the only feature worth splurging on since it’s the only feature that you can really see. My data totally backs this up.  Cut also happens to be the least expensive of the 4Cs. You are better off getting the best cut than a diamond with fewer flaws.

Next, I wanted to test if the conventional wisdom is true that Blue Nile is really the least expensive source for diamonds. The truth is I don’t have enough data to answer this question, since I only looked at 2 stores. However, I do have one store to compare it to, James Allen.  Being that they both sell pretty much the same thing online, and as far as I can tell no one provides a higher level of service than the other, I guessed that they would be about the same price.

To test this, I did a “difference of means test.”  This test is used to determine whether two things are the same, on average.  What is the likelihood that Blue Nile and James Allen have the same prices on average, given my data?  According to the test, extremely unlikely, as evidenced by the sig. 000 figure.  Not only are they not equal, but James Allen is significantly more expensive than Blue Nile, on average. My regression output backs this up.

Difference of Means

Difference of Means Output

Concluding Thoughts

  • Diamonds are an expensive commodity
  • Prices can vary considerably, even for two similar diamonds from different stores
  • Go for cut over anything else.  I crunched the numbers to prove it.
  • You don’t need to go to these lengths to price diamonds (although it’s fun)
  • Do your research online and use it to negotiate a better deal

In the end, I decided not to get a diamond at all.  Who wants to spend a lot of money on a commodity anyway?  Instead, I got a 2 carat sapphire from jewelry startup teamanco. It’s bigger than I could have afforded in a diamond, and very special.  She loves it.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s