"A bottle of wine contains more philosophy than all the books in the world"
- Louis Pasteur.
Tasting and enjoying wine is one thing, understand what makes the wine taste good is another.
One of the joys about analytics is that actually you can pick up some of your favorite thematic and explore the reason why, understand some factors that sometimes are not so evident.
Saying so, the goal of the post today is to talk about Wine & Analytics.
That's true! let's do the following exercise, using a very exploratory approach:
- Evaluate the physicochemical properties on their quality and type. There are any differences between Red and White Wines?
- Does the perceptions about the overall quality (measured by the number of points that the WineEnthusiast rate the wine) varies according to their type?
For addressing the first point we will use a sample of 1599 red wines and 4898 white wines collected from the study of Cortez et al. And the second point we scrapped data from Wine Enthusiast Magazine (http://www.winemag.com/) and filter out only the reviews concerning to Portuguese wine. From a total of 4875 Portuguese Wines Reviewed by connoisseurs , we subset a sample of 1134 Portuguese Douro Wines to perform our analysis.
1. Evaluate the physicochemical properties on their quality and type. There are any differences between Red and White Wines?
One of the goals of this post is evaluate the influence of wine physicochemical properties on their quality and type. For this we used a sample of red and white variants of the Portuguese “Vinho Verde” wine. This dataset has 1599 red wines and 4898 white wines.
The variable classification is a sensorial property, with grade between 0 and 10 given by specialists. Let's check the distribution of the quality ratings:
For more details check the source: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Although this is a complex relationship to be study let's try to figure out if we can extract some interesting conclusions with this dataset.
Regarding the physicochemical properties of red wine we can see that ones with the highest correlation with quality are alcohol (r = 0.476) and volatile acidity (r = -0.391).
Red Wine Correlation Matrix
Alcohol level seems to be the psychochemical characteristic more related to quality. The volatile acidity it's responsible for a unpleasant taste of vinegar, when exists in high levels. It's measured in grams of acetic acid per litre (g/L). In general, the volatile acidity, it seems that for red wine when the volatile acidity decreases it increase the quality.
White Wine Correlation Matrix
Regarding, the white wine, we can also notice that there is a highest correlation between quality and alcohol (r = 0.436) and also a good correlation with density (r = -0.307). It's interesting to notice that the volatile acidity seems not be very related to quality in white wine. Talking about the density, the wines density (g/cm^3) depends on the percentage of alcohol and amount of sugar. If the wine is denser as it contains more sugars, and the quality decrease. Aligned results, as we can see in our sample if the density increases the perceived quality decreases.
We created bucket quality classes and saw the trend against white and red wines. The rating was classified by low (below 5), medium (5 and 6) and high (above 6). Using these classes regarding quality perceived values, we can see a clear distinction between high quality wines values regarding alcohol regarding the remain ones. All the groups was statistically significant (p-value < 0.05), both for white and red wines.
2. Does the perceptions about the overall quality varies according to their type?
As we told, for this second point we scraped data from the Wine Enthusiast Magazine and selected a sample of n= 1134 Portuguese Douro Wines reviews.
This dataset is divided into: 913 red wines (~ 80%) and 221 white wines reviews (~20%).
We can see that the points awarded distribution varies with a minimum of 81 and a maximum perfect score of 100. The average points awarded is 89.14.
We can see that there's a greater variation in the red wines, but nevertheless it seems that the is in this color that we have the most awarded wines.
What is the Relationship Between Price and Points Given?
Although, it’s a complex relationship and several studies point out that (Ashenfelter (2008), Shewbridge (1998), Reuter (2000)), most of these studies are what economists call "hedonic pricing" analyses. That is, the "price of good or service depends both on internal and external factors".
According to Miu (2001), when looking for a good wine, the uninformed consumer, who has limited knowledge of wine types and quality, will often set a price floor on the amount that he is willing to pay. We can see with our results that there's a positive relation between the points given and the wine price (r = 0.6 ).
From this chart we can identify the present of some outliers, or can we say wines that are overpriced. Let's filter out the wines with price greater than 100 and see how it looks.
Let's split the analysis by type and we can see that despite the correlation coefficient is similar (r=0.59 , r= 0.61 ) for both types of wine.
It's more clear to that that there are some wines that seems to be over-priced and others under-priced given the quality level. Specially for the White Wines trend.
For sure this is the first analysis with a simplistic approach to a complex problem but, we can see that, however, there is some variation of price among a given quality score. Given clue that there's more space to determine the quality perceived than only the price.
We can see that the quality is not only influenced by the price. But there's a degree of relation. For further analysis it would be interesting add extra information and also include some kind of regional effect, variety, alcohol level, etc.
Additional Exploration - Common Words
Taking the advantage of having the reviews extracted, we decided to go a step further and check if the reviews, so let's do a quick exploitation of the most common words for both wine and red wines.
Ok, it's also evident that both types are different also regarding that, and it makes totally sense from the consumer and expert perspective. ;)
Dividing our rating into high (above 95), medium (90 to 95) and the remain as "low". Just for the sake of understanding. We also find out some interesting result regarding the amount of words that an expert spent on each wine and the points given.
Is this telling us also something? Is the points given also related to the amount of words that the connoisseur puts in efforts while reviewing? How is this conclusions drawn in other wine samples? hmmm more things to explore in next posts maybe ;) using our portal.
But, one thing is sure, the analytics behind this theme is a world with a lot's of layers that can to be explored.
Talking about it ... how the brands being reviewed are positioned in this analysis? How can we use analytics to choose a bottle of wine?
See you later,