What's new

Statistical Nerds League (SNL)

This is getting intense. I love it.

I had a thought about rating compression after discussing some of my scores with a few individuals.

I'm currently grading my great shaves in the 9.0 range for closeness. Some have commented that my shaves fall in the 7.5 to 8.0 closeness range on @T Bone's scale.

This got me thinking about rating compression and headroom at the upper end of the scale. I'm wondering if some of us perceive rating steps (7, 8, 9) as being linear ones while others perceive them as being logarithmic.

For subjective assessments like comfort, whether people rate logarithmically may be a consideration. I seem to think logarithmically for comfort.

My statistical knowledge is primitive, and it's been decades since I've thought about this beyond basic concepts.

I'm just throwing it out there.

... Thom
I think that it varies greatly. We like to think linearly because it is easier to picture, but things does not end up that way. Besides what is a linear shave rating scale even? Mine nor anyone else's can probably be classified as an objective linear scale. Even a simple five step scale like 1,2,3,4,5 - (no shave, SAS, CCS, DFS, BBS) is not linear and we will have rating compression naturally when we improve our shaving skill and the great majority of our shaves are at the upper end. Then we may break up DFS and BBS in many more levels. Well now it is even less linear.

Your comment regarding logarithmic scale made me chuckle. Maybe we should have ratings in dB. We can set the reference on SAS. dBSAS = 20 * log(<rating>/<SAS rating>) that ought to do it!

SAS = 1
CCS = 10
DFS = 100
BBS = 1000

Now we can have a lot of levels! Today I had a 572 shave rating -> 55.1 dBSAS 🤣🤣
 
This is getting intense. I love it.

I had a thought about rating compression after discussing some of my scores with a few individuals.

I'm currently grading my great shaves in the 9.0 range for closeness. Some have commented that my shaves fall in the 7.5 to 8.0 closeness range on @T Bone's scale.

This got me thinking about rating compression and headroom at the upper end of the scale. I'm wondering if some of us perceive rating steps (7, 8, 9) as being linear ones while others perceive them as being logarithmic.

For subjective assessments like comfort, whether people rate logarithmically may be a consideration. I seem to think logarithmically for comfort.

My statistical knowledge is primitive, and it's been decades since I've thought about this beyond basic concepts.

I'm just throwing it out there.

... Thom
Wohooo Thom you got the #100 post!
 
Jim,

I have read your design version 3 and your structure is impressive. I see some duplications but you already mentioned those were deliberate.

I have requested moderators in the past if it was possible to get a download of the SOTD thread to bills that database, but unfortunately the structure of the forum doesn’t allow that without significant efforts on their part.

Your approach is the second best thing and a daunting task to say the least.

You do pose a million dollar question if people will indeed use it as the sheet number of variables to be assessed is extensive. Even when populated. I am all for data analyses and I like it even for the sake of it, because in honesty I have never actually done anything with my statistics other than checking razor usage. I have favourite blades, but the past year been blade agnostic and simply shave with the blade off the week so to speak.


It definitely might yes and it would be an idea to get these handled if we can, but I also think circumstances might be very different even if minute between shaves purely because of the operator. For example I load for 15 counts before face lathering. Is my count even throughout? Or did I inadvertently speed up this morning? Do I apply consistent and constant issues when loading or even when face lathering?

So I see much benefit in finding in internal validity in data. Given the subjective nature of the YMMV I can also hypothesise that IRR might not be applicable because the underlying events are less similar than expected. ICC tries to counter that effect by calculating IRR based on one observation. There is a paper available on calculations. You might want to check it out if that may make sense for shaving data.

This would be super cool. You could do HLM on individual or razor (I think) and if we populate the database from across the globe it would make even more possible.

Just some first thoughts.

Cheers,

Guido
Thank you for those very thoughtful insights. Do qualitative research is always a shot in the dark (as seen by the irreproducibility of about half the psychology research papers). I console myself with the thought that at worse case I will have shown the wet shaving world is chaotic as opposed to complex (which I think the techniques I've identified can handle) and will save others from wasting time.

If a sufficient number of people decide to use the diary (I have to make it really appealing) at a shave a day the law of large numbers kicks in and the analysis get better (across a 1000 observations 1 to 2 second variations in beard soak time becomes immaterial).

You make a comment that I have had as an observation lurking on the boards. As a shaver becomes more experienced, the more indifferent they become towards razor and blade. Not that they do not have preference, its just an increasingly insignificant choice. Kind of like the Zen "Chop wood, carry water" with equanimity towards axe and bucket.
 
I'm wondering if some of us perceive rating steps (7, 8, 9) as being linear ones while others perceive them as being logarithmic.

Even a simple five step scale like 1,2,3,4,5 - (no shave, SAS, CCS, DFS, BBS) is not linear and we will have rating compression naturally when we improve our shaving skill and the great majority of our shaves are at the upper end.
I believe we are using what is known as a Likert scale after Rensis Likert who did research on the distances between scores on a lineair scale - we assume equal distances where in actuality these are likely to be different. Mostly used in questionnaires but I think it might also apply here.

I have some linear progressions in my scoring from DFS+ to DFS+++ but other than that I try to minimize having decimals (a rule I violate by the way).

What does come to my mind is that I would suggest to keep it simple for lightweight analysis and some substantiation to thoughts and hunches rather than going for analysis paralysis and beating the fun out of shaving for the sake of accuracy.

I would love a regression model to match razor and blade but I wonder how long I will continue using it once it’s in place.

Cheers,

Guido
 
I believe we are using what is known as a Likert scale after Rensis Likert who did research on the distances between scores on a lineair scale - we assume equal distances where in actuality these are likely to be different. Mostly used in questionnaires but I think it might also apply here.

I have some linear progressions in my scoring from DFS+ to DFS+++ but other than that I try to minimize having decimals (a rule I violate by the way).

What does come to my mind is that I would suggest to keep it simple for lightweight analysis and some substantiation to thoughts and hunches rather than going for analysis paralysis and beating the fun out of shaving for the sake of accuracy.

I would love a regression model to match razor and blade but I wonder how long I will continue using it once it’s in place.

Cheers,

Guido
Yes, they are Likert scales. They work best when each level has a defined measurable meaning. If you look at the shave diary information model you'll see I avoid BBS et al and give specific measurable tests for each level. It's one way to handle the relativism that pops up over time.
 
Hi All

I've posted V4 of the Shave Diary information model here. It is likely final for first round of development. Feel free to take a look and ping me if you see anything majorly wrong. Thanks for all the great input.

 
Ok - so I let my inner geek come out and play some more.

Not only does our data suffer from "hard to compare to each other"-ness, it is also majorly skewed - because I don't know about you but my shave results tend not to be normally distributed, which I consider to be a great thing in this case! That does pose problems however if you want to do addition analyses, like Jim @Stikeyoda is trying to achieve with regressions and modelling.

Just to give you an idea, this is the graph of my end results (i.e. BBS, DFS, et cetera):

1716488670004.png


It is negatively skewed, meaning I have more observations to the right than I do to the left of my average / mean. Z-scores will make it somewhat easier, but as the Z-score revolves around the mean, the skweness of my population is not (fully) addressed by such a transformation.

Fortunately there are other techniques to achieve that. It is a bit trial and error, because you need to check what it does to your distribution. Technically I should be looking at residuals, but I am avoiding statistical software as much as I can at this point - it is for fun and not to complete a second PhD (I am perfectly fine with the one I already have). One that worked on my data set of 651 entries was using an inverse (which is related to the negative skewness - if your population is positively skewed, so towards the left rather than right, don´t inverse) formula - namely:

Transformed score = 1/ ((MAX(H2:H651)+1)-H2), where

H2:H651 is the range of the scores you are transforming and H2 is the inidivual score you want to transform.

What this formula does is basically offsetting your individual score to the maximum score in your population, thus making it relative to that. If I apply that my graph is as follows:

1716489141438.png


Still not ideal, but much more spread. If I then add my weights to the distribution (Quality*3, Comfort*2 and Effectiveness*4) using these inversed scores, I get the following graph:

1716489355351.png


A much, much better even spread across my datapoints, making it easier to do statistical analyses and even regressions if I wanted to. Funny thing is, that now my spreadsheet tells me what my gut already knew (my current top 5 razors - I selected only those with 10 shaves or more):
  1. Blackbird
  2. Lambda Athena
  3. Karve OC C
  4. Mühle Rocca R94
  5. Ti95
In terms of blade performance, my current top 5 is:
  1. Rubie Plus
  2. Gillette Stainless
  3. Nacet
  4. KCG
  5. GSB

Or was I just tossing my data around to get to this list?! Haha! Either way, I had fun!

Cheers,

Guido.
 
Last edited:
Ok - so I let my inner geek come out and play some more.

Not only does our data suffer from "hard to compare to each other"-ness, it is also majorly skewed - because I don't know about you but my shave results tend not to be normally distributed, which I consider to be a great thing in this case! That does pose problems however if you want to do addition analyses, like Jim @Stikeyoda is trying to achieve with regressions and modelling.

Just to give you an idea, this is the graph of my end results (i.e. BBS, DFS, et cetera):

View attachment 1850986

It is negatively skewed, meaning I have more observations to the right than I do to the left of my average / mean. Z-scores will make it somewhat easier, but as the Z-score revolves around the mean, the skweness of my population is not (fully) addressed by such a transformation.

Fortunately there are other techniques to achieve that. It is a bit trial and error, because you need to check what it does to your distribution. Technically I should be looking at residuals, but I am avoiding statistical software as much as I can at this point - it is for fun and not to complete a second PhD (I am perfectly fine with the one I already have). One that worked on my data set of 651 entries was using an inverse (which is related to the negative skewness - if your population is positively skewed, so towards the left rather than right, don´t inverse) formula - namely:

Transformed score = 1/ ((MAX(H2:H651)+1)-H2), where

H2:H651 is the range of the scores you are transforming and H2 is the inidivual score you want to transform.

What this formula does is basically offsetting your individual score to the maximum score in your population, thus making it relative to that. If I apply that my graph is as follows:

View attachment 1850993

Still not ideal, but much more spread. If I then add my weights to the distribution (Quality*3, Comfort*2 and Effectiveness*4) using these inversed scores, I get the following graph:

View attachment 1850997

A much, much better even spread across my datapoints, making it easier to do statistical analyses and even regressions if I wanted to. Funny thing is, that now my spreadsheet tells me what my gut already knew (my current top 5 razors - I selected only those with 10 shaves or more):
  1. Blackbird
  2. Lambda Athena
  3. Karve OC C
  4. Mühle Rocca R94
  5. Ti95
In terms of blade performance, my current top 5 is:
  1. Rubie Plus
  2. Gillette Stainless
  3. Nacet
  4. KCG
  5. GSB

Or was I just tossing my data around to get to this list?! Haha! Either way, I had fun!

Cheers,

Guido.
I think this is another way to go since we certainly does not have normal distributed data. It is heavily left skewed. I graphed some histograms of my data and indeed it it also heavily skewed. I had to read up a bit on transformations to try to make it more normal distribution like and it was a pretty cool read. Especially for different degrees of skewedness. sqrt, log10, ln, inverse transformations and also doing a reflection on left skewed data. If I read your transformation correct you performed a reflected inverse transformation.

I guess how you can check if you did well is how close the mean, median is to each other for a dataset. I guess that if they are far apart is an indication on that you are skewed to one side or the other.

When you combined your data you did that after the transformation I assume?
 
Ok - so I let my inner geek come out and play some more.

Not only does our data suffer from "hard to compare to each other"-ness, it is also majorly skewed - because I don't know about you but my shave results tend not to be normally distributed, which I consider to be a great thing in this case! That does pose problems however if you want to do addition analyses, like Jim @Stikeyoda is trying to achieve with regressions and modelling.

Just to give you an idea, this is the graph of my end results (i.e. BBS, DFS, et cetera):

View attachment 1850986

It is negatively skewed, meaning I have more observations to the right than I do to the left of my average / mean. Z-scores will make it somewhat easier, but as the Z-score revolves around the mean, the skweness of my population is not (fully) addressed by such a transformation.

Fortunately there are other techniques to achieve that. It is a bit trial and error, because you need to check what it does to your distribution. Technically I should be looking at residuals, but I am avoiding statistical software as much as I can at this point - it is for fun and not to complete a second PhD (I am perfectly fine with the one I already have). One that worked on my data set of 651 entries was using an inverse (which is related to the negative skewness - if your population is positively skewed, so towards the left rather than right, don´t inverse) formula - namely:

Transformed score = 1/ ((MAX(H2:H651)+1)-H2), where

H2:H651 is the range of the scores you are transforming and H2 is the inidivual score you want to transform.

What this formula does is basically offsetting your individual score to the maximum score in your population, thus making it relative to that. If I apply that my graph is as follows:

View attachment 1850993

Still not ideal, but much more spread. If I then add my weights to the distribution (Quality*3, Comfort*2 and Effectiveness*4) using these inversed scores, I get the following graph:

View attachment 1850997

A much, much better even spread across my datapoints, making it easier to do statistical analyses and even regressions if I wanted to. Funny thing is, that now my spreadsheet tells me what my gut already knew (my current top 5 razors - I selected only those with 10 shaves or more):
  1. Blackbird
  2. Lambda Athena
  3. Karve OC C
  4. Mühle Rocca R94
  5. Ti95
In terms of blade performance, my current top 5 is:
  1. Rubie Plus
  2. Gillette Stainless
  3. Nacet
  4. KCG
  5. GSB

Or was I just tossing my data around to get to this list?! Haha! Either way, I had fun!

Cheers,

Guido.
I'm looking at different statistical techniques as they can influence how I set up and structure the data in the shared shave diary I am working on. Some of what I have come up with may help.

Subjective ratings data like the shave diary responses often tend to be skewed; they are not evenly distributed around the mean, as you have discovered. Here are some statistical techniques well-suited for analyzing skewed data:

Non-Parametric Tests: These tests don't rely on assumptions about the underlying data distribution, making them robust to skewness.

Examples:

Mann-Whitney U test (compares two independent groups) -
The Mann-Whitney U test assesses whether the medians of two independent groups are significantly different. In your case:

Independent Variable (Grouping Factor):​
You can use this test to compare two groups based on a binary characteristic, such as:​
Shaving Soap Type (e.g., "Soap" vs. "Cream")​
Blade Coating (e.g., "Coated" vs. "Uncoated")​
Water Softener Usage (e.g., "Yes" vs. "No")​
Plate (e.g., OC vs. SB)​
Any other binary variable you collect (remember anything can be turned into a binary variable - this and not this).​
Remember, the two groups must be independent (a user's response in one group shouldn't influence their response in the other group).​
Dependent Variable (Outcome): You can apply the Mann-Whitney U test to any ordinal rating scale (smoothness, comfort, overall satisfaction).​
Example Scenario (from what I am working on):​
Suppose you want to determine if there's a significant difference in "Overall Satisfaction" ratings between when you used pre-shave oil and when you didn't.​
Group Data: Divide your shave diary entries into two groups: "Pre-Shave Oil Used" and "No pre-shave oil"​
Rank Data: Assign ranks to all the "Overall Satisfaction" ratings across both groups combined.​
Calculate Test Statistic (U): The Mann-Whitney U test calculates a test statistic (U) based on the sum of ranks in each group.​
Determine Significance: Compare the calculated U value to critical values from the Mann-Whitney U distribution table (or use software) to determine if the difference is statistically significant.​

Interpreting Results:​
Null Hypothesis: The null hypothesis is that there's no difference in "Overall Satisfaction" ratings between the two groups.​
Significant Result: If the p-value is less than your chosen significance level (e.g., 0.05), you can reject the null hypothesis, concluding that there's a statistically significant difference in satisfaction ratings between the two groups.​

Advantages of Mann-Whitney U Test:​
Non-Parametric: Suitable for ordinal data (Likert scales - which is why I am using them) and doesn't assume normal distribution (which we know our data won't be).​
Robust to Outliers: Less sensitive to outliers than parametric tests like t-tests.​
Ease of Interpretation: The results are easy to understand and communicate.​

Considerations:​
Independence: The two groups must be independent (not paired data).​
Sample Size: The test works best with a reasonable sample size (at least 10-15 observations per group).​

Kruskal-Wallis test (compares three or more groups) -
Let's say you want to investigate if different razor types lead to significantly different smoothness ratings (WTG or ATG, for example). You would do the following:​
Group Data: Divide your shave diary entries into groups based on razor type (e.g., SE Razor group, DE Razor group).​
Rank Data: Assign ranks to all the smoothness ratings across all groups combined.​
Calculate Test Statistic (H): The Kruskal-Wallis test calculates a test statistic (H) based on the sum of ranks within each group.​

Determine Significance: Compare the calculated H value to a critical value from the chi-squared distribution to determine if the differences between groups are statistically significant.​

Interpreting Results:​
Null Hypothesis: The null hypothesis is that there's no difference in smoothness ratings across razor types.​
Significant Result: If the p-value is less than your chosen significance level (e.g., 0.05), you can reject the null hypothesis, concluding that there are significant differences in smoothness ratings between at least two of the razor types.​

Post-Hoc Tests:​
If you find a significant result, you'll need to conduct post-hoc tests (like Dunn's test or Conover-Iman test) to determine which razor types differ significantly.​

Wilcoxon signed-rank test (compares paired samples) - might be informative. More applicable to me as you would need pairs of individuals, but you could use yourself on bad days versus good days (I know I'm like a different person then :) )

Transformations:

You already applied an inverse transformation; you might want to consider the following:
Log transformation (for right-skewed data)
Square root transformation (for moderate right-skewed data)

Generalized Linear Models (GLMs):

GLMs extend linear regression to handle non-normal response variables.
For my Likert-scale data, ordinal logistic regression would be appropriate. I suspect your data is somewhat similar I suspect
This model accounts for the ordered nature of the ratings and doesn't assume normality.

Bootstrapping (if you've got a lot of time on your hands):

Bootstrapping is a resampling technique that can provide robust estimates of standard errors and confidence intervals even with skewed data. It involves repeatedly sampling your data with replacement and recalculating the statistic of interest.

Hope this gives you some ideas.
 
Here is the current state of the Shared Shave Diary Design.

There will be two parts. First is the Firestore database where there is a Research Collection which contains all the product level information (razors, plates, blades, brushes, software, etc.) and a Persons Collection which contains the users details, inventory and personal shave diary. I've attached the current design of the Firestore database.

When ever a user posts a diary entry, their diary is immediately updated and an anonymized duplicate is entered into a SharedDiaryMemo Collection. The personal diary is correct in real time and contains links to the Research Collection. The SharedDiaryMemo Collection contains an expanded record where links are replaced with the corresponding text in a comma delimited format. All comment fields are quoted (quotes will not be allowed in comment fields). Probably once a week a corn job will read the SharedDiaryMemo Collection and post those records/documents to a flat file in Google Cloud Storage which anyone will have read access to for research or AI training.

Users will have a hash key that they can use to identify their entries in the public flat file so they can analyze their shaves versus others or just pull their shaves for their own analysis.

I would appreciate anyone looking at the current design and commenting (what's left out).
 

Attachments

  • Firestore Design Document v5.pdf
    162.2 KB · Views: 7
Version 7 is finished and I am pretty much ready to strat implementing. This is pretty complete. I really would appreciate any reviews and comments that anybody can spare - both for the content and the design. Thanks in Advance!
 

Attachments

  • Firestore Design Document v7.pdf
    250 KB · Views: 3
Version 7 is finished and I am pretty much ready to strat implementing. This is pretty complete. I really would appreciate any reviews and comments that anybody can spare - both for the content and the design. Thanks in Advance!
Jim,

Looks very complete and thorough. I am not a coder (at least not since my teens and that was only BASIC), but I can follow the translation from your previous tables to this document. I only have one observation at this point. The price chart on the bottom of page 4 (perusing to brushes) is a copy of the blade one. I think you either meant to adjust the price points and still need to do that or possibly copy the razor price table instead.

Good luck!!

Guido
 
Jim,

Looks very complete and thorough. I am not a coder (at least not since my teens and that was only BASIC), but I can follow the translation from your previous tables to this document. I only have one observation at this point. The price chart on the bottom of page 4 (perusing to brushes) is a copy of the blade one. I think you either meant to adjust the price points and still need to do that or possibly copy the razor price table instead.

Good luck!!

Guido
Thanks for taking the time! It was late so you're likely correct. There just meant to be starting points as they will update as people add items to their inventory. I do need to make that one a better start
 
Top Bottom