What's new

Creating a Shared Razor & Blade Research Database Shave Journal

Hi All, This started as a study of Chinese blades. As I worked on how to gather and analyze the data, I realized that with a little more work, it could be turned into something more valuable as a research tool and more useful to members of Badger & Blade.

Attached is a PDF diagram of the second cut of a Shave Research Database that emerged from the China blade project. At a high level, there are 5 tables:
  • Diary - a transactional table that records individual instances of a shaving experience
  • Blade - detailed information about specific razor blades
  • Razor - detailed information about specific razors
  • Company - detailed information about the companies that produce the razors and blades in their respective tables
  • Person - detailed information about the environmental conditions of the shaving experience (age, beard, skin, water, etc.) but no other personal information
Whenever a new person comes into the database, they create their own user ID, which is then one-way encrypted to a key (Generated ID in the diagram) to record and retrieve their specific information in the database. This is done to allow anyone to access the Diary table for research purposes and allow specific individuals to maintain a private personal diary.

Three regular monthly reports for anyone will be posted for download from the Diary:
  • Razor Summary of Outcome Distributions - a spreadsheet sorted by razors recorded in the Diary and the outcomes recorded. It is provided as a sheet so people can sort by razor, skin, and beard parameters, regardless of the blade being used.
  • Blade Summary of Outcome Distributions - a duplicate of the above only by blades regardless of the razor.
  • Razor by Blade Summary of Outcome Distributions - a combination of the above.
Additionally, those who contribute may download their personal spreadsheet of diary transactions and scroll through it anytime. Query capability will depend upon final implementation.

Contributors to the shave research database will also have a blade inventory capability.

Individuals can contribute their experiences without creating a personal diary and a user ID. They can optionally complete all of the personal information each time they contribute to the diary. Those with a user ID will have their "environment" information (skin, beard, etc.) automatically loaded, and selection lists for razors and blades will be in inventory.

I would appreciate any feedback you may have on the attached high-level design. Specifically, what am I missing that would be useful to you both in terms of a real shave diary and in terms of what you would like to know/learn about people's shaving experiences with products? I also need input into what needs to be recorded in terms of pre-shave and post-shave activities and products.

Right now, I am experimenting with Google Forms + Sheets and SharePoint + PowerApps. I would appreciate any recommendations here as well.

Any techies out there, feel free to jump in!

Thanks
 

Attachments

  • Shave Research Database Design V2.pdf
    3.6 MB · Views: 48
Last edited:
A few thoughts on this:
- what do you envision would be the primary research goals for someone using the dataset? I think this would be one way to consider gaps in data and structure of it

- are you hoping this would also allow for individuals to catalog their shave of the day down to exact equipment and products? If so there are typically some inherit challenges in reaching a model that works for most people in the way they prefer. In my experience for applications like this, you eventually run into the tension of how the platform wants to organize and record data vs the varying ways an individual prefers to catalogue and organize data, and to what granularity.

- are you seeing this offer a way for data contributors to ask questions and provide reporting about their personal history of shaves. (e.g. how many times did I used blade X, or what was my most used soap in year Y) I also find that this becomes a challenge once you move past simple queries. Inevitably you discover that the platform can't answer the question because the data was not captured or organized in a way to easily provide the answer. This is the primary reason I normally just build my own database because I have complete control over data structuring and running complex queries. But I also know most people will not be comfortable doing that.

- how are you thinking about data attributes that can change over time? One of the problems you might encounter is how to store transactional data that is related to some of your current fixed tables. For example, what if someone moves and needs to change the attributes of their water, or maybe they just traveled somewhere for a few weeks. Or maybe as they age their skin sensitivity changes. How would someone look back in time and see the right values for a given time period of shave transactions. You may want to consider using a document oriented or NoSql type database that would employ a strategy of data denormalization. Essentially you include multiple copies of data with other data that needs it, which may seem counter intuitive if coming from a more traditional structured/Sql style database setup. Like all things, there is a tradeoff that has to be evaluated and other techniques can be used.

- if this was a clinical grade research database, you would probably want to do something more sophisticated to anonymize contributor entries besides a one to one relationship to a single hashed id for every contributor's transaction. But I don't think that is necessarily critical for something here as long as you are doing something to anonymize ids during data set reporting or retrieval.

- Eventually this has to be hosted somewhere and that comes at some cost, at least at moderate scale. Would data contributors cover these costs or did you see some other mechanism?
 
A few thoughts on this:
- what do you envision would be the primary research goals for someone using the dataset? I think this would be one way to consider gaps in data and structure of it

- are you hoping this would also allow for individuals to catalog their shave of the day down to exact equipment and products? If so there are typically some inherit challenges in reaching a model that works for most people in the way they prefer. In my experience for applications like this, you eventually run into the tension of how the platform wants to organize and record data vs the varying ways an individual prefers to catalogue and organize data, and to what granularity.

- are you seeing this offer a way for data contributors to ask questions and provide reporting about their personal history of shaves. (e.g. how many times did I used blade X, or what was my most used soap in year Y) I also find that this becomes a challenge once you move past simple queries. Inevitably you discover that the platform can't answer the question because the data was not captured or organized in a way to easily provide the answer. This is the primary reason I normally just build my own database because I have complete control over data structuring and running complex queries. But I also know most people will not be comfortable doing that.

- how are you thinking about data attributes that can change over time? One of the problems you might encounter is how to store transactional data that is related to some of your current fixed tables. For example, what if someone moves and needs to change the attributes of their water, or maybe they just traveled somewhere for a few weeks. Or maybe as they age their skin sensitivity changes. How would someone look back in time and see the right values for a given time period of shave transactions. You may want to consider using a document oriented or NoSql type database that would employ a strategy of data denormalization. Essentially you include multiple copies of data with other data that needs it, which may seem counter intuitive if coming from a more traditional structured/Sql style database setup. Like all things, there is a tradeoff that has to be evaluated and other techniques can be used.

- if this was a clinical grade research database, you would probably want to do something more sophisticated to anonymize contributor entries besides a one to one relationship to a single hashed id for every contributor's transaction. But I don't think that is necessarily critical for something here as long as you are doing something to anonymize ids during data set reporting or retrieval.

- Eventually this has to be hosted somewhere and that comes at some cost, at least at moderate scale. Would data contributors cover these costs or did you see some other mechanism?
Great input.

I agonized over private data and the public availability of shave events. The published analysis data will have no identifiable data (not even the one-way user ID hash), so the IRB is cool with that. There will be age, beard parameters, skin parameters, etc. (I am still at a university, wearing belts and suspenders and passing it all through the IRB.) The IRB and I are discussing whether age is really relevant and if it gives enough information along with everything else to back into an identification (not sure why anyone would want to). The private personal shave diary only has the ID one-way hash in it. The idea is that anyone can create their own ID but not link it to anything like an email or mobile phone number (i.e., no links back to the real world). Since the only thing on the personal side is inventory and shave records, I'm not sure it's worth having people remember one more password, and since I don't have a phone, email, or IP address, there is no real way to TFA. The only reason I can imagine someone would want to break in is the mess with somebody, and then they would need to know the non-hashed ID. Theoretically, someone could pound on the login trying ID combinations (a lot of work for no gain), but I could put in something to kill any IP after a number of unsuccessful tries. Any other thoughts?

Your point about data structure and how people would use it is the crutch of my design issues.

My links between tables are combinatorial IDs. For example, I use an RPSH (RazorPlateSettingHandle) instead of a razor ID. In publishing the public database, that would convert to the razor manufacturer, razor, plate (with gap and exposure settings) or setting (for adjustable), handle manufacturer, handle (everything defaults to "from the factory" if the user has not changed anything). Likewise, brushes have a BKD##L##ID - brush handle manufacturer, knot = manufacturer + name = content, diameter in mm, loft in mm. Again, "from the factory" defaults. I need to work more on content with all the hybrid knots appearing.

I am reworking the Company table (source of manufacturer information) to make it more conducive to recording artisans and supporting their one-off creations. One of the data gurus on the IRB pointed out that if you know the artisan and could look up their one-offs, you could theoretically identify who was behind a hashed ID. I am still thinking that one through.

I am going to make a CSV dump of the diary for individuals who want to do their own in-depth analysis. My goal is to provide as simple an input as possible and "standard" outputs (hardware/software rankings by outcome, outcomes by hardware/software, inventory state, others if people suggest them)

"how are you thinking about data attributes that can change over time? " - this was the big issue for me (also a lot of software changes formulations). The diary itself is actually a flat file of all the information pulled from the "non-transactional" tables. Say someone adds a water softener; all the transactions after that point in time will reflect that, but those before that will not. broke every normalization rule, but the "real" purpose of the public database is to feed statistical programs and AIs. A user can load a SS or DB from the CSV if they want to do their own sophisticated analysis, otherwise, they have the canned summaries (which I suspect is 80% of potential users) and their inventory information.

Thank you so much for taking the time to comment!
 
I am looking at having the diary generate a post for B&B SOTD. I would need to add bowls to the inventory, as well as alum and septic sticks. I could generate the upfront list of preshave materials, razor, blade, brush, postshave materials, and passes that you could cut and paste into the B&B post. You would have to add any pictures or commentary. Would that be valuable to you??
 
Great input.

....

Thank you so much for taking the time to comment!
The added context was helpful. I can see where you are going with the data strategy given the different data consumer profiles and related use cases. I still think you (or whoever is implementing) may bump in to a few scale pains with data contributors consuming some of the basic functionality through a presumed app. But you should be able to migrate to alternative strategies, if and when those challenges are ever faced.
 
Design Version 3

Latest update. Version 3 of the complete database is attached. In addition to razors and blades it now has brushes, Pre-Shave Accoutrements, Post Shave Accoutrements. It also has an inventory system for total usage and usage since last purchase. It is a limited inventory system as it only really tracks blades (deducting every time a new blade is used, added to when user indicates a purchase. For other inventory items it tracks cost and instances of use in shaves. The main purpose is to let the user have an easy selection list of what they shave with so they don't have to type much.

There are 4 parts to the database. There is a hardware section covering Razors, blades, and brushes. I will initially populate it, but when anyone adds something not already in it, it will update. Details about the hardware can be added by the user, though some up it will trigger me to edit/review. For example, I will validate any BESS sharpness entries, or fill in any entries that are not complete. Also, the brushes section is set up to handle the case where the handle and knot are from separate suppliers. I am still working on generalizable performance parameters for knots, but the first cut is in the design. Razors can be "regular", discrete adjustable (alternate plates) or continuously adjustable (dial). The knot data is also set up to identify "named" knots where the producer has multiple knots of the same hair, but process them differently - Oumo is a good example. I have directly called out image and comment storage as how I do that will be dependent on which software I used to implement with.

The software section is broken into Pre-Shave, Lubricants, and Post-Shave software. "Inventory" is enabled, but only to track total usage over time, last purchase date and cost, usage since last purchase.

The personal section covers details about the shaver, the shaver's environment, and the shaver's process. Again, the intent is once it is entered in the beginning all that data will default into the diary and only has to be entered for a specific shave when there is an exception (e.g., you are traveling).

The fourth part is the diary itself. It is a transaction log of every shave entered by every user of the system. The only link to the diary user is a one way encrypted user name that the user picks for themselves. The personal section keeps a log of all the transaction IDs of a specific user so it appears to be a private database to them and no data can be tied back to the specific shaver except by the specific shaver (one way encrypted user name).

There are a couple of reasons for this. First, what I get out of it is a (hopefully) large database of shaving events that I can use to analyze individual hardware, software, and all their combinations in terms of outcomes for skin and beard types. I also hope to use the dairy to train an AI to advise individuals of options that might best work for them both individually and in combination. For that reason you will see almost all "subjective" data has been normalized to a 5 point Likert scale to simplify statistical work as well as getting consistency in the data.

Second, the full diary is used to update the performance data for the hardware and software to aid users in future purchase decisions.

Non-users can get a comma delimited version of the diary with transaction ID's set to zero. Users can get the same thing if they are into Geek stuff. Users can also get a comma delimited version with just their dairy entries and they can request a full dump with their diary entries identified (transaction ID set to 1 and all others set to 0) so they can compare their observations with everyone else.

I am just starting on the UI as much is dictated by how I implement. The intent is to have both app and web interfaces. Because most of the data is a value of 1-5, most of the interface will be checkboxes or pull down selectors.

Would really appreciate any comments or suggestions on the design attached as well as the data being collected. I have a prototype running in Notion at the moment but that is not viable to open to others. I have pretty much narrowed implementation down to either Firestore or Azure. Any comments or suggestions on that would be appreciated as well.

The intent is to run this for free for 5 years to collect the data, after which I will freely give it away to any person or organization that would like to take it over, maybe even a Wikipedia type model.

Thanks in advance for any input you may have.
 

Attachments

  • Shave Diary V3.2.pdf
    7.3 MB · Views: 26
Last edited:

Thanks in advance for any input you may have.
It wasn't apparent to me if in your model it was possible to associate multiple razors and blades with the same shave. I occasionally read about people using one razor for the main passes and then another one for clean up.
 
It wasn't apparent to me if in your model it was possible to associate multiple razors and blades with the same shave. I occasionally read about people using one razor for the main passes and then another one for clean up.
Yes - I do that myself. I also use adjustables, and hybrids (Everyday Stinger and Seygus Zepplin 7/9). You can have up to three per shave. If you need more you can use the comments, but it wouldn't show up in the analysis. The only reason I limit it to three is the analysis file is going to be just a flat file and I don't want it to be too large.
 
I have pretty much narrowed implementation down to either Firestore or Azure. Any comments or suggestions on that would be appreciated as well.
May I ask what Azure resource type were you considering for this ? Also, are you envisioning any build and deployment automation?
 
It wasn't apparent to me if in your model it was possible to associate multiple razors and blades with the same shave. I occasionally read about people using one razor for the main passes and then another one for clean up.
Hmm, it just dawned on my you should be able to flag razor/blade by pass. Into the next version.
 
May I ask what Azure resource type were you considering for this ? Also, are you envisioning any build and deployment automation?
The Azure costs are getting steep (this is a give away) and when I give the whole system away (maybe B&B will want it) I don't want to stick whoever takes over with a lot of code. Right now I am exploring Airtable and Caspio so I can keep everything in one environment and hopefully stay away from much code. If I go with Azure it would be SQL, Web App Services, and Blob storage (pictures). Haven't settled on the forms/programming front end yet.
 
The Azure costs are getting steep (this is a give away) and when I give the whole system away (maybe B&B will want it) I don't want to stick whoever takes over with a lot of code. Right now I am exploring Airtable and Caspio so I can keep everything in one environment and hopefully stay away from much code. If I go with Azure it would be SQL, Web App Services, and Blob storage (pictures). Haven't settled on the forms/programming front end yet.
I won't bog down the thread as I imagine you are trying to focus on closing out the data modeling. But I might suggest some alternative approaches when it comes time to implementation, along with devops strategies, given the realities of what sounds like a solo maintainer situation and the goals you mentioned above. My apologies if this is all stuff you are experienced with in real world deployments. I've just been down this road a few times :)
 
I won't bog down the thread as I imagine you are trying to focus on closing out the data modeling. But I might suggest some alternative approaches when it comes time to implementation, along with devops strategies, given the realities of what sounds like a solo maintainer situation and the goals you mentioned above. My apologies if this is all stuff you are experienced with in real world deployments. I've just been down this road a few times :)
Firm believer in team coding 😃
 
Latest update - much thanks for the observations and comments especially from the "Nerds". This is probably close to final in terms of the first round of development. Please take a look and let me know if you see anything glaringly wrong.
 

Attachments

  • Shave Diary V4.pdf
    7.7 MB · Views: 16
After a long day, I've settled on the implementation platform. It will be Google's Firestore and Adalo. Being NoSQL Firestore will make it easier to adjust the information model as I learn for early users. It is also the cheapest production option (since I am funding this myself for the first few years). Adalo reduces the amount of coding and will make it easier to pass the system on to someone else later. Also, Adalo gives me both web and mobile interfaces.

I am now going dark as I need to learn both of these and likely rework the information/data model to reflect Firestore's document orientation.

If anyone has any experience with these two and would like to help I would be truly appreciative. Until then I will be in geek mode and give updates if something significant is happening. Again, thanks to all for the input that helped to get this far.
 
Hi Jim,

Thanks again for the work you are putting into this. I got behind the laptop to have a better look.

  • Blade list, razor list and brush list look fine to me. Usual line items I would say. In terms of the options behind these lists - will users have the opportunity to add new options or is it a fixed list? (I like the unintentional typo for Plissoft by the way - it now sounds like a knot you are likely to never use! :lol:) - O, wait, I see the "Other". Ok check.
  • I can also follow Lubricants and Pre-Shave Lists. I understand the presence or necessity even of ingredients lists, but that might also be harder to trace as not all producers are that transparent about these lists if transparent at all.
  • The designations for Lather Quality might result in biased results.
    • As they are stated now, I can have poor quality with extreme satisfaction - if I understand correctly.
    • There are different camps regarding thickness of lather and some would rate a runny watery lather as excellent whereas I might be inclined to rate that lower.
    • Perhaps the number is all that is needed to categorize the lather as a sort of dummy value - or - users might need the opportunity in their personal list to identify their preferences (I know this is a can full of worms) - or - lather quality may need two entries, one for type (rather than quality) and one for rating (being quality).
  • I see no arrow from skin feel to satisfaction or other designations. Is this field necessary given the other table you have on Hydrate/Soften (bottom left) which I find very useful?
  • On beard condition I wonder if splitting cheeks and neck would be benificial as I have read reports by others that cheeks are no problem whereas neck causes challenges. Your designations sound right to me.
  • Of course mileages may vary, but I use effectiveness rather than aggressiveness in labelling blades or razors.
  • Finally, I get the staying away from end result ratings. People are very used to using them though, so I would suggest to include them anyway even if you do not use them in your modeling.

Again, interesting endavour to say the least. I am curious to insights from the model and simply curious ifthis could actually work yes or no. Happy to think along if needed.

Good luck!!

Cheers,

Guido.

PS: I just read you were going dark so maybe my post is in vain.
 
Hi Jim,

Thanks again for the work you are putting into this. I got behind the laptop to have a better look.

  • Blade list, razor list and brush list look fine to me. Usual line items I would say. In terms of the options behind these lists - will users have the opportunity to add new options or is it a fixed list? (I like the unintentional typo for Plissoft by the way - it now sounds like a knot you are likely to never use! :lol:) - O, wait, I see the "Other". Ok check.
  • I can also follow Lubricants and Pre-Shave Lists. I understand the presence or necessity even of ingredients lists, but that might also be harder to trace as not all producers are that transparent about these lists if transparent at all.
  • The designations for Lather Quality might result in biased results.
    • As they are stated now, I can have poor quality with extreme satisfaction - if I understand correctly.
    • There are different camps regarding thickness of lather and some would rate a runny watery lather as excellent whereas I might be inclined to rate that lower.
    • Perhaps the number is all that is needed to categorize the lather as a sort of dummy value - or - users might need the opportunity in their personal list to identify their preferences (I know this is a can full of worms) - or - lather quality may need two entries, one for type (rather than quality) and one for rating (being quality).
  • I see no arrow from skin feel to satisfaction or other designations. Is this field necessary given the other table you have on Hydrate/Soften (bottom left) which I find very useful?
  • On beard condition I wonder if splitting cheeks and neck would be benificial as I have read reports by others that cheeks are no problem whereas neck causes challenges. Your designations sound right to me.
  • Of course mileages may vary, but I use effectiveness rather than aggressiveness in labelling blades or razors.
  • Finally, I get the staying away from end result ratings. People are very used to using them though, so I would suggest to include them anyway even if you do not use them in your modeling.

Again, interesting endavour to say the least. I am curious to insights from the model and simply curious ifthis could actually work yes or no. Happy to think along if needed.

Good luck!!

Cheers,

Guido.

PS: I just read you were going dark so maybe my post is in vain.
Not in vein and even more appreciated. I wrestled with the whole BBS DFS, SAS, etc. thing. Maybe a color slider so people could use it, and there are some algorithms that look at continuous values (slider) versus discrete (Likert scale) and can infer more reliable values (think distributions of values between the Likert values). The math is past my union card but I have friends, so good idea.

Agree on cheeks and necks (next iteration before implementation - damn you).

I understand your lather comments. I need to ponder that a little more (double damn you :facep:)

Thank you very much
 
Just an update. The learning curve on Firestore/Firebase has been a little more than expected (or maybe my 70-year-old brain is less receptive than expected). It's also my lack of familiarity with NoSQL databases. As I learn, I redesign. Anyway, it is starting to look realistic with the advantages of Firestore, meaning I can implement it more quickly once I have the new design figured out. There are some interesting AI capabilities here, too, so my idea of a shaving advisor is getting legs.
 
Here is the current state of the Shared Shave Diary Design.

There will be two parts. First is the Firestore database where there is a Research Collection which contains all the product level information (razors, plates, blades, brushes, software, etc.) and a Persons Collection which contains the users details, inventory and personal shave diary. I've attached the current design of the Firestore database.

When ever a user posts a diary entry, their diary is immediately updated and an anonymized duplicate is entered into a SharedDiaryMemo Collection. The personal diary is correct in real time and contains links to the Research Collection. The SharedDiaryMemo Collection contains an expanded record where links are replaced with the corresponding text in a comma delimited format. All comment fields are quoted (quotes will not be allowed in comment fields). Probably once a week a corn job will read the SharedDiaryMemo Collection and post those records/documents to a flat file in Google Cloud Storage which anyone will have read access to for research or AI training.

Users will have a hash key that they can use to identify their entries in the public flat file so they can analyze their shaves versus others or just pull their shaves for their own analysis.

I would appreciate anyone looking at the current design and commenting (what's left out).
 

Attachments

  • Firestore Design Document v5.pdf
    162.2 KB · Views: 9
Version 7 is finished and I am pretty much ready to strat implementing. This is pretty complete. I really would appreciate any reviews and comments that anybody can spare - both for the content and the design. Thanks in Advance!
 

Attachments

  • Firestore Design Document v7.pdf
    250 KB · Views: 12
Top Bottom