What's new

Creating a Shared Razor & Blade Research Database Shave Journal

Hi All, This started as a study of Chinese blades. As I worked on how to gather and analyze the data, I realized that with a little more work, it could be turned into something more valuable as a research tool and more useful to members of Badger & Blade.

Attached is a PDF diagram of the second cut of a Shave Research Database that emerged from the China blade project. At a high level, there are 5 tables:
  • Diary - a transactional table that records individual instances of a shaving experience
  • Blade - detailed information about specific razor blades
  • Razor - detailed information about specific razors
  • Company - detailed information about the companies that produce the razors and blades in their respective tables
  • Person - detailed information about the environmental conditions of the shaving experience (age, beard, skin, water, etc.) but no other personal information
Whenever a new person comes into the database, they create their own user ID, which is then one-way encrypted to a key (Generated ID in the diagram) to record and retrieve their specific information in the database. This is done to allow anyone to access the Diary table for research purposes and allow specific individuals to maintain a private personal diary.

Three regular monthly reports for anyone will be posted for download from the Diary:
  • Razor Summary of Outcome Distributions - a spreadsheet sorted by razors recorded in the Diary and the outcomes recorded. It is provided as a sheet so people can sort by razor, skin, and beard parameters, regardless of the blade being used.
  • Blade Summary of Outcome Distributions - a duplicate of the above only by blades regardless of the razor.
  • Razor by Blade Summary of Outcome Distributions - a combination of the above.
Additionally, those who contribute may download their personal spreadsheet of diary transactions and scroll through it anytime. Query capability will depend upon final implementation.

Contributors to the shave research database will also have a blade inventory capability.

Individuals can contribute their experiences without creating a personal diary and a user ID. They can optionally complete all of the personal information each time they contribute to the diary. Those with a user ID will have their "environment" information (skin, beard, etc.) automatically loaded, and selection lists for razors and blades will be in inventory.

I would appreciate any feedback you may have on the attached high-level design. Specifically, what am I missing that would be useful to you both in terms of a real shave diary and in terms of what you would like to know/learn about people's shaving experiences with products? I also need input into what needs to be recorded in terms of pre-shave and post-shave activities and products.

Right now, I am experimenting with Google Forms + Sheets and SharePoint + PowerApps. I would appreciate any recommendations here as well.

Any techies out there, feel free to jump in!

Thanks
 

Attachments

  • Shave Research Database Design V2.pdf
    3.6 MB · Views: 23
Last edited:
A few thoughts on this:
- what do you envision would be the primary research goals for someone using the dataset? I think this would be one way to consider gaps in data and structure of it

- are you hoping this would also allow for individuals to catalog their shave of the day down to exact equipment and products? If so there are typically some inherit challenges in reaching a model that works for most people in the way they prefer. In my experience for applications like this, you eventually run into the tension of how the platform wants to organize and record data vs the varying ways an individual prefers to catalogue and organize data, and to what granularity.

- are you seeing this offer a way for data contributors to ask questions and provide reporting about their personal history of shaves. (e.g. how many times did I used blade X, or what was my most used soap in year Y) I also find that this becomes a challenge once you move past simple queries. Inevitably you discover that the platform can't answer the question because the data was not captured or organized in a way to easily provide the answer. This is the primary reason I normally just build my own database because I have complete control over data structuring and running complex queries. But I also know most people will not be comfortable doing that.

- how are you thinking about data attributes that can change over time? One of the problems you might encounter is how to store transactional data that is related to some of your current fixed tables. For example, what if someone moves and needs to change the attributes of their water, or maybe they just traveled somewhere for a few weeks. Or maybe as they age their skin sensitivity changes. How would someone look back in time and see the right values for a given time period of shave transactions. You may want to consider using a document oriented or NoSql type database that would employ a strategy of data denormalization. Essentially you include multiple copies of data with other data that needs it, which may seem counter intuitive if coming from a more traditional structured/Sql style database setup. Like all things, there is a tradeoff that has to be evaluated and other techniques can be used.

- if this was a clinical grade research database, you would probably want to do something more sophisticated to anonymize contributor entries besides a one to one relationship to a single hashed id for every contributor's transaction. But I don't think that is necessarily critical for something here as long as you are doing something to anonymize ids during data set reporting or retrieval.

- Eventually this has to be hosted somewhere and that comes at some cost, at least at moderate scale. Would data contributors cover these costs or did you see some other mechanism?
 
A few thoughts on this:
- what do you envision would be the primary research goals for someone using the dataset? I think this would be one way to consider gaps in data and structure of it

- are you hoping this would also allow for individuals to catalog their shave of the day down to exact equipment and products? If so there are typically some inherit challenges in reaching a model that works for most people in the way they prefer. In my experience for applications like this, you eventually run into the tension of how the platform wants to organize and record data vs the varying ways an individual prefers to catalogue and organize data, and to what granularity.

- are you seeing this offer a way for data contributors to ask questions and provide reporting about their personal history of shaves. (e.g. how many times did I used blade X, or what was my most used soap in year Y) I also find that this becomes a challenge once you move past simple queries. Inevitably you discover that the platform can't answer the question because the data was not captured or organized in a way to easily provide the answer. This is the primary reason I normally just build my own database because I have complete control over data structuring and running complex queries. But I also know most people will not be comfortable doing that.

- how are you thinking about data attributes that can change over time? One of the problems you might encounter is how to store transactional data that is related to some of your current fixed tables. For example, what if someone moves and needs to change the attributes of their water, or maybe they just traveled somewhere for a few weeks. Or maybe as they age their skin sensitivity changes. How would someone look back in time and see the right values for a given time period of shave transactions. You may want to consider using a document oriented or NoSql type database that would employ a strategy of data denormalization. Essentially you include multiple copies of data with other data that needs it, which may seem counter intuitive if coming from a more traditional structured/Sql style database setup. Like all things, there is a tradeoff that has to be evaluated and other techniques can be used.

- if this was a clinical grade research database, you would probably want to do something more sophisticated to anonymize contributor entries besides a one to one relationship to a single hashed id for every contributor's transaction. But I don't think that is necessarily critical for something here as long as you are doing something to anonymize ids during data set reporting or retrieval.

- Eventually this has to be hosted somewhere and that comes at some cost, at least at moderate scale. Would data contributors cover these costs or did you see some other mechanism?
Great input.

I agonized over private data and the public availability of shave events. The published analysis data will have no identifiable data (not even the one-way user ID hash), so the IRB is cool with that. There will be age, beard parameters, skin parameters, etc. (I am still at a university, wearing belts and suspenders and passing it all through the IRB.) The IRB and I are discussing whether age is really relevant and if it gives enough information along with everything else to back into an identification (not sure why anyone would want to). The private personal shave diary only has the ID one-way hash in it. The idea is that anyone can create their own ID but not link it to anything like an email or mobile phone number (i.e., no links back to the real world). Since the only thing on the personal side is inventory and shave records, I'm not sure it's worth having people remember one more password, and since I don't have a phone, email, or IP address, there is no real way to TFA. The only reason I can imagine someone would want to break in is the mess with somebody, and then they would need to know the non-hashed ID. Theoretically, someone could pound on the login trying ID combinations (a lot of work for no gain), but I could put in something to kill any IP after a number of unsuccessful tries. Any other thoughts?

Your point about data structure and how people would use it is the crutch of my design issues.

My links between tables are combinatorial IDs. For example, I use an RPSH (RazorPlateSettingHandle) instead of a razor ID. In publishing the public database, that would convert to the razor manufacturer, razor, plate (with gap and exposure settings) or setting (for adjustable), handle manufacturer, handle (everything defaults to "from the factory" if the user has not changed anything). Likewise, brushes have a BKD##L##ID - brush handle manufacturer, knot = manufacturer + name = content, diameter in mm, loft in mm. Again, "from the factory" defaults. I need to work more on content with all the hybrid knots appearing.

I am reworking the Company table (source of manufacturer information) to make it more conducive to recording artisans and supporting their one-off creations. One of the data gurus on the IRB pointed out that if you know the artisan and could look up their one-offs, you could theoretically identify who was behind a hashed ID. I am still thinking that one through.

I am going to make a CSV dump of the diary for individuals who want to do their own in-depth analysis. My goal is to provide as simple an input as possible and "standard" outputs (hardware/software rankings by outcome, outcomes by hardware/software, inventory state, others if people suggest them)

"how are you thinking about data attributes that can change over time? " - this was the big issue for me (also a lot of software changes formulations). The diary itself is actually a flat file of all the information pulled from the "non-transactional" tables. Say someone adds a water softener; all the transactions after that point in time will reflect that, but those before that will not. broke every normalization rule, but the "real" purpose of the public database is to feed statistical programs and AIs. A user can load a SS or DB from the CSV if they want to do their own sophisticated analysis, otherwise, they have the canned summaries (which I suspect is 80% of potential users) and their inventory information.

Thank you so much for taking the time to comment!
 
I am looking at having the diary generate a post for B&B SOTD. I would need to add bowls to the inventory, as well as alum and septic sticks. I could generate the upfront list of preshave materials, razor, blade, brush, postshave materials, and passes that you could cut and paste into the B&B post. You would have to add any pictures or commentary. Would that be valuable to you??
 
Great input.

....

Thank you so much for taking the time to comment!
The added context was helpful. I can see where you are going with the data strategy given the different data consumer profiles and related use cases. I still think you (or whoever is implementing) may bump in to a few scale pains with data contributors consuming some of the basic functionality through a presumed app. But you should be able to migrate to alternative strategies, if and when those challenges are ever faced.
 
Top Bottom