Consumer Physics Developers Site | Reply To: Categorization of users/data on the database

July 22, 2015 at 6:45 am #1571

Keymaster

sakrelaasta wrote:

I am not sure if I have understand fully the way that the database will work, so please excuse me if everything I mention is wrong (and also correct me) So I was thinking… It was been mentioned many times that, the power of the SCIO is the people that do scans and add them (with the necessary extra information) to the database. The good thing is that there will be a lot of people adding data The bad thing is that there will be a lot of people adding data!!! What I mean: Can I “trust” every scan and information that everyone will add? Probably no. And not because the user does a lazy job! Lets say that there is a data base for edible Oils. So I add some data also… – One day I go to the supermarket, buy a bottle of olive oil from a big company, scan it and add all the data that the bottle has about the oil. But in reality the data in the label, are not for this specific oil, but it is an average of all the measurements the company did in all the oil it has. So… maybe this scan+data are not the best for the data base – The second day I take a sample of my small (home) olive oil production, scan it, and take it to a chemistry lab to analyze it. These data will be much much better. So should these two scans, have the same “weight” on the data base? What I think is that maybe there should be a categorization of users that add data according to their background. For example ~ Normal users ~ Users with background in chemistry (or something) ~ Users that used specific measurement in a lab ~ Users that “have labs” (academic or private lab companies) ~ Users-labs that have ISO This way, after the users scans a new object to see the concentration of “chemical A” (lets say it has concentration of chemical A 31%), can choose: “Ok… let see if I accept only the “lab data” or “academic data”. Hmmm they are trustworthy, but there are only 5 different samples in the data base and non of them appears to be close to my scan (their measurements are 5%, 7% 15% 50% 52% and 60%)… not very accurate.. I know that in my sample the concentration of “chemical A” is between 15-50%, but it is not good enough. Lets add the “measurements in a lab”, aaa nice! now there are 50 samples, and many of them are around 22, 24, 30, 35% there are many of them close to my spectrum!!! Great!! that is perfect! My concentration is 31%” Sorry for the long topic… but I couldn’t wait anymore until it arrives to check it myself!! Nikos

Thanks Nikos,

You raise an important topic and, in general, your observation is correct. Collecting data from a community is complex and not trivial.

Our plan is to gradually roll out a plan where we will request data collection from the community gradually, in increasing levels of complexity and establish tools such as those you mention.

As a starting point (Q4’15-Q1’16), we will collect data that is relatively easy to validate. For example, identification of pills. We intend to request from users to scan medication and classify them. A ‘voting’ system will be implemented, so only if you get enough users with similar scans of the same medication will we accept it. Outliers will not be included.

As we learn more, we will build trust with the quality of data coming from specific contributors, and over time increase the complexity and variety of data we collect from the community.

Hagai