Developer Terms and Conditions › The Development › Molecular Sensing Models › Classification model not working despite high correlation
- This topic has 6 replies, 2 voices, and was last updated 8 years, 6 months ago by Ayelet.
-
AuthorPosts
-
May 10, 2016 at 7:03 am #3174guoweizhang@lanpengkj.comParticipant
Please see attached file. my classification model shows a pretty high correlation (i only used 12 samples instead of the recommended 40). But when I go to test this model, the results are not consistent and not correct in a lot of cases. Can you help me understand why?
I thought as long as the correlation is high, the model should be fairly accurate…
Attachments:
You must be logged in to view attached files.May 10, 2016 at 7:36 am #3177AyeletKeymasterThe reason for that is the size of the database.
I would recommend adding more samples and broadening your database, then evaluate again the model’s performance.
Ayelet
May 11, 2016 at 8:03 am #3275guoweizhang@lanpengkj.comParticipantAyelet, the correlation is actually very high despite the small size of the database. Why would the performance be weak if the correlation is high?
May 11, 2016 at 9:58 am #3277guoweizhang@lanpengkj.comParticipantAyelet, I added more samples to the database. Now it contains 48 samples (12 samples of 4 brands each). I still have a fairly high correlation (F1=0.96), but when i go and test the model, the performance is not accurate. not even close. Please see the attached file. I would appreciate it if you can point me in the right direction.
Attachments:
You must be logged in to view attached files.May 15, 2016 at 11:12 am #3306AyeletKeymasterHi Guoweiz,
The problem may be derived from a number of reasons.
We would like to ask you for some more information:
1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database)
2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples?
3. Did you use the solid sample holder?
A high correlation on a low number of samples can originate from over-fitting as well (in over-fitting, a statistical model describes random error or noise instead of the underlying relationship).
Therefore, even if the classification model seems successful, it is not robust.
May I ask which samples did you scan? Can you please add a photo of the samples in the data collection?
Ayelet
May 16, 2016 at 2:34 am #3313guoweizhang@lanpengkj.comParticipantThanks Ayelet, Below are answers to your questions.
1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database). I SCANNED THE COLLECTION SAMPLES AND TEST SAMPLES UNDER THE SAME CONDITION. THE TEMPERATURE MIGHT BE SLIGHTLY DIFFERENT AS I DID NOT TRY TO CONTROL THE TEMPERATURE EXACTLY.
2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples? I USED THE SAME PHYSICAL SAMPLE FOR 3 DATA COLLECTION SAMPLES (THE PHYSICAL SAMPLE IS JUST POSITIONED DIFFERENTLY IN EACH DATA COLLECTION SAMPLE). SO IN MY DATA COLLECTION, I HAVE 48 SAMPLES. THAT’S ACTUALLY FROM 16 DIFFERENT PHYSICAL SAMPLES.
3. Did you use the solid sample holder? I USED THE SOLID SAMPLE HOLDER. I’M ATTACHING PICTURES. I FIRST CRUSHED THE RICE SAMPLES INTO POWDERS AND FILLED THE SOLID SAMPLE HOLDER WITH 4 GRAMS OF POWDER.
Attachments:
You must be logged in to view attached files.May 29, 2016 at 8:08 am #3426AyeletKeymasterHi,
Since you only used 16 samples, most likely the problem is indeed the database size.
I would recommend, as I mentioned before, adding more samples to the the data collection.
Keep us updated!
Ayelet
-
AuthorPosts
- You must be logged in to reply to this topic.