May 10, 2016 at 7:03 am #3174
Please see attached file. my classification model shows a pretty high correlation (i only used 12 samples instead of the recommended 40). But when I go to test this model, the results are not consistent and not correct in a lot of cases. Can you help me understand why?
I thought as long as the correlation is high, the model should be fairly accurate…May 10, 2016 at 7:36 am #3177
The reason for that is the size of the database.
I would recommend adding more samples and broadening your database, then evaluate again the model’s performance.
AyeletMay 11, 2016 at 8:03 am #3275
Ayelet, the correlation is actually very high despite the small size of the database. Why would the performance be weak if the correlation is high?May 11, 2016 at 9:58 am #3277
Ayelet, I added more samples to the database. Now it contains 48 samples (12 samples of 4 brands each). I still have a fairly high correlation (F1=0.96), but when i go and test the model, the performance is not accurate. not even close. Please see the attached file. I would appreciate it if you can point me in the right direction.May 15, 2016 at 11:12 am #3306
The problem may be derived from a number of reasons.
We would like to ask you for some more information:
1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database)
2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples?
3. Did you use the solid sample holder?
A high correlation on a low number of samples can originate from over-fitting as well (in over-fitting, a statistical model describes random error or noise instead of the underlying relationship).
Therefore, even if the classification model seems successful, it is not robust.
May I ask which samples did you scan? Can you please add a photo of the samples in the data collection?
AyeletMay 16, 2016 at 2:34 am #3313May 29, 2016 at 8:08 am #3426
Since you only used 16 samples, most likely the problem is indeed the database size.
I would recommend, as I mentioned before, adding more samples to the the data collection.
Keep us updated!
- You must be logged in to reply to this topic.