Consumer Physics Developers Site | Topic: Classification model not working despite high correlation

This topic has 6 replies, 2 voices, and was last updated 8 years, 2 months ago by Ayelet.

Viewing 7 posts - 1 through 7 (of 7 total)

Author

Posts
May 10, 2016 at 7:03 am #3174

guoweizhang@lanpengkj.com
Participant

Please see attached file. my classification model shows a pretty high correlation (i only used 12 samples instead of the recommended 40). But when I go to test this model, the results are not consistent and not correct in a lot of cases. Can you help me understand why?

I thought as long as the correlation is high, the model should be fairly accurate…

Attachments:
You must be logged in to view attached files.

May 10, 2016 at 7:36 am #3177

Ayelet
Keymaster

The reason for that is the size of the database.

I would recommend adding more samples and broadening your database, then evaluate again the model’s performance.

Ayelet

May 11, 2016 at 8:03 am #3275

guoweizhang@lanpengkj.com
Participant

Ayelet, the correlation is actually very high despite the small size of the database. Why would the performance be weak if the correlation is high?

May 11, 2016 at 9:58 am #3277

guoweizhang@lanpengkj.com
Participant

Ayelet, I added more samples to the database. Now it contains 48 samples (12 samples of 4 brands each). I still have a fairly high correlation (F1=0.96), but when i go and test the model, the performance is not accurate. not even close. Please see the attached file. I would appreciate it if you can point me in the right direction.

Attachments:
You must be logged in to view attached files.

May 15, 2016 at 11:12 am #3306

Ayelet
Keymaster

Hi Guoweiz,

The problem may be derived from a number of reasons.

We would like to ask you for some more information:

1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database)

2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples?

3. Did you use the solid sample holder?

A high correlation on a low number of samples can originate from over-fitting as well (in over-fitting, a statistical model describes random error or noise instead of the underlying relationship).

Therefore, even if the classification model seems successful, it is not robust.

May I ask which samples did you scan? Can you please add a photo of the samples in the data collection?

Ayelet

May 16, 2016 at 2:34 am #3313

guoweizhang@lanpengkj.com
Participant

Thanks Ayelet, Below are answers to your questions.

1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database). I SCANNED THE COLLECTION SAMPLES AND TEST SAMPLES UNDER THE SAME CONDITION. THE TEMPERATURE MIGHT BE SLIGHTLY DIFFERENT AS I DID NOT TRY TO CONTROL THE TEMPERATURE EXACTLY.

2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples? I USED THE SAME PHYSICAL SAMPLE FOR 3 DATA COLLECTION SAMPLES (THE PHYSICAL SAMPLE IS JUST POSITIONED DIFFERENTLY IN EACH DATA COLLECTION SAMPLE). SO IN MY DATA COLLECTION, I HAVE 48 SAMPLES. THAT’S ACTUALLY FROM 16 DIFFERENT PHYSICAL SAMPLES.

3. Did you use the solid sample holder? I USED THE SOLID SAMPLE HOLDER. I’M ATTACHING PICTURES. I FIRST CRUSHED THE RICE SAMPLES INTO POWDERS AND FILLED THE SOLID SAMPLE HOLDER WITH 4 GRAMS OF POWDER.

Attachments:
You must be logged in to view attached files.

May 29, 2016 at 8:08 am #3426

Ayelet
Keymaster

Hi,

Since you only used 16 samples, most likely the problem is indeed the database size.

I would recommend, as I mentioned before, adding more samples to the the data collection.

Keep us updated!

Ayelet
Author

Posts

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

Attachments:

Attachments:

Attachments: