Developer Terms and Conditions The Development Molecular Sensing Models Classification model not working despite high correlation

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #3174

    Please see attached file.  my classification model shows a pretty high correlation (i only used 12 samples instead of the recommended 40).  But when I go to test this model, the results are not consistent and not correct in a lot of cases.  Can you help me understand why?

     

    I thought as long as the correlation is high, the model should be fairly accurate…

    Attachments:
    You must be logged in to view attached files.
    #3177
    Ayelet
    Keymaster

    The reason for that is the size of the database.

    I would recommend adding more samples and broadening your database, then evaluate again the model’s performance.

     

    Ayelet

    #3275

    Ayelet, the correlation is actually very high despite the small size of the database.  Why would the performance be weak if the correlation is high?

     

    #3277

    Ayelet, I added more samples to the database.  Now it contains 48 samples (12 samples of 4 brands each).  I still have a fairly high correlation (F1=0.96), but when i go and test the model, the performance is not accurate.  not even close.  Please see the attached file.   I would appreciate it if you can point me in the right direction.

     

    Attachments:
    You must be logged in to view attached files.
    #3306
    Ayelet
    Keymaster

    Hi Guoweiz,

     

    The problem may be derived from a number of reasons.

    We would like to ask you for some more information:

    1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database)

    2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples?

    3. Did you use the solid sample holder?

     

    A high correlation on a low number of samples can originate from over-fitting as well (in over-fitting, a statistical model describes random error or noise instead of the underlying relationship).

    Therefore, even if the classification model seems successful, it is not robust.

     

    May I ask which samples did you scan? Can you please add a photo of the samples in the data collection?

     

    Ayelet

     

     

     

     

    #3313

    Thanks Ayelet,  Below are answers to your questions.

    1. Have you scanned new samples in different temperatures? (specifically different temperatures than measured when collecting the database).  I SCANNED THE COLLECTION SAMPLES AND TEST SAMPLES UNDER THE SAME CONDITION.  THE TEMPERATURE MIGHT BE SLIGHTLY DIFFERENT AS I DID NOT TRY TO CONTROL THE TEMPERATURE EXACTLY.

    2. Have you used actual different samples or scanned multiple scans for identical samples and referred them as new samples?  I USED THE SAME PHYSICAL SAMPLE FOR 3 DATA COLLECTION SAMPLES (THE PHYSICAL SAMPLE IS JUST POSITIONED DIFFERENTLY IN EACH DATA COLLECTION SAMPLE).  SO IN MY DATA COLLECTION, I HAVE 48 SAMPLES.  THAT’S ACTUALLY FROM 16 DIFFERENT PHYSICAL SAMPLES.

    3. Did you use the solid sample holder? I USED THE SOLID SAMPLE HOLDER.  I’M ATTACHING PICTURES.  I FIRST CRUSHED THE RICE SAMPLES INTO POWDERS AND FILLED THE SOLID SAMPLE HOLDER WITH 4 GRAMS OF POWDER.

    Attachments:
    You must be logged in to view attached files.
    #3426
    Ayelet
    Keymaster

    Hi,

     

    Since you only used 16 samples, most likely the problem is indeed the database size.

    I would recommend, as I mentioned before, adding more samples to the the data collection.

     

    Keep us updated!

     

    Ayelet

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.