Optimizing a Retinal Disease Image Recognition Model
Introduction
This is the second part of creating a better model for the MURED dataset. If you would like to read the first part, which will provide context and a basic understanding of what my goals with the research were, you can do so at https://josedavidlomelin.blogspot.com/2022/11/introduction-to-my-research-project.html. My goal with the research was to create a more effective model for the MURED dataset. In addition to this, the MURED dataset was made using very high quality images, which does not always reflect real world conditions. In order to account for this, I also created an image noise generator reflective of real world conditions. Lastly, by making a better CNN model, even if only better through a few metrics, would imply that Transformer models are not always the best option to use in certain image recognition tasks. There is an ongoing debate for which type of model is better for certain image recognition tasks[1] and this would show that more complex models are not necessary in order to get similar accuracy.
New Model Changes
In the first part, I described the initial model architecture that was going to be used. However, due to Colab memory usage limitations and other errors, including a low accuracy and hard to train model, I decided to go with a much simpler model. The model takes the following architecture.
![]() |
New Model Architecture |
Before, there were four models being combined in different ways, but after many iterations
I found that the simpler the model, the better it was at learning. Lastly, I increased the
amount of fully connected layers from two to four, adding many more units. Although there are a few
smaller changes I did not go over, and some not shown on the graphic, those were by far the changes
that gave the biggest difference in the model performance.
Data Augmentation, Noise and Images
Data augmentation, or the practice of creating “new” data from existing data was a crucial part of
increasing the model’s metrics. Although the Transformer based model utilizes data augmentation,
many of the augmentations used do not reflect real world conditions. For example, a vertical flip and
a rotation of up to 30 degrees were used on the dataset as augmentations. There is no reason for a
retinal scan to be taken at a 30 degree angle, much less upside down. Additionally, most of the
augmentations were applied at a rate of .3 or .5, while my model’s augmentations were applied across
all of the dataset. Some of the augmentations that were used include a width, height and zoom shift, to
account for slightly differing eye sizes and inconsistent pictures, a 50% horizontal flip to train the model
for both right and left eyes, and a brightness range, samplewise center and samplewise standard
normalization to account for inconsistent lighting between scans. In addition to this, the “OTHER”
class was removed from the training set, minimizing the amount of diseases with an unproportionally
small sample size.
Eye Scans Before Image Augmentations |
Eye Scans After Image Augmentations (samplewise_std_normalization set to False) |
Eye Scans After Image Augmentations with Noise Function
Model Metrics
The main goal of the research conducted was to create a more effective model on the MURED dataset while making the model account better for real life situations. Before comparing the two models, it is important to keep in mind that these metrics were taken with the addition of random noise and augmentations that reflect the real world closely, which the Transformer model did not have. The main issue with the Transformer model, outlined in the first research blog, was a low precision, shown in both the mean average precision metric and in the model’s F1 score. Their model's mAP score was 0.685, while their F1 score was 0.573. Although not specified, the assumption was made that the score was based off of the dataset’s validation data. In comparison, as of February 14, 2023, our best model’s mAP score is 0.733, improving the precision. However, our model’s F1 score is 0.4614, which is a little worse than the Transformer’s model’s metric. Additionally, our AUC was very similar to theirs, as we had 0.9233 while they had 0.962. Lastly, the ML score, defined by the previous MURED research paper, is defined as ML Score = (ML mAP + ML AUC)/2. While the proposed Transformer based model scored a maximum ML Score of 0.824, the CNN based model scored a maximum ML Score of 0.834, which shows an improvement over the Transformer based model.
Conclusion
The results of our model are significant because our model is a major improvement over the Transformer based model, due to the improved image augmentations and an overall improvement in the model metrics. Although not perfect, the model is an important step in using machine learning to help doctors diagnose retinal diseases because it shows that RNNs and Transformers are not necessary to do so. These models typically take more space and more data to be effective, and their main advantage is that they can “understand context” in an image. However, this is not important as most eyes have a very similar structure to each other. Additionally, the model was trained to account for variations in retinal scans through data augmentation. In conclusion, all of the goals set initially were achieved. The CNN based model was better than the Transformer model on most of the metrics used, a successful data augmentation and noise generator that reflect the real world were created, and this successfully demonstrates that Transformer and RNN based image recognition models are not always better than CNN based ones.
Citations
[1]https://becominghuman.ai/transformers-in-vision-e2e87b739feb
[2] https://analyticsindiamag.com/a-comparison-of-4-popular-transfer-learning-models/
Comments
Post a Comment