EDCST: Enhanced Density-Aware Cross-Scale Transformer for Robust Object Classification under Atmospheric Fog Conditions

Fiston Oshasha; Saint Jean Djungu; Alidor  Mbayandjambe; Franklin Mwamba; Jirince Biaba; Frey Sylvestre; Tege Simboni Simboni; Nathanaël  Kasoro; Blaise Muhala

doi:10.14232/analecta.2025.3-4.15-38

Authors

Fiston Oshasha Commissariat Général à L'Energie Atomique CGEA/CREN-K https://orcid.org/0009-0009-5447-2760
Saint Jean Djungu CRIA-Center for Research in Applied Computing https://orcid.org/0009-0007-9947-9283
Alidor Mbayandjambe University of Kinshasa https://orcid.org/0009-0006-7275-193X
Franklin Mwamba Health Sciences Research Institute https://orcid.org/0009-0001-5609-9113
Jirince Biaba Hanoi University of Science and Technology https://orcid.org/0009-0007-5481-8448
Frey Sylvestre University of Kinshasa https://orcid.org/0009-0008-9810-3994
Tege Simboni Simboni Department of Computer Management, Higher Pedagogical Institute of Isiro, Isiro, D.R. Congo https://orcid.org/0009-0000-3771-9180
Nathanaël Kasoro University of Kinshasa https://orcid.org/0009-0004-2228-6617
Blaise Muhala University of Kinshasa https://orcid.org/0009-0002-8644-021X

DOI:

https://doi.org/10.14232/analecta.2025.3-4.15-38

Keywords:

EDCST, object classification, fog conditions

Abstract

Atmospheric fog poses a critical challenge for computer vision systems in autonomous driving, surveillance, and robotics, where reliable object classification is essential. Under severe fog, classification accuracy can degrade by over 50%, and most existing approaches rely on separate defogging steps, which limit their applicability in real-time settings. This study introduces the Enhanced Density-Aware Cross-Scale Transformer (EDCST), a novel architecture designed for direct object classification under foggy conditions without requiring prior defogging. To support model training and evaluation, we developed a physics-based simulation framework generating four fog types (uniform, gradient, patchy, and adaptive) across nine intensity levels. EDCST leverages 384 dimensional embeddings, eight transformer layers, and twelve attention heads, trained using curriculum learning and OneCycleLR scheduling. On CODaN-Fog (15,500 images at 224×224 resolution), EDCST achieves 84.4% accuracy on clean images and retains 74.2% accuracy under severe fog (80% intensity), outperforming baseline transformers by 15.8%. Class-wise sensitivity analysis reveals that larger objects, such as vehicles and animals, maintain over 75% classification performance, while smaller objects are more affected. Patchy fog causes the greatest accuracy drop (19.1%), followed by adaptive (8.9%) and uniform fog (6.8%). The model converges in 100 epochs within 513 minutes. This work introduces a real-time-capable classification framework that eliminates defogging requirements and maintains strong performance under diverse fog conditions, making it highly suitable for safety-critical vision applications.

Downloads

Download data is not yet available.

References

[1] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “We are ready for autonomous driving,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154, IEEE, 2012.

[2] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision, pp. 740–755, 2015.

[3] Y. Cui, Z. Cao, Y. Xie, X. Jiang, F. Tao, Y. V. Chen, L. Li, and D. Liu, “DG-Labeler and DGL-MOTS dataset: Boost the autonomous driving perception,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3411–3420, 2022. doi: 10.1109/WACV51458.2022.00347.

[4] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” International Journal of Computer Vision, vol. 126, no. 9, pp. 973–992, 2018.

[5] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Transactions on Image Processing, vol. 28, no. 1, pp. 492–505, 2019.

[6] J. Zhang, Y. Cao, S. Fang, Y. Kang, and C. W. Chen, “Deep learning for image dehazing: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 2071–2091, 2020.

[7] C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel, “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” arXiv preprint arXiv:1907.07484, 2019.

[8] D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” arXiv preprint arXiv:1903.12261, 2019.

[9] X. Liu, Y. Ma, Z. Shi, and J. Chen, “Fog density estimation and image defogging based on surrogate modeling for optical depth,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1571–1584, 2020.

[10] Y. Gao, D. Hendrycks, M. Mazeika, and J. Steinhardt, “Progressive difficulty curriculum learning for robotic grasping,” IEEE Robotics and Automation Letters, vol. 9, no. 4, pp. 3456–3463, 2024.

[11] C. Ancuti, C. O. Ancuti, C. Hermans, and P. Bekaert, “Effective single image dehazing by combining transmission and radiance,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5133–5144, 2016.

[12] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm using color attenuation prior,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3522–3533, 2015.

[13] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “DehazeNet: An end-to-end system for single image haze removal,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5187–5198, 2016.

[14] H. Zhang and V. M. Patel, “Densely connected pyramid dehazing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203, 2018.

[15] B. Li, X. Peng, Z. Wang, J. Xu, and D. Feng, “AOD-Net: All-in-one dehazing network,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4770–4778, 2017.

[16] Y. Qu, Y. Chen, J. Huang, and Y. Xie, “Enhanced pix2pix dehazing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8160–8168, 2019.

[17] C. Sakaridis, D. Dai, and L. Van Gool, “Map-guided curriculum domain adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 7, pp. 1768–1783, 2020.

[18] D. Dai, C. Sakaridis, S. Hecker, and L. Van Gool, “Curriculum domain adaptation for semantic nighttime image segmentation,” International Journal of Computer Vision, vol. 128, no. 5, pp. 1296–1317, 2020.

[19] H. Zhang et al., “ResNeSt: Split-attention networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2736–2746, 2021.

[20] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 10012–10022, 2021.

[21] C. Kamann and C. Rother, “Benchmarking the robustness of semantic segmentation models with respect to common corruptions,” International Journal of Computer Vision, vol. 129, no. 2, pp. 462–483, 2020.

[22] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning, pp. 10347–10357, 2021.

[23] L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token ViT: Training vision transformers from scratch on ImageNet,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 558–567, 2021.

[24] S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” International Journal of Computer Vision, vol. 48, no. 3, pp. 233–254, 2002.

[25] W. E. K. Middleton, Vision through the atmosphere, University of Toronto Press, 1952.

[26] H. Koschmieder, “Theorie der horizontalen sichtweite,” Beiträge zur Physik der freien Atmosphäre, vol. 12, pp. 33–53, 1924.

[27] M. Negru, S. Nedevschi, and R. I. Peter, “Exponential contrast restoration in fog conditions for driving assistance,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 2257–2268, 2016.

[28] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48, 2009.

[29] G. Hacohen and D. Weinshall, “On the power of curriculum learning in training deep networks,” in International Conference on Machine Learning, pp. 2535–2544, 2019.

[30] P. Soviany, R. T. Ionescu, P. Rota, and N. Sebe, “Curriculum learning: A survey,” International Journal of Computer Vision, vol. 130, no. 6, pp. 1526–1565, 2022.

[31] S. Guo et al., “Curriculumnet: Weakly supervised learning from large-scale web images,” in Proceedings of the European Conference on Computer Vision, pp. 135–150, 2018.

[32] A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.

[33] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM Computing Surveys, vol. 54, no. 10s, pp. 1–41, 2022.

[34] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.

[35] Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, “Vision transformer adapter for dense predictions,” arXiv preprint arXiv:2205.08534, 2022.

[36] H. Touvron et al., “LLaMA: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.

[37] Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu, “EfficientFormer: Vision transformers at MobileNet speed,” Advances in Neural Information Processing Systems, vol. 35, pp. 12934–12949, 2024.

[38] J. Li, G. Li, and H. Fan, “Physics-guided deep learning for spatially correlated atmospheric haze removal,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9850–9862, 2022.

[39] J. Zhang and D. Tao, “Atmospheric scattering-based multiple light source detection and image dehazing,” IEEE Transactions on Image Processing, vol. 32, pp. 1895–1906, 2023.

[40] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: Feature fusion attention network for single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11908–11915, 2020.

[41] H. Wu et al., “Contrastive learning for compact single image dehazing,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10551–10560, 2021.

[42] X. Liu, Y. Ma, Z. Shi, and J. Chen, “Synthetic fog generation: A survey on theory, methods, and applications,” Computer Vision and Image Understanding, vol. 208, 103209, 2021.

[43] K. Wang, X. Zhao, L. Zhang, and S. Li, “A physically constrained deep learning approach for atmospheric visibility estimation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–12, 2023.

[44] Z. Chen, Y. Wang, Y. Yang, and D. Liu, “PSD: Principled synthetic-to-real dehazing guided by physical priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7180–7189, 2021.

[45] C. Michaelis et al., “Benchmarking robustness in object detection under distribution shifts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9688–9704, 2023.

[46] C. Sakaridis, D. Dai, and L. Van Gool, “ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10765–10775, 2021.

[47] D. Dai, C. Sakaridis, S. Hecker, and L. Van Gool, “Dark model adaptation: Semantic image segmentation from daytime to nighttime,” International Journal of Computer Vision, vol. 130, no. 2, pp. 559–576, 2022.

[48] D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, and D. Song, “Natural adversarial examples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15262–15271, 2022.

[49] F. Croce et al., “Robustness of vision transformers to adversarial and natural distribution shifts,” arXiv preprint arXiv:2302.14267, 2023.

[50] E. Mintun, A. Kirillov, and S. Xie, “Interaction networks for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9745–9754, 2021.

[51] H. Wang, C. Xiao, J. Kossaifi, Z. Yu, A. Anandkumar, and Z. Wang, “AugMax: Adversarial composition of random augmentations for robust training,” Advances in Neural Information Processing Systems, vol. 34, pp. 237–250, 2022.

[52] X. Wang, Y. Chen, and W. Zhu, “A survey on curriculum learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 11161–11179, 2023.

[53] G. Hacohen and D. Weinshall, “On the power of curriculum learning in training deep networks,” in International Conference on Machine Learning, pp. 2535–2544, 2020.

[54] B. Zhou, X. Qiu, L. Chen, B. Zhang, W. Che, B. Zhou, and T. Liu, “Curriculum learning for natural language understanding,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 6095–6104, 2022.

[55] J. Liu, X. Zhang, H. Xiong, Q. Huang, X. Zhang, and P. S. Yu, “Automatic curriculum learning through value disagreement,” Advances in Neural Information Processing Systems, vol. 36, pp. 48314–48328, 2023.

[56] M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models with multi-task applications,” Journal of Machine Learning Research, vol. 25, no. 42, pp. 1–47, 2024.

[57] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, et al., “Dynamic curriculum learning for image classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5067–5076, 2023.

[58] T. Zhang, Z. Xu, Y. Chen, and Z. Wang, “Adaptive curriculum learning via gradient matching,” in International Conference on Learning Representations, 2024.

[59] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2021.

[60] M. Dehghani, J. Djolonga, B. Mustafa, P. Padlewski, J. Heek, J. Gilmer, A. P. Steiner, M. Caron, R. Geirhos, I. Alabdulmohsin, et al., “Scaling vision transformers to 22 billion parameters,” in International Conference on Machine Learning, pp. 7480–7512, 2023.

[61] W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li, et al., “InternImage: Exploring large-scale vision foundation models with deformable convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14408–14419, 2023.

[62] S. Paul and P.-Y. Chen, “Vision transformers are robust learners,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 2071–2081, 2022.

[63] S. Bhojanapalli, A. Chakrabarti, D. Glasner, D. Li, T. Unterthiner, and A. Veit, “Understanding robustness of transformers for image classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10231–10241, 2023.

[64] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2341–2353, 2010.

[65] Y. Song, Z. He, H. Qian, and X. Du, “Vision transformers for single image dehazing,” IEEE Transactions on Image Processing, vol. 32, pp. 1927–1941, 2023.

[66] J. Li, G. Li, and H. Fan, “A comprehensive survey on image dehazing based on deep learning,” Neurocomputing, vol. 546, 126301, 2023.

[67] D. Kim, S. Woo, J.-Y. Lee, and I. S. Kweon, “Self-augmentation of weather dataset for improving classification with atmospheric degradation,” arXiv preprint arXiv:2209.13012, 2022.

[68] J. Jeong, S. Lee, J. Kim, and N. Kwak, “Consistency-based semi-supervised learning for object detection,” Advances in Neural Information Processing Systems, vol. 30, pp. 3989–3997, 2017.

[69] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Realistic atmospheric scattering simulation for computer vision applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1887–1900, 2024.

[70] R. Li, J. Pan, Z. Li, and J. Tang, “Single image dehazing via conditional generative adversarial network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8202–8211, 2020.

[71] Y. Song, Z. He, H. Qian, and X. Du, “Dehamer: Large-scale multi-weather dataset for single image dehazing,” arXiv preprint arXiv:2305.05654, 2023.

[72] M. Zhang, L. Teck, S. Azimi, P. Rad, P. Poupart, J. Pineau, and G. Javadi, “Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11682–11692, 2019.

[73] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578, 2021.

[76] M. Liu, S. Wang, and Y. Zhang, "Time to shine: Fine-tuning object detection models with synthetic adverse weather images," Computer Vision and Image Understanding, vol. 241, 103921, 2025.

[75] Y. Li, X. Zhang, J. Wang, and H. Chen, "Gated image-adaptive network for driving-scene object detection under nighttime conditions," IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 2, pp. 1245-1258, 2025.

[74] A. Lengyel, S. Garg, M. Milford, and J. C. van Gemert, "Zero-shot day-night domain adaptation with a physics prior," in Proceedings of the IEEE International Conference on Computer Vision, pp. 4305–4315, 2021.