Obstacle Detection for Drone Flight Path
A project completed as the culminating experience for a degree in Master's in Data Analytics from San Jose State University. Developed by Shrey Agarwal, Ibrahim Khalid, Sung Won Lee, and Justin Wang. Completed in Fall 2024 to Spring 2025 under the advisorship of Dr. Simon Shim and Mr. Venkat Iyer.
As the world begins to rely more on autonomous control, the need for better object detection and labeling arises. In the case of autonomous vehicles, there is still a large degree of false classifications and slow inference times, that lead to unnecessary accidents. Our goal in this project is to create a prototype model that is both fast and accurate for obstacle detection in drones, a domain that is not researched as much. By doing so, we can improve autonomous drone vehicle operation for use in warehousing, delivery, and more.
One of the primary benefits of drones is their relatively small footprint required to operate combined with their ability to be quick and nimble. It is for this reason that they must have robust obstacle detection capabilities, so they do not crash into a wall.
The overall system application is designed to give full control over the dataset definitions and model training to the end user in a friendly interface. The admin user can use the web interface, which in turn sets off a series of events that result in their selected model being trained on their selected dataset. Once the model is fully trained, the admin user can theoretically download the weights of the model onto a drone system. At the same time, since the AWS S3 bucket is public, the models can be accessed for fine tuned development via Jupyter notebooks.
This system was primarily developed using the YOLO[7] framework and the VisDrone[8] dataset.
The goal of our research project is to test and train various models, including developing custom models from scratch. Our model changes come from various papers in our literature review. The main models we will develop and test are:
- YOLO DCE
- YOLO MHSA BiFPN
- YOLO BiFPN ResNet
- YOLO-LITE
An et al[1]found that removing the last two Conv and C2f streamlined the model and improved performance. Ye et al[2]proposed a unique convolution-based transformer and multihead self attention module to better detect occluded objects. Based on these methods, we can try to improve occluded object performance and inference time in YOLOv8.
The model incorporates a Multi-Head Self-Attention layer at the final backbone stage of YOLOv8 to efficiently capture global context while preserving spatial details, and employs a Bidirectional Feature Pyramid Network to learn adaptive weights from multi-scale features. The original founding were proposed by Zhang et al[3].
This model uses ResNet50 layers as its backbone since Liu et al[4]suggests it is ideal for feature extraction of obstacles that are far away in distance as well as at lower levels and utilizes residual connections for better gradient flow. Then, BiFPN as well as YOLO detection heads are both used for neck and head as Li et al[5] does due to the enhanced feature fusion capability as well as bidirectional flow for better detection performance.
Pedoeem and Huang[6] developed a model using YOLOv2 which reduced the number of layers and widths of those layers in the hopes of developing a model for edge devices. They tested over 15 different configurations.
In summary, this research presented a comprehensive approach for developing an advanced lightweight, real-time obstacle detection and labeling application primarily for drone flight paths, followed by the growing usage of UAV for various situations and the need for safety and autonomy in its application. It focused on enhancing small obstacle detection by using advanced object detection models, custom model development, and an end-to-end deployment pipeline.
For the achievements and findings, our team was able to successfully build and test various custom YOLOv8 based models. Many other custom models also showed similar or better results compared to object detection models such as YOLOv11, Detectron2, and RT-DETR.
The implications of the project demonstrated the feasibility of deploying fast and accurate obstacle detection models, which allow the end point applications where the model implemented could be applied to warehouse automation, delivery drones, and disaster response.
- [1] An, J., Hee Lee, D., Dwisnanto Putro, M., & Kim, B. W. (2024). Dce-yolov8: Lightweight and accurate object detection for drone visionhttp://dx.doi.org/10.1109/ACCESS.2024.3481410
- [2] Ye, T., Qin, W., Zhao, Z., Gao, X., Deng, X., & Ouyang, Y. (2023). Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformerhttps://doi.org/10.1109/TIM.2023.3241825
- [3] Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., & Liu, F. (2021). ViT-YOLO:Transformer-Based YOLO for Object Detectionhttps://openaccess.thecvf.com/content/ICCV2021W/VisDrone/papers/Zhang_ViT-YOLOTransformer-Based_YOLO_for_Object_Detection_ICCVW_2021_paper.pdf
- [4] Liu, W., Qiang, J., Li, X., Guan, P., & Du, Y. (2022). Uav image small object detection based on composite backbone networkhttps://doi.org/10.1155/2022/7319529
- [5] Li, N., Ye, T., Zhou, Z., Gao, C., & Zhang, P. (2024). Enhanced yolov8 with bifpn-simam for precise defect detection in miniature capacitors.https://doi.org/10.3390/app14010429
- [6] Pedoeem, J., & Huang, R. (2018, November). YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computershttps://doi.org/10.1109/BigData.2018.8621865
- [7] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detectionhttps://arxiv.org/abs/1506.02640
- [8] Zhu, P., Wen, L., Bian, X., Ling, H., & Hu, Q. (2018, April). Vision Meets Drones: A Challengehttps://arxiv.org/abs/1804.07437