Detection of moving fish schools using reinforcement learning technique

Mehmet Yaşar Bayraktar

doi:10.12714/egejfas.42.1.03

Araştırma Makalesi

Takviyeli öğrenme tekniği kullanarak haraketli balık sürülerinin tespiti

Yıl 2025, Cilt: 42 Sayı: 1, 21 - 26, 08.03.2025

Mehmet Yaşar Bayraktar

https://doi.org/10.12714/egejfas.42.1.03

Öz

Bu çalışmada toplu halde hareket eden balık sürülerinin yerlerinin tespit edilerek, balık endüstrisine katkı sağlamaya odaklanılmıştır. Takviyeli öğrenme tekniğini kullanan Q-Learning algoritması ile balıkların sıkça rastlandığı bölgeler işaretlenip, otonom gemilerin bu bölgelere daha hızlı ulaşması sağlanmıştır. Makine öğrenmesi tekniklerinde olan Q-Learning algoritmasıyla, küçük karelere ayrılmış her bir bölgeye verilen ödül ceza puanlarıyla, balık sürülerinin bol olduğu bölgeler tespit edilmiştir. Ayrıca istenen bölgenin balık sürüsü yoğunluk matrisi çıkartılıp, avcı ya da araştırmacılar tarafından daha hızlıca tanınması sağlanmıştır. Sonuç olarak, bölgenin otonom gemiler tarafından tanınmasıyla birlikte, balık sürülerini bulma veya takip etmede zaman ve yol maliyeti açısından yüksek kazançlar elde edilmiştir.

Anahtar Kelimeler

Balık yuvası bulma, çoklu etmenler, takviyeli öğrenme, Q-Learning algoritması

Kaynakça

Angiuli, A., Fouque, J.P., & Laurière, M. (2022). Unified reinforcement Q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, 34(2), 217 271. https://doi.org/10.1007/s00498-021-00310-1
Aydındağ Bayrak, E., Kırcı, P., Ensari, T., Seven, E., & Dağtekin, M. (2022). Diagnosing breast cancer using machine learning methods. (in Turkish with English abstract) Journal of Intelligent Systems: Theory and Applications, 5(1), 35-41. https://doi.org/10.38016/jista.966517
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2), 81-138. https://doi.org/10.1016/0004-3702(94)00011-O
Chapman, D., & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. Proceedings of the 1991. International Joint Conference on Artificial Intelligence, 726–731 pp., Sydney, Australia.
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. http://dx.doi.org/10.48550/arXiv.1706.03741
D'Eramo, C., Cini, A., Nuara, A., Pirotta, M., Alippi, C., Peters, J., & Restelli, M. (2021). Gaussian approximation for bias reduction in Q-learning. Journal of Machine Learning Research, 22(277), 1-51.
D'Eramo, C., Nuara, A., Pirotta, M., & Restelli, M. (2017). Estimating the maximum expected value in continuous reinforcement learning problems. In Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 1846-1846.
Dayan, P. (1993). Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4), 613-624. https://doi.org/10.1162/neco.1993.5.4.613
Devlin, S., Yliniemi, L., Kudenko, D., & Tumer, K. (2014). Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, 165-172 pp.
Elallid, B.B., Benamar, N., Hafid, A.S., Rachidi, T., & Mrani, N. (2022). A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving. Journal of King Saud University-Computer and Information Sciences, 34(9), 7366-7390. https://doi.org/10.1016/j.jksuci.2022.03.013
Everitt, T., Krakovna, V., Orseau, L., Hutter, M., & Legg, S. (2017). Reinforcement learning with a corrupted reward channel. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 4705-4713.
Gümüş, E. (2016). Q-Learning Algoritması ile Labirentte Yol Bulmak. 7(2), 1–23. https://github.com/emrahgumus/java-q-learning-labirent.git(Erişim Tarihi: 10.09.2024)
Jogunola, O., Adebisi, B., Ikpehai, A., Popoola, S.I., Gui, G., Gačanin, H., & Ci, S. (2021). Consensus algorithms and deep reinforcement learning in energy market: A review. IEEE Internet of Things Journal, 8(6), 4211-4227. https://doi.org/10.1109/JIOT.2020.3032162
Jones, G.L., & Qin, Q. (2022). Markov chain Monte Carlo in practice. Annual Review of Statistics and Its Application, 9(1), 557-578. https://doi.org/10.1146/annurev-statistics-040220-090158
Jordan, M.I., & Mitchell, T.M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415
Kober, J., Bagnell, J.A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238-1274. https://doi.org/10.1177/0278364913495721
Nykjaer, K. (2022). Q Learning Library. https://kunuk.wordpress.com/2012/01/14/q-learning-library-example-with-csharp/ (Erişim Tarihi: 11.09.2024)
Lin, L.J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293-321. https://doi.org/10.1007/BF00992699
Liu, H., Bi, W., Teo, K.L., & Liu, N. (2019). Dynamic optimal decision making for manufacturers with limited attention based on sparse dynamic programming. Journal of Industrial & Management Optimization, 15(2). https://doi.org/10.3934/jimo.2018050
Meng, T.L., & Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4(3), 110. https://doi.org/10.3390/data4030110
Pandey, P., Pandey, D., & Kumar, S. (2010). Reinforcement learning by comparing immediate reward. International Journal of Computer Science and Information Security, 8(5), 1009.2566. https://doi.org/10.48550/arXiv.1009.2566
Parisotto, E. (2021). Meta reinforcement learning through memory. Doctoral dissertation, Pittsburgh, Carnegie Mellon University.
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., & Tsang, J. (2017). Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems, 30. ISBN: 9781510860964.
Wang, J., Liu, Y., & Li, B. (2020, April). Reinforcement learning with perturbed rewards. In Proceedings of the AAAI conference on artificial intelligence, 34(04), 6202-6209. https://doi.org/10.1609/aaai.v34i04.6086
Watkins, C.J.C.H. (1989). Learning from delayed rewards. Doctoral dissertation, King's College, London, UK.
Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292. https://doi.org/10.1007/BF00992698

Detection of moving fish schools using reinforcement learning technique

Yıl 2025, Cilt: 42 Sayı: 1, 21 - 26, 08.03.2025

Mehmet Yaşar Bayraktar

https://doi.org/10.12714/egejfas.42.1.03

Öz

In this study, it is aimed to contribute to the fishing sector by determining the locations of moving fish schools. With the Q-Learning algorithm, areas where fish schools are frequently seen were marked and autonomous ships were able to reach these areas faster. With the Q-Learning algorithm, one of the machine learning techniques, areas where fish schools are abundant were determined and reward and penalty points were given to each region. In addition, the fish density matrix of the region was extracted thanks to the autonomous systems. Moreover, the algorithm can be automatically updated according to fish species and fishing bans. A different Q-Gain matrix was kept for each fish species to be caught, allowing autonomous ships to move according to the gain matrix. In short, high gains were achieved in terms of time and travel costs in finding or following fish schools by recognizing the region by autonomous ships.

Anahtar Kelimeler

Fish school finding, multi-agent systems, reinforcement learning, Q-Learning algorithm

Etik Beyan

For this type of study, formal consent is not required.

Kaynakça

Angiuli, A., Fouque, J.P., & Laurière, M. (2022). Unified reinforcement Q-learning for mean field game and control problems. Mathematics of Control, Signals, and Systems, 34(2), 217 271. https://doi.org/10.1007/s00498-021-00310-1
Aydındağ Bayrak, E., Kırcı, P., Ensari, T., Seven, E., & Dağtekin, M. (2022). Diagnosing breast cancer using machine learning methods. (in Turkish with English abstract) Journal of Intelligent Systems: Theory and Applications, 5(1), 35-41. https://doi.org/10.38016/jista.966517
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2), 81-138. https://doi.org/10.1016/0004-3702(94)00011-O
Chapman, D., & Kaelbling, L.P. (1991). Input generalization in delayed reinforcement learning: An algorithm and performance comparisons. Proceedings of the 1991. International Joint Conference on Artificial Intelligence, 726–731 pp., Sydney, Australia.
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30. http://dx.doi.org/10.48550/arXiv.1706.03741
D'Eramo, C., Cini, A., Nuara, A., Pirotta, M., Alippi, C., Peters, J., & Restelli, M. (2021). Gaussian approximation for bias reduction in Q-learning. Journal of Machine Learning Research, 22(277), 1-51.
D'Eramo, C., Nuara, A., Pirotta, M., & Restelli, M. (2017). Estimating the maximum expected value in continuous reinforcement learning problems. In Proceedings of the AAAI Conference on Artificial Intelligence, 31(1), 1846-1846.
Dayan, P. (1993). Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4), 613-624. https://doi.org/10.1162/neco.1993.5.4.613
Devlin, S., Yliniemi, L., Kudenko, D., & Tumer, K. (2014). Potential-based difference rewards for multiagent reinforcement learning. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, 165-172 pp.
Elallid, B.B., Benamar, N., Hafid, A.S., Rachidi, T., & Mrani, N. (2022). A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving. Journal of King Saud University-Computer and Information Sciences, 34(9), 7366-7390. https://doi.org/10.1016/j.jksuci.2022.03.013
Everitt, T., Krakovna, V., Orseau, L., Hutter, M., & Legg, S. (2017). Reinforcement learning with a corrupted reward channel. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), 4705-4713.
Gümüş, E. (2016). Q-Learning Algoritması ile Labirentte Yol Bulmak. 7(2), 1–23. https://github.com/emrahgumus/java-q-learning-labirent.git(Erişim Tarihi: 10.09.2024)
Jogunola, O., Adebisi, B., Ikpehai, A., Popoola, S.I., Gui, G., Gačanin, H., & Ci, S. (2021). Consensus algorithms and deep reinforcement learning in energy market: A review. IEEE Internet of Things Journal, 8(6), 4211-4227. https://doi.org/10.1109/JIOT.2020.3032162
Jones, G.L., & Qin, Q. (2022). Markov chain Monte Carlo in practice. Annual Review of Statistics and Its Application, 9(1), 557-578. https://doi.org/10.1146/annurev-statistics-040220-090158
Jordan, M.I., & Mitchell, T.M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415
Kober, J., Bagnell, J.A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238-1274. https://doi.org/10.1177/0278364913495721
Nykjaer, K. (2022). Q Learning Library. https://kunuk.wordpress.com/2012/01/14/q-learning-library-example-with-csharp/ (Erişim Tarihi: 11.09.2024)
Lin, L.J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8, 293-321. https://doi.org/10.1007/BF00992699
Liu, H., Bi, W., Teo, K.L., & Liu, N. (2019). Dynamic optimal decision making for manufacturers with limited attention based on sparse dynamic programming. Journal of Industrial & Management Optimization, 15(2). https://doi.org/10.3934/jimo.2018050
Meng, T.L., & Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4(3), 110. https://doi.org/10.3390/data4030110
Pandey, P., Pandey, D., & Kumar, S. (2010). Reinforcement learning by comparing immediate reward. International Journal of Computer Science and Information Security, 8(5), 1009.2566. https://doi.org/10.48550/arXiv.1009.2566
Parisotto, E. (2021). Meta reinforcement learning through memory. Doctoral dissertation, Pittsburgh, Carnegie Mellon University.
Van Seijen, H., Fatemi, M., Romoff, J., Laroche, R., Barnes, T., & Tsang, J. (2017). Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems, 30. ISBN: 9781510860964.
Wang, J., Liu, Y., & Li, B. (2020, April). Reinforcement learning with perturbed rewards. In Proceedings of the AAAI conference on artificial intelligence, 34(04), 6202-6209. https://doi.org/10.1609/aaai.v34i04.6086
Watkins, C.J.C.H. (1989). Learning from delayed rewards. Doctoral dissertation, King's College, London, UK.
Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292. https://doi.org/10.1007/BF00992698

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Balıkçılık Yönetimi
Bölüm	Makaleler
Yazarlar	Mehmet Yaşar Bayraktar 0000-0003-3182-120X
Yayımlanma Tarihi	8 Mart 2025
Gönderilme Tarihi	5 Eylül 2024
Kabul Tarihi	15 Ocak 2025
Yayımlandığı Sayı	Yıl 2025Cilt: 42 Sayı: 1

Kaynak Göster

APA	Bayraktar, M. Y. (2025). Detection of moving fish schools using reinforcement learning technique. Ege Journal of Fisheries and Aquatic Sciences, 42(1), 21-26. https://doi.org/10.12714/egejfas.42.1.03

Makale Dosyaları

Tam Metin