Automatic library categorization

Abstract

Software ecosystems contain several types of artefacts such as libraries, documentation and source code files. Recent studies show that the Maven software ecosystem alone already contains over 2.8 million artefacts and over 70, 000 libraries. Given the size of the ecosystem, selecting a library represents a challenge to its users. The MVNRepository website offers a category-based search functionality as a solution. However, not all of the libraries have been categorised, which leads to incomplete search results. This work proposes an approach to the automatic categorisation of libraries through machine learning classifiers trained on class and method names. Our preliminary results show that the approach is accurate, suggesting that large-scale applications may be feasible.

Publication
In the International Workshop on Software Health, SoHeal 2020, Seoul, South Korea
Camilo Velázquez-Rodríguez
Camilo Velázquez-Rodríguez
Postdoctoral Researcher

My research interests include software engineering, artificial intelligence on code and mining software repositories.