Our website uses cookies to provide you with a better online experience. Please select “Accept”.

Automatic Classification of Algorithm Citation Functions in Scientific Literature

One of the applications of artificial intelligence (AI) technology is the creation of automated work systems to manage large amounts of data instead of humans doing it. The use of AI in digital library research has gained a lot of attention. Its main purpose is to facilitate users in being able to find the information they need from the data warehouse precisely, conveniently and efficiently.

An algorithm is a procedure used for solving a problem or performing a computation. Most computer science research and related disciplines revolve around developing, evaluating, and applying algorithms which are often proposed and published in scholarly documents and journals.

To create and publish new algorithms, researchers must use previous algorithms in a variety of ways, including extensions, direct usage, and mentions. If a graph depicting the algorithm’s implementation in all academic papers can be created, it will be extremely useful in digital library research. An index of influence indicators (Influence) and generalizability of each algorithm can also be created. Furthermore, such a graph can be used to study the evolution of algorithms from the past to the present.

This research was conducted by Assoc. Prof. Dr. Suppawong Tuarob, Head of the Machine Intelligence and Knowledge Engineering Research Group and Prof. Dr. Peter Haddawy, Deputy Dean for Research Development. It is a presentation of AI technology to identify the type of algorithm implementation automatically from the reference context (Citation Context). The algorithm is categorized into 3 types: Extension, Direct Usage and Mention. A set of heterogeneous ensemble machine-learning methods is proposed where the combination of two base classifiers trained with heterogeneous feature types is used to automatically identify the algorithm usage relationship. The proposed heterogeneous ensemble methods achieve the best average F1 of 74.9% and 90.5% for fine-grained and binary algorithm citation function classification, respectively.

This research was published in the international academic journal, IEEE Transactions on Knowledge and Data Engineering in 2020, a leading journal in Q1, ranking in the Top 5% in Information Systems. For those who are interested in the full article, please visit: https://ieeexplore.ieee.org/document/8700263. 

Did you know? This research has been further developed into an application called “AlgoExplorer” by Mr. Chanathip Pornprasit and Mr. Thanadon Boonkeard, students of Mahidol University, supervised by Assoc. Prof. Dr. Suppawong Tuarob. The “AlgoExplorer” won the 3rd prize, in the Data Science and Artificial Intelligence Application category from the 18th Thailand IT Contest Festival, during 13-15 March 2019.