“Matcha” Code Improvement Recommendation Tool for Software Developers

After the success of “Siamese”, which was developed from a 2021 outstanding dissertation awardee by the National Research Council of Thailand (NRCT) “Code Similarity and Clone Search in Large-Scale Source Code Data”, and was created with the purpose to detect toxic code  that  contains outdated or copyrighted snippets, Dr. Chaiyong Ragkhitwetsagul, an instructor at Mahidol University’s Faculty of Information and Communication Technology, together with his co-researcher, Mr. Matheus Paixao from State University of Ceara (UECE) in Brazil, are currently developing “Matcha”, an ongoing research project that recommends better codes for software developers.

Dr. Ragkhitwetsagul explained that he invented “Matcha” because he realized that most application and website developers need to search for codes in online communities such as Stack Overflow. As a crowdsourcing platform, Stack Overflow is a convenient place to ask questions and look for answers as well as to share coding tips and knowledge. However, sometimes they don’t always get the best answer. Fortunately, the answered code in Stack Overflow is continually updated. Thus, “Matcha” will facilitate in checking similar codes that appear in Stack Overflow answers and then recommend the best version of the answer if it exists.

“Stack Overflow is a crowdsourcing platform that gathers coding questions and answers from those who have an interest in writing codes. The received answers are inherently collective intelligence in nature, so coders help each other find the best answer. For example, if there is a person asking about a database connection, other developers will keep suggesting several solutions or even point out if there’s still other existing problems in the code. Over time, people might give an edited version of the code in the existing answers or even create a new one.”

Dr. Ragkhitwetsagul also added that access to the Internet, including the crowdsourcing platforms, has transformed how software developers work. Also, finding codes to solve problems on Stack Overflow assists in hitting the right answer and thus helps save time as compared to digging for the answer in a book or official documentation alone. All these reasons made the platform popular and indispensable for developers.

“In the past, it might have taken weeks to find the answer but with the Internet, Google, and Stack Overflow, developers start to change how they work. It may take 15-30 minutes for them to find a solution when having a problem. If things are not going as well as expected, they will google the problem to see if others are facing quite the same situation. Due to its question-and-answer nature, Stack Overflow will lead to the shared question and possibly the right answer. It will be easier to apply than finding the answer through reading books because the book doesn’t give a specific solution to the problem.”

In fact, “Matcha” incorporates “Siamese” to support its code recommendation mechanism. While “Siamese” searches the developer’s code with the accepted code answers from Stack Overflow, a massive coding database, in seconds, “Matcha” will find if the developer’s code is the latest version posted on Stack Overflow. Dr. Ragkhitwetsagul hopes that “Matcha” will reduce coding problems and improve the coding quality in software development.

 “I think it’s typical to look for code from the Internet. It’s not a harmful thing. But most of the time developers don’t know if the source code they’re copying is the latest version or not. Since the information on the Internet is updated in real time, it is safe to say that copying code from Stack Overflow has its advantages and disadvantages. Developers must use it wisely.”

Figures:

Figure 1: Overview of the “Matcha” study in three phases as follows:

  1. Building a dataset of accepted answers from Stack Overflow
  2. Select GitHub projects to analyze and find the recommended code snippets
  3. Evaluate “Matcha” recommended code snippets by giving to developers

Figure 2: An example of code edits on Stack Overflow answers