NLP-based approaches and tools have been proposed to improve the efficiency of software engineers, processes, and products, by automatically processing natural language artifacts (issues, emails, commits, etc.).
We believe that the availability of accurate tools is becoming increasingly necessary to improve Software Engineering (SE) processes. Two important processes are (i) issue management and prioritization and (ii) code comment classification where developers have to understand, classify, prioritize, assign, etc. incoming issues and code comments reported by end-users and developers.
We are pleased to announce the third edition of the NLBSE'24 tool competition on issue report classification and, for the second time on code comment classification; two important tasks in issue and code comment management and prioritization.
You are invited to participate in one or both tool competitions.
The issue report classification competition consists of building and assessing a set of multi-class classification models to classify issue reports as belonging to one category representing the type of information they convey.
We provide a dataset encompassing 3 thousand labeled issue reports (as bugs, enhancements, and questions) extracted from 5 real open-source projects. You are invited to leverage this dataset to evaluate your proposed approach and compare your achieved results against our baselines (based on Sentence Transformers).
You must train, tune and evaluate your multi-class classification models using the provided training and test sets. To access these datasets as well as the competition's rules and baselines, please check out our repository.
The submissions will be ranked based on the average multi-class F1 score achieved by the proposed classifiers on the test sets, as indicated in the papers.
The submission with the highest average F1 score will be the winner of the competition.
Compared to the 2023 version of the issue report competition, we have made the following changes:
The issue report classification competition is organized by: Rafael Kallis (rk@rafaelkallis.com) and Giuseppe Colavito (giuseppe.colavito@uniba.it).
The code comment classification competition consists of building and testing a set of binary classifiers to classify class comment sentences as belonging to one or more categories representing the types of information that a sentence is conveying.
For the competition, we provide a dataset of 82089 class comment sentences and 19 categories, and a baseline classifier based on the Transformer model. Participants will propose their classifiers for this task to outperform the baselines.
You must train, tune, and evaluate your classification model using the provided training and test sets. Detailed instructions about the competition (data, rules, baseline, results, etc.) can be found in our repository and notebook.
Compared to the 2023 version of the code comment classification competition, we have made the following changes:
The code comment classification competition is organized by: Pooja Rani (rani@ifi.uzh.ch), Oscar Chaparro (oscarch@wm.edu), Luca Pascarella (lpascarella@ethz.ch), and Ali Al-Kaswan (a.al-kaswan@tudelft.nl).
To participate in any of the competitions, you must train, tune and evaluate your models using the provided training and test sets of the respective competition.
Additionally, you must write a paper (2-4 pages) describing:
Submit the paper by the deadline using our submission form.
All submissions must conform to the ICSE'24 formatting and submission instructions and do not need to be double-blinded.
Participation in both competitions is allowed, but requires a distinct paper for each submission.
Submissions will be evaluated and accepted based on correctness and reproducibility, defined by the following criteria:
The accepted submissions will be published at the workshop proceedings.
Participants will submit a set of multi-class classifiers and the submissions will be ranked based on the average F1 score achieved by the proposed classifiers on the issue report test set, as indicated in the papers.
The submission with the highest F1 score will be the winner of the issue report classification competition.
Since participants will submit a set of binary classifiers (based on a single DL model -- see more details in our notebook), we will use a formula to rank the competition submissions and determine a winner.
The formula, specified in our notebook, accounts for: (1) the overall averaged F1 score achieved by the classifiers, and (2) the average runtime the proposed model takes to predict the category of the test set instances.
The submission with the highest score, determined by our formula, will be the winner of the competition.
Since you will be using the dataset and possibly the original work behind the dataset, please cite the following references in your paper:
@inproceedings{nlbse2024,
author={Kallis, Rafael and Colavito, Giuseppe and Al-Kaswan, Ali and Pascarella, Luca and Chaparro, Oscar and Rani, Pooja},
title={The NLBSE'24 Tool Competition},
booktitle={Proceedings of The 3rd International Workshop on Natural Language-based Software Engineering (NLBSE'24)},
year={2024}
}
Please cite if participating in the issue report classification competition:
@article{kallis2020tickettagger,
author={Kallis, Rafael and Di Sorbo, Andrea and Canfora, Gerardo and Panichella, Sebastiano},
title={Predicting issue types on GitHub},
journal={Science of Computer Programming},
volume={205},
pages={102598},
year={2021},
issn={0167-6423},
doi={https://doi.org/10.1016/j.scico.2020.102598},
url={https://www.sciencedirect.com/science/article/pii/S0167642320302069}
}
@inproceedings{kallis2019tickettagger,
author = {Kallis, Rafael and Di Sorbo, Andrea and Canfora, Gerardo and Panichella, Sebastiano},
title = {Ticket Tagger: Machine Learning Driven Issue Classification},
booktitle = {2019 {IEEE} International Conference on Software Maintenance and Evolution,
{ICSME} 2019, Cleveland, OH, USA, September 29 - October 4, 2019},
pages = {406--409},
publisher = { {IEEE} },
year = {2019},
doi = {10.1109/ICSME.2019.00070},
}
@inproceedings{colavito2023few,
author={Colavito, Giuseppe and Lanubile, Filippo and Novielli, Nicole},
booktitle={2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)},
title={Few-Shot Learning for Issue Report Classification},
year={2023},
volume={},
number={},
pages={16-19},
doi={10.1109/NLBSE59153.2023.00011}
}
Please cite if participating in the code comment classification competition:
@article{rani2021,
title={How to identify class comment types? A multi-language approach for class comment classification},
author={Rani, Pooja and Panichella, Sebastiano and Leuenberger, Manuel and Di Sorbo, Andrea and Nierstrasz, Oscar},
journal={Journal of systems and software},
volume={181},
pages={111047},
year={2021},
publisher={Elsevier}
}
@INPROCEEDINGS{AlKaswan2023,
author={Al-Kaswan, Ali and Izadi, Maliheh and Van Deursen, Arie},
booktitle={2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE)},
title={STACC: Code Comment Classification using SentenceTransformers},
year={2023},
pages={28-31}
}
@inproceedings{pascarella2017,
title={Classifying code comments in Java open-source software systems},
author={Pascarella, Luca and Bacchelli, Alberto},
booktitle={2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)},
year={2017},
organization={IEEE}
}
December 9, 2023
December 29, 2023
January 25, 2024
All dates are Anywhere on Earth (AoE).Issue report classification repository
The authors of the best accepted (research and tool) papers will be invited to develop and submit a software tool to the NLBSE'24 special issue in the Software Track of the Journal of Science of Computer Programming.