Software & Algorithms
Requirements Modeling and Reuse using Ontology-driven Recommendations
This page keeps the files required for reproducing the results of the paper:
Requirements Modeling and Reuse using Ontology-driven Recommendations
submitted to the Special Issue: Model-driven data-intensive Enterprise Information Systems in the Journal of Enterprise Information Systems
The ontologies for the static and dynamic view of software projects are given below:
- Static Ontology, available here in OWL format and documented fully in deliverable D3.1 of S-CASE
- Dynamic Ontology, available here in OWL format and documented fully in deliverable D3.2 of S-CASE
The evaluation involves a set of software projects from various sources.
For these projects, the following elements are provided:
- Functional requirements, available here in RQS format (RQS files can be processed and/or transformed to other formats using the Requirements Editor of S-CASE)
- UML diagrams, including Use Case diagrams and Activity diagrams in XMI format (the files follow the format of the Papyrus UML tool and can be processed with any XML parser)
Finally, more detailed information about the methods used in this paper can be found in deliverable D2.4 of S-CASE.
Requirements Dataset for Specification Extraction
Mapping functional requirements to specifications is one of the more challenging tasks of the software development process. An interesting line of work involves using Semantic Role Labeling techniques to automate this task. The effectiveness of such approaches in requirements engineering scenarios has to be assessed using realistic datasets with functional requirements. In this context, we provide the dataset we have crafted for researchers to evaluate their systems in requirements-to-specifications scenarios and reproduce our findings available at:
Themistoklis Diamantopoulos, Michael Roth, Andreas Symeonidis, and Ewan Klein,
Software Requirements as an Application Domain for Natural Language Processing,
which has been sent to the Journal of Language Resources and Evaluation.
You may find the dataset here.
Reusability Dataset for Component Reuse
The problem of reusing software components has led to the creation of several specialized source code recommendation systems. These systems, however, do not usually assess the reusability of the retrieved components, i.e. the extent to which each component can be reused. In this context, we provide the code reuse quality dataset we have marked, in order for researchers to evaluate their systems and reproduce our findings available at:
Themistoklis Diamantopoulos, Klearchos Thomopoulos and Andreas Symeonidis. QualBoa: Quality-aware Recommendations of Source Code Components, which has been sent to the Mining Challenge of the 13th International Conference on Mining Software Repositories (MSR 2016).
The marked dataset and the component retrieval query are available here.
Dataset for Test-Driven Reuse Recommendation Systems
Nikolaos Katirtzis, Themistoklis Diamantopoulos and Andreas Symeonidis, Mantissa: A Recommendation System for Test-Driven Code Reuse, which has been sent to the International Journal on Software Tools for Technology Transfer.
You may find the dataset here.
Dataset for Software Component Reuse
Themistoklis Diamantopoulos and Andreas Symeonidis, AGORA: A Search Engine for Source Code Reuse, which has been sent to the SoftwareX Journal.
You may find the dataset here.
Mertacor
RDOTE
R. (RDOTE – Relational Database to Ontology Transformation Engine) is a friendly and powerful framework for transforming relational databases to semantic web data. Users can connect to multiple databases and create mappings to their ontology schemata. Visit R.’s homepage on SourceForge
Dataset for Software Bug Detection
Locating software bugs is a difficult task, especially if they do not lead to crashes. Current research on automating non-crashing bug detection dictates collecting function call traces and representing them as graphs, and reducing the graphs before applying a subgraph mining algorithm. A ranking of potentially buggy functions is derived using frequency statistics for each node (function) in the correct and incorrect set of traces. Although most existing techniques are effective, they do not achieve scalability. Additionally, in most cases, it difficult to find and reuse datasets containing software bugs. In this context, we provide the dataset we have crafted, for researchers to test their approaches and reproduce our findings available at: Themistoklis Diamantopoulos and Andreas Symeonidis, “Towards Scalable Bug Localization using the Edit Distance of Call Traces”, to be presented at the Eighth International Conference on Software Engineering Advances (ICSEA 2013), October 27 – November 1, 2013 – Venice, Italy. You may find the dataset (along with a readme file) here.Following our new paper: Themistoklis Diamantopoulos and Andreas Symeonidis, ”Localizing Software Bugs using the Edit Distance of Call Traces” that is submitted to the International Journal On Advances in Software, we provide a revised version of our dataset with different types of bugs. You can find the revised dataset here.
Supervised LCS for Multi-label Classification
In recent years, multi-label classification has attracted a significant body of research, motivated by real-life applications, such as text classification and medical diagnoses. Although sparsely studied in this context, Learning Classifier Systems are naturally well-suited to multi-label classification problems, whose search space typically involves multiple highly specific niches.
This is the motivation behind our work that introduces a generalized multi-label rule format – allowing for flexible label-dependency modeling, with no need for explicit knowledge of which correlations to search for – and uses it as a guide for further adapting the general Michigan-style supervised Learning Classifier System framework.
The integration of the aforementioned rule format and framework adaptations results in a novel algorithm for multi-label classification, namely the Multi-Label Supervised Learning Classifier System (MLS-LCS). MLS-LCS has been studied through a set of properly defined artificial problem and has also been thoroughly evaluated on a set of multi-label datasets, where it was found competitive to other state-of-the-art multi-label classification methods.
The current implementation corresponds to the version of the MLS-LCS algorithm originally presented in:
- Allamanis, A., Tzima, F. A., & Mitkas, P. A. (2013). Effective Rule-Based Multi-label Classification with Learning Classifier Systems. In M. Tomassini, A. Antonioni, F. Daolio, and P. Buesser, editors, Adaptive and Natural Computing Algorithms, Lecture Notes in Computer Science, Volume 7824, pages 466–476, Springer Berlin Heidelberg, 2013.
and further improved in
- Tzima, F.A., Allamanis, M., Filotheou, A., & Mitkas, P. A. (Under review). Inducing Generalized Multi-Label Rules with Learning Classifier Systems. Evolutionary Computation.
More information on ML-SLCS can be found here.
S.Co.R.E. (Source Code Rating Estimator)
The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. S.Co.R.E. is a source code quality estimation system that relates quality with source code metrics. The ground truth behind S.Co.R.E. lies in the fact that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. S.Co.R.E. uses code quality evaluation models in order to decide whether a given source code component is of high quality (exceeds minimum quality thresholds). If so, the quality is estimated in a quantified manner by computing a quality score. Also, S.Co.R.E. can be used as a bad coding practices detection tool.
The context of our work is available at:
Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics, which has been sent to the 2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016).
S.Co.R.E. can be found here.