Θεμιστοκλής Διαμαντόπουλος

diamantopoulos2Μεταδιδακτορικός Ερευνητής
Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης
Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών
54124, Θεσσαλονίκη

Τηλ: +30 2310 99 6365
Fax: +30 2310 99 6398
Email: thdiaman (at) issel [dot] ee [dot] auth [dot] gr

Για περισσότερες πληροφορίες μπορείτε να επισκεφτείτε την προσωπική μου ιστοσελίδα.

LinkedIn | Twitter

Επαγγελματική Εμπειρία

11/2013 – σήμερα Βοηθός Έρευνας Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα, Ευρωπαϊκά Ερευνητικά Προγράμματα: S-CASE, SEAF
10/2012 – 10/2013 Βοηθός Έρευνας Εθνικό Κέντρο Έρευνας και Τεχνολογικής Ανάπτυξης – Ινστιτούτο Τεχνολογιών Πληροφορικής και Επικοινωνιών, Ευρωπαϊκό Ερευνητικό Πρόγραμμα: eCOMPASS

Εκπαίδευση

2013-2018 Διδακτορικό στο Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα Θέμα Διδακτορικής Διατριβής: “Εξόρυξη Δεδομένων Τεχνολογίας Λογισμικού για Επαναχρησιμοποίηση Λογισμικού”
2011-2012 Μεταπτυχιακό στην Επιστήμη Υπολογιστών από το Τμήμα Πληροφορικής, Πανεπιστήμιο Εδιμβούργου, Ηνωμένο Βασίλειο, Μεταπτυχιακή Διατριβή: “Ad hoc Team Formation: Using Machine Learning Techniques to Cooperate without Pre-Coordination”
2006-2011 Δίπλωμα Ηλεκτρολόγου Μηχανικού και Μηχανικού Υπολογιστών από το Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα, Διπλωματική Διατριβή: “Σχεδίαση και Ανάπτυξη Αλγορίθμων Δημοπρασιών με Εφαρμογή στο Διαγωνισμό Power TAC”

Ερευνητικά Ενδιαφέροντα

  • Εξόρυξη Δεδομένων
  • Τεχνολογία Λογισμικού

Ξένες Γλώσσες

  • Αγγλικά: Άριστα
  • Ισπανικά: Επίπεδο συνεννόησης
  • Γαλλικά: Επίπεδο συνεννόησης

2020

Conference Papers

Nikolaos L. Tsakiridis, Themistoklis Diamantopoulos, Andreas L. Symeonidis, John B. Theocharis, Athanasios Iossifides, Periklis Chatzimisios, George Pratos and Dimitris Kouvas
"Versatile Internet of Things for Agriculture: An eXplainable AI Approach"
International Conference on Artificial Intelligence Applications and Innovations, 2020 Jun

The increase of the adoption of IoT devices and the contemporary problem of food production have given rise to numerous applications of IoT in agriculture. These applications typically comprise a set of sensors that are installed in open fields and measure metrics, such as temperature or humidity, which are used for irrigation control systems. Though useful, most contemporary systems have high installation and maintenance costs, and they do not offer automated control or, if they do, they are usually not interpretable, and thus cannot be trusted for such critical applications. In this work, we design Vital, a system that incorporates a set of low-cost sensors, a robust data store, and most importantly an explainable AI decision support system. Our system outputs a fuzzy rule-base, which is interpretable and allows fully automating the irrigation of the fields. Upon evaluating Vital in two pilot cases, we conclude that it can be effective for monitoring open-field installations.

@conference{AIAI2020,
author={Nikolaos L. Tsakiridis and Themistoklis Diamantopoulos and Andreas L. Symeonidis and John B. Theocharis and Athanasios Iossifides and Periklis Chatzimisios and George Pratos and Dimitris Kouvas},
title={Versatile Internet of Things for Agriculture: An eXplainable AI Approach},
booktitle={International Conference on Artificial Intelligence Applications and Innovations},
year={2020},
month={06},
date={2020-06-06},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/05/AIAI2020.pdf},
keywords={Internet of Things;Precision Irrigation;eXplainable AI},
abstract={The increase of the adoption of IoT devices and the contemporary problem of food production have given rise to numerous applications of IoT in agriculture. These applications typically comprise a set of sensors that are installed in open fields and measure metrics, such as temperature or humidity, which are used for irrigation control systems. Though useful, most contemporary systems have high installation and maintenance costs, and they do not offer automated control or, if they do, they are usually not interpretable, and thus cannot be trusted for such critical applications. In this work, we design Vital, a system that incorporates a set of low-cost sensors, a robust data store, and most importantly an explainable AI decision support system. Our system outputs a fuzzy rule-base, which is interpretable and allows fully automating the irrigation of the fields. Upon evaluating Vital in two pilot cases, we conclude that it can be effective for monitoring open-field installations.}
}

Themistoklis Diamantopoulos, Nikolaos Oikonomou and Andreas Symeonidis
"Extracting Semantics from Question-Answering Services for Snippet Reuse"
Fundamental Approaches to Software Engineering, pp. 119-139, Springer International Publishing, Cham, 2020 Apr

Nowadays, software developers typically search online for reusable solutions to common programming problems. However, forming the question appropriately, and locating and integrating the best solution back to the code can be tricky and time consuming. As a result, several mining systems have been proposed to aid developers in the task of locating reusable snippets and integrating them into their source code. Most of these systems, however, do not model the semantics of the snippets in the context of source code provided. In this work, we propose a snippet mining system, named StackSearch, that extracts semantic information from Stack Overlow posts and recommends useful and in-context snippets to the developer. Using a hybrid language model that combines Tf-Idf and fastText, our system effectively understands the meaning of the given query and retrieves semantically similar posts. Moreover, the results are accompanied with useful metadata using a named entity recognition technique. Upon evaluating our system in a set of common programming queries, in a dataset based on post links, and against a similar tool, we argue that our approach can be useful for recommending ready-to-use snippets to the developer.

@conference{FASE2020,
author={Themistoklis Diamantopoulos and Nikolaos Oikonomou and Andreas Symeonidis},
title={Extracting Semantics from Question-Answering Services for Snippet Reuse},
booktitle={Fundamental Approaches to Software Engineering},
pages={119-139},
publisher={Springer International Publishing},
address={Cham},
year={2020},
month={04},
date={2020-04-17},
url={https://link.springer.com/content/pdf/10.1007/978-3-030-45234-6_6.pdf},
doi={https://doi.org/10.1007/978-3-030-45234-6_6},
isbn={978-3-030-45234-6},
keywords={Code Search;Snippet Mining;Code Semantic Analysis;Question-Answering Systems},
abstract={Nowadays, software developers typically search online for reusable solutions to common programming problems. However, forming the question appropriately, and locating and integrating the best solution back to the code can be tricky and time consuming. As a result, several mining systems have been proposed to aid developers in the task of locating reusable snippets and integrating them into their source code. Most of these systems, however, do not model the semantics of the snippets in the context of source code provided. In this work, we propose a snippet mining system, named StackSearch, that extracts semantic information from Stack Overlow posts and recommends useful and in-context snippets to the developer. Using a hybrid language model that combines Tf-Idf and fastText, our system effectively understands the meaning of the given query and retrieves semantically similar posts. Moreover, the results are accompanied with useful metadata using a named entity recognition technique. Upon evaluating our system in a set of common programming queries, in a dataset based on post links, and against a similar tool, we argue that our approach can be useful for recommending ready-to-use snippets to the developer.}
}

Themistoklis Diamantopoulos, Michail D. Papamichail, Thomas Karanikiotis, Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis
"Employing Contribution and Quality Metrics for Quantifying the Software Development Process"
The 17th International Conference on Mining Software Repositories (MSR 2020), 2020 Jun

The full integration of online repositories in the contemporary software development process promotes remote work and remote collaboration. Apart from the apparent benefits, online repositories offer a deluge of data that can be utilized to monitor and improve the software development process. Towards this direction, we have designed and implemented a platform that analyzes data from GitHub in order to compute a series of metrics that quantify the contributions of project collaborators, both from a development as well as an operations (communication) perspective. We analyze contributions in an evolutionary manner throughout the projects' lifecycle and track the number of coding violations generated, this way aspiring to identify cases of software development that need closer monitoring and (possibly) further actions to be taken. In this context, we have analyzed the 3000 most popular Java GitHub projects and provide the data to the community.

@conference{MSR2020,
author={Themistoklis Diamantopoulos and Michail D. Papamichail and Thomas Karanikiotis and Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis},
title={Employing Contribution and Quality Metrics for Quantifying the Software Development Process},
booktitle={The 17th International Conference on Mining Software Repositories (MSR 2020)},
year={2020},
month={06},
date={2020-06-29},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/05/MSR2020.pdf},
keywords={mining software repositories;contribution analysis;DevOps;GitHub issues;code violations},
abstract={The full integration of online repositories in the contemporary software development process promotes remote work and remote collaboration. Apart from the apparent benefits, online repositories offer a deluge of data that can be utilized to monitor and improve the software development process. Towards this direction, we have designed and implemented a platform that analyzes data from GitHub in order to compute a series of metrics that quantify the contributions of project collaborators, both from a development as well as an operations (communication) perspective. We analyze contributions in an evolutionary manner throughout the projects\' lifecycle and track the number of coding violations generated, this way aspiring to identify cases of software development that need closer monitoring and (possibly) further actions to be taken. In this context, we have analyzed the 3000 most popular Java GitHub projects and provide the data to the community.}
}

Vasileios Matsoukas, Themistoklis Diamantopoulos, Michail D. Papamichail and Andreas L. Symeonidis
"Towards Analyzing Contributions from Software Repositories to Optimize Issue Assignment"
Proceedings of the 2020 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, Vilnius, Lithuania, 2020 Jul

Most software teams nowadays host their projects online and monitor software development in the form of issues/tasks. This process entails communicating through comments and reporting progress through commits and closing issues. In this context, assigning new issues, tasks or bugs to the most suitable contributor largely improves efficiency. Thus, several automated issue assignment approaches have been proposed, which however have major limitations. Most systems focus only on assigning bugs using textual data, are limited to projects explicitly using bug tracking systems, and may require manually tuning parameters per project. In this work, we build an automated issue assignment system for GitHub, taking into account the commits and issues of the repository under analysis. Our system aggregates feature probabilities using a neural network that adapts to each project, thus not requiring manual parameter tuning. Upon evaluating our methodology, we conclude that it can be efficient for automated issue assignment.

@conference{QRS2020IssueAssignment,
author={Vasileios Matsoukas and Themistoklis Diamantopoulos and Michail D. Papamichail and Andreas L. Symeonidis},
title={Towards Analyzing Contributions from Software Repositories to Optimize Issue Assignment},
booktitle={Proceedings of the 2020 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
publisher={IEEE},
address={Vilnius, Lithuania},
year={2020},
month={07},
date={2020-07-31},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/07/QRS2020IssueAssignment.pdf},
keywords={GitHub issues;automated issue assignment;issue triaging},
abstract={Most software teams nowadays host their projects online and monitor software development in the form of issues/tasks. This process entails communicating through comments and reporting progress through commits and closing issues. In this context, assigning new issues, tasks or bugs to the most suitable contributor largely improves efficiency. Thus, several automated issue assignment approaches have been proposed, which however have major limitations. Most systems focus only on assigning bugs using textual data, are limited to projects explicitly using bug tracking systems, and may require manually tuning parameters per project. In this work, we build an automated issue assignment system for GitHub, taking into account the commits and issues of the repository under analysis. Our system aggregates feature probabilities using a neural network that adapts to each project, thus not requiring manual parameter tuning. Upon evaluating our methodology, we conclude that it can be efficient for automated issue assignment.}
}

2019

Journal Articles

Michail D. Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information"
Data in Brief, 2019 Dec

The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled "Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information

@article{PAPAMICHAIL2019104687,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information},
journal={Data in Brief},
year={2019},
month={12},
date={2019-12-31},
url={https://reader.elsevier.com/reader/sd/pii/S235234091931042X?token=9CDEB13940390201A35D26027D763CACB6EE4D49BFA9B920C4D32B348809F1F6A7DE309AA1737161C7E5BF1963BBD952},
doi={https://doi.org/10.1016/j.dib.2019.104687},
keywords={developer-perceived reusability;code reuse;static analysis metrics;Reusability assessment},
abstract={The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled \"Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information}
}

Michail D. Papamichail , Themistoklis Diamantopoulos and Andreas L. Symeonidis
Journal of Systems and Software, pp. 110423, 2019 Sep

Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.

@article{PAPAMICHAIL2019110423,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information},
journal={Journal of Systems and Software},
pages={110423},
year={2019},
month={09},
date={2019-09-17},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/09/2019mpapamicJSS.pdf},
doi={https://doi.org/10.1016/j.jss.2019.110423},
issn={0164-1212},
publisher's url={https://www.sciencedirect.com/science/article/pii/S0164121219301979},
keywords={developer-perceived reusability;code reuse;static analysis metrics;reusability estimation},
abstract={Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.}
}

2019

Conference Papers

Themistoklis Diamantopoulos, Maria-Ioanna Sifaki and Andreas L. Symeonidis
"Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow"
16th International Conference on Mining Software Repositories, 2019 Mar

Thecurrentstateofpracticedictatesthatinorderto solve a problem encountered when building software, developers ask for help in online platforms, such as Stack Overflow. In this context of collaboration, answers to question posts often undergo several edits to provide the best solution to the problem stated. In this work, we explore the potential of mining Stack Overflow answer edits to extract common patterns when answering a post. In particular, we design a similarity scheme that takes into account the text and code of answer edits and cluster edits according to their semantics. Upon applying our methodology, we provide frequent edit patterns and indicate how they could be used to answer future research questions. Our evaluation indicates that our approach can be effective for identifying commonly applied edits, thus illustrating the transformation path from the initial answer to the optimal solution.

@conference{Diamantopoulos2019,
author={Themistoklis Diamantopoulos and Maria-Ioanna Sifaki and Andreas L. Symeonidis},
title={Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow},
booktitle={16th International Conference on Mining Software Repositories},
year={2019},
month={03},
date={2019-03-15},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/03/MSR2019.pdf},
abstract={Thecurrentstateofpracticedictatesthatinorderto solve a problem encountered when building software, developers ask for help in online platforms, such as Stack Overflow. In this context of collaboration, answers to question posts often undergo several edits to provide the best solution to the problem stated. In this work, we explore the potential of mining Stack Overflow answer edits to extract common patterns when answering a post. In particular, we design a similarity scheme that takes into account the text and code of answer edits and cluster edits according to their semantics. Upon applying our methodology, we provide frequent edit patterns and indicate how they could be used to answer future research questions. Our evaluation indicates that our approach can be effective for identifying commonly applied edits, thus illustrating the transformation path from the initial answer to the optimal solution.}
}

2019

Inproceedings Papers

Michail D. Papamichail, Themistoklis Diamantopoulos, Vasileios Matsoukas, Christos Athanasiadis and Andreas L. Symeonidis
"Towards Extracting the Role and Behavior of Contributors in Open-source Projects"
Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 536-543, SciTePress, 2019 Jul

Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati

@inproceedings{icsoft19devops,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Vasileios Matsoukas and Christos Athanasiadis and Andreas L. Symeonidis},
title={Towards Extracting the Role and Behavior of Contributors in Open-source Projects},
booktitle={Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={536-543},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_DevOps.pdf},
doi={http://10.5220/0007966505360543},
isbn={978-989-758-379-7},
abstract={Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati}
}

Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis Diamantopoulos, Napoleon-Christos Oikonomou and Andreas L. Symeonidis
"npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT"
Proceedings of the 14th International Conference on Software Technologies, pp. 544-551, SciTePress, 2019 Jul

The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.

@inproceedings{icsoft19npm,
author={Kyriakos C. Chatzidimitriou and Michail D. Papamichail and Themistoklis Diamantopoulos and Napoleon-Christos Oikonomou and Andreas L. Symeonidis},
title={npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT},
booktitle={Proceedings of the 14th International Conference on Software Technologies},
pages={544-551},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_NPMRecipes.pdf},
doi={http://10.5220/0007966805440551},
isbn={978-989-758-379-7},
abstract={The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.}
}

Christos Psarras, Themistoklis Diamantopoulos and Andreas Symeonidis
"A Mechanism for Automatically Summarizing Software Functionality from Source Code"
Proceedings of the 2019 IEEE International Conference on Software Quality, Reliability and Security (QRS), pp. 121-130, IEEE, Sofia, Bulgaria, 2019 Jul

When developers search online to find software components to reuse, they usually first need to understand the container projects/libraries, and subsequently identify the required functionality. Several approaches identify and summarize the offerings of projects from their source code, however they often require that the developer has knowledge of the underlying topic modeling techniques; they do not provide a mechanism for tuning the number of topics, and they offer no control over the top terms for each topic. In this work, we use a vectorizer to extract information from variable/method names and comments, and apply Latent Dirichlet Allocation to cluster the source code files of a project into different semantic topics.The number of topics is optimized based on their purity with respect to project packages, while topic categories are constructed to provide further intuition and Stack Exchange tags are used to express the topics in more abstract terms

@inproceedings{QRS2019,
author={Christos Psarras and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={A Mechanism for Automatically Summarizing Software Functionality from Source Code},
booktitle={Proceedings of the 2019 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
pages={121-130},
publisher={IEEE},
address={Sofia, Bulgaria},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/QRS2019.pdf},
abstract={When developers search online to find software components to reuse, they usually first need to understand the container projects/libraries, and subsequently identify the required functionality. Several approaches identify and summarize the offerings of projects from their source code, however they often require that the developer has knowledge of the underlying topic modeling techniques; they do not provide a mechanism for tuning the number of topics, and they offer no control over the top terms for each topic. In this work, we use a vectorizer to extract information from variable/method names and comments, and apply Latent Dirichlet Allocation to cluster the source code files of a project into different semantic topics.The number of topics is optimized based on their purity with respect to project packages, while topic categories are constructed to provide further intuition and Stack Exchange tags are used to express the topics in more abstract terms}
}

2018

Inproceedings Papers

Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos and Andreas L. Symeonidis
"npm-miner: An Infrastructure for Measuring the Quality of the npm Registry"
MSR ’18: 15th International Conference on Mining Software Repositories, pp. 4, ACM, Gothenburg, Sweden, 2018 May

As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.

@inproceedings{Chatzidimitriou2018MSR,
author={Kyriakos C. Chatzidimitriou and Michail Papamichail and Themistoklis Diamantopoulos and Michail Tsapanos and Andreas L. Symeonidis},
title={npm-miner: An Infrastructure for Measuring the Quality of the npm Registry},
booktitle={MSR ’18: 15th International Conference on Mining Software Repositories},
pages={4},
publisher={ACM},
address={Gothenburg, Sweden},
year={2018},
month={05},
date={2018-05-28},
url={http://issel.ee.auth.gr/wp-content/uploads/2018/03/msr2018.pdf},
doi={https:%20//doi.org/10.1145/3196398.3196465},
abstract={As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.}
}

Themistoklis Diamantopoulos, Georgios Karagiannopoulos and Andreas Symeonidis
"CodeCatch: Extracting Source Code Snippets from Online Sources"
IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pp. 21-27, https://dl.acm.org/ft_gateway.cfm?id=3194107&ftid=1982571&dwn=1&CFID=87644405&CFTOKEN=833260e7cb501a7d-48967D35-AFC5-4678-82812B13D64D3DD3, 2018 May

https://dl.acm.org/ft_gateway.cfm?id=3194107&ftid=1982571&dwn=1&CFID=87644405&CFTOKEN=833260e7cb501a7d-48967D35-AFC5-4678-82812B13D64D3DD3

@inproceedings{Diamantopoulos2018,
author={Themistoklis Diamantopoulos and Georgios Karagiannopoulos and Andreas Symeonidis},
title={CodeCatch: Extracting Source Code Snippets from Online Sources},
booktitle={IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)},
pages={21-27},
publisher={https://dl.acm.org/ft_gateway.cfm?id=3194107&ftid=1982571&dwn=1&CFID=87644405&CFTOKEN=833260e7cb501a7d-48967D35-AFC5-4678-82812B13D64D3DD3},
year={2018},
month={05},
date={2018-05-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2018/11/RAISE2018.pdf},
doi={http://10.1145/3194104.3194107},
abstract={https://dl.acm.org/ft_gateway.cfm?id=3194107&ftid=1982571&dwn=1&CFID=87644405&CFTOKEN=833260e7cb501a7d-48967D35-AFC5-4678-82812B13D64D3DD3}
}

Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis
Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories, 2018 Mar

The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.

@inproceedings{Papamichail2018MaLTeSQuE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Ilias Chrysovergis and Philippos Samlidis and Andreas Symeonidis},
title={User-Perceived Reusability Estimation based on Analysis of Software Repositories},
booktitle={Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)},
publisher={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
year={2018},
month={03},
date={2018-03-20},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/maLTeSQuE.pdf},
publisher's url={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
abstract={The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.}
}

2018

Inbooks

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Charpter:1, pp. 25, Springer, 2018 Jan

Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.

@inbook{Dimaridou2018,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Assessing the User-Perceived Quality of Source Code Components using Static Analysis Metrics},
chapter={1},
pages={25},
publisher={Springer},
year={2018},
month={01},
date={2018-01-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/ccis_book_chapter.pdf},
publisher's url={https://www.researchgate.net/publication/325627162_Assessing_the_User-Perceived_Quality_of_Source_Code_Components_Using_Static_Analysis_Metrics},
abstract={Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.}
}

2017

Journal Articles

Themistoklis Diamantopoulos, Michael Roth, Andreas Symeonidis and Ewan Klein
"Software requirements as an application domain for natural language processing"
Language Resources and Evaluation, pp. 1-30, 2017 Feb

Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.

@article{Diamantopoulos2017,
author={Themistoklis Diamantopoulos and Michael Roth and Andreas Symeonidis and Ewan Klein},
title={Software requirements as an application domain for natural language processing},
journal={Language Resources and Evaluation},
pages={1-30},
year={2017},
month={02},
date={2017-02-27},
url={http://rdcu.be/tpxd},
doi={http://10.1007/s10579-017-9381-z},
abstract={Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.}
}

Themistoklis Diamantopoulos and Andreas Symeonidis
Enterprise Information Systems, pp. 1-22, 2017 Dec

Enhancing the requirements elicitation process has always been of added value to software engineers, since it expedites the software lifecycle and reduces errors in the conceptualization phase of software products. The challenge posed to the research community is to construct formal models that are capable of storing requirements from multimodal formats (text and UML diagrams) and promote easy requirements reuse, while at the same time being traceable to allow full control of the system design, as well as comprehensible to software engineers and end users. In this work, we present an approach that enhances requirements reuse while capturing the static (functional requirements, use case diagrams) and dynamic (activity diagrams) view of software projects. Our ontology-based approach allows for reasoning over the stored requirements, while the mining methodologies employed detect incomplete or missing software requirements, this way reducing the effort required for requirements elicitation at an early stage of the project lifecycle.

@article{Diamantopoulos2017EIS,
author={Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Enhancing requirements reusability through semantic modeling and data mining techniques},
journal={Enterprise Information Systems},
pages={1-22},
year={2017},
month={12},
date={2017-12-17},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/EIS2017.pdf},
doi={http://10.1080/17517575.2017.1416177},
publisher's url={https://doi.org/10.1080/17517575.2017.1416177},
abstract={Enhancing the requirements elicitation process has always been of added value to software engineers, since it expedites the software lifecycle and reduces errors in the conceptualization phase of software products. The challenge posed to the research community is to construct formal models that are capable of storing requirements from multimodal formats (text and UML diagrams) and promote easy requirements reuse, while at the same time being traceable to allow full control of the system design, as well as comprehensible to software engineers and end users. In this work, we present an approach that enhances requirements reuse while capturing the static (functional requirements, use case diagrams) and dynamic (activity diagrams) view of software projects. Our ontology-based approach allows for reasoning over the stored requirements, while the mining methodologies employed detect incomplete or missing software requirements, this way reducing the effort required for requirements elicitation at an early stage of the project lifecycle.}
}

2017

Inproceedings Papers

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
"Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics"
Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 73-84, SciTePress, 2017 Jul

Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.

@inproceedings{icsoft17,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics},
booktitle={Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={73-84},
publisher={SciTePress},
year={2017},
month={07},
date={2017-07-26},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/08/ICSOFT.pdf},
doi={http://10.5220/0006420000730084},
slideshare={https://www.slideshare.net/isselgroup/towards-modeling-the-userperceived-quality-of-source-code-using-static-analysis-metrics},
abstract={Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.}
}

2016

Journal Articles

Christoforos Zolotas, Themistoklis Diamantopoulos, Kyriakos Chatzidimitriou and Andreas Symeonidis
"From requirements to source code: a Model-Driven Engineering approach for RESTful web services"
Automated Software Engineering, pp. 1-48, 2016 Sep

During the last few years, the REST architectural style has drastically changed the way web services are developed. Due to its transparent resource-oriented model, the RESTful paradigm has been incorporated into several development frameworks that allow rapid development and aspire to automate parts of the development process. However, most of the frameworks lack automation of essential web service functionality, such as authentication or database searching, while the end product is usually not fully compliant to REST. Furthermore, most frameworks rely heavily on domain specific modeling and require developers to be familiar with the employed modeling technologies. In this paper, we present a Model-Driven Engineering (MDE) engine that supports fast design and implementation of web services with advanced functionality. Our engine provides a front-end interface that allows developers to design their envisioned system through software requirements in multimodal formats. Input in the form of textual requirements and graphical storyboards is analyzed using natural language processing techniques and semantics, to semi-automatically construct the input model for the MDE engine. The engine subsequently applies model-to-model transformations to produce a RESTful, ready-to-deploy web service. The procedure is traceable, ensuring that changes in software requirements propagate to the underlying software artefacts and models. Upon assessing our methodology through a case study and measuring the effort reduction of using our tools, we conclude that our system can be effective for the fast design and implementation of web services, while it allows easy wrapping of services that have been engineered with traditional methods to the MDE realm.

@article{2016ZolotasASE,
author={Christoforos Zolotas and Themistoklis Diamantopoulos and Kyriakos Chatzidimitriou and Andreas Symeonidis},
title={From requirements to source code: a Model-Driven Engineering approach for RESTful web services},
journal={Automated Software Engineering},
pages={1-48},
year={2016},
month={09},
date={2016-09-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/09/ReqsToCodeMDE.pdf},
doi={http://10.1007/s10515-016-0206-x},
abstract={During the last few years, the REST architectural style has drastically changed the way web services are developed. Due to its transparent resource-oriented model, the RESTful paradigm has been incorporated into several development frameworks that allow rapid development and aspire to automate parts of the development process. However, most of the frameworks lack automation of essential web service functionality, such as authentication or database searching, while the end product is usually not fully compliant to REST. Furthermore, most frameworks rely heavily on domain specific modeling and require developers to be familiar with the employed modeling technologies. In this paper, we present a Model-Driven Engineering (MDE) engine that supports fast design and implementation of web services with advanced functionality. Our engine provides a front-end interface that allows developers to design their envisioned system through software requirements in multimodal formats. Input in the form of textual requirements and graphical storyboards is analyzed using natural language processing techniques and semantics, to semi-automatically construct the input model for the MDE engine. The engine subsequently applies model-to-model transformations to produce a RESTful, ready-to-deploy web service. The procedure is traceable, ensuring that changes in software requirements propagate to the underlying software artefacts and models. Upon assessing our methodology through a case study and measuring the effort reduction of using our tools, we conclude that our system can be effective for the fast design and implementation of web services, while it allows easy wrapping of services that have been engineered with traditional methods to the MDE realm.}
}

2016

Conference Papers

Themistoklis Diamantopoulos, Klearchos Thomopoulos and Andreas L. Symeonidis
"QualBoa: Reusability-aware Recommendations of Source Code Components"
IEEE/ACM 13th Working Conference on Mining Software Repositories, 2016 May

Contemporary software development processes involve finding reusable software components from online repositories and integrating them to the source code, both to reduce development time and to ensure that the final software project is of high quality. Although several systems have been designed to automate this procedure by recommending components that cover the desired functionality, the reusability of these components is usually not assessed by these systems. In this work, we present QualBoa, a recommendation system for source code components that covers both the functional and the quality aspects of software component reuse. Upon retrieving components, QualBoa provides a ranking that involves not only functional matching to the query, but also a reusability score based on configurable thresholds of source code metrics. The evaluation of QualBoa indicates that it can be effective for recommending reusable source code.

@conference{2016DiamantopoulosIEEE/ACM,
author={Themistoklis Diamantopoulos and Klearchos Thomopoulos and Andreas L. Symeonidis},
title={QualBoa: Reusability-aware Recommendations of Source Code Components},
booktitle={IEEE/ACM 13th Working Conference on Mining Software Repositories},
year={2016},
month={05},
date={2016-05-14},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/06/QualBoa-Reusability-aware-Recommendations-of-Source-Code-Components.pdf},
doi={http://2016%20IEEE/ACM%2013th%20Working%20Conference%20on%20Mining%20Software%20Repositories},
abstract={Contemporary software development processes involve finding reusable software components from online repositories and integrating them to the source code, both to reduce development time and to ensure that the final software project is of high quality. Although several systems have been designed to automate this procedure by recommending components that cover the desired functionality, the reusability of these components is usually not assessed by these systems. In this work, we present QualBoa, a recommendation system for source code components that covers both the functional and the quality aspects of software component reuse. Upon retrieving components, QualBoa provides a ranking that involves not only functional matching to the query, but also a reusability score based on configurable thresholds of source code metrics. The evaluation of QualBoa indicates that it can be effective for recommending reusable source code.}
}

Themistoklis Diamantopoulos, Antonis Noutsos and Andreas L. Symeonidis
"DP-CORE: A Design Pattern Detection Tool for Code Reuse"
6th International Symposium on Business Modeling and Software Design (BMSD), -, Rhodes, Greece, 2016 Dec

In order to maintain, extend or reuse software projects one has to primarily understand what a system does and how well it does it. And, while in some cases information on system functionality exists, information covering the non-functional aspects is usually unavailable. Thus, one has to infer such knowledge by extracting design patterns directly from the source code. Several tools have been developed to identify design patterns, however most of them are limited to compilable and in most cases executable code, they rely on complex representations, and do not offer the developer any control over the detected patterns. In this paper we present DP-CORE, a design pattern detection tool that defines a highly descriptive representation to detect known and define custom patterns. DP-CORE is flexible, identifying exact and approximate pattern versions even in non-compilable code. Our analysis indicates that DP-CORE provides an efficient alternative to existing design pattern detection tools.

@conference{2016DiamantopoulosSBMSD,
author={Themistoklis Diamantopoulos and Antonis Noutsos and Andreas L. Symeonidis},
title={DP-CORE: A Design Pattern Detection Tool for Code Reuse},
booktitle={6th International Symposium on Business Modeling and Software Design (BMSD)},
publisher={-},
address={Rhodes, Greece},
year={2016},
month={00},
date={2016-00-00},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/09/DP-CORE.pdf},
doi={http://2016%20IEEE/ACM%2013th%20Working%20Conference%20on%20Mining%20Software%20Repositories},
abstract={In order to maintain, extend or reuse software projects one has to primarily understand what a system does and how well it does it. And, while in some cases information on system functionality exists, information covering the non-functional aspects is usually unavailable. Thus, one has to infer such knowledge by extracting design patterns directly from the source code. Several tools have been developed to identify design patterns, however most of them are limited to compilable and in most cases executable code, they rely on complex representations, and do not offer the developer any control over the detected patterns. In this paper we present DP-CORE, a design pattern detection tool that defines a highly descriptive representation to detect known and define custom patterns. DP-CORE is flexible, identifying exact and approximate pattern versions even in non-compilable code. Our analysis indicates that DP-CORE provides an efficient alternative to existing design pattern detection tools.}
}

2016

Inproceedings Papers

Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"User-Perceived Source Code Quality Estimation based on Static Analysis Metrics"
2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), Vienna, Austria, 2016 Aug

The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.

@inproceedings{2016PapamichailIEEE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={User-Perceived Source Code Quality Estimation based on Static Analysis Metrics},
booktitle={2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
address={Vienna, Austria},
year={2016},
month={08},
date={2016-08-03},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/User-Perceived-Source-Code-Quality-Estimation-based-on-Static-Analysis-Metrics.pdf},
slideshare={http://www.slideshare.net/isselgroup/userperceived-source-code-quality-estimation-based-on-static-analysis-metrics},
abstract={The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.}
}

2015

Conference Papers

Themistoklis Diamantopoulos and Andreas Symeonidis
"Employing Source Code Information to Improve Question-Answering in Stack Overflow"
The 12th Working Conference on Mining Software Repositories (MSR 2015), pp. 454-457, Florence, Italy, 2015 May

Nowadays, software development has been greatlyinfluenced by question-answering communities, such as Stack Overflow. A new problem-solving paradigm has emerged, as developers post problems they encounter that are then answered by the community. In this paper, we propose a methodology that allows searching for solutions in Stack Overflow, using the main elements of a question post, including not only its title, tags, and body, but also its source code snippets. We describe a similarity scheme for these elements and demonstrate how structural information can be extracted from source code snippets and compared to further improve the retrieval of questions. The results of our evaluation indicate that our methodology is effective on recommending similar question posts allowing community members to search without fully forming a question

@conference{2015DiamantopoulosMSR,
author={Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Employing Source Code Information to Improve Question-Answering in Stack Overflow},
booktitle={The 12th Working Conference on Mining Software Repositories (MSR 2015)},
pages={454-457},
address={Florence, Italy},
year={2015},
month={05},
date={2015-05-01},
url={http://issel.ee.auth.gr/wp-content/uploads/MSR2015.pdf},
keywords={Load Forecasting},
abstract={Nowadays, software development has been greatlyinfluenced by question-answering communities, such as Stack Overflow. A new problem-solving paradigm has emerged, as developers post problems they encounter that are then answered by the community. In this paper, we propose a methodology that allows searching for solutions in Stack Overflow, using the main elements of a question post, including not only its title, tags, and body, but also its source code snippets. We describe a similarity scheme for these elements and demonstrate how structural information can be extracted from source code snippets and compared to further improve the retrieval of questions. The results of our evaluation indicate that our methodology is effective on recommending similar question posts allowing community members to search without fully forming a question}
}

2015

Inproceedings Papers

Themistoklis Diamantopoulos and Andreas Symeonidis
"Towards Interpretable Defect-Prone Component Analysis using Genetic Fuzzy Systems"
IEEE/ACM 4th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE, pp. 32-38, Florence, Italy, 2015 May

The problem of Software Reliability Prediction is attracting the attention of several researchers during the last few years. Various classification techniques are proposed in current literature which involve the use of metrics drawn from version control systems in order to classify software components as defect-prone or defect-free. In this paper, we create a novel genetic fuzzy rule-based system to efficiently model the defect-proneness of each component. The system uses a Mamdani-Assilian inference engine and models the problem as a one-class classification task. System rules are constructed using a genetic algorithm, where each chromosome represents a rule base (Pittsburgh approach). The parameters of our fuzzy system and the operators of the genetic algorithm are designed with regard to producing interpretable output. Thus, the output offers not only effective classification, but also a comprehensive set of rules that can be easily visualized to extract useful conclusions about the metrics of the software.

@inproceedings{2015DiamantopoulosRAISE,
author={Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Towards Interpretable Defect-Prone Component Analysis using Genetic Fuzzy Systems},
booktitle={IEEE/ACM 4th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE},
pages={32-38},
address={Florence, Italy},
year={2015},
month={05},
date={2015-05-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Towards-Interpretable-Defect-Prone-Component-Analysis-using-Genetic-Fuzzy-Systems-.pdf},
keywords={Load Forecasting},
abstract={The problem of Software Reliability Prediction is attracting the attention of several researchers during the last few years. Various classification techniques are proposed in current literature which involve the use of metrics drawn from version control systems in order to classify software components as defect-prone or defect-free. In this paper, we create a novel genetic fuzzy rule-based system to efficiently model the defect-proneness of each component. The system uses a Mamdani-Assilian inference engine and models the problem as a one-class classification task. System rules are constructed using a genetic algorithm, where each chromosome represents a rule base (Pittsburgh approach). The parameters of our fuzzy system and the operators of the genetic algorithm are designed with regard to producing interpretable output. Thus, the output offers not only effective classification, but also a comprehensive set of rules that can be easily visualized to extract useful conclusions about the metrics of the software.}
}

2014

Journal Articles

Themistoklis Diamantopoulos and Andreas Symeonidis
"Localizing Software Bugs using the Edit Distance of Call Traces"
International Journal On Advances in Software, 7, (1), pp. 277 - 288, 2014 Oct

Automating the localization of software bugs that do not lead to crashes is a difficult task that has drawn the attention of several researchers. Several popular methods follow the same approach; function call traces are collected and represented as graphs, which are subsequently mined using subgraph mining algorithms in order to provide a ranking of potentially buggy functions-nodes. Recent work has indicated that the scalability of state-of-the-art methods can be improved by reducing the graph dataset using tree edit distance algorithms. The call traces that are closer to each other, but belong to different sets, are the ones that are most significant in localizing bugs. In this work, we further explore the task of selecting the most significant traces, by proposing different call trace selection techniques, based on the Stable Marriage problem, and testing their effectiveness against current solutions. Upon evaluating our methods on a real-world dataset, we prove that our methodology is scalable and effective enough to be applied on dynamic bug detection scenarios.

@article{2014DiamantopoulosIJAS,
author={Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Localizing Software Bugs using the Edit Distance of Call Traces},
journal={International Journal On Advances in Software},
volume={7},
number={1},
pages={277 - 288},
year={2014},
month={10},
date={2014-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Localizing-Software-Bugs-using-the-Edit-Distance-of-Call-Traces.pdf},
doi={http://10.1016/j.engappai.2014.06.015},
keywords={Intrusion detection systems},
abstract={Automating the localization of software bugs that do not lead to crashes is a difficult task that has drawn the attention of several researchers. Several popular methods follow the same approach; function call traces are collected and represented as graphs, which are subsequently mined using subgraph mining algorithms in order to provide a ranking of potentially buggy functions-nodes. Recent work has indicated that the scalability of state-of-the-art methods can be improved by reducing the graph dataset using tree edit distance algorithms. The call traces that are closer to each other, but belong to different sets, are the ones that are most significant in localizing bugs. In this work, we further explore the task of selecting the most significant traces, by proposing different call trace selection techniques, based on the Stable Marriage problem, and testing their effectiveness against current solutions. Upon evaluating our methods on a real-world dataset, we prove that our methodology is scalable and effective enough to be applied on dynamic bug detection scenarios.}
}

2014

Inproceedings Papers

Michael Roth, Themistoklis Diamantopoulos, Ewan Klein and Andreas L. Symeonidis
"Software Requirements: A new Domain for Semantic Parsers"
Proceedings of the ACL 2014 Workshop on Semantic Parsing (SP14), pp. 50-54, Baltimore, Maryland, USA, 2014 Jun

Software requirements are commonlywritten in natural language, making themprone to ambiguity, incompleteness and inconsistency. By converting requirements to formal emantic representations, emerging problems can be detected at an early stage of the development process, thus reducing the number of ensuing errors and the development costs. In this paper, we treat the mapping from requirements to formal representations as a semantic parsing task. We describe a novel data set for this task that involves two contributions: first, we establish an ontology for formally representing requirements; and second, we introduce an iterative annotation scheme, in which formal representations are derived through step-wise refinements.

@inproceedings{roth2014software,
author={Michael Roth and Themistoklis Diamantopoulos and Ewan Klein and Andreas L. Symeonidis},
title={Software Requirements: A new Domain for Semantic Parsers},
booktitle={Proceedings of the ACL 2014 Workshop on Semantic Parsing (SP14)},
pages={50-54},
address={Baltimore, Maryland, USA},
year={2014},
month={06},
date={2014-06-01},
url={http://www.aclweb.org/anthology/W/W14/W14-24.pdf#page=62},
keywords={Load Forecasting},
abstract={Software requirements are commonlywritten in natural language, making themprone to ambiguity, incompleteness and inconsistency. By converting requirements to formal emantic representations, emerging problems can be detected at an early stage of the development process, thus reducing the number of ensuing errors and the development costs. In this paper, we treat the mapping from requirements to formal representations as a semantic parsing task. We describe a novel data set for this task that involves two contributions: first, we establish an ontology for formally representing requirements; and second, we introduce an iterative annotation scheme, in which formal representations are derived through step-wise refinements.}
}

2013

Incollection

Themistoklis Diamantopoulos, Andreas Symeonidis and Anthonios Chrysopoulos
"Designing robust strategies for continuous trading in contemporary power markets"
Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets, pp. 30-44, Springer Berlin Heidelberg, 2013 Jan

In contemporary energy markets participants interact with each other via brokers that are responsible for the proper energy flow to and from their clients (usually in the form of long-term or short- term contracts). Power TAC is a realistic simulation of a real-life energy market, aiming towards providing a better understanding and modeling of modern energy markets, while boosting research on innovative trad- ing strategies. Power TAC models brokers as software agents, competing against each other in Double Auction environments, in order to increase their client base and market share. Current work discusses such a bro- ker agent architecture, striving to maximize his own profit. Within the context of our analysis, Double Auction markets are treated as microeco- nomic systems and, based on state-of-the-art price formation strategies, the following policies are designed: an adaptive price formation policy, a policy for forecasting energy consumption that employs Time Series Analysis primitives, and two shout update policies, a rule-based policy that acts rather hastily, and one based on Fuzzy Logic. The results are quite encouraging and will certainly call for future research.

@incollection{2013DiamantopoulosAMEC-DTSMEM,
author={Themistoklis Diamantopoulos and Andreas Symeonidis and Anthonios Chrysopoulos},
title={Designing robust strategies for continuous trading in contemporary power markets},
booktitle={Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets},
pages={30-44},
publisher={Springer Berlin Heidelberg},
year={2013},
month={01},
date={2013-01-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Designing-Robust-Strategies-for-Continuous-Trading-in-Contemporary-Power-Markets.pdf},
doi={http://link.springer.com/chapter/10.1007/978-3-642-40864-9_3#page-1},
keywords={aiming towards providing a better understanding and modeling of modern energy markets;competing against each other in Double Auction environments;striving to maximize his own profit. Within the context of our analysis;Double Auction markets are treated as microeconomic systems and;based on state-of-the-art price formation strategies;the following policies are designed: an adaptive price formation policy;a policy for forecasting energy consumption that employs Time Series Analysis primitives;and two shout update policies;a rule-based policy that acts rather hastily},
abstract={In contemporary energy markets participants interact with each other via brokers that are responsible for the proper energy flow to and from their clients (usually in the form of long-term or short- term contracts). Power TAC is a realistic simulation of a real-life energy market, aiming towards providing a better understanding and modeling of modern energy markets, while boosting research on innovative trad- ing strategies. Power TAC models brokers as software agents, competing against each other in Double Auction environments, in order to increase their client base and market share. Current work discusses such a bro- ker agent architecture, striving to maximize his own profit. Within the context of our analysis, Double Auction markets are treated as microeco- nomic systems and, based on state-of-the-art price formation strategies, the following policies are designed: an adaptive price formation policy, a policy for forecasting energy consumption that employs Time Series Analysis primitives, and two shout update policies, a rule-based policy that acts rather hastily, and one based on Fuzzy Logic. The results are quite encouraging and will certainly call for future research.}
}

2013

Inproceedings Papers

Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Towards Scalable Bug Localization using the Edit Distance of Call Traces"
The Eighth International Conference on Software Engineering Advances (ICSEA 2013), pp. 45-50, Venice, Italy, 2013 Oct

Locating software bugs is a difficult task, especially if they do not lead to crashes. Current research on automating non-crashing bug detection dictates collecting function call traces and representing them as graphs, and reducing the graphs before applying a subgraph mining algorithm. A ranking of potentially buggy functions is derived using frequency statistics for each node (function) in the correct and incorrect set of traces. Although most existing techniques are effective, they do not achieve scalability. To address this issue, this paper suggests reducing the graph dataset in order to isolate the graphs that are significant in localizing bugs. To this end, we propose the use of tree edit distance algorithms to identify the traces that are closer to each other, while belonging to different sets. The scalability of two proposed algorithms, an exact and a faster approximate one, is evaluated using a dataset derived from a real-world application. Finally, although the main scope of this work lies in scalability, the results indicate that there is no compromise in effectiveness.

@inproceedings{2013DiamantopoulosICSEA,
author={Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Towards Scalable Bug Localization using the Edit Distance of Call Traces},
booktitle={The Eighth International Conference on Software Engineering Advances (ICSEA 2013)},
pages={45-50},
address={Venice, Italy},
year={2013},
month={10},
date={2013-10-27},
url={https://www.thinkmind.org/download.php?articleid=icsea_2013_2_30_10250},
keywords={Load Forecasting},
abstract={Locating software bugs is a difficult task, especially if they do not lead to crashes. Current research on automating non-crashing bug detection dictates collecting function call traces and representing them as graphs, and reducing the graphs before applying a subgraph mining algorithm. A ranking of potentially buggy functions is derived using frequency statistics for each node (function) in the correct and incorrect set of traces. Although most existing techniques are effective, they do not achieve scalability. To address this issue, this paper suggests reducing the graph dataset in order to isolate the graphs that are significant in localizing bugs. To this end, we propose the use of tree edit distance algorithms to identify the traces that are closer to each other, while belonging to different sets. The scalability of two proposed algorithms, an exact and a faster approximate one, is evaluated using a dataset derived from a real-world application. Finally, although the main scope of this work lies in scalability, the results indicate that there is no compromise in effectiveness.}
}