Michail Papamichail

papamichailPhD Candidate

Aristotle University of Thessaloniki
Department of Electrical and Computer Engineering
54124 Thessaloniki – GREECE

Tel: +30 2310 99 6365
Fax: +30 2310 99 6398
Email: mpapamic (at) issel [dot] ee [dot] auth [dot] gr
LinkedIn

Education

12/2015 – today PhD candidate Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece PhD Thesis: “Application of artificaial intelligence and data mining techniques for software quality assessment”
09/2010 – 11/2015 Diploma of Electrical and Computer Engineering Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece Diploma Thesis: “Design and development of a source code quality estimation system using static analysis metrics and machine learning techniques”.

Professional Experience

02/2016 – today Research Associate Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece EU-funded Project: Mobile-Age (http://issel.ee.auth.gr/mobile-age/ )
06/2015 – 12/2015 Application/Technical Consultant in Veltio. (Oracle RPAS solutions consultant)
06/2014 – 09/2014 Paid internship position in the University of California, Irvine in the Secure Systems and Software Laboratory (SSL) under professor Michael Franz

Teaching Experience

10/2016 – today Teaching assistant for “Pattern Recognition”, Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Greece

Research interests

  • Source Code Quality
  • Software Engineering
  • Data Mining
  • Machine Learning

Languages

  • English: Proficient (Cambridge and Michigan Proficiency)
  • German: Conversational (Goethe-Zertifikats B1: Zertifikat Deutsch (ZD))

Memberships

  • Member of the Technical Chamber of Greece

Puplications

2022

Conference Papers

Georgios Kalantzis, Gerasimos Papakostas, Thomas Karanikiotis, Michail Papamichail and Andreas Symeonidis
"A Heuristic Approach towards Continuous Implicit Authentication"
2022 IEEE International Joint Conference on Biometrics (IJCB), pp. 1-7, IEEE, 2022 Oct

Smartphones nowadays handle large amounts of sensitive user information, since users exchange undisclosed information on an everyday basis. This generates the need for more effective authentication mechanisms, deviating from the traditional ones. In this direction, many research approaches are targeted towards continuous implicit authentication, on the basis of modelling the constant interaction of the user with the device. These approaches yield promising results, however certain improvements can be made by exploiting the sequential order of the predictions and the known performance metrics. In this work, we propose a heuristics algorithm, which, given a series of predictions from any continuous implicit authentication model, can ex-ploit the sequential order in order to fix any false predictions and improve the accuracy of the smartphone security system. Preliminary evaluation on several axes indicates that our approach can effectively improve any CIA model and achieve significantly better results.

@conference{ijcb2022karanikiotis,
author={Georgios Kalantzis and Gerasimos Papakostas and Thomas Karanikiotis and Michail Papamichail and Andreas Symeonidis},
title={A Heuristic Approach towards Continuous Implicit Authentication},
booktitle={2022 IEEE International Joint Conference on Biometrics (IJCB)},
pages={1-7},
publisher={IEEE},
year={2022},
month={10},
date={2022-10-01},
url={https://ieeexplore.ieee.org/abstract/document/10007940},
doi={http://10.1109/IJCB54206.2022.10007940},
issn={2474-9699},
isbn={978-1-6654-6394-2},
abstract={Smartphones nowadays handle large amounts of sensitive user information, since users exchange undisclosed information on an everyday basis. This generates the need for more effective authentication mechanisms, deviating from the traditional ones. In this direction, many research approaches are targeted towards continuous implicit authentication, on the basis of modelling the constant interaction of the user with the device. These approaches yield promising results, however certain improvements can be made by exploiting the sequential order of the predictions and the known performance metrics. In this work, we propose a heuristics algorithm, which, given a series of predictions from any continuous implicit authentication model, can ex-ploit the sequential order in order to fix any false predictions and improve the accuracy of the smartphone security system. Preliminary evaluation on several axes indicates that our approach can effectively improve any CIA model and achieve significantly better results.}
}

2021

Journal Articles

Michail D. Papamichail and Andreas L. Symeonidis
"Data-Driven Analytics towards Software Sustainability: The Case of Open-Source Multimedia Tools on Cultural Storytelling"
Sustainability, 13, (3), pp. 1079, 2021 Jan

@article{papamichail2021data,
author={Michail D. Papamichail and Andreas L. Symeonidis},
title={Data-Driven Analytics towards Software Sustainability: The Case of Open-Source Multimedia Tools on Cultural Storytelling},
journal={Sustainability},
volume={13},
number={3},
pages={1079},
year={2021},
month={01},
date={2021-01-21},
url={https://www.mdpi.com/2071-1050/13/3/1079},
doi={https://doi.org/10.3390/su13031079}
}

Thomas Karanikiotis, Michail D. Papamichail and Andreas L. Symeonidis
"Analyzing Static Analysis Metric Trends towards Early Identification of Non-Maintainable Software Components"
Sustainability, 13, (22), 2021 Nov

Nowadays, agile software development is considered a mainstream approach for software with fast release cycles and frequent changes in requirements. Most of the time, high velocity in software development implies poor software quality, especially when it comes to maintainability. In this work, we argue that ensuring the maintainability of a software component is not the result of a one-time only (or few-times only) set of fixes that eliminate technical debt, but the result of a continuous process across the software’s life cycle. We propose a maintainability evaluation methodology, where data residing in code hosting platforms are being used in order to identify non-maintainable software classes. Upon detecting classes that have been dropped from their project, we examine the progressing behavior of their static analysis metrics and evaluate maintainability upon the four primary source code properties: complexity, cohesion, inheritance and coupling. The evaluation of our methodology upon various axes, both qualitative and quantitative, indicates that our approach can provide actionable and interpretable maintainability evaluation at class level and identify non-maintainable components around 50% ahead of the software life cycle. Based on these results, we argue that the progressing behavior of static analysis metrics at a class level can provide valuable information about the maintainability degree of the component in time.

@article{su132212848,
author={Thomas Karanikiotis and Michail D. Papamichail and Andreas L. Symeonidis},
title={Analyzing Static Analysis Metric Trends towards Early Identification of Non-Maintainable Software Components},
journal={Sustainability},
volume={13},
number={22},
year={2021},
month={11},
date={2021-11-20},
url={https://www.mdpi.com/2071-1050/13/22/12848},
doi={https://doi.org/10.3390/su132212848},
issn={2071-1050},
abstract={Nowadays, agile software development is considered a mainstream approach for software with fast release cycles and frequent changes in requirements. Most of the time, high velocity in software development implies poor software quality, especially when it comes to maintainability. In this work, we argue that ensuring the maintainability of a software component is not the result of a one-time only (or few-times only) set of fixes that eliminate technical debt, but the result of a continuous process across the software’s life cycle. We propose a maintainability evaluation methodology, where data residing in code hosting platforms are being used in order to identify non-maintainable software classes. Upon detecting classes that have been dropped from their project, we examine the progressing behavior of their static analysis metrics and evaluate maintainability upon the four primary source code properties: complexity, cohesion, inheritance and coupling. The evaluation of our methodology upon various axes, both qualitative and quantitative, indicates that our approach can provide actionable and interpretable maintainability evaluation at class level and identify non-maintainable components around 50% ahead of the software life cycle. Based on these results, we argue that the progressing behavior of static analysis metrics at a class level can provide valuable information about the maintainability degree of the component in time.}
}

2021

Inbooks

Thomas Karanikiotis, Michail D. Papamichail and Andreas L. Symeonidis
"Multilevel Readability Interpretation Against Software Properties: A Data-Centric Approach"
Charpter:-, van Sinderen, Marten and Maciaszek, Leszek A. and Fill, Hans-Georg edition, 1447, pp. 203-226, Springer International Publishing, Communications in Computer and Information Science, Cham, 2021 Jul

Given the wide adoption of the agile software development paradigm, where efficient collaboration as well as effective maintenance are of utmost importance, the need to produce readable source code is evident. To that end, several research efforts aspire to assess the extent to which a software component is readable. Several metrics and evaluation criteria have been proposed; however, they are mostly empirical or rely on experts who are responsible for determining the ground truth and/or set custom thresholds, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. Unlike already existing approaches, we refrain from using experts and we provide a fully automated and extendible methodology built upon data residing in online code hosting facilities. We perform static analysis at two levels (method and class) and construct a benchmark dataset that includes more than one million methods and classes covering diverse development scenarios. After performing clustering based on source code size, we employ Support Vector Regression in order to interpret the extent to which a software component is readable against the source code properties: cohesion, inheritance, complexity, coupling, and documentation. The evaluation of our methodology indicates that our models effectively interpret readability as perceived by developers against the above mentioned source code properties.

@inbook{icsoft2020BookChapter,
author={Thomas Karanikiotis and Michail D. Papamichail and Andreas L. Symeonidis},
title={Multilevel Readability Interpretation Against Software Properties: A Data-Centric Approach},
chapter={-},
edition={van Sinderen, Marten and Maciaszek, Leszek A. and Fill, Hans-Georg},
volume={1447},
pages={203-226},
publisher={Springer International Publishing},
series={seriesS},
address={Cham},
year={2021},
month={07},
date={2021-07-21},
url={https://doi.org/10.1007/978-3-030-83007-6_10},
doi={http://10.1007/978-3-030-83007-6_10},
isbn={978-3-030-83007-6},
abstract={Given the wide adoption of the agile software development paradigm, where efficient collaboration as well as effective maintenance are of utmost importance, the need to produce readable source code is evident. To that end, several research efforts aspire to assess the extent to which a software component is readable. Several metrics and evaluation criteria have been proposed; however, they are mostly empirical or rely on experts who are responsible for determining the ground truth and/or set custom thresholds, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. Unlike already existing approaches, we refrain from using experts and we provide a fully automated and extendible methodology built upon data residing in online code hosting facilities. We perform static analysis at two levels (method and class) and construct a benchmark dataset that includes more than one million methods and classes covering diverse development scenarios. After performing clustering based on source code size, we employ Support Vector Regression in order to interpret the extent to which a software component is readable against the source code properties: cohesion, inheritance, complexity, coupling, and documentation. The evaluation of our methodology indicates that our models effectively interpret readability as perceived by developers against the above mentioned source code properties.}
}

2020

Journal Articles

Michail D. Papamichail and Andreas L. Symeonidis
"A Generic Methodology for Early Identification of Non-Maintainable Source Code Components through Analysis of Software Releases"
Information and Software Technology, 118, pp. 106218, 2020 Feb

Contemporary development approaches consider that time-to-market is of utmost importance and assume that software projects are constantly evolving, driven by the continuously changing requirements of end-users. This practically requires an iterative process where software is changing by introducing new or updating existing software/user features, while at the same time continuing to support the stable ones. In order to ensure efficient software evolution, the need to produce maintainable software is evident. In this work, we argue that non-maintainable software is not the outcome of a single change, but the consequence of a series of changes throughout the development lifecycle. To that end, we define a maintainability evaluation methodology across releases and employ various information residing in software repositories, so as to decide on the maintainability of software. Upon using the dropping of packages as a non-maintainability indicator (accompanied by a series of quality-related criteria), the proposed methodology involves using one-class-classification techniques for evaluating maintainability at a package level, on four different axes each targeting a primary source code property: complexity, cohesion, coupling, and inheritance. Given the qualitative and quantitative evaluation of our methodology, we argue that apart from providing accurate and interpretable maintainability evaluation at package level, we can also identify non-maintainable components at an early stage. This early stage is in many cases around 50% of the software package lifecycle. Based on our findings, we conclude that modeling the trending behavior of certain static analysis metrics enables the effective identification of non-maintainable software components and thus can be a valuable tool for the software engineers.

@article{ISTmaintainabilityPaper,
author={Michail D. Papamichail and Andreas L. Symeonidis},
title={A Generic Methodology for Early Identification of Non-Maintainable Source Code Components through Analysis of Software Releases},
journal={Information and Software Technology},
volume={118},
pages={106218},
year={2020},
month={02},
date={2020-02-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/06/ISTmaintainabilityPaper.pdf},
doi={https://doi.org/10.1016/j.infsof.2019.106218},
issn={0950-5849},
keywords={static analysis metrics;Software releases;maintainability evaluation;software quality;trend analysis},
abstract={Contemporary development approaches consider that time-to-market is of utmost importance and assume that software projects are constantly evolving, driven by the continuously changing requirements of end-users. This practically requires an iterative process where software is changing by introducing new or updating existing software/user features, while at the same time continuing to support the stable ones. In order to ensure efficient software evolution, the need to produce maintainable software is evident. In this work, we argue that non-maintainable software is not the outcome of a single change, but the consequence of a series of changes throughout the development lifecycle. To that end, we define a maintainability evaluation methodology across releases and employ various information residing in software repositories, so as to decide on the maintainability of software. Upon using the dropping of packages as a non-maintainability indicator (accompanied by a series of quality-related criteria), the proposed methodology involves using one-class-classification techniques for evaluating maintainability at a package level, on four different axes each targeting a primary source code property: complexity, cohesion, coupling, and inheritance. Given the qualitative and quantitative evaluation of our methodology, we argue that apart from providing accurate and interpretable maintainability evaluation at package level, we can also identify non-maintainable components at an early stage. This early stage is in many cases around 50% of the software package lifecycle. Based on our findings, we conclude that modeling the trending behavior of certain static analysis metrics enables the effective identification of non-maintainable software components and thus can be a valuable tool for the software engineers.}
}

2020

Conference Papers

Thomas Karanikiotis, Michail D. Papamichail, Kyriakos C. Chatzidimitriou, Napoleon-Christos I. Oikonomou, Andreas L. Symeonidis, and Sashi K. Saripalle
"Continuous Implicit Authentication through Touch Traces Modelling"
20th International Conference on Software Quality, Reliability and Security (QRS), pp. 111-120, 2020 Nov

Nowadays, the continuously increasing use of smartphones as the primary way of dealing with day-to-day tasks raises several concerns mainly focusing on privacy and security. In this context and given the known limitations and deficiencies of traditional authentication mechanisms, a lot of research efforts are targeted towards continuous implicit authentication on the basis of behavioral biometrics. In this work, we propose a methodology towards continuous implicit authentication that refrains from the limitations imposed by small-scale and/or controlled environment experiments by employing a real-world application used widely by a large number of individuals. Upon constructing our models using Support Vector Machines, we introduce a confidence-based methodology, in order to strengthen the effectiveness and the efficiency of our approach. The evaluation of our methodology on a set of diverse scenarios indicates that our approach achieves good results both in terms of efficiency and usability.

@inproceedings{ciaQRS2020,
author={Thomas Karanikiotis and Michail D. Papamichail and Kyriakos C. Chatzidimitriou and Napoleon-Christos I. Oikonomou and Andreas L. Symeonidis and and Sashi K. Saripalle},
title={Continuous Implicit Authentication through Touch Traces Modelling},
booktitle={20th International Conference on Software Quality, Reliability and Security (QRS)},
pages={111-120},
year={2020},
month={11},
date={2020-11-04},
url={https://cassiopia.ee.auth.gr/index.php/s/suNwCr8hXVdmJFp/download},
keywords={Implicit Authentication;Smartphone Security;Touch Traces Modelling;Support Vector Machines},
abstract={Nowadays, the continuously increasing use of smartphones as the primary way of dealing with day-to-day tasks raises several concerns mainly focusing on privacy and security. In this context and given the known limitations and deficiencies of traditional authentication mechanisms, a lot of research efforts are targeted towards continuous implicit authentication on the basis of behavioral biometrics. In this work, we propose a methodology towards continuous implicit authentication that refrains from the limitations imposed by small-scale and/or controlled environment experiments by employing a real-world application used widely by a large number of individuals. Upon constructing our models using Support Vector Machines, we introduce a confidence-based methodology, in order to strengthen the effectiveness and the efficiency of our approach. The evaluation of our methodology on a set of diverse scenarios indicates that our approach achieves good results both in terms of efficiency and usability.}
}

Thomas Karanikiotis, Michail D. Papamichail, Giannis Gonidelis, Dimitra Karatza and Andreas L. Symeonidis
"A Data-driven Methodology towards Interpreting Readability against Software Properties"
Proceedings of the 15th International Conference on Software Technologies - ICSOFT, pp. 61-72, SciTePress, 2020 Jan

In the context of collaborative, agile software development, where effective and efficient software maintenance is of utmost importance, the need to produce readable source code is evident. Towards this direction, several approaches aspire to assess the extent to which a software component is readable. Most of them rely on experts who are responsible for determining the ground truth and/or set custom evaluation criteria, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. In an effort to provide a fully automated and extendible methodology, we refrain from using experts; rather we harness data residing in online code hosting facilities towards constructing a dataset that includes more than one million methods that cover diverse development scenarios. After performing clustering based on source code size, we employ S upport Vector Regression in order to interpret the extent to which a software component is readable on three axes: complexity, coupling, and documentation. Preliminary evaluation on several axes indicates that our approach effectively interprets readability as perceived by developers against the aforementioned three primary source code properties.

@inproceedings{karanikiotisICSOFT2020,
author={Thomas Karanikiotis and Michail D. Papamichail and Giannis Gonidelis and Dimitra Karatza and Andreas L. Symeonidis},
title={A Data-driven Methodology towards Interpreting Readability against Software Properties},
booktitle={Proceedings of the 15th International Conference on Software Technologies - ICSOFT},
pages={61-72},
publisher={SciTePress},
organization={INSTICC},
year={2020},
month={01},
date={2020-01-20},
url={https://doi.org/10.5220/0009891000610072},
doi={http://10.5220/0009891000610072},
issn={2184-2833},
isbn={978-989-758-443-5},
keywords={Developer-perceived Readability;Readability Interpretation;Size-based Clustering;Support Vector Regression.},
abstract={In the context of collaborative, agile software development, where effective and efficient software maintenance is of utmost importance, the need to produce readable source code is evident. Towards this direction, several approaches aspire to assess the extent to which a software component is readable. Most of them rely on experts who are responsible for determining the ground truth and/or set custom evaluation criteria, leading to results that are context-dependent and subjective. In this work, we employ a large set of static analysis metrics along with various coding violations towards interpreting readability as perceived by developers. In an effort to provide a fully automated and extendible methodology, we refrain from using experts; rather we harness data residing in online code hosting facilities towards constructing a dataset that includes more than one million methods that cover diverse development scenarios. After performing clustering based on source code size, we employ S upport Vector Regression in order to interpret the extent to which a software component is readable on three axes: complexity, coupling, and documentation. Preliminary evaluation on several axes indicates that our approach effectively interprets readability as perceived by developers against the aforementioned three primary source code properties.}
}

Themistoklis Diamantopoulos, Michail D. Papamichail, Thomas Karanikiotis, Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis
"Employing Contribution and Quality Metrics for Quantifying the Software Development Process"
The 17th International Conference on Mining Software Repositories (MSR 2020), 2020 Jun

The full integration of online repositories in the contemporary software development process promotes remote work and remote collaboration. Apart from the apparent benefits, online repositories offer a deluge of data that can be utilized to monitor and improve the software development process. Towards this direction, we have designed and implemented a platform that analyzes data from GitHub in order to compute a series of metrics that quantify the contributions of project collaborators, both from a development as well as an operations (communication) perspective. We analyze contributions in an evolutionary manner throughout the projects' lifecycle and track the number of coding violations generated, this way aspiring to identify cases of software development that need closer monitoring and (possibly) further actions to be taken. In this context, we have analyzed the 3000 most popular Java GitHub projects and provide the data to the community.

@conference{MSR2020,
author={Themistoklis Diamantopoulos and Michail D. Papamichail and Thomas Karanikiotis and Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis},
title={Employing Contribution and Quality Metrics for Quantifying the Software Development Process},
booktitle={The 17th International Conference on Mining Software Repositories (MSR 2020)},
year={2020},
month={06},
date={2020-06-29},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/05/MSR2020.pdf},
keywords={mining software repositories;contribution analysis;DevOps;GitHub issues;code violations},
abstract={The full integration of online repositories in the contemporary software development process promotes remote work and remote collaboration. Apart from the apparent benefits, online repositories offer a deluge of data that can be utilized to monitor and improve the software development process. Towards this direction, we have designed and implemented a platform that analyzes data from GitHub in order to compute a series of metrics that quantify the contributions of project collaborators, both from a development as well as an operations (communication) perspective. We analyze contributions in an evolutionary manner throughout the projects\' lifecycle and track the number of coding violations generated, this way aspiring to identify cases of software development that need closer monitoring and (possibly) further actions to be taken. In this context, we have analyzed the 3000 most popular Java GitHub projects and provide the data to the community.}
}

Vasileios Matsoukas, Themistoklis Diamantopoulos, Michail D. Papamichail and Andreas L. Symeonidis
"Towards Analyzing Contributions from Software Repositories to Optimize Issue Assignment"
Proceedings of the 2020 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, Vilnius, Lithuania, 2020 Jul

Most software teams nowadays host their projects online and monitor software development in the form of issues/tasks. This process entails communicating through comments and reporting progress through commits and closing issues. In this context, assigning new issues, tasks or bugs to the most suitable contributor largely improves efficiency. Thus, several automated issue assignment approaches have been proposed, which however have major limitations. Most systems focus only on assigning bugs using textual data, are limited to projects explicitly using bug tracking systems, and may require manually tuning parameters per project. In this work, we build an automated issue assignment system for GitHub, taking into account the commits and issues of the repository under analysis. Our system aggregates feature probabilities using a neural network that adapts to each project, thus not requiring manual parameter tuning. Upon evaluating our methodology, we conclude that it can be efficient for automated issue assignment.

@conference{QRS2020IssueAssignment,
author={Vasileios Matsoukas and Themistoklis Diamantopoulos and Michail D. Papamichail and Andreas L. Symeonidis},
title={Towards Analyzing Contributions from Software Repositories to Optimize Issue Assignment},
booktitle={Proceedings of the 2020 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
publisher={IEEE},
address={Vilnius, Lithuania},
year={2020},
month={07},
date={2020-07-31},
url={https://issel.ee.auth.gr/wp-content/uploads/2020/07/QRS2020IssueAssignment.pdf},
keywords={GitHub issues;automated issue assignment;issue triaging},
abstract={Most software teams nowadays host their projects online and monitor software development in the form of issues/tasks. This process entails communicating through comments and reporting progress through commits and closing issues. In this context, assigning new issues, tasks or bugs to the most suitable contributor largely improves efficiency. Thus, several automated issue assignment approaches have been proposed, which however have major limitations. Most systems focus only on assigning bugs using textual data, are limited to projects explicitly using bug tracking systems, and may require manually tuning parameters per project. In this work, we build an automated issue assignment system for GitHub, taking into account the commits and issues of the repository under analysis. Our system aggregates feature probabilities using a neural network that adapts to each project, thus not requiring manual parameter tuning. Upon evaluating our methodology, we conclude that it can be efficient for automated issue assignment.}
}

2019

Journal Articles

Michail Papamichail, Kyriakos Chatzidimitriou, Thomas Karanikiotis, Napoleon-Christos Oikonomou, Andreas Symeonidis and Sashi Saripalle
"BrainRun: A Behavioral Biometrics Dataset towards Continuous Implicit Authentication"
Data, 4, (2), 2019 May

The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository.

@article{Papamichail2019,
author={Michail Papamichail and Kyriakos Chatzidimitriou and Thomas Karanikiotis and Napoleon-Christos Oikonomou and Andreas Symeonidis and Sashi Saripalle},
title={BrainRun: A Behavioral Biometrics Dataset towards Continuous Implicit Authentication},
journal={Data},
volume={4},
number={2},
year={2019},
month={05},
date={2019-05-03},
url={https://res.mdpi.com/data/data-04-00060/article_deploy/data-04-00060.pdf?filename=&attachment=1},
doi={http://10.3390/data4020060},
issn={2306-5729},
abstract={The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository.}
}

Michail D. Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information"
Data in Brief, 2019 Dec

The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled "Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information

@article{PAPAMICHAIL2019104687,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information},
journal={Data in Brief},
year={2019},
month={12},
date={2019-12-31},
url={https://reader.elsevier.com/reader/sd/pii/S235234091931042X?token=9CDEB13940390201A35D26027D763CACB6EE4D49BFA9B920C4D32B348809F1F6A7DE309AA1737161C7E5BF1963BBD952},
doi={https://doi.org/10.1016/j.dib.2019.104687},
keywords={developer-perceived reusability;code reuse;static analysis metrics;Reusability assessment},
abstract={The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled \"Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information}
}

Michail D. Papamichail , Themistoklis Diamantopoulos and Andreas L. Symeonidis
Journal of Systems and Software, pp. 110423, 2019 Sep

Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.

@article{PAPAMICHAIL2019110423,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information},
journal={Journal of Systems and Software},
pages={110423},
year={2019},
month={09},
date={2019-09-17},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/09/2019mpapamicJSS.pdf},
doi={https://doi.org/10.1016/j.jss.2019.110423},
issn={0164-1212},
publisher's url={https://www.sciencedirect.com/science/article/pii/S0164121219301979},
keywords={developer-perceived reusability;code reuse;static analysis metrics;reusability estimation},
abstract={Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.}
}

2019

Conference Papers

Kyriakos C Chatzidimitriou, Michail D Papamichail, Napoleon-Christos I Oikonomou, Dimitrios Lampoudis and Andreas L Symeonidis
"Cenote: A Big Data Management and Analytics Infrastructure for the Web of Things"
IEEE/WIC/ACM International Conference on Web Intelligence, pp. 282-285, ACM, 2019 Oct

In the era of Big Data, Cloud Computing and Internet of Things, most of the existing, integrated solutions that attempt to solve their challenges are either proprietary, limit functionality to a predefined set of requirements, or hide the way data are stored and accessed. In this work we propose Cenote, an open source Big Data management and analytics infrastructure for the Web of Things that overcomes the above limitations. Cenote is built on component-based software engineering principles and provides an all-inclusive solution based on components that work well individually.

@inproceedings{Chatzidimitriou:2019:CBD:3350546.3352531,
author={Kyriakos C Chatzidimitriou and Michail D Papamichail and Napoleon-Christos I Oikonomou and Dimitrios Lampoudis and Andreas L Symeonidis},
title={Cenote: A Big Data Management and Analytics Infrastructure for the Web of Things},
booktitle={IEEE/WIC/ACM International Conference on Web Intelligence},
pages={282-285},
publisher={ACM},
year={2019},
month={10},
date={2019-10-17},
url={http://doi.acm.org/10.1145/3350546.3352531},
doi={http://10.1145/3350546.3352531},
keywords={Internet of Things;analytics;apache kafka;apache storm;cockroachdb;infrastructure;restful api;web of things},
abstract={In the era of Big Data, Cloud Computing and Internet of Things, most of the existing, integrated solutions that attempt to solve their challenges are either proprietary, limit functionality to a predefined set of requirements, or hide the way data are stored and accessed. In this work we propose Cenote, an open source Big Data management and analytics infrastructure for the Web of Things that overcomes the above limitations. Cenote is built on component-based software engineering principles and provides an all-inclusive solution based on components that work well individually.}
}

Michail D. Papamichail, Themistoklis Diamantopoulos, Vasileios Matsoukas, Christos Athanasiadis and Andreas L. Symeonidis
"Towards Extracting the Role and Behavior of Contributors in Open-source Projects"
Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 536-543, SciTePress, 2019 Jul

Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati

@inproceedings{icsoft19devops,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Vasileios Matsoukas and Christos Athanasiadis and Andreas L. Symeonidis},
title={Towards Extracting the Role and Behavior of Contributors in Open-source Projects},
booktitle={Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={536-543},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_DevOps.pdf},
doi={http://10.5220/0007966505360543},
isbn={978-989-758-379-7},
abstract={Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati}
}

Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis Diamantopoulos, Napoleon-Christos Oikonomou and Andreas L. Symeonidis
"npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT"
Proceedings of the 14th International Conference on Software Technologies, pp. 544-551, SciTePress, 2019 Jul

The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.

@inproceedings{icsoft19npm,
author={Kyriakos C. Chatzidimitriou and Michail D. Papamichail and Themistoklis Diamantopoulos and Napoleon-Christos Oikonomou and Andreas L. Symeonidis},
title={npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT},
booktitle={Proceedings of the 14th International Conference on Software Technologies},
pages={544-551},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_NPMRecipes.pdf},
doi={http://10.5220/0007966805440551},
isbn={978-989-758-379-7},
abstract={The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.}
}

2018

Conference Papers

Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos and Andreas L. Symeonidis
"npm-miner: An Infrastructure for Measuring the Quality of the npm Registry"
MSR ’18: 15th International Conference on Mining Software Repositories, pp. 4, ACM, Gothenburg, Sweden, 2018 May

As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.

@inproceedings{Chatzidimitriou2018MSR,
author={Kyriakos C. Chatzidimitriou and Michail Papamichail and Themistoklis Diamantopoulos and Michail Tsapanos and Andreas L. Symeonidis},
title={npm-miner: An Infrastructure for Measuring the Quality of the npm Registry},
booktitle={MSR ’18: 15th International Conference on Mining Software Repositories},
pages={4},
publisher={ACM},
address={Gothenburg, Sweden},
year={2018},
month={05},
date={2018-05-28},
url={http://issel.ee.auth.gr/wp-content/uploads/2018/03/msr2018.pdf},
doi={https:%20//doi.org/10.1145/3196398.3196465},
abstract={As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.}
}

Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis
Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories, 2018 Mar

The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.

@inproceedings{Papamichail2018MaLTeSQuE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Ilias Chrysovergis and Philippos Samlidis and Andreas Symeonidis},
title={User-Perceived Reusability Estimation based on Analysis of Software Repositories},
booktitle={Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)},
publisher={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
year={2018},
month={03},
date={2018-03-20},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/maLTeSQuE.pdf},
publisher's url={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
abstract={The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.}
}

2018

Inbooks

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Charpter:1, pp. 25, Springer, 2018 Jan

Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.

@inbook{Dimaridou2018,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Assessing the User-Perceived Quality of Source Code Components using Static Analysis Metrics},
chapter={1},
pages={25},
publisher={Springer},
year={2018},
month={01},
date={2018-01-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/ccis_book_chapter.pdf},
publisher's url={https://www.researchgate.net/publication/325627162_Assessing_the_User-Perceived_Quality_of_Source_Code_Components_Using_Static_Analysis_Metrics},
abstract={Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.}
}

2017

Conference Papers

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
"Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics"
Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 73-84, SciTePress, 2017 Jul

Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.

@inproceedings{icsoft17,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics},
booktitle={Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={73-84},
publisher={SciTePress},
year={2017},
month={07},
date={2017-07-26},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/08/ICSOFT.pdf},
doi={http://10.5220/0006420000730084},
slideshare={https://www.slideshare.net/isselgroup/towards-modeling-the-userperceived-quality-of-source-code-using-static-analysis-metrics},
abstract={Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.}
}

2016

Conference Papers

Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"User-Perceived Source Code Quality Estimation based on Static Analysis Metrics"
2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), Vienna, Austria, 2016 Aug

The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.

@inproceedings{2016PapamichailIEEE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={User-Perceived Source Code Quality Estimation based on Static Analysis Metrics},
booktitle={2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
address={Vienna, Austria},
year={2016},
month={08},
date={2016-08-03},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/User-Perceived-Source-Code-Quality-Estimation-based-on-Static-Analysis-Metrics.pdf},
slideshare={http://www.slideshare.net/isselgroup/userperceived-source-code-quality-estimation-based-on-static-analysis-metrics},
abstract={The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.}
}