Michail Papamichail

papamichailPhD Candidate

Aristotle University of Thessaloniki
Department of Electrical and Computer Engineering
54124 Thessaloniki – GREECE

Tel: +30 2310 99 6365
Fax: +30 2310 99 6398
Email: mpapamic (at) issel [dot] ee [dot] auth [dot] gr
LinkedIn

Education

12/2015 – today PhD candidate Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece PhD Thesis: “Application of artificaial intelligence and data mining techniques for software quality assessment”
09/2010 – 11/2015 Diploma of Electrical and Computer Engineering Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece Diploma Thesis: “Design and development of a source code quality estimation system using static analysis metrics and machine learning techniques”.

Professional Experience

02/2016 – today Research Associate Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece EU-funded Project: Mobile-Age (http://issel.ee.auth.gr/mobile-age/ )
06/2015 – 12/2015 Application/Technical Consultant in Veltio. (Oracle RPAS solutions consultant)
06/2014 – 09/2014 Paid internship position in the University of California, Irvine in the Secure Systems and Software Laboratory (SSL) under professor Michael Franz

Teaching Experience

10/2016 – today Teaching assistant for “Pattern Recognition”, Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Greece

Research interests

  • Source Code Quality
  • Software Engineering
  • Data Mining
  • Machine Learning

Languages

  • English: Proficient (Cambridge and Michigan Proficiency)
  • German: Conversational (Goethe-Zertifikats B1: Zertifikat Deutsch (ZD))

Memberships

  • Member of the Technical Chamber of Greece

Puplications

2019

Journal Articles

Michail Papamichail, Kyriakos Chatzidimitriou, Thomas Karanikiotis, Napoleon-Christos Oikonomou, Andreas Symeonidis and Sashi Saripalle
"BrainRun: A Behavioral Biometrics Dataset towards Continuous Implicit Authentication"
Data, 4, (2), 2019 May

The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository.

@article{Papamichail2019,
author={Michail Papamichail and Kyriakos Chatzidimitriou and Thomas Karanikiotis and Napoleon-Christos Oikonomou and Andreas Symeonidis and Sashi Saripalle},
title={BrainRun: A Behavioral Biometrics Dataset towards Continuous Implicit Authentication},
journal={Data},
volume={4},
number={2},
year={2019},
month={05},
date={2019-05-03},
url={https://res.mdpi.com/data/data-04-00060/article_deploy/data-04-00060.pdf?filename=&attachment=1},
doi={http://10.3390/data4020060},
issn={2306-5729},
abstract={The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository.}
}

Michail D. Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information"
Data in Brief, 2019 Dec

The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled "Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information

@article{PAPAMICHAIL2019104687,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information},
journal={Data in Brief},
year={2019},
month={12},
date={2019-12-31},
url={https://reader.elsevier.com/reader/sd/pii/S235234091931042X?token=9CDEB13940390201A35D26027D763CACB6EE4D49BFA9B920C4D32B348809F1F6A7DE309AA1737161C7E5BF1963BBD952},
doi={https://doi.org/10.1016/j.dib.2019.104687},
keywords={developer-perceived reusability;code reuse;static analysis metrics;Reusability assessment},
abstract={The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool [2] that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA [5]. The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled \"Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information}
}

Michail D. Papamichail , Themistoklis Diamantopoulos and Andreas L. Symeonidis
Journal of Systems and Software, pp. 110423, 2019 Sep

Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.

@article{PAPAMICHAIL2019110423,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information},
journal={Journal of Systems and Software},
pages={110423},
year={2019},
month={09},
date={2019-09-17},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/09/2019mpapamicJSS.pdf},
doi={https://doi.org/10.1016/j.jss.2019.110423},
issn={0164-1212},
publisher's url={https://www.sciencedirect.com/science/article/pii/S0164121219301979},
keywords={developer-perceived reusability;code reuse;static analysis metrics;reusability estimation},
abstract={Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.}
}

2019

Inproceedings Papers

Kyriakos C Chatzidimitriou, Michail D Papamichail, Napoleon-Christos I Oikonomou, Dimitrios Lampoudis and Andreas L Symeonidis
"Cenote: A Big Data Management and Analytics Infrastructure for the Web of Things"
IEEE/WIC/ACM International Conference on Web Intelligence, pp. 282-285, ACM, 2019 Oct

In the era of Big Data, Cloud Computing and Internet of Things, most of the existing, integrated solutions that attempt to solve their challenges are either proprietary, limit functionality to a predefined set of requirements, or hide the way data are stored and accessed. In this work we propose Cenote, an open source Big Data management and analytics infrastructure for the Web of Things that overcomes the above limitations. Cenote is built on component-based software engineering principles and provides an all-inclusive solution based on components that work well individually.

@inproceedings{Chatzidimitriou:2019:CBD:3350546.3352531,
author={Kyriakos C Chatzidimitriou and Michail D Papamichail and Napoleon-Christos I Oikonomou and Dimitrios Lampoudis and Andreas L Symeonidis},
title={Cenote: A Big Data Management and Analytics Infrastructure for the Web of Things},
booktitle={IEEE/WIC/ACM International Conference on Web Intelligence},
pages={282-285},
publisher={ACM},
year={2019},
month={10},
date={2019-10-17},
url={http://doi.acm.org/10.1145/3350546.3352531},
doi={http://10.1145/3350546.3352531},
keywords={Internet of Things;analytics;apache kafka;apache storm;cockroachdb;infrastructure;restful api;web of things},
abstract={In the era of Big Data, Cloud Computing and Internet of Things, most of the existing, integrated solutions that attempt to solve their challenges are either proprietary, limit functionality to a predefined set of requirements, or hide the way data are stored and accessed. In this work we propose Cenote, an open source Big Data management and analytics infrastructure for the Web of Things that overcomes the above limitations. Cenote is built on component-based software engineering principles and provides an all-inclusive solution based on components that work well individually.}
}

Michail D. Papamichail, Themistoklis Diamantopoulos, Vasileios Matsoukas, Christos Athanasiadis and Andreas L. Symeonidis
"Towards Extracting the Role and Behavior of Contributors in Open-source Projects"
Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 536-543, SciTePress, 2019 Jul

Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati

@inproceedings{icsoft19devops,
author={Michail D. Papamichail and Themistoklis Diamantopoulos and Vasileios Matsoukas and Christos Athanasiadis and Andreas L. Symeonidis},
title={Towards Extracting the Role and Behavior of Contributors in Open-source Projects},
booktitle={Proceedings of the 14th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={536-543},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_DevOps.pdf},
doi={http://10.5220/0007966505360543},
isbn={978-989-758-379-7},
abstract={Lately, the popular open source paradigm and the adoption of agile methodologies have changed the way soft-ware is developed. Effective collaboration within software teams has become crucial for building successful products. In this context, harnessing the data available in online code hosting facilities can help towards understanding how teams work and optimizing the development process. Although there are several approaches that mine contributions’ data, they usually view contributors as a uniform body of engineers, and focus mainlyon the aspect of productivity while neglecting the quality of the work performed. In this work, we design a methodology for identifying engineer roles in development teams and determine the behaviors that prevail for each role. Using a dataset of GitHub projects, we perform clustering against the DevOps axis, thus identifying three roles: developers that are mainly preoccupied with code commits, operations engineers that focus on task assignment and acceptance testing, and the lately popular role of DevOps engineers that are a mix of both.Our analysis further extracts behavioral patterns for each role, this way assisting team leaders in knowing their team and effectively directing responsibilities to achieve optimal workload balancing and task allocati}
}

Kyriakos C. Chatzidimitriou, Michail D. Papamichail, Themistoklis Diamantopoulos, Napoleon-Christos Oikonomou and Andreas L. Symeonidis
"npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT"
Proceedings of the 14th International Conference on Software Technologies, pp. 544-551, SciTePress, 2019 Jul

The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.

@inproceedings{icsoft19npm,
author={Kyriakos C. Chatzidimitriou and Michail D. Papamichail and Themistoklis Diamantopoulos and Napoleon-Christos Oikonomou and Andreas L. Symeonidis},
title={npm Packages as Ingredients: A Recipe-based Approach - Volume 1: ICSOFT},
booktitle={Proceedings of the 14th International Conference on Software Technologies},
pages={544-551},
publisher={SciTePress},
organization={INSTICC},
year={2019},
month={07},
date={2019-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/08/ICSOFT_NPMRecipes.pdf},
doi={http://10.5220/0007966805440551},
isbn={978-989-758-379-7},
abstract={The sharing and growth of open source software packages in the npm JavaScript (JS) ecosystem has beenexponential, not only in numbers but also in terms of interconnectivity, to the extend that often the size of de-pendencies has become more than the size of the written code. This reuse-oriented paradigm, often attributedto the lack of a standard library in node and/or in the micropackaging culture of the ecosystem, yields interest-ing insights on the way developers build their packages. In this work we view the dependency network of thenpm ecosystem from a “culinary” perspective. We assume that dependencies are the ingredients in a recipe,which corresponds to the produced software package. We employ network analysis and information retrievaltechniques in order to capture the dependencies that tend to co-occur in the development of npm packages andidentify the communities that have been evolved as the main drivers for npm’s exponential grow.}
}

2018

Inproceedings Papers

Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos and Andreas L. Symeonidis
"npm-miner: An Infrastructure for Measuring the Quality of the npm Registry"
MSR ’18: 15th International Conference on Mining Software Repositories, pp. 4, ACM, Gothenburg, Sweden, 2018 May

As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.

@inproceedings{Chatzidimitriou2018MSR,
author={Kyriakos C. Chatzidimitriou and Michail Papamichail and Themistoklis Diamantopoulos and Michail Tsapanos and Andreas L. Symeonidis},
title={npm-miner: An Infrastructure for Measuring the Quality of the npm Registry},
booktitle={MSR ’18: 15th International Conference on Mining Software Repositories},
pages={4},
publisher={ACM},
address={Gothenburg, Sweden},
year={2018},
month={05},
date={2018-05-28},
url={http://issel.ee.auth.gr/wp-content/uploads/2018/03/msr2018.pdf},
doi={https:%20//doi.org/10.1145/3196398.3196465},
abstract={As the popularity of the JavaScript language is constantly increasing, one of the most important challenges today is to assess the quality of JavaScript packages. Developers often employ tools for code linting and for the extraction of static analysis metrics in order to assess and/or improve their code. In this context, we have developed npn-miner, a platform that crawls the npm registry and analyzes the packages using static analysis tools in order to extract detailed quality metrics as well as high-level quality attributes, such as maintainability and security. Our infrastructure includes an index that is accessible through a web interface, while we have also constructed a dataset with the results of a detailed analysis for 2000 popular npm packages.}
}

Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis
Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories, 2018 Mar

The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.

@inproceedings{Papamichail2018MaLTeSQuE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Ilias Chrysovergis and Philippos Samlidis and Andreas Symeonidis},
title={User-Perceived Reusability Estimation based on Analysis of Software Repositories},
booktitle={Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)},
publisher={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
year={2018},
month={03},
date={2018-03-20},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/maLTeSQuE.pdf},
publisher's url={https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories},
abstract={The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.}
}

2018

Inbooks

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Charpter:1, pp. 25, Springer, 2018 Jan

Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.

@inbook{Dimaridou2018,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Assessing the User-Perceived Quality of Source Code Components using Static Analysis Metrics},
chapter={1},
pages={25},
publisher={Springer},
year={2018},
month={01},
date={2018-01-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/08/ccis_book_chapter.pdf},
publisher's url={https://www.researchgate.net/publication/325627162_Assessing_the_User-Perceived_Quality_of_Source_Code_Components_Using_Static_Analysis_Metrics},
abstract={Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.}
}

2017

Inproceedings Papers

Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
"Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics"
Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT, pp. 73-84, SciTePress, 2017 Jul

Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.

@inproceedings{icsoft17,
author={Valasia Dimaridou and Alexandros-Charalampos Kyprianidis and Michail Papamichail and Themistoklis Diamantopoulos and Andreas Symeonidis},
title={Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics},
booktitle={Proceedings of the 12th International Conference on Software Technologies - Volume 1: ICSOFT},
pages={73-84},
publisher={SciTePress},
year={2017},
month={07},
date={2017-07-26},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/08/ICSOFT.pdf},
doi={http://10.5220/0006420000730084},
slideshare={https://www.slideshare.net/isselgroup/towards-modeling-the-userperceived-quality-of-source-code-using-static-analysis-metrics},
abstract={Nowadays, software has to be designed and developed as fast as possible, while maintaining quality standards. In this context, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may lead to low quality software products. Thus, measuring the quality of software components is of vital importance. Several approaches that use code metrics rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are highly context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by the developers’ community. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and exam ine the semantics among metrics to provide an analysis on five axes for a source code component: complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are used to estimate the final quality score given metrics from all of these axes. Preliminary evaluation indicates that our approach can effectively estimate software quality.}
}

2016

Inproceedings Papers

Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"User-Perceived Source Code Quality Estimation based on Static Analysis Metrics"
2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), Vienna, Austria, 2016 Aug

The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.

@inproceedings{2016PapamichailIEEE,
author={Michail Papamichail and Themistoklis Diamantopoulos and Andreas L. Symeonidis},
title={User-Perceived Source Code Quality Estimation based on Static Analysis Metrics},
booktitle={2016 IEEE International Conference on Software Quality, Reliability and Security (QRS)},
address={Vienna, Austria},
year={2016},
month={08},
date={2016-08-03},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/User-Perceived-Source-Code-Quality-Estimation-based-on-Static-Analysis-Metrics.pdf},
slideshare={http://www.slideshare.net/isselgroup/userperceived-source-code-quality-estimation-based-on-static-analysis-metrics},
abstract={The popularity of open source software repositories and the highly adopted paradigm of software reuse have led to the development of several tools that aspire to assess the quality of source code. However, most software quality estimation tools, even the ones using adaptable models, depend on fixed metric thresholds for defining the ground truth. In this work we argue that the popularity of software components, as perceived by developers, can be considered as an indicator of software quality. We present a generic methodology that relates quality with source code metrics and estimates the quality of software components residing in popular GitHub repositories. Our methodology employs two models: a one-class classifier, used to rule out low quality code, and a neural network, that computes a quality score for each software component. Preliminary evaluation indicates that our approach can be effective for identifying high quality software components in the context of reuse.}
}