Tel: +30 2310 99 6365
Fax: +30 2310 99 6398
Email: thdiaman (at) issel [dot] ee [dot] auth [dot] gr
For more information about me you can visit my personal website.
|11/2013 – today||Research Associate in Electrical and Computer Engineering Department Aristotle University of Thessaloniki, Greece, EU-funded Projects: S-CASE, SEAF
|10/2012 – 10/2013||Research Associate in Centre for Research and Technology Hellas – Information Technologies Institute, EU-funded Project: eCOMPASS|
|2013-2018||PhD in Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Greece PhD Thesis: “Mining Software Engineering Data for Software Reuse”|
|2011-2012||Master of Science in Computer Science in School of Informatics, University of Edinburgh, United Kingdom, MSc Thesis: “Ad hoc Team Formation: Using Machine Learning Techniques to Cooperate without Pre-Coordination”|
|2006-2011||Diploma of Electrical and Computer Engineering in Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Greece, Diploma Thesis: “Design and Analysis of Auction Algorithms applied to the Power TAC Competition”|
- Data Mining
- Software Engineering
- Greek: Native
- English: Proficient
- Spanish: Conversational
- French: Conversational
Michail D. Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Software Reusability Dataset based on Static Analysis Metrics and Reuse Rate Information"
Data in Brief, 2019 Dec
The widely adopted component-based development paradigm considers the reuse of proper software components as a primary criterion for successful software development. As a result, various research efforts are directed towards evaluating the extent to which a software component is reusable. Prior efforts follow expert-based approaches, however the continuously increasing open-source software initiative allows the introduction of data-driven alternatives. In this context we have generated a dataset that harnesses information residing in online code hosting facilities and introduces the actual reuse rate of software components as a measure of their reusability. To do so, we have analyzed the most popular projects included in the maven registry and have computed a large number of static analysis metrics at both class and package levels using SourceMeter tool  that quantify six major source code properties: complexity, cohesion, coupling, inheritance, documentation and size. For these projects we additionally computed their reuse rate using our self-developed code search engine, AGORA . The generated dataset contains analysis information regarding more than 24,000 classes and 2,000 packages, and can, thus, be used as the information basis towards the design and development of data-driven reusability evaluation methodologies. The dataset is related to the research article entitled "Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information
Michail D. Papamichail , Themistoklis Diamantopoulos and Andreas L. Symeonidis
"Measuring the Reusability of Software Components using Static Analysis Metrics and Reuse Rate Information"
Journal of Systems and Software, pp. 110423, 2019 Sep
Nowadays, the continuously evolving open-source community and the increasing demands of end users are forming a new software development paradigm; developers rely more on reusing components from online sources to minimize the time and cost of software development. An important challenge in this context is to evaluate the degree to which a software component is suitable for reuse, i.e. its reusability. Contemporary approaches assess reusability using static analysis metrics by relying on the help of experts, who usually set metric thresholds or provide ground truth values so that estimation models are built. However, even when expert help is available, it may still be subjective or case-specific. In this work, we refrain from expert-based solutions and employ the actual reuse rate of source code components as ground truth for building a reusability estimation model. We initially build a benchmark dataset, harnessing the power of online repositories to determine the number of reuse occurrences for each component in the dataset. Subsequently, we build a model based on static analysis metrics to assess reusability from five different properties: complexity, cohesion, coupling, inheritance, documentation and size. The evaluation of our methodology indicates that our system can effectively assess reusability as perceived by developers.
Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos and Andreas L. Symeonidis
"npm-miner: An Infrastructure for Measuring the Quality of the npm Registry"
MSR ’18: 15th International Conference on Mining Software Repositories, pp. 4, ACM, Gothenburg, Sweden, 2018 May
Themistoklis Diamantopoulos, Georgios Karagiannopoulos and Andreas Symeonidis
"CodeCatch: Extracting Source Code Snippets from Online Sources"
IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pp. 21-27, https://dl.acm.org/ft_gateway.cfm?id=3194107&ftid=1982571&dwn=1&CFID=87644405&CFTOKEN=833260e7cb501a7d-48967D35-AFC5-4678-82812B13D64D3DD3, 2018 May
Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis
Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), https://www.researchgate.net/publication/324106989_User-Perceived_Reusability_Estimation_based_on_Analysis_of_Software_Repositories, 2018 Mar
The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Charpter:1, pp. 25, Springer, 2018 Jan
Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.
Themistoklis Diamantopoulos, Michael Roth, Andreas Symeonidis and Ewan Klein
"Software requirements as an application domain for natural language processing"
Language Resources and Evaluation, pp. 1-30, 2017 Feb
Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.
Themistoklis Diamantopoulos and Andreas Symeonidis
Enterprise Information Systems, pp. 1-22, 2017 Dec
Enhancing the requirements elicitation process has always been of added value to software engineers, since it expedites the software lifecycle and reduces errors in the conceptualization phase of software products. The challenge posed to the research community is to construct formal models that are capable of storing requirements from multimodal formats (text and UML diagrams) and promote easy requirements reuse, while at the same time being traceable to allow full control of the system design, as well as comprehensible to software engineers and end users. In this work, we present an approach that enhances requirements reuse while capturing the static (functional requirements, use case diagrams) and dynamic (activity diagrams) view of software projects. Our ontology-based approach allows for reasoning over the stored requirements, while the mining methodologies employed detect incomplete or missing software requirements, this way reducing the effort required for requirements elicitation at an early stage of the project lifecycle.