Completed diploma thesis, Pericles Mitkas



Konstantinos Koukoutegos
Aggregating Differential Gene Expression Analysis Workflows Using Containers and Machine Learning
The need for software applications that can be easily reproducible without the intense effort of installing plethora of software packages and their dependencies has been crucial since early days in the field of Computer Science. Programmers and researchers that tend to collaborate remotely are used to make huge efforts in order to keep up with each other during the elaboration of specific tasks. At the same time, provision of systematical knowledge produced by automated procedures tends to be evenly important. Researchers waste a lot of the valuable time so as to execute procedures, out of their spectrum of interest, trying to retrieve the necessary results when at this point, they should mainly focus on using their knowledge to make secure and accurate conclusions about the systems investigated. The present diploma thesis focuses on developing methods that help in these exact problems mentioned above. Specifically: • Making use of Docker technology provides the possibility of executing major RNA Sequencing Pipelines with an automated manner as well as returning useful results at the same time. • We attempt to develop algorithms which aim at unifying the retrieved results provided by different tools of DNA analysis and also make use of machine learning models that make up reliable predictors which target in secure classification of gene Differential Expression. • Using the Shiny App R library, we attempt to create a user-friendly interface which allows them to interact with the resulting data as well as to make use of the produced algorithms mentioned above.
Maria - Christina Maniou
Development of an Algorithm for Correlation of Chromosomal Locations with Functional Biological Processes Using Large-Scale Data Analysis
High Throughput Sequencing (HTS) technologies continue to improve and become more accessible every day, increasing the amount of information available in the field of Bioinformatics. The combination of the biological knowledge stored in public databases with that of the examined dataset has been proven an efficient method in attempting to draw useful conclusions about the functional biological processes of the data. In this research thesis, an algorithm is proposed for correlating chromosomal locations with functional biological processes and it is implemented with emphasis on the functionality of TADs. This is achieved by enriching the dataset with features (GO Terms, KEGG Pathways, TFs) from external databases and by applying statistical methods to find the significant results. The algorithm implements three enrichment analysis scenarios. In this way, it is performed an aggregated analysis of the dataset and, at the same time, it is investigated the importance of TADs as functional regions in the human genome, as they contain terms of relating biological processes in them. The algorithm was applied successfully to a real large-scale biological dataset. Regarding the structure the thesis, first, basic principles related to Bioinformatics, some biological terms and the science of Statistics are presented, which are useful for understanding the process that takes place. Then, the automated workflow that was implemented is explained step by step and finally the output files created are presented. The designed algorithm, together with a sample input dataset and the expected output files are available at the following link:
Melpomeni Seraki
Software Design and Development Aiming to Optimize Large-Scale Bio-Data Analysis Using k-mers
Advances in computer technology allow for the analysis of part of the problems that arise in the field of border biology. Due to the increase in computing power and especially the advanced graphics technology, it is possible to display the configurations of the structure of the biological limits on the computer screen. Attempts are still being made to create algorithmic methods for the production of boundary structures based on sequential data. The large amount of data translated into the science of molecular biology, and in particular the field of genomic sequencing (DNA sequencing), is a major challenge for algorithmic design and analysis scientists. In particular, it facilitates the search for solutions to problems such as gene recognition, the definition of the structure of encoded proteins, the discovery of the mechanisms by which proteins perform their biological function, the acquisition of knowledge about the role of non-encoded regions of DNA in morphology. and expression of genes. This thesis focuses on software design to optimize the description parameters of the genetic material of each organism through k-mers analysis. The k-mers analysis (k-length subset of DNA sequence data) is a key component of many bioinformatics methods, including genome and transcript assembly, for transgenomic sequencing and for sequence error correction. For this reason, an algorithm capable of \'cutting\' the genetic material into the k-mers characteristics of the sample for various values is being developed and can model this information into descriptive data for each sequence. The created models are then used in machine learning algorithms to evaluate each value of k and to select the best k. The algorithm that is being developed is an unprecedented and promising approach both to the problem of separating organisms into transgenomic data and to the study of changes in k-mers distributions as the value of k changes over a range of values.


Sotirios - Filippos Tsarouxis
A Multi-metric Algorithm for Hierarchical Clustering of Same-length Protein Sequences
The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. In the present diploma thesis, we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from chronic lymphocytic leukemia (CLL) patients, showing promising results. First of all, we analyze the basic elements of bioinformatics and proteins, which are necessary for the understanding of the problem. Furthermore, we mention the metrics, the statistical measures and the structures used for the development of the algorithm. Afterwards, we present the proposed Hierarchical clustering algorithm, the various visualization techniques and the application developed in order to produce an interactive and user-friendly tool. Finally, our R Shiny Application is publicly available from the URL In addition, we present algorithm’s results and their comparison with related algorithms. The project ends with the conclusions and the future work that can be done to improve the results.
Dionisis Theodoropoulos
Optimization of Time of Use Electricity Tariffs via applying Particle Filters
Vasiliki Strouthopoulou
Smart System for Data Analysis and Services for Solar Panels
Nowadays, the need to save energy resources, economic resources and reduce pollutants has led to the development of Smart Grids. Smart Networks are upgrading the Networks with a range of Smart Technologies that aim to better manage electricity. Better management means sufficient and not excess production, proper distribution, reliable yield, economic consumption and energy storage. Household consumption accounts for a significant share of a country\'s total consumption. Therefore, detailed information on the eco-household consumption of household loads is necessary to manage demand and improve service delivery for both Network Operators and consumers. A significant reduction in energy wastage can be achieved through Appliance Load Monitoring (ALM) and consumer information about it. Consumers can be informed about the consumption of each appliance and its effect on overall consumption. In this way they learn to better manage consumption for their own and the Network. To this end, many researches are being made on consumer awareness, information and education. Advisory systems can speed up and improve this process. The purpose of this diploma thesis is to construct such an advisory system. More specifically, the aim of the system is to provide consumers with advice and suggestions regarding the energy production of the photovoltaic system installed. The obstacle encountered by the implementation is the difficulty of calculating production as the system takes a time series of the total household consumption, which is the cumulative output of production and consumption. The implementation proposes a method of estimating production and then calculates the clear consumption. In these results, it implements a series of computational methods to draw useful conclusions in relation to production, consumption, installation’s performance and the external factors of its performance. In the end, these conclusions are gathered and presented to consumers appropriately to better understand the operation of the installation and make it more efficient to use it. The method has significant margins of extensibility and improvement.
Ifigeneia Theodoridou
Event Detection in Large Scale Image Collections
This thesis deals with the problem of event detection in large scale image collections. The greater challenge of this problem is the definition of a similarity metric that can identify image pairs referring to the same event. Given an efficient similarity metric, it is possible to achieve good results to the overall problem by applying a state of the art clustering algorithm. Therefore, this thesis focuses on defining a similarity metric suitable for the identification of image pairs referring to the same event, by using only the text descriptions of the images in the collection. As a first step, it is attempted to model a text as a vector that encodes the underlying semantics and syntactical structure of the text. Then, the probability of two images belonging in the same event is estimated by leveraging the underlying vector representations. To obtain representations of textual documents, it is first required to derive vector representations corresponding to the words consisting the vocabulary of the documents. Those word vectors, also known as word embeddings, must then be fused in a vector at the level of the document. At the context of this thesis a state of the art word embeddings model, termed word2vec, was used to obtain word embeddings. Word2vec learns a language model using a shallow neural network, and word embeddings are a by-product of that learning task. Next, different neural network architectures were used to fuse word embeddings. More precisely, Convolutional Neural Networks (CNNs) και Recurrent Neural Networks (RNN), as well as combinations of these models were investigated. It is worth mentioning that the learning of the unified representation of texts is performed simultaneously with the learning of the same event model. To evaluate the efficiency of the proposed models the accuracy of the same event classification was considered. A thorough evaluation was conducted in two datasets provided and used in the 2013 and 2014 editions of the Social Event Detection task of the MediaEval workshop. According to the results, the combination of CNN and RNN architectures on top of pre-trained word embeddings achieves remarkable results, although most of the proposed models produce comparable results.


Aggelos Kaltsikis
Parallel implementation of MCL using Cloud resources
Eva Kotta
Semi-automatically populating ontologies from software requirements
Vassilis Choutas
Deep Reinforcement Learning Agents with Adaptive Attention
Antonia Tsalla
Advanced Clustering Techniques for Segmentation of Low Voltage Consumers
In contemporary electricity markets, distribution and energy services have been liberated, leading suppliers to act in a very competitive environment. In order to maintain their existing customers’ sat- isfaction level high, as well as reaching out new ones, suppliers must provide personalized energy services that are tailored for each customer or group of customers with similar needs. The ultimate goal is the smart design of energy tariffs through a process of creating consumer profiles (Load Profil- ing). Thus, it is very important to segment customers into groups with similar energy behavior. This can provide numerous benefits to both energy service providers (reduce consumption at peak times, improve power management mode) and consumers (understanding of their energy behavior, reduce costs, etc.) The subject of this diploma thesis is the analysis and the assessment of two state-of-the-art algo- rithms for residential consumer segmentation: Gravitational Search Algorithm, a state of the art algo- rithm that is used in similar fields and applications, and Improved Gravitational Search Algorithm, which is a variation that was implemented in the context of this diploma thesis. The first set of exper- iments presented within targeted on the choice of the optimal parameters, the selection of the most appropriate distance metric and the optimal number of clusters utilized for each algorithm, for con- sumer segmentation problem at hand. The available data originated from three pilot case scenarios of various European Research Programs and two different input vectors have been used. Next, a set of comparison experiments with other used Clustering Algorithms, such as K-Means, Hierarchical Clus- tering and Fuzzy C-Means were realized. Based on significant assessment metrics, via result comparison, various significant and meaningful conclusions were produced on the coherence and the cohesion of the resulting consumer groups, as well as on the efficiency of the proposed algorithms.
Dimitrios Sotirakis
Real-Time Energy Disaggregation via Advanced Pattern Recognition Techniques
In recent years, the significant increase in the end use of electrical appliances, espe- cially in urban societies, state the importance of controlling the overall electrical energy consumption, taking into consideration both the facilitation of financial control, and rein- forcement of society’s ecological profile. Up until now, small-scale consumers are informed about their consumption by their electrical bills, making their notification partial and out of time. For this purpose, the \"Real-Time Energy Data Disaggregation\" term is introdu- ced, which is associated with high energy consuming devices, namely, ovens, washing machines etc. Via this process, the consumers can be notified for their overall energy consumption, as well as the appliances’ participation percentage. In the context of this thesis, a set of advanced pattern recognition techniques are used in order to achieve real- time energy data disaggregation, which is well-known as NIALM (Non-Intrusive Appliance Load Monitoring). To this end, two novel approaches are introduced (one containing a pre-training phase and one without), which are based on ANFIS (Adaptive Neuro-Fuzzy Inference System). Consequently, the experimental procedure is presented, where the results are assessed with the help of appropriate metrics. More specifically, the appliance end uses detected by the presented approaches are compared to the real data to show the appropriateness and correctness of the implemented system. These results are highligh- ting the ability of achieving the desired accuracy and validity, emphasizing the usefulness for the household consumer.


Georgios Zisopoulos
Identification of event-related messages in social media by leveraging Information Retrieval techniques
Michail Perdikidis
Knowledge Extraction through Graph Mining in Bioinformatics
Themistoklis Makedas
Development of a Multi-Agent, Trust and Reputation System for the Evaluation and Selection of Electricity Energy Supplier
Kotouza Maria
Segmentation of Low Voltage consumers using Clustering techniques


Mpourtzoudis Stefanos
A System for Automatic Event Detection on Social Media Data Leveraging Sentiment Analysis
Olga Vrousgou
Design and Implementation of a Grid Computing Framework For Enabling Large-Scale Comparative Genomics Processes
Kosmas Kyriakidis
Procedural level generation for Infinite Mario Bros using genetic algorithms
Konstantinos Raptis
Real-time Online Betting Analysis- The THMMY Case-Study


Eleutherios Xatzipetrou
Automatic News Extraction from Social Media
Kyriaki Kaza
Techniques for the estimation and modeling of organisms distance based on proteins comparison data


Elena Ntagka
Software Agent Optimization for the Energy Market using Particle Swarm Optimization Techniques
Alexandros Philotheou
Multi-label Classification with Learning Classifier Systems
Evangelos Skartados
Multi-label Classification with Learning Classifier Systems
Athanasios Salamanis
Named Entity Recognition and Disambiguation
Konstantinos Anastasiou
Autonomous Battle Tactics Generation for Real Time Strategy Games Using Neuroevolution
Athanasios Nakopoulos
Development of an Autonomous Poker Agent Software using Reinforcement Learning Techniques


Dimitrios Vitsios
Identifying Evolution of Organisms in Metabolic Pathways
Christos Chatzichristos
Correlation Platform of Phylogenetic Profiles with Metabolic Pathways, in Order to Extract Evolutionary Motives
George Matzoulas
Use of Hierarchical Architecture Structure to Develop an Intelligent Autonomous Agent Software for the Game PacMan
Dimitrios Kotsopoulos
Advanced Classification Methods for Pattern Detection and Prediction in Egg Production Data
Georgios Antoniou
Suggestions of Tweets Based on User’s Interests


Konstantinos Gounitsiotis
Development of an E-Voting System
Miltiadis Allamanis
Multi-label Classification with Learning Classifier Systems
Theodoros Markos
R.- An Engine for Connecting Relational Databases with Ontologies
Emmanouil Sxoinas
Data Collection from Multiple Social Media


Zinovia Alepidou
Tag Recommendation Algorithms for Collaborative Tagging Systems
Georgios Lazaridis
Implementation of the Strategy of a General Game Player
Ioannis Sarafis
Development of an Autonomous Poker Agent Software using Reinforcement Learning Techniques
Konstantina Gemenetzi
A System for the Collection and Analysis of Social Media Data
Michael Tsapanos
Zeroth Classifier Systems for Real Time Strategy Games