Task Importance Assessment based on Project Management DataThe contemporary software development process dictates the use of issue tracking systems. These systems allow the enrichment of issues with semantic characteristics that optimize the software development process. One of these characteristics regards the importance of each issue, which in turn affects the priority of the issue with respect to the development sprints, as well as the expected duration for the completion of the issue itself. Since there are no clear instructions on how the level of importance is defined, it is up to the personal judgement of each developer. Consequently, issue reports with erroneous importance values arise frequently, a fact which perplexes the management process of a software project under examination. Researchers have attempted to automate the aforementioned process through the design and development of recommender systems. However, these systems specialize in predicting the importance of software bugs only, and not generally the issues of a software project, failing to filter the used information, germane to the time frame to which it belongs and not considering the fact that the classes of importance are ordinal. The current diploma thesis extends previous work in order to tackle the above mentioned aspects not covered by other research. More specifically, it proposes a system capable of automating the process of assigning importance values to issue reports, by utilizing the available information found in Issue Tracking Systems. For the implementation of this system, a data set was used, which was exported from the Jira platform. For each issue report the attributes title, description, type, and assignee-id are extracted. Subsequently, a multifactorial approach is followed, which entails the design and development of three distinct machine learning models, aggregated to a final model. In this context, four different types of prediction models were investigated. In particular, KNN and SVR classifiers were used, along with two Neural Networks, which were trained for every project. The purpose of developing the different models was to find the optimal case, after comparing their respective results. Results show that our system can successfully serve as a basis for defining the proper value for the importance of issues.
Bug Fix Time Classification on Open Source RepositoriesNowadays, software development teams follow modern principles regarding the software development life cycle and use many tools, such as version control systems, bug tracking systems etc., in order to improve their productivity. The popularity and the intensive use of such tools and systems, has gathered a big amount of information regarding every stage of the software development process. By utilizing and analyzing these data, we can extract valuable information and build tools that contribute to the field of qualitative software development. New trends around software development processes aim at proper distribution of tasks to the team, flexibility in dynamic situations and development of a timetable that corresponds to reality. The achievement of the above in large open source organizations can be achieved through the analysis of software development methods and the design of systems that automate relevant processes. This diploma proposes an end-to-end system that contributes to the research of bug fix time prediction, by applying information retrieval techniques. More precisely, the designed system collects and analyzes data from GitHub repositories. The system classifies software issues according to their predicted fix time. Our approach is multilevel, taking into consideration the features title, description, assignee and labels of a bug report. A subsystem is designed for each of these features. Subsystems analyze previous data and generate a score that represents the probability of participation for each issue in every class. Finally, classification is performed by a neural network that aggregates every subsystem\'s scores. Moreover, data processing techniques are used in order to cope with the particularities shown in the datasets of open source software repositories. The proposed system is trained and evaluated in a dataset that consists of 11099 issues from 26 big Java repositories in GitHub. Experiments show that our system has satisfactory efficiency, especially when it comes to binary classification where high evaluation metrics are observed.
Applying Data Mining Techniques on Software Repositories to Extract Design and Evolution PatternsClose collaboration between software developers is considered essential in order to build innovative software projects. For this reason, there are several online program-hosting platforms, which enable their users to watch each other’s changes, recommendations and comments towards the improvement and evolution of code. These platforms also control different versions of the software code so that the developer can revert to previous ones if desired. All the modifications performed at a given time by a member of the software development team are bundled in a commit, where the main reasons behind them are also recorded. As a consequence, it goes without saying that these series of changes include a lot of useful information about the way a software project evolves. Applying data mining techniques on public software repositories and the data we discussed above could unveil some common bug fixes, systematic edits, frequent types of changes in a project’s architecture and frequently-used design patterns either known or unknown ones. An extensive bibliographic research in this domain reveals that the majority of scientific efforts has focused on bug fixes and systematic edits ignoring some more coarse-grained (high-level) code evolution or design patterns. In this context, this dissertation tries to extract the relationships between the classes of an object-oriented program, while also seeking to monitor the way they evolve over time. To achieve these goals, this diploma thesis adapts a Relationship Extractor tool based on the Abstract Syntax Trees analysis of some of the most popular software projects in Github web platform. After analyzing and processing those syntax trees, useful information is extracted concerning the operation, the abstraction level as well as the inheritance of classes. This information is then modeled as graphs (with classes as nodes and the connections between them as edges). These steps are not only executed for the latest version of a project, but also in each and every commit with a view to extracting the difference in relationships between the versions of a project before and after the specific commit. Finally, gSpan, which is a frequent-subgraph mining algorithm, is applied, in order to detect code design and evolution patterns used by the software community worldwide.
Detection Of Abnormal User Behavior In Web Applications Using Sequence Classification Machine Learning TechniquesIn recent years the development of the internet and its applications has increased rapidly, and its use occupies continuously more and more a larger part of people daily lives. Today, the internet is a basic and necessary means for communication, entertainment, information, shopping and many other functions that are now done through it. Unfortunately, with the development of these features, illegal activities as cheating other users, accessing confidential and secret information, promoting certain products and even interrupting the deployment of websites from the internet, have also increased, since hackers are exploiting the vulnerabilities in the security of web applications and systems. Cybersecurity focuses on the development of protection systems and methods that aim to detect and identify an impending cyber-attack, thus contributing drastically to protection against malicious actions. On the other hand, the field of Machine Learning focuses on developing techniques that allow a computing system to \"think\" and \"decide\", and not just explicitly execute commands that have been dictated to it by the programmer. The field of Machine Learning is widely used in various domains, such as cybersecurity, which this dissertation deals with. In the context of this dissertation, a system was modeled and developed to receive necessary and useful information about user behavior in an online e-commerce application, and after storing and processing the data in a specific way, finally feeding them into a sequential classification machine learning model to characterize the user behavior as either benign or malicious.
Design and development of a Machine Learning based attack detection system for web applicationsThe increasing use of web applications and the popularity of Software-As-A-Service has created room for major vulnerability issues in systems which up until recently were “running” in restricted networks: information (sensitive or not) is now available on the internet. As a consequence, using appropriate software security procedures is the only way to protect it. Security checks must be performed in many and different layers, like the network layer, the OS layer, and also the application layer. In light of this, the objective of this diploma thesis is the design and development of a system that detects possible security attacks using machine learning algorithms. The goal is the use of machine learning algorithms to detect “good” and “bad” behaviors at the application layer. The analysis will be dynamic (at runtime) and a decision mechanism will be developed.
Design and implementation of an Automation Mechanism for the configuration of robotic devices for the simulator GazeboIn the age of rapid development of technology, the use of robotic systems is wide throughout the spectrum of modern life and the automation achieved through the use of robotic systems, yields large and faster production at relatively lower costs. However, robots often behave inconsistently during testing, given the complexity of the systems and the large variability of the environment. Robotic simulations provide the solution to this problem, as they provide a low-cost, easily accessible virtual robot development environment. They are used to quickly evaluate the design of a robot, simulate virtual sensors, provide a reduced model for predictable model controllers, and an architecture for real-world control of robots, and so on. Robotic simulations take place in special software, robotic simulators. A robotics simulator is a simulator, used to create an application for a physical robot without depending on the actual machine, thus saving cost and time. In some cases, these applications can be transferred to the physical robot (or rebuilt) without modification. One of the most popular applications for robotic simulators is the 3D modeling and rendering of a robot and its environment. This type of robotics software has a simulator, which is a virtual robot, which is capable of mimicking the motion of a real robot in a real situation. Some robotic simulators even employ physics engines for a more realistic robotic motion output. There is a large number of robotics simulators, each serving, either different or same purposes. However, while robotics simulators offer a wealth of benefits, the need to produce high quality applications and software has become more pressing than ever. Increasing productivity, reducing errors (debugging), auditing, verifying and maintaining software play a crucial role in the quality of the final product. A solution to this problem derives from automation software engineering. Modern software, on the other hand, is very complex, as it often consists of hundreds of lines of code, distributed in many different files, and depends on numerous libraries. Changing a single line of code can affect the functionality of the entire system and result in errors, which is very likely, since most software requires a large number of people to develop them. The solution to the issue of quality comes from software automation. By utilizing automation software engineering, we have the ability to produce software, tested and ready to use, in less time and at reduced cost, thus achieving increased productivity and quality. Ιt is as if, in a way, the complexity of the system is being eliminated, as the degree of dependence on the human factor is reduced. One advantage of automation is that it also gives non-skilled engineers the opportunity to determine the operation of a software, omitting extra steps in its development that would otherwise be necessary.(continue in full text)
Continuous Implicit Authentication of smartphone users based on behavioral analysisThe increasing popularity of smartphones has raised serious safety concerns. This is due to the fact that these devices hold sensitive personal and often pro fessional information and existing authentication schemes have proven inefficient. Password patterns and PIN codes, in particular, can easily be acquired by attackers with shoulder surfing techniques, while all widely-employed user authentication mechanisms, in general, offer one-time authentication, leaving the device unpro tected after the login stage. In this thesis, a continuous and implicit authenti cation (CIA) approach is introduced that can act as a complementary authenti cation method. This approach is supplemented by developing a methodology of personalising authentication criteria by analysing how different users behave based on the context of the screen they are browsing. This last addition serves as the greatest contribution of this thesis in the field of continuous and implicit authentication, since not many ways of optimizing authentication schemes have been explored yet. As a means of pursuing the aforementioned goals, a behavioral biometrics dataset, containing several users’ gestures, was utilized. Two types of gestures were examined, swipes and taps, on how they can serve as a way of distinguishing users. One-Class SVM played a key role in developing this methodology as it allows training with the use of only one user’s gestures, something that can be deployed in real-life scenarios. The problem of determining the behavioral variance that each user indicates (based on the context of the screen he/she is browsing) was handled as a clustering problem, addressed by the k-means algorithm. The method proved to be efficient, especially when analysing swipe gestures, and the incorporation of contextual-behavioral information can offer substantial improvements in user authentication schemes.
Domain specific language for controlling sensors and actuators in IoT devices, using model driven engineering approachesThe Internet of Things (IoT), has been growing at an exponential rate in the last couple of years. Every year new devices invade human daily life and waiting to be controlled. Controlling software must be developed to interact with these devices and new applications could be built on top of them. Many people can’t experience the true advantages of IoT as they are unable to build applications since they lack the required technological background. Model-Driven Engineering (MDE) can help these people as it solves software engine ering problems using models of the physical and virtual world. There aren’t many attempts, which try to use MDE in the world of IoT. There are even less attempts that try to help the technology illiterates to build IoT applications. This diploma thesis proposes some tools to model IoT devices and the connections between them. In addition it provides a textual grammar for the definition of those models. Further, it develops a library for driving IoT devices through a common API. Also, using automated code source generation it proposes a way of controlling these devices through a raspberry pi and communication endpoints.
On analyzing the importance of Google Lighthouse performace metricsΙnternet has become an integral part of humans’ everyday life, an indispensable part of information gathering, means of socialization, provision of services, purchasing and selling products. The plethora of available websites providing similar or even different services has created a new reality where each user can find sites that fulfill their needs. Therefore, sites of similar content and services focus on optimizing User Experience to attract more users. Particularly, User Experience refers to user interactions with a website and focuses on the overall experience a site provides. There are various factors that influence User Experience. This thesis employs Google Lighthouse, an automated tool for measuring the quality of web pages, and explores the very features that influence performance metrics pertaining to User Experience. Particularly, 85 features were extracted from a dataset of 200K websites, data resulting from Google Lighthouse reports. These features describe quantitatively the composition, structure and resources of each web page. After having used a regression model for predicting performance metrics scores, as defined by the simulation software, an analysis-extraction of the most important features used by the model was performed. The ultimate objective of the thesis is to enable a front-end website developer to prioritize and focus on those features that improve Google Lighthouse’s performance metrics scores, this way improving user experience.
Bug Fix Time Classification on Open Source RepositoriesNowadays, software development teams follow modern principles regarding the software development life cycle and use many tools, such as version control systems, bug tracking systems etc., in order to improve their productivity. The popularity and the intensive use of such tools and systems, has gathered a big amount of information regarding every stage of the software development process. By utilizing and analyzing these data, we can extract valuable information and build tools that contribute to the field of qualitative software development. New trends around software development processes aim at proper distribution of tasks to the team, flexibility in dynamic situations and development of a timetable that corresponds to reality. The achievement of the above in large open source organizations can be achieved through the analysis of software development methods and the design of systems that automate relevant processes. This diploma proposes an end-to-end system that contributes to the research of bug fix time prediction, by applying information retrieval techniques. More precisely, the designed system collects and analyzes data from GitHub repositories. The system classifies software issues according to their predicted fix time. Our approach is multilevel, taking into consideration the features title, description, assignee and labels of a bug report. A subsystem is designed for each of these features. Subsystems analyze previous data and generate a score that represents the probability of participation for each issue in every class. Finally, classification is performed by a neural network that aggregates every subsystem’s scores. Moreover, data processing techniques are used in order to cope with the particularities shown in the datasets of open source software repositories. The proposed system is trained and evaluated in a dataset that consists of 11099 issues from 26 big Java repositories in GitHub. Experiments show that our system has satisfactory efficiency, especially when it comes to binary classification where high evaluation metrics are observed.
Basketball data analytics via Machine Learning techniques using the REMEDES systemData science although pre-existing, now days dominates and may do the same in the future. The existence of huge storage space and powerful processors capable of managing corresponding-sized databases, have enabled information to be collected in every workplace, from the medical and engineering sectors to the arts and sports. In this diploma thesis we will stand in the field of professional sport and in particular basketball. Initially basic knowledge about the sport will be presented, some of the information collection tools will be mentioned and we will analyze the importance and role of data in training, in the preparation of athletes and in the decisions of coaches. Subsequently, having collected data using the \"REMEDES\" system on a specific basketball set of drills, a sports performance evaluation system was developed using the \"Python\" programming language and various Data Preprocessing and Machine Learning techniques. The purpose of this system is to evaluate as representatively as possible non-athletes, athletes and Basketball athletes in a specific set of drills that obviously concern basketball. During the analysis and through careful monitoring of the results we have drawn some very interesting conclusions that will be presented and interpreted in this report along the way.
Development of an automatic procedure for Continuous IntegrationIn recent years there has been a rapid growth in the field of cloud computing which has aroused the interest of many companies, with their demand constantly growing as well as the number of providers offering these services. However, despite the fact that the use of cloud computing has been established, offering many advantages, various challenges arise, such as data security. A key element of the software development process is the frequent testing of the application, in order to ensure quality and minimize bugs, which is achieved through Continuous Integration (CI) systems. Upon successful execution of the automated tests, CI deploys the latest version of the code in a pre-production (staging) or production environment automatically through Continuous Deployment (CD) and Continuous Delivery (CDE). The purpose of this thesis is comparing cloud providers, and then developing a method that simpifies the usage of a CI + CD/CDE system. Our approach also integrates static code analysis and evaluation. CI and CD/CDE processes are implemented through Gitlab, an open source software, with ready-to-use pipelines(Templates) supporting Node.js and Django web applications, while static analysis is performed through Code Quality which is embedded in Gitlab and is based on the Code Climate tool. The automatic installation of the prerequisites for the application deployment, in other words the server setup, and the first deployment , are performed through the Ansible software configuration management tool. Moreover, is given the capability to the user to deploy the app on the cloud platform Heroku without the need of using Ansible. The outcome of the thesis is aimed primarily at students or software developers with little experience who want to get involved and take their first steps with Gitlab CI.
A graphical application development methodology for remote robots in the context of cyber-physical systemsJust as the Internet has transformed the way people interact with information, cyber-physical systems are transforming the way people interact with computational systems. Cyber-physical systems integrate sensing, computation, control and networking into physical objects, connecting them to the Internet and to each other. A typical example of such systems are robotic systems, as they combine interaction with the environment and computational abilities. Even though robotics is closely tied to the manufacturing industry, in recent years it has branched out to other fields, such as medicine and autonomous exploration, and even in aspects of our daily life, such as for domestic use. A growth of similar scale can be seen in the Internet of Things (IoT) domain, where everyday objects are equipped with sensors to collect data from the environment and are able to connect to the Internet to share this data. We envision that, due to the mobility offered by robotic systems, their integration with IoT would enable better interaction with the environment, and simultaneously allow robots to make decisions based on data from other devices. To make this possible, there are certain limitations that must be overcome. On one hand, it is especially important to have the ability to control and monitor the robot remotely. Unfortunately, the Robot Operating System (ROS), the most widespread middleware for robotics development, restricts the management of the robot to the local network. On the other hand, it is desirable for users to have the ability to create their applications without having extensive robotics and programming knowledge. This thesis focuses on developing a system to address the aforementioned limitations. To establish the communication between the robot and the remote computer, the RabbitMQ message broker is used. At the same time, application development and the integration of the robot with the IoT world are accomplished through Node-RED, a tool for building applications for IoT systems through a graphical interface, thus simplifying the programming procedure. Furthermore, various use cases are presented, which showcase the capabilities of the system for developing robotic applications as part of the IoT.
Knowledge Distillation into BiLSTM Networks for the Compression of the Greek‐BERT ModelIn recent years, pre-trained language models, such as BERT, have achieved state of-the-art results in several natural language processing tasks. However, these models are typically characterized by a large number of parameters and high demands on memory and processing power. Therefore, their use in limited resource environments, such as on-the-edge applications, is often difficult. Within the context of this diploma thesis, various knowledge distillation tech niques into simple BiLSTM models are investigated with the aim of compressing the Greek-BERT model. The term ”Knowledge Distillation” refers to a set of techniques for transferring knowledge from a large and complex model to a smaller one. Greek BERT is a monolingual BERT language model, which has proven to be very efficient in various natural language processing problems in Modern Greek. For this purpose, GloVe word embeddings in Modern Greek, which were not previously available, are trained and evaluated. GloVe is trained on a huge corpus of texts in Modern Greek, totalling over 30GB. In order to make a fair comparison, the text corpus was crawled from the same web sources used for the pre-training of Greek-BERT. The models are evaluated on the XNLI dataset and on a text classifi cation dataset from the newspaper ”Makedonia”. In order to maximize knowledge transfer from Greek-BERT into the BiLSTM models, a data augmentation algorithm is developed, which is based on the GloVe word embeddings. It is proven that this process significantly improves the perfor mance of the models, especially for small datasets. Experiments indicate that knowledge distillation can improve the performance of simple BiLSTM models for natural language understanding in Modern Greek. The final single-layer model is 28.6x times faster, achieving 96.0% of the performance of Greek-BERT performance in text classification tasks and 86.9% in NLI tasks. The two-layer model is 10.7x times faster, achieving 88.4% of the performance of Greek-BERT in NLI tasks.
Optimal route planning of autonomous vehicles in dynamic environments.Autonomous driving is developing and evolving rapidly in the recent years. In the industrial sector, there are many companies that want to establish themselves first in the market, and create the ”ideal” autonomous vehicle. Improving citizens’ safety, reducing travel time and smoothly operating traffic, drives the need for efficient and effective autonomous driving solutions. In an ideal scenario, people will be able to cross roads without paying attention to passing vehicles, car accidents will diminish and traffic lights and other road signs will no longer be necessary as cars will be able to exchange information with each other through a communication network. The development of such a technology though, is quite complex, as it requires having control of many random and unpredictable conditions. In order to develop such a solution, it is necessary for the autonomous vehicle to have an excellent perception of the surrounding space, adapt to it, and be able to respond instantly to any changes in the surrounding environment. The vehicle must navigate safely on the road and respond to static and dynamic obstacles. In addition, it should and evaluate decision scenarios and then, according to the circumstances, select the appropriate response. Thus, autonomous vehicles need to be equipped with special sensors that identify and map the surrounding area of the vehicle, are armed with complex control and decision making systems and appropriate behavioral prediction systems. This dissertation focuses on the design and development of such an autonomous driving system. The purpose of the system is to safely navigate the vehicle, from a starting point to a final destination, in a city filled with vehicles and pedestrians while at the same time it calculates the best and shortest road in compliance to traffic rules. The system has been developed in the form of a modular, ego-only system. The software is written in the Python programming language and the ROS middleware. The Carla simulator was selected, which offers cities, cars and the desired physics to conduct the results. The developed system consists of the individual systems of 1) construction of the global path, 2) perception, 3) behavior prediction, 4) construction of local paths, 5) behavior selection and 6) control of the kinematic behavior of the vehicle. These systems communicate accordingly with each other in order to achieve autonomous driving. In the global path construction system, a guided graph of the map is created and the algorithm A* is used to search for the optimal route. (continue in full text)
Understanding the importance of demographic background for the website aesthetics through deep learning techniquesWeb pages nowadays constitute the most popular source of information, business and entertainment provision. Inarguably, their aesthetics comprise an integral part of the design of a website, playing a multidimensional role. Initially, web aesthetics support the content and functionality of a website, while at the same time striving to pique the interest of the targeted user categories. The objectives of this diploma thesis are to investigate and highlight the importance of demographic characteristics when evaluating web design aesthetics, through the use of deep learning algorithms. For this, two different approaches have been applied. The first approach concerns the training of three different architectures of convolutional neural networks (CNN) across the available data set, the AlexNet, VGG16 and Xception architectures. AlexNet has been re-evaluated on this set and provides reliable results while VGG16 is presented as an improved solution. On the other hand, Xception is a contemporary architecture which is being tested for the first time on this dataset and has surpassed the literature results. The second approach involves splitting the dataset by demographic groups and training convolutional networks for each group separately. In this way the respective models can model the aesthetics preferences of each demographic group. These models are merged using various ensemble methods and the best one is opted for the evaluation and comparison of the findings. In the experiments performed, comparisons are made between the models of each approach, as well as a presentation of various relative examples is given for better understanding. The purpose of this thesis is to point out the determinant role and importance of demographic characteristics, while also highlighting the contribution of advanced deep learning algorithms to the achievement of reliable predictive results regarding subjective issues, such as web site aesthetics.
Punctual fault identification through Machine Learning techniquesThe technology uprising in the premises of the 4th industrial revolution has led to the modernization of the maintenance field and the migration from preventive to predictive maintenance through machine learning methods and techniques. This diploma thesis aims, through research of classical and state of the art algorithms in the timeseries anomaly detection and classification domain, to the development of a user friendly and accurate tool of fault identification. To achieve this, it is essential to research for the most suitable machine learning techniques and consequently implement, adjust and evaluate their results in a real industrial environment.
Design and Implementation of a Mechanism that automates the generation of Software Systems capable of Deductive ReasoningToday, the development of technology and its utilization in all areas of human life, creates the need for software that is easily customizable, presentable, solves many types of problems, is economical and reliable. Model-Driven Engineering (MDE), ie software development based on models, the automatic production of code based on these models, the ability to graphically display the software, in combination with the techniques of Automated Reasoning meet the above needs. In the current diploma thesis, in order to meet the aforementioned needs, all the above techniques were utilized for the construction of a complete software tool, on the Eclipse platform. More specifically, in the framework of Model-Driven Engineering (MDE), a meta model was constructed which constitutes the core of the system and incorporates terms from the field of Logic. Expanding on this, a graphical interface was created, in the Sirius environment, which allows the interested party to construct, in a graphic way, the model he wants. The construction of the model is done in the form of equations, correctly formulated in the standards of First Order Logic (FOL). From this model, Java code is automatically generated, which utilizing functions and objects of the TweetyProject library, is properly configured to be a valid input for the built-in prover of the same library, that can perform logical tests in the standards of Automated Logic. Some more functions written in Java, complete the software tool of this diploma thesis. All of the above, constitute the software tool developed in this diploma, capable of being used by various mechanisms that automatically produce systems, in order to check the validity of the systems under design, without the need to implement additional software that draws logical conclusions.
Domain specific language for asynchronous message-driven architecturesThe introduction of new technologies in the domain of Internet of Things (IoT) combined with their extended use has raised some concerns to the developers. One of those, refers to the interoperability of systems caused by the heterogeneity of the various protocols and communication interfaces. This is the reason for increased difficulty in developing and maintaining applications and systems composed by multiple devices and entities. A major factor in facing those challenges can be Model Driven Engineering (MDE), mainly because it raises the level of abstraction in order to avoid addressing the details and the restrictions of the specific domain by the user. Moreover, it speeds up the software development process and its quality by allowing the design and development of reusable code. For now though, the presence of MDE in IoT is still insignificant. In this sense, the present Diploma Thesis describes a solution based on models for the interoperability in message-driven IoT systems. The result of this research is MECO (Modeling Entities and COmmunications), a Domain-specific Language (DSL) that allows users to design those kind of systems without any significant programming knowledge. Moreover, there is a Model-to-Text transformation to automatically generate software that implements the communications and a Model-to-Model transformation that generates documentation diagrams and files that improve the monitoring of the described systems.
Python metaprogramming in linear time language for automated runtime verification with graph neural networksThe term runtime logic verification defines a field that ranges from software verification for compliance with a set of specifications to assuring the adoption of good coding practices. Under this scope, we created lovpy, a novel metaprogramming library for python, that introduces to its ecosystem the capabilities of runtime logic verification. Definition of expected behavior is performed using the intuitive specifications language Gherkin, while using the library requires no code modifications. For its implementation we utilized a broad set of tools, ranging from the domains of graph theory, formal languages theory and temporal logic to deep learning, with specific focus on graph neural networks. We also, provided the mathematical foundation for a new type of graph, designed for representing temporal specifications. Based on it, we defined a set of mathematically proved logic algorithms.(continue in full text)
Abstract - Mining Source Code Change Patterns from Open-Source Repositories Nowadays, there is a rapid growth of open-source version control systems and repositories. A large number of new software projects are implemented, developed and maintained through these systems. Τhis way, software engineers can collaborate directly with each other, organize effectively and maintain an up-to-date history of the project’s evolution. Therefore, the volume of information stored is significant and its harnessing can lead to the development of smart and efficient systems. Within the context of this diploma thesis a machine learning system is developed, which stores, processes and groups source code changes that have taken place during the development stage, with the goal of extracting source code changes patterns. These patterns can act as recommendations for new projects, in order to optimize code development and/or fix potential bugs found repeatedly in project repositories. The proposed methodology was applied on the GitHub code hosting platform. GitHub tracks changes of source code files contained in a repository. These changes are represented as Abstract Syntax Trees (ASTs), so that the calculation of a similarity metric for the algorithmic structure can be achieved. Additionally, their semantic similarity is calculated and thus final clustering of source code changes is possible. Clusters that meet specific criteria, contain patterns of source code changes that can be used to provide recommendations for new software projects.
Model-driven development for low-consumption real-time IOT devicesInternet of Things (IoT) is a field that is evolving rapidly, especially in recent years. There is the possibility of developing even more applications which prove to be useful for many people, whether they have to do with simple functions in automation systems, or with larger scale applications in the industry. Therefore, more and more people want to work in this field. The process of developing an IoT system involves code development to control the system’s devices. In fact, in most cases fast response is of the utmost importance, so low-level code development is required, as well as the use of real-time operating systems (RTOS). Also, due to the great heterogeneity of IoT devices on the market, it is necessary to understand the capabilities that each device can offer, in order to make the appropriate choice of one, tailored to the needs of the system to be implemented. These requirements may seem complicated to some users, especially to people who are technologically untrained, i.e. do not have the necessary programming skills, but still want to build an IoT system e.g. for their personal use. This results in a large portion of people wanting to get involved with IoT, being discouraged to do so. Model Driven Engineering (MDE) is here to solve the problems that, those who want to get involved with IoT, may face, but also to simplify the software production process in general, as it can provide the developing of IoT systems to a more abstract level, which is more user friendly. Through this diploma thesis, one is given the opportunity to describe, using models, IoT devices, through two domain specific Languages (DSL) developed for the description of devices and the connections between them. From the models, a Model to-Text (M2T) transformation is performed for the automated code generation, for a variety of IoT devices, adapted to the characteristics that the user wishes for it to have. The software for controlling the IoT devices that is produced implements the process of taking measurements from sensors, and sending them to a broker, but also the process of controlling actuators through the broker. It also consists of low-level code, as it has been designed according to the requirements of a real time operating system, named RIOT. Finally, a Model-to-Model (M2M) transformation takes place in order to produce diagrams that provide a visualization and thus a better understanding by the user, of the wiring and intercommunication of their system.
Design and development of a tool for automating scenario production of digital assistantsThe development of advanced Artificial Intelligence techniques in recent years has allowed digital assistant technologies to emerge. From customer service centers to medical diagnostics, digital assistants find application in many areas and are used daily by users. More and more companies are trying to integrate them in their framework and the technologies behind them are constantly evolving. In addition, Open Source technologies bring digital assistant tools closer to developers, allowing them to experiment with them. One such tool is Rasa, an Open Source technology for creating industrial-level digital assistants with Artificial Intelligence. However, the use of Rasa requires a high level of programming software knowledge expertise. As digital assistants become more and more necessary in everyday applications, the barrier of know-how limits the number of people who are involved with them. The present Diploma Thesis focuses on the development of an easy-to-use scenario creation tool for Rasa with the aim of rapidly creating digital assistants. Using Python and specifically the framework Django, it presents the implementation of a full-stack application, from views and resource paths to models and back-end processes. This application makes it easy to create and edit digital assistants by automating most Rasa features. In addition, the application is used by creating digital assistants, simple and complex. First the design of the scenarios and stories that the discussion will take is presented and then they are implemented in the system. Finally, the assistants are tested and the result is evaluated from the examples of discussions. According to the results, the application can successfully create digital assistants that contain the basic components of Rasa. However, as digital assistants become more complex, some human intervention becomes necessary for the desired function to be implemented. Thus, although the application works as we want in simple and complex scenarios, when the operator needs something quite demanding in complexity, it is still necessary to know programming skills.
Development of a graphical interface of an autonomous vehicle for driving behavior parameterization and remote controllingThe transportation of people over the years was an integral part of their daily lives. For this reason, the first efforts to manufacture a car began in the 18th century. Over the years, the industries were constructing more and more modern cars to offer the people easy and comfortable transportations. In recent years, the most known companies and universities perform research aiming to create autonomous vehicles, which will change the way traditional cars work. Some of the main problems that the technology of self-driving cars will contribute drastically, are the saving of significant time in people’s daily lives, the reduction of road accidents and consequently the safer transportations, the contribution to fuel economy and the reduction of environment’s pollution. To date, a fully autonomous vehicle has not yet been constructed, which operates without any human intervention. When this technology becomes a reality, the vehicle will have a full and precise perception of the external environment’s conditions, will make the right decisions every time, and will be able to exchange information with other vehicles to cooperate for better operation of the overall traffic. However, a lot of effort and research is demanded in the sector of autonomous driving to create a vehicle that successfully corresponds to numerous scenarios and conditions that occur inside the traffic. The implementation of such a system requires solving the problems of the external environment’s perception, the right behavior choice, and the safe and smooth transition to the final destination by obeying the traffic rules and avoiding dynamic and static obstacles. To solve the aforementioned problems, suitable equipment is needed that includes state-of-the-art sensors which will take as input the environment’s measurements. The measurements will be analyzed by a central processing unit and finally, the right decision will be taken. (continue in full text)
Image Inpainting Detection through Artificial Intelligence TechniquesImage inpainting is the process of repairing an area in an image, from which a part of the semantic information is missing and consequently there is a lack of semantic continuity. Image inpainting was initially designed to effectively repair damaged areas in images. Ηowever, it was quickly used for the purpose of forgery and deception. In recent years, methods of applying image inpainting through artificial intelligence techniques came up and achieved high quality results, producing images where the presence of inpainting is almost impossible to detect with the human eye. Therefore, it is of critical importance to develop a method that will detect the affected areas in inpainted image. For this reason, the present thesis focuses on the study of image inpainting detection methods and the implementation of an artificial neural network capable of detecting areas where an image has been tampered by inpainting. A total of eight convolutional neural networks, based on two state of the art architectures, were trained and tested. The training process was based on two configurations sets (10 and 50 epochs respectively) adopting the binary cross entropy (BCE) as a loss function. Furthermore, it was also studied to what extent the use of a training dataset consisting of images that have been inpainted in semantic areas helps more than one whose images have been inpainted in random-form areas helps more in the image inpainting detection. For this reason, two training sets were created. The first one, is consisting of images with random-form inpainting masks, while the second one is consisting of images with semantic masks (objects). To evaluate the trained models, a test set consisting of both forms of masks were created in order to give an objective interpretation of the results. The aim is to train a model, capable of producing a predicted mask Mo as output, given an image I as input. Finally, the two commonly used pixel-wise metrics, IoU and AUC, were adopted to evaluate the performance. The metrics were calculated by using the ground truth Mg and the predicted mask Mo and by making a 1-1 comparison of their corresponding pixels. Τhe study proved that, models trained with a set of images that have been tampered in random areas (random masks) achieve better results comparing to models that were trained with a train set of images that have been tampered in semantic areas (semantic masks).
Analyzing code bugs based on method call graphsThe increasing size and complexity of modern software projects often leads to the appearance of runtime errors (crashes), for instance due to coding inaccuracies or unforeseen use cases. Since errors affect software usability, quickly dealing with them has become an important maintenance task tied to the success of software projects. At the same time, processes for parsing user feedback, for example by dedicated teams, to understand errors or other bags and initiate maintenance operations can prove time consuming. To mitigate associated costs, an emerging trend is to automate (parts of) error understanding with machine learning systems, for example that perform auto matic tagging. In this thesis, we focus on understanding errors through extracted latent represen tations; these can be inputted in machine learning systems to predict error qualities, such as recommending which tags errors should obtain. To achieve this, existing ap proaches in the broader scope of automated bug understanding make use of natural language processing techniques, such as word embeddings, to understand feedback texts. However, in the case of errors, we propose that available stack traces leading up to crashing code segments also capture useful coding semantics in the form of paths within function call graphs. Thus, we investigate whether graph embeddings—extracted from error stack traces—can be used to obtain a better understanding of errors. To test our hypothesis, we developed a system that extracts latent error represen tations of software projects that combine textual and stack trace embeddings. To verify that these improve error understanding compared to using textual features only, we experimented on three popular software GitHub projects, where we extracted error rep resenations and used them to predict error tags (e.g. high priority) with neural network predictors. We found that, given a robust selection of predictor and enough example errors to train from, our approach improves text-based tagging by a significant margin across popular recommendation system measures.
Implementation of a full stack tool in Kubernetes environment to automate the application of filters on messages using message broker technologyThe transition of internet technologies to microservice architectures and the de velopment of the Internet of Things (IoT) have significantly increased the need for new methods of efficient communication between heterogeneous and distributed systems. Brokered messaging methodologies work better than REST (Representational State Transfer) and RPC (Remote Procedure Call) technologies / approaches in producer consumer (messaging) communication systems where both high-throughput trans mission of large volumes of data is desirable as well as the abstraction of producer and consumer subsystems. A lightweight and reliable technology that offers the benefits of brokered mes saging is RabbitMQ. By using it, complex and efficient systems can be built under conditions of asynchronous communication, unreliable networks and within big data application environments. This dissertation focuses on the full-stack development of a tool, which uses brokered messaging technology to implement filters on the messaging of a system. The automation of these functions through the tool, makes the effects of the involved technologies accessible to the users, regardless of the degree of their experience in the specific technologies. Messaging is carried out via a Rabbitmq Server which implements the brokered messaging technologoy. Finally, to facilitate the management of the entire system, this was set up in the context of Kubernetes, which offers the automated orchestration of the parts of the system. For the establishment of the Kubernetes environment, the minikube technology was chosen as it offers easy and fast creation of a Kubernetes environ ment. System performance was tested for different values of message input load and number of applied filters. The measured parameters refer to the frequency of mes sage entry, the frequency of message consumption, the frequency of message logging to the Database and the number of messages stored in the broker queues. From the experiments it is concluded that it is particularly important to select the appropriate number of applied filters according to the available processing power and memory resources of the system.