Michail Papamichail, Kyriakos Chatzidimitriou, Thomas Karanikiotis, Napoleon-Christos Oikonomou, Andreas Symeonidis and Sashi Saripalle
Data, 4, (2), 2019 May
The widespread use of smartphones has dictated a new paradigm, where mobile applications are the primary channel for dealing with day-to-day tasks. This paradigm is full of sensitive information, making security of utmost importance. To that end, and given the traditional authentication techniques (passwords and/or unlock patterns) which have become ineffective, several research efforts are targeted towards biometrics security, while more advanced techniques are considering continuous implicit authentication on the basis of behavioral biometrics. However, most studies in this direction are performed “in vitro” resulting in small-scale experimentation. In this context, and in an effort to create a solid information basis upon which continuous authentication models can be built, we employ the real-world application “BrainRun”, a brain-training game aiming at boosting cognitive skills of individuals. BrainRun embeds a gestures capturing tool, so that the different types of gestures that describe the swiping behavior of users are recorded and thus can be modeled. Upon releasing the application at both the “Google Play Store” and “Apple App Store”, we construct a dataset containing gestures and sensors data for more than 2000 different users and devices. The dataset is distributed under the CC0 license and can be found at the EU Zenodo repository.
Christoforos Zolotas, Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis
Enterprise Information Systems, pp. 1-27, 2018 Mar
In the modern business world it is increasingly often that Enterprises opt to bring their business model online, in their effort to reach out to more end users and increase their customer base. While transitioning to the new model, enterprises consider securing their data of pivotal importance. In fact, many efforts have been introduced to automate this ‘webification’ process; however, they all fall short in some aspect: a) they either generate only the security infrastructure, assigning implementation to the developers, b) they embed mainstream, less powerful authorisation schemes, or c) they disregard the merits of the dominating REST architecture and adopt less suitable approaches. In this paper we present RESTsec, a Low-Code platform that supports rapid security requirements modelling for Enterprise Services, abiding by the state of the art ABAC authorisation scheme. RESTsec enables the developer to seamlessly embed the desired access control policy and generate the service, the security infrastructure and the code. Evaluation shows that our approach is valid and can help developers deliver secure by design enterprise services in a rapid and automated manner.
George Mamalakis, Christos Diou, Andreas L. Symeonidis and Leonidas Georgiadis
"Of daemons and men: reducing false positive rate in intrusion detection systems with file system footprint analysis"
Neural Computing and Applications, 2018 May
In this work, we propose a methodology for reducing false alarms in file system intrusion detection systems, by taking into account the daemon’s file system footprint. More specifically, we experimentally show that sequences of outliers can serve as a distinguishing characteristic between true and false positives, and we show how analysing sequences of outliers can lead to lower false positive rates, while maintaining high detection rates. Based on this analysis, we developed an anomaly detection filter that learns outlier sequences using k-nearest neighbours with normalised longest common subsequence. Outlier sequences are then used as a filter to reduce false positives on the FI2DS file system intrusion detection system. This filter is evaluated on both overlapping and non-overlapping sequences of outliers. In both cases, experiments performed on three real-world web servers and a honeynet show that our approach achieves significant false positive reduction rates (up to 50 times), without any degradation of the corresponding true positive detection rates.
Eleni Nisioti, Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis
"ICML 2018 AutoML WorkshopPredicting hyperparameters from meta-features in binary classification problems"
AutoML, Stockholm, Sweden, 2018 Jul
The presence of computationally demanding problems and the current inability to auto-matically transfer experience from the application of past experiments to new ones delaysthe evolution of knowledge itself. In this paper we present the Automated Data Scientist1,a system that employs meta-learning for hyperparameter selection and builds a rich ensem-ble of models through forward model selection in order to automate binary classificationtasks. Preliminary evaluation shows that the system is capable of coping with classificationproblems of medium complexity.
Sotirios-Filippos Tsarouchis, Maria Th. Kotouza, Fotis E. Psomopoulos and Pericles A. Mitkas
IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 189-199, Springer, Cham, 2018 May
The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.
Kyriakos C. Chatzidimitriou, Michail Papamichail, Themistoklis Diamantopoulos, Michail Tsapanos and Andreas L. Symeonidis
"npm-miner: An Infrastructure for Measuring the Quality of the npm Registry"
MSR ’18: 15th International Conference on Mining Software Repositories, pp. 4, ACM, Gothenburg, Sweden, 2018 May
Themistoklis Diamantopoulos, Georgios Karagiannopoulos and Andreas Symeonidis
IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pp. 21-27, 2018 May
Anastasios Dimanidis, Kyriakos C. Chatzidimitriou and Andreas L. Symeonidis
"A Natural Language Driven Approach for Automated Web API Development: Gherkin2OAS"
WWW ’18 Companion: The 2018 Web Conference Companion, pp. 6, Lyon, France, 2018 Apr
Speeding up the development process of Web Services, while adhering to high quality software standards is a typical requirement in the software industry. This is why industry specialists usually suggest \"driven by\" development approaches to tackle this problem. In this paper, we propose such a methodology that employs Specification Driven Development and Behavior Driven Development in order to facilitate the phases of Web Service requirements elicitation and specification. Furthermore, we introduce gherkin2OAS, a software tool that aspires to bridge the aforementioned development approaches. Through the suggested methodology and tool, one may design and build RESTful services fast, while ensuring proper functionality.
Maria Th. Kotouza, Konstantinos N. Vavliakis, Fotis E. Psomopoulos and Pericles A. Mitkas
5th International Conference on Big Data Computing Applications and Technologies, pp. 191-197, IEEE/ACM, Zurich, Switzerland, 2018 Dec
Item clustering is commonly used for dimensionality reduction, uncovering item similarities and connections, gaining insights of the market structure and recommendations. Hierarchical clustering methods produce a hierarchy structure along with the clusters that can be useful for managing item categories and sub-categories, dealing with indirect competition and new item categorization as well. Nevertheless, baseline hierarchical clustering algorithms have high computational cost and memory usage. In this paper we propose an innovative scalable hierarchical clustering framework, which overcomes these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the items using three metrics, a) Identity, b) Similarity and c) Entropy, as well as a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. ?he proposed framework is evaluated on the popular MovieLens 20M dataset achieving significant reduction in both memory consumption and computational time over a baseline hierarchical clustering algorithm.
Panagiotis G. Mousouliotis, Konstantinos L. Panayiotou, Emmanouil G. Tsardoulias, Loukas P. Petrou and Andreas L. Symeonidis
MOCAST, 2018 Mar
FPGAs are commonly used to accelerate domain-specific algorithmic implementations, as they can achieve impressive performance boosts, are reprogrammable and exhibit minimal power consumption. In this work, the SqueezeNet DCNN is accelerated using an SoC FPGA in order for the offered object recognition resource to be employed in a robotic application. Experiments are conducted to investigate the performance and power consumption of the implementation in comparison to deployment on other widely-used computational systems. thanks you!
Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis
Proceedings of the 2018 Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), 2018 Mar
The popularity of open-source software repositories has led to a new reuse paradigm, where online resources can be thoroughly analyzed to identify reusable software components. Obviously, assessing the quality and specifically the reusability potential of source code residing in open software repositories poses a major challenge for the research community. Although several systems have been designed towards this direction, most of them do not focus on reusability. In this paper, we define and formulate a reusability score by employing information from GitHub stars and forks, which indicate the extent to which software components are adopted/accepted by developers. Our methodology involves applying and assessing different state-of-the-practice machine learning algorithms, in order to construct models for reusability estimation at both class and package levels. Preliminary evaluation of our methodology indicates that our approach can successfully assess reusability, as perceived by developers.
Emmanouil G. Tsardoulias, Konstantinos L. Panayiotou, Christoforos Zolotas, Alexandros Philotheou, Anreas L. Symeonidis and Loukas Petrou
"From classical to cloud robotics: Challenges and potential"
3rd International Workshop on Microsystems, Sindos Campus, ATEI Thessaloniki, Greece, 2018 Dec
Nowadays, a rapid transition from the classical robotic systems to more modern concepts like Cloud or IoT robotics is being experienced. The current paper brieﬂy overviews the beneﬁts robots can have, as parts of the increasingly interconnected world.
Konstantinos N. Vavliakis, Maria Th. Kotouza, Andreas L. Symeonidis and Pericles A. Mitkas
"Recommendation Systems in a Conversational Web"
Proceedings of the 14th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,, pp. 68-77, SciTePress, 2018 Jan
In this paper we redefine the concept of Conversation Web in the context of hyper-personalization. We argue that hyper-personalization in the WWW is only possible within a conversational web where websites and users continuously “discuss” (interact in any way). We present a modular system architecture for the conversational WWW, given that adapting to various user profiles and multivariate websites in terms of size and user traffic is necessary, especially in e-commerce. Obviously there cannot be a unique fit-to-all algorithm, but numerous complementary personalization algorithms and techniques are needed. In this context, we propose PRCW, a novel hybrid approach combining offline and online recommendations using RFMG, an extension of RFM modeling. We evaluate our approach against the results of a deep neural network in two datasets coming from different online retailers. Our evaluation indicates that a) the proposed approach outperforms current state-of-art methods in small-medium datasets and can improve performance in large datasets when combined with other methods, b) results can greatly vary in different datasets, depending on size and characteristics, thus locating the proper method for each dataset can be a rather complex task, and c) offline algorithms should be combined with online methods in order to get optimal results since offline algorithms tend to offer better performance but online algorithms are necessary for exploiting new users and trends that turn up.
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas Symeonidis
Charpter:1, pp. 25, Springer, 2018 Jan
Nowadays, developers tend to adopt a component-based software engineering approach, reusing own implementations and/or resorting to third-party source code. This practice is in principle cost-effective, however it may also lead to low quality software products, if the components to be reused exhibit low quality. Thus, several approaches have been developed to measure the quality of software components. Most of them, however, rely on the aid of experts for defining target quality scores and deriving metric thresholds, leading to results that are context-dependent and subjective. In this work, we build a mechanism that employs static analysis metrics extracted from GitHub projects and defines a target quality score based on repositories’ stars and forks, which indicate their adoption/acceptance by developers. Upon removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine the semantics among metrics to provide an analysis on five axes for source code components (classes or packages): complexity, coupling, size, degree of inheritance, and quality of documentation. Neural networks are thus applied to estimate the final quality score given metrics from these axes. Preliminary evaluation indicates that our approach effectively estimates software quality at both class and package levels.