Fotis E. Psomopoulos

Post-Doctoral Researcher

Aristotle University of Thessaloniki
Department of Electrical and Computer Engineering
54124 Thessaloniki – GREECE

Tel: +30 2310 99 6349
Fax: +30 2310 99 6398
Email: fpsom(at) issel [dot] ee [dot] auth [dot] gr

LinkedIn | Twitter

Education

2010-today Postdoctoral research
Institute of Agrobiotechnology (INA)
Centre for Research and Technology Hellas (CERTH), Greece
Postdoctoral research topic: ”Phylogenetic Data Analysis
2004-2010 PhD student
Electrical and Computer Engineering Department
Aristotle University of Thessaloniki, Greece
PhD Thesis Title: ”Parallel Data Mining and Analysis Algorithms in a Grid environment and applications in Bioinformatics
1999-2004 Diploma of Electrical and Computer Engineering
Electrical and Computer Engineering Department
Aristotle University of Thessaloniki, Greece
Grade:8.17/10
Diploma Dissertation Title: ”A Finite State Automata based data-mining algorithm for interesting rules extraction, with application in protein classification

Professional Experience

Apr 2010 – today Software Development, Institute of Agrobiotechnology (INA)
Centre for Research and Technology Hellas(CERTH).
Director: Prof. C. A. Ouzounis
Jun 2005 – today Software Development, Informatics and Telematics Institute (ITI)
(founding member of the Centre for Research and Technology Hellas(CERTH)))
under the auspices of the Greek General Secretariat of Research and Technology.
Director: Prof. P. A. Mitkas
Jul 2004 – today Software Development, Intelligent Systems and Software Engineering Laboratory (ISSEL)
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
Director: Prof. P. A. Mitkas.
Jun 2003 – Sep 2003 Undergraduate Trainee at Intracom S.A.,
Software Design & Development for the IST-2001 Project “e-Sharing”

Academic Experience

Aristotle University of Thessaloniki

Teaching Experience
Feb 2007 Lectures on postgraduate semester course in “Databases and knowledge mining
Advanced European MSc Program MUNDUS – ERASMUS
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
October 2005 – today Teaching Assistant on 5th and 7th semester course “Data Structures
(Teaching: Prof. P. A. Mitkas)
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
June 2005 – today Project Thesis Co-supervisor (Supervisor: Prof. P. A. Mitkas)
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
March 2005 – May 2005 Seminar Lectures on Java Programming (40 hrs.)
undergraduate student seminars “Computer Use and Programming”
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
R&D Experience
Mar 2010 – today MICROME – A Knowledge-Based Bioinformatics Framework for Microbial Pathway Genomics
EU Framework Programme 7 Collaborative Project.
Scientific Advisor: Prof. C. A. Ouzounis.
Sep 2005 – Apr 2009 ASSIST – ASsociation Studies assisted by Inference and Semantic Technologies
European Commission IST Program, 2005-2007.
Scientific Advisor: Prof. P. A. Mitkas.
Jul 2004 – Sep 2010 EPEAEK II (National funded) – Operational Program for Education and Initial Vocational Training,
Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece.
Software Development, Research project: “e-THMMY”.
Scientific Advisor: Prof. P. A. Mitkas.

Research Interests

  • Bioinformatics
  • Grid Computing
  • Distributed / Parallel algorithms
  • Data Mining

Technical skills

  • Programming languages: Java SE, Java EE, C/C++/C#, Perl, Pascal, Visual Basic, Fortran
  • Parallel Programming frameworks: CUDA 2.0, MPI
  • Relational Database Management Systems programming/administration: MS SQL Server, Oracle 9i, MySQL, Postgresql
  • Web development: ASP.NET, HTML, JSP, PHP
  • Rapid Web development/Web authoring: JOOMLA, Drupal
  • Data processing and simulation frameworks: MATLAB, R, Mathematica
  • Grid Computing Framework: Globus Toolkit, gLite (member of HellasGrid / EGEE since 02/2006)

Foreign Languages

– English: Excellent (Cambridge Proficiency, Michigan Proficiency)
– German: Good (Goethe Institut Mittelstüfe)

Honors & Awards

  • Academic Excellence Award: for the 5th academic year (2003 – 2004), Technical Chamber of Greece (TEE), Athens, March 2007. Grade: 8.83/10
  • Student Grand: 1st European Summer School in Knowledge Discovery for Ubiquitous Computing, Dept. Computer Science VIII, University of Dortmund, 14-16 September, 2006.
  • Undergraduate degree ranking: 9th (July 2004). Grade: 8.17 / 10, Department of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, Greece

Memberships

  • Member of the Editorial Review Board of the International Journal of Systems Biology and Biomedicine Technologies (IJSBBT). (12/2010 – 12/2013)
  • Member of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB). (10/2006 – today)
  • IEEE and Computer Society Member. (11/2003 – today)
  • Member of TEE, Technical Chamber of Greece. (11/2004 – today)
  • Honorary member of “Apeiresios”, the Postgraduate Student Union. (12/2004 – today)

2019

Conference Papers

Maria Kotouza, Fotis Psomopoulos and Periklis A. Mitkas
New Trends in Databases and Information Systems, pp. 564-569, Springer International Publishing, Cham, 2019 Sep

Nowadays, a wide range of sciences are moving towards the Big Data era, producing large volumes of data that require processing for new knowledge extraction. Scientific workflows are often the key tools for solving problems characterized by computational complexity and data diversity, whereas cloud computing can effectively facilitate their efficient execution. In this paper, we present a generative big data analysis workflow that can provide analytics, clustering, prediction and visualization services to datasets coming from various scientific fields, by transforming input data into strings. The workflow consists of novel algorithms for data processing and relationship discovery, that are scalable and suitable for cloud infrastructures. Domain experts can interact with the workflow components, set their parameters, run personalized pipelines and have support for decision-making processes. As case studies in this paper, two datasets consisting of (i) Documents and (ii) Gene sequence data are used, showing promising results in terms of efficiency and performance.

@inproceedings{Kotouza19NTDIS,
author={Maria Kotouza and Fotis Psomopoulos and Periklis A. Mitkas},
title={A Dockerized String Analysis Workflow for Big Data},
booktitle={New Trends in Databases and Information Systems},
pages={564-569},
publisher={Springer International Publishing},
address={Cham},
year={2019},
month={09},
date={2019-09-01},
doi={https://doi.org/10.1007/978-3-030-30278-8_55},
isbn={978-3-030-30278-8},
publisher's url={https://link.springer.com/chapter/10.1007%2F978-3-030-30278-8_55},
abstract={Nowadays, a wide range of sciences are moving towards the Big Data era, producing large volumes of data that require processing for new knowledge extraction. Scientific workflows are often the key tools for solving problems characterized by computational complexity and data diversity, whereas cloud computing can effectively facilitate their efficient execution. In this paper, we present a generative big data analysis workflow that can provide analytics, clustering, prediction and visualization services to datasets coming from various scientific fields, by transforming input data into strings. The workflow consists of novel algorithms for data processing and relationship discovery, that are scalable and suitable for cloud infrastructures. Domain experts can interact with the workflow components, set their parameters, run personalized pipelines and have support for decision-making processes. As case studies in this paper, two datasets consisting of (i) Documents and (ii) Gene sequence data are used, showing promising results in terms of efficiency and performance.}
}

2018

Conference Papers

Sotirios-Filippos Tsarouchis, Maria Th. Kotouza, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences"
IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 189-199, Springer, Cham, 2018 May

The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.

@inproceedings{2018Tsarouchis,
author={Sotirios-Filippos Tsarouchis and Maria Th. Kotouza and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences},
booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations},
pages={189-199},
publisher={Springer},
address={Cham},
year={2018},
month={05},
date={2018-05-22},
doi={https://doi.org/10.1007/978-3-319-92016-0_18},
isbn={978-3-319-92016-0},
abstract={The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.}
}

Maria Th. Kotouza, Konstantinos N. Vavliakis, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Hierarchical Multi-Metric Framework for Item Clustering"
5th International Conference on Big Data Computing Applications and Technologies, pp. 191-197, IEEE/ACM, Zurich, Switzerland, 2018 Dec

Item clustering is commonly used for dimensionality reduction, uncovering item similarities and connections, gaining insights of the market structure and recommendations. Hierarchical clustering methods produce a hierarchy structure along with the clusters that can be useful for managing item categories and sub-categories, dealing with indirect competition and new item categorization as well. Nevertheless, baseline hierarchical clustering algorithms have high computational cost and memory usage. In this paper we propose an innovative scalable hierarchical clustering framework, which overcomes these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the items using three metrics, a) Identity, b) Similarity and c) Entropy, as well as a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. ?he proposed framework is evaluated on the popular MovieLens 20M dataset achieving significant reduction in both memory consumption and computational time over a baseline hierarchical clustering algorithm.

@inproceedings{KotouzaVPM18,
author={Maria Th. Kotouza and Konstantinos N. Vavliakis and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Hierarchical Multi-Metric Framework for Item Clustering},
booktitle={5th International Conference on Big Data Computing Applications and Technologies},
pages={191-197},
publisher={IEEE/ACM},
address={Zurich, Switzerland},
year={2018},
month={12},
date={2018-12-17},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/02/BDCAT_2018_paper_24_Proceedings.pdf},
doi={http://10.1109/BDCAT.2018.00031},
abstract={Item clustering is commonly used for dimensionality reduction, uncovering item similarities and connections, gaining insights of the market structure and recommendations. Hierarchical clustering methods produce a hierarchy structure along with the clusters that can be useful for managing item categories and sub-categories, dealing with indirect competition and new item categorization as well. Nevertheless, baseline hierarchical clustering algorithms have high computational cost and memory usage. In this paper we propose an innovative scalable hierarchical clustering framework, which overcomes these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the items using three metrics, a) Identity, b) Similarity and c) Entropy, as well as a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. ?he proposed framework is evaluated on the popular MovieLens 20M dataset achieving significant reduction in both memory consumption and computational time over a baseline hierarchical clustering algorithm.}
}

2017

Journal Articles

Athanassios M. Kintsakis, Fotis E. Psomopoulos, Andreas L. Symeonidis and Pericles A. Mitkas
"Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments"
SoftwareX, 6, pp. 217-224, 2017 Sep

Hermes introduces a new ”describe once, run anywhere” paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.

@article{SOFTX89,
author={Athanassios M. Kintsakis and Fotis E. Psomopoulos and Andreas L. Symeonidis and Pericles A. Mitkas},
title={Hermes: Seamless delivery of containerized bioinformatics workflows in hybrid cloud (HTC) environments},
journal={SoftwareX},
volume={6},
pages={217-224},
year={2017},
month={09},
date={2017-09-19},
url={http://www.sciencedirect.com/science/article/pii/S2352711017300304},
doi={http://10.1016/j.softx.2017.07.007},
keywords={Bioinformatics;hybrid cloud;scientific workflows;distributed computing},
abstract={Hermes introduces a new ”describe once, run anywhere” paradigm for the execution of bioinformatics workflows in hybrid cloud environments. It combines the traditional features of parallelization-enabled workflow management systems and of distributed computing platforms in a container-based approach. It offers seamless deployment, overcoming the burden of setting up and configuring the software and network requirements. Most importantly, Hermes fosters the reproducibility of scientific workflows by supporting standardization of the software execution environment, thus leading to consistent scientific workflow results and accelerating scientific output.}
}

Cezary Zielinski, Maciej Stefanczyk, Tomasz Kornuta, Maksym Figat, Wojciech Dudek, Wojciech Szynkiewicz, Wlodzimierz Kasprzak, Jan Figat, Marcin Szlenk, Tomasz Winiarski, Konrad Banachowicz, Teresa Zielinska, Emmanouil G. Tsardoulias, Andreas L. Symeonidis, Fotis E. Psomopoulos, Athanassios M. Kintsakis, Pericles A. Mitkas, Aristeidis Thallas, Sofia E. Reppou, George T. Karagiannis, Konstantinos Panayiotou, Vincent Prunet, Manuel Serrano, Jean-Pierre Merlet, Stratos Arampatzis, Alexandros Giokas, Lazaros Penteridis, Ilias Trochidis, David Daney and Miren Iturburu
"Variable structure robot control systems: The RAPP approach"
Robotics and Autonomous Systems, 94, pp. 226-244, 2017 May

This paper presents a method of designing variable structure control systems for robots. As the on-board robot computational resources are limited, but in some cases the demands imposed on the robot by the user are virtually limitless, the solution is to produce a variable structure system. The task dependent part has to be exchanged, however the task governs the activities of the robot. Thus not only exchange of some task-dependent modules is required, but also supervisory responsibilities have to be switched. Such control systems are necessary in the case of robot companions, where the owner of the robot may demand from it to provide many services.

@article{Zielnski2017,
author={Cezary Zielinski and Maciej Stefanczyk and Tomasz Kornuta and Maksym Figat and Wojciech Dudek and Wojciech Szynkiewicz and Wlodzimierz Kasprzak and Jan Figat and Marcin Szlenk and Tomasz Winiarski and Konrad Banachowicz and Teresa Zielinska and Emmanouil G. Tsardoulias and Andreas L. Symeonidis and Fotis E. Psomopoulos and Athanassios M. Kintsakis and Pericles A. Mitkas and Aristeidis Thallas and Sofia E. Reppou and George T. Karagiannis and Konstantinos Panayiotou and Vincent Prunet and Manuel Serrano and Jean-Pierre Merlet and Stratos Arampatzis and Alexandros Giokas and Lazaros Penteridis and Ilias Trochidis and David Daney and Miren Iturburu},
title={Variable structure robot control systems: The RAPP approach},
journal={Robotics and Autonomous Systems},
volume={94},
pages={226-244},
year={2017},
month={05},
date={2017-05-05},
url={http://www.sciencedirect.com/science/article/pii/S0921889016306248},
doi={https://doi.org/10.1016/j.robot.2017.05.002},
keywords={robot controllers;variable structure controllers;cloud robotics;RAPP},
abstract={This paper presents a method of designing variable structure control systems for robots. As the on-board robot computational resources are limited, but in some cases the demands imposed on the robot by the user are virtually limitless, the solution is to produce a variable structure system. The task dependent part has to be exchanged, however the task governs the activities of the robot. Thus not only exchange of some task-dependent modules is required, but also supervisory responsibilities have to be switched. Such control systems are necessary in the case of robot companions, where the owner of the robot may demand from it to provide many services.}
}

2016

Journal Articles

Michael Chatzidimopoulos, Fotis Psomopoulos, Emmanouil Malandrakis, Ioannis Ganopoulos, Panagiotis Madesis, Evangelos Vellios and Pavlidis Drogoudi
"Comparative Genomics of Botrytis cinerea Strains with Differential Multi-Drug Resistance"
Frontiers in Plant Science, 2016 Apr

Botrytis cinerea is a ubiquitous fungus difficult to control because it possess a variety of attack modes, diverse hosts as inoculum sources, and it can survive as mycelia and/or conidia or for extended periods as sclerotia in crop debris. For these reasons the use of any single control measure is unlikely to succeed and a combination of cultural practices with the application of site-specific synthetic compounds provide the best protection for the crops (Williamson et al., 2007). However, the chemical control has been adversely affected by the development of fungicide resistance. The selection of resistant individuals in a fungal population subjected to selective pressure due to fungicides is an evolutionary mechanism that promotes advantageous genotypes (Walker et al., 2013). High levels of resistance to site-specific fungicides are commonly associated with point mutations. For example the mutations G143A, H272R, and F412S leading to changes in the target proteins CytB, SdhB, and Erg27 are conferring resistance of the pathogen to the chemical classes of QoIs, SDHIs, and hydroxyanilides, respectively (Leroux, 2007). Multidrug resistance is another mechanism associated with resistance in B. cinerea which involves mutations leading to overexpression of individual transporters such as ABC and MFS (Kretschmer et al., 2009). This mechanism is associated with low levels of resistance to multiple fungicides including the anilinopyrimidines and phenylpyrroles. However, a subdivision of gray mold populations was found to be more tolerant to these two classes of fungicides (Leroch et al., 2013).Previous reports have clearly demonstrated that the resistance to anilinopyrimidines has a qualitative, disruptive pattern, and is monogenically controlled (Chapeland et al., 1999). In order to elucidate the mechanism of the resistance, the whole genome of three different samples (gene pools) was sequenced, each containing DNA of 10 selected strains of the same genotype regarding resistance to seven different classes of fungicides including anilinopyrimidines. This report presents the publicly available genomic data.

@article{2016ChatzidimopoulosFPS,
author={Michael Chatzidimopoulos and Fotis Psomopoulos and Emmanouil Malandrakis and Ioannis Ganopoulos and Panagiotis Madesis and Evangelos Vellios and Pavlidis Drogoudi},
title={Comparative Genomics of Botrytis cinerea Strains with Differential Multi-Drug Resistance},
journal={Frontiers in Plant Science},
year={2016},
month={04},
date={2016-04-28},
url={http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4849417/pdf/fpls-07-00554.pdf},
abstract={Botrytis cinerea is a ubiquitous fungus difficult to control because it possess a variety of attack modes, diverse hosts as inoculum sources, and it can survive as mycelia and/or conidia or for extended periods as sclerotia in crop debris. For these reasons the use of any single control measure is unlikely to succeed and a combination of cultural practices with the application of site-specific synthetic compounds provide the best protection for the crops (Williamson et al., 2007). However, the chemical control has been adversely affected by the development of fungicide resistance. The selection of resistant individuals in a fungal population subjected to selective pressure due to fungicides is an evolutionary mechanism that promotes advantageous genotypes (Walker et al., 2013). High levels of resistance to site-specific fungicides are commonly associated with point mutations. For example the mutations G143A, H272R, and F412S leading to changes in the target proteins CytB, SdhB, and Erg27 are conferring resistance of the pathogen to the chemical classes of QoIs, SDHIs, and hydroxyanilides, respectively (Leroux, 2007). Multidrug resistance is another mechanism associated with resistance in B. cinerea which involves mutations leading to overexpression of individual transporters such as ABC and MFS (Kretschmer et al., 2009). This mechanism is associated with low levels of resistance to multiple fungicides including the anilinopyrimidines and phenylpyrroles. However, a subdivision of gray mold populations was found to be more tolerant to these two classes of fungicides (Leroch et al., 2013).Previous reports have clearly demonstrated that the resistance to anilinopyrimidines has a qualitative, disruptive pattern, and is monogenically controlled (Chapeland et al., 1999). In order to elucidate the mechanism of the resistance, the whole genome of three different samples (gene pools) was sequenced, each containing DNA of 10 selected strains of the same genotype regarding resistance to seven different classes of fungicides including anilinopyrimidines. This report presents the publicly available genomic data.}
}

Sofia E. Reppou, Emmanouil G. Tsardoulias, Athanassios M. Kintsakis, Andreas Symeonidis, Pericles A. Mitkas, Fotis E. Psomopoulos, George T. Karagiannis, Cezary Zielinski, Vincent Prunet, Jean-Pierre Merlet, Miren Iturburu and Alexandros Gkiokas
"RAPP: A robotic-oriented ecosystem for delivering smart user empowering applications for older people"
Journal of Social Robotics, pp. 15, 2016 Jun

It is a general truth that increase of age is associated with a level of mental and physical decline but unfortunately the former are often accompanied by social exclusion leading to marginalization and eventually further acceleration of the aging process. A new approach in alleviating the social exclusion of older people involves the use of assistive robots. As robots rapidly invade everyday life, the need of new software paradigms in order to address the user’s unique needs becomes critical. In this paper we present a novel architectural design, the RAPP [a software platform to deliver smart, user empowering robotic applications (RApps)] framework that attempts to address this issue. The proposed framework has been designed in a cloud-based approach, integrating robotic devices and their respective applications. We aim to facilitate seamless development of RApps compatible with a wide range of supported robots and available to the public through a unified online store.

@article{2016ReppouJSR,
author={Sofia E. Reppou and Emmanouil G. Tsardoulias and Athanassios M. Kintsakis and Andreas Symeonidis and Pericles A. Mitkas and Fotis E. Psomopoulos and George T. Karagiannis and Cezary Zielinski and Vincent Prunet and Jean-Pierre Merlet and Miren Iturburu and Alexandros Gkiokas},
title={RAPP: A robotic-oriented ecosystem for delivering smart user empowering applications for older people},
journal={Journal of Social Robotics},
pages={15},
year={2016},
month={06},
date={2016-06-18},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/RAPP-A-Robotic-Oriented-Ecosystem-for-Delivering-Smart-User-Empowering-Applications-for-Older-People.pdf},
doi={http://10.1007/s10515-016-0206-x},
abstract={It is a general truth that increase of age is associated with a level of mental and physical decline but unfortunately the former are often accompanied by social exclusion leading to marginalization and eventually further acceleration of the aging process. A new approach in alleviating the social exclusion of older people involves the use of assistive robots. As robots rapidly invade everyday life, the need of new software paradigms in order to address the user’s unique needs becomes critical. In this paper we present a novel architectural design, the RAPP [a software platform to deliver smart, user empowering robotic applications (RApps)] framework that attempts to address this issue. The proposed framework has been designed in a cloud-based approach, integrating robotic devices and their respective applications. We aim to facilitate seamless development of RApps compatible with a wide range of supported robots and available to the public through a unified online store.}
}

Emmanouil Tsardoulias, Athanassios Kintsakis, Konstantinos Panayiotou, Aristeidis Thallas, Sofia Reppou, George Karagiannis, Miren Iturburu, Stratos Arampatzis, Cezary Zielinskic, Vincent Prunetg, Fotis Psomopoulos, Andreas Symeonidis and Pericles Mitkas
"Towards an integrated robotics architecture for social inclusion – The RAPP paradigm"
Cognitive Systems Research, pp. 1-8, 2016 Sep

Scientific breakthroughs have led to an increase in life expectancy, to the point where senior citizens comprise an ever increasing percentage of the general population. In this direction, the EU funded RAPP project “Robotic Applications for Delivering Smart User Empowering Applications” introduces socially interactive robots that will not only physically assist, but also serve as a companion to senior citizens. The proposed RAPP framework has been designed aiming towards a cloud-based integrated approach that enables robotic devices to seamlessly deploy robotic applications, relieving the actual robots from computational burdens. The Robotic Applications (RApps) developed according to the RAPP paradigm will empower consumer social robots, allowing them to adapt to versatile situations and materialize complex behaviors and scenarios. The RAPP pilot cases involve the development of RApps for the NAO humanoid robot and the ANG-MED rollator targeting senior citizens that (a) are technology illiterate, (b) have been diagnosed with mild cognitive impairment or (c) are in the process of hip fracture rehabilitation. Initial results establish the robustness of RAPP in addressing the needs of end users and developers, as well as its contribution in significantly increasing the quality of life of senior citizens.

@article{2016TsardouliasCSR,
author={Emmanouil Tsardoulias and Athanassios Kintsakis and Konstantinos Panayiotou and Aristeidis Thallas and Sofia Reppou and George Karagiannis and Miren Iturburu and Stratos Arampatzis and Cezary Zielinskic and Vincent Prunetg and Fotis Psomopoulos and Andreas Symeonidis and Pericles Mitkas},
title={Towards an integrated robotics architecture for social inclusion – The RAPP paradigm},
journal={Cognitive Systems Research},
pages={1-8},
year={2016},
month={09},
date={2016-09-03},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/09/COGSYS_2016_R1.pdf},
abstract={Scientific breakthroughs have led to an increase in life expectancy, to the point where senior citizens comprise an ever increasing percentage of the general population. In this direction, the EU funded RAPP project “Robotic Applications for Delivering Smart User Empowering Applications” introduces socially interactive robots that will not only physically assist, but also serve as a companion to senior citizens. The proposed RAPP framework has been designed aiming towards a cloud-based integrated approach that enables robotic devices to seamlessly deploy robotic applications, relieving the actual robots from computational burdens. The Robotic Applications (RApps) developed according to the RAPP paradigm will empower consumer social robots, allowing them to adapt to versatile situations and materialize complex behaviors and scenarios. The RAPP pilot cases involve the development of RApps for the NAO humanoid robot and the ANG-MED rollator targeting senior citizens that (a) are technology illiterate, (b) have been diagnosed with mild cognitive impairment or (c) are in the process of hip fracture rehabilitation. Initial results establish the robustness of RAPP in addressing the needs of end users and developers, as well as its contribution in significantly increasing the quality of life of senior citizens.}
}

Aliki Xanthopoulou, Fotis Psomopoulos, Ioannis Ganopoulos, Maria Manioudaki, Athanasios Tsaftaris, Irini Nianiou-Obeidat and Panagiotis Madesis
"De novo transcriptome assembly of two contrasting pumpkin cultivars"
Genomics Data pp 200-201, 2016 Jan

Cucurbita pepo (squash, pumpkin, gourd), a worldwide-cultivated vegetable of American origin, is extremely variable in fruit characteristics. However, the information associated with genes and genetic markers for pumpkin is very limited. In order to identify new genes and to develop genetic markers, we performed a transcriptome analysis (RNA-Seq) of two contrasting pumpkin cultivars. Leaves and female flowers of cultivars,

@article{2016XanthopoulouGD,
author={Aliki Xanthopoulou and Fotis Psomopoulos and Ioannis Ganopoulos and Maria Manioudaki and Athanasios Tsaftaris and Irini Nianiou-Obeidat and Panagiotis Madesis},
title={De novo transcriptome assembly of two contrasting pumpkin cultivars},
journal={Genomics Data pp 200-201},
year={2016},
month={01},
date={2016-01-15},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/De-novo-transcriptome-assembly-of-two-contrasting-pumpkin-cultivars.pdf},
abstract={Cucurbita pepo (squash, pumpkin, gourd), a worldwide-cultivated vegetable of American origin, is extremely variable in fruit characteristics. However, the information associated with genes and genetic markers for pumpkin is very limited. In order to identify new genes and to develop genetic markers, we performed a transcriptome analysis (RNA-Seq) of two contrasting pumpkin cultivars. Leaves and female flowers of cultivars,}
}

2016

Conference Papers

Fotis Psomopoulos, Athanassios Kintsakis and Pericles Mitkas
"A pan-genome approach and application to species with photosynthetic capabilities"
15th European Conference on Computational Biology, The Hague, Netherlands, 2016 Sep

The abundance of genome data being produced by the new sequencing techniques is providing the opportunity to investigate gene diversity at a new level. A pan-genome analysis can provide the framework for estimating the genomic diversity of the data set at hand and give insights towards the understanding of its observed characteristics. Currently, there exist several tools for pan-genome studies, mostly focused on prokaryote genomes and their respective attributes. Here we provide a systematic approach for constructing the groups inherently associated with a pan-genome analysis, using the complete proteome data of photosynthetic genomes as the driving case study. As opposed to similar studies, the presented method requires a complete information system (i.e. complete genomes) in order to produce meaningful results. The method was applied to 95 genomes with photosynthetic capabilities, including cyanobacteria and green plants, as retrieved from UniProt and Plaza. Due to the significant computational requirements of the analysis, we utilized the Federated Cloud computing resources provided by the EGI infrastructure. The analysis ultimately produced 37,680 protein families, with a core genome comprising of 102 families. An investigation of the families’ distribution revealed two underlying but expected subsets, roughly corresponding to bacteria and eukaryotes. Finally, an automated functional annotation of the produced clusters, through assignment of PFAM domains to the participating protein sequences, allowed the identification of the key characteristics present in the core genome, as well as of selected multi-member families.

@inproceedings{2016PsomopoulosECCB,
author={Fotis Psomopoulos and Athanassios Kintsakis and Pericles Mitkas},
title={A pan-genome approach and application to species with photosynthetic capabilities},
booktitle={15th European Conference on Computational Biology},
address={The Hague, Netherlands},
year={2016},
month={09},
date={2016-09-01},
abstract={The abundance of genome data being produced by the new sequencing techniques is providing the opportunity to investigate gene diversity at a new level. A pan-genome analysis can provide the framework for estimating the genomic diversity of the data set at hand and give insights towards the understanding of its observed characteristics. Currently, there exist several tools for pan-genome studies, mostly focused on prokaryote genomes and their respective attributes. Here we provide a systematic approach for constructing the groups inherently associated with a pan-genome analysis, using the complete proteome data of photosynthetic genomes as the driving case study. As opposed to similar studies, the presented method requires a complete information system (i.e. complete genomes) in order to produce meaningful results. The method was applied to 95 genomes with photosynthetic capabilities, including cyanobacteria and green plants, as retrieved from UniProt and Plaza. Due to the significant computational requirements of the analysis, we utilized the Federated Cloud computing resources provided by the EGI infrastructure. The analysis ultimately produced 37,680 protein families, with a core genome comprising of 102 families. An investigation of the families’ distribution revealed two underlying but expected subsets, roughly corresponding to bacteria and eukaryotes. Finally, an automated functional annotation of the produced clusters, through assignment of PFAM domains to the participating protein sequences, allowed the identification of the key characteristics present in the core genome, as well as of selected multi-member families.}
}

Emmanouil Stergiadis, Athanassios Kintsakis, Fotis Psomopoulos and Pericles A. Mitkas
"A scalable Grid Computing framework for extensible phylogenetic profile construction"
12th International Conference on Artificial Intelligence Applications and Innovations, pp. 455-462, 12th International Conference on Artificial Intelligence Applications and Innovations, Thessaloniki, Greece, September, 2016 Sep

Current research in Life Sciences without doubt has been established as a Big Data discipline. Beyond the expected domain-specific requirements, this perspective has put scalability as one of the most crucial aspects of any state-of-the-art bioinformatics framework. Sequence alignment and construction of phylogenetic profiles are common tasks evident in a wide range of life science analyses as, given an arbitrary big volume of genomes, they can provide useful insights on the functionality and relationships of the involved entities. This process is often a computational bottleneck in existing solutions, due to its inherent complexity. Our proposed distributed framework manages to perform both tasks with significant speed-up by employing Grid Computing resources provided by EGI in an efficient and optimal manner. The overall workflow is both fully automated, thus making it user friendly, and fully detached from the end-users terminal, since all computations take place on Grid worker nodes.

@inproceedings{2016Stergiadis,
author={Emmanouil Stergiadis and Athanassios Kintsakis and Fotis Psomopoulos and Pericles A. Mitkas},
title={A scalable Grid Computing framework for extensible phylogenetic profile construction},
booktitle={12th International Conference on Artificial Intelligence Applications and Innovations},
pages={455-462},
publisher={12th International Conference on Artificial Intelligence Applications and Innovations},
address={Thessaloniki, Greece, September},
year={2016},
month={09},
date={2016-09-02},
abstract={Current research in Life Sciences without doubt has been established as a Big Data discipline. Beyond the expected domain-specific requirements, this perspective has put scalability as one of the most crucial aspects of any state-of-the-art bioinformatics framework. Sequence alignment and construction of phylogenetic profiles are common tasks evident in a wide range of life science analyses as, given an arbitrary big volume of genomes, they can provide useful insights on the functionality and relationships of the involved entities. This process is often a computational bottleneck in existing solutions, due to its inherent complexity. Our proposed distributed framework manages to perform both tasks with significant speed-up by employing Grid Computing resources provided by EGI in an efficient and optimal manner. The overall workflow is both fully automated, thus making it user friendly, and fully detached from the end-users terminal, since all computations take place on Grid worker nodes.}
}

2015

Journal Articles

Alfonso M Duarte, Fotis Psomopoulos, Christophe Blanchet, Alexandre M Bonvin, Manuel Corpas, Alain Franc, Rafael C Jimenez, Jesus M de Lucas, Tommi Nyrönen, Gargely Sipos and Stephanie B Suhr
"Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis"
Frontiers in Genetics, Vol. 6, No. 197 (2015), 2015 Jun

With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.

@article{2015DuarteFG,
author={Alfonso M Duarte and Fotis Psomopoulos and Christophe Blanchet and Alexandre M Bonvin and Manuel Corpas and Alain Franc and Rafael C Jimenez and Jesus M de Lucas and Tommi Nyrönen and Gargely Sipos and Stephanie B Suhr},
title={Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis},
journal={Frontiers in Genetics, Vol. 6, No. 197 (2015)},
year={2015},
month={06},
date={2015-06-23},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Future-opportunities-and-trends-for-e-infrastructures-and-life-sciences-going-beyond-the-grid-to-enable-life-science-data-analysis.pdf},
abstract={With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.}
}

Dimitrios Vitsios, Fotis Psomopoulos, Pericles Mitkas and Christos Ouzounis
"Inference of pathway decomposition across multiple species through gene clustering"
International Journal on Artificial Intelligence Tools, 24, pp. 25, 2015 Feb

In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel algorithm has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm

@article{2015vitsiosIJAIT,
author={Dimitrios Vitsios and Fotis Psomopoulos and Pericles Mitkas and Christos Ouzounis},
title={Inference of pathway decomposition across multiple species through gene clustering},
journal={International Journal on Artificial Intelligence Tools},
volume={24},
pages={25},
year={2015},
month={02},
date={2015-02-23},
url={http://www.worldscientific.com/doi/pdf/10.1142/S0218213015400035},
abstract={In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel algorithm has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm}
}

2015

Conference Papers

Fotis Psomopoulos, Olga Vrousgou and Pericles A. Mitkas
"Large-scale modular comparative genomics: the Grid approach"
23rd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) / 14th European Conference on Computational Biology (ECCB), 2015 Jul

@conference{2015PsomopoulosAICISMB,
author={Fotis Psomopoulos and Olga Vrousgou and Pericles A. Mitkas},
title={Large-scale modular comparative genomics: the Grid approach},
booktitle={23rd Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) / 14th European Conference on Computational Biology (ECCB)},
year={2015},
month={07},
date={2015-07-26},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Large-scale-modular-comparative-genomics-the-Grid-approach.pdf}
}

Olga Vrousgou, Fotis Psomopoulos and Pericles Mitkas
"A grid-enabled modular framework for efficient sequence analysis workflows"
16th International Conference on Engineering Applications of Neural Network, Island of Rhodes, 2015 Oct

In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400%.

@conference{2015VrousgouICEANN,
author={Olga Vrousgou and Fotis Psomopoulos and Pericles Mitkas},
title={A grid-enabled modular framework for efficient sequence analysis workflows},
booktitle={16th International Conference on Engineering Applications of Neural Network},
address={Island of Rhodes},
year={2015},
month={10},
date={2015-10-22},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/A-Grid-Enabled-Modular-Framework-for-Efficient-Sequence-Analysis-Workflows.pdf},
abstract={In the era of Big Data in Life Sciences, efficient processing and analysis of vast amounts of sequence data is becoming an ever daunting challenge. Among such analyses, sequence alignment is one of the most commonly used procedures, as it provides useful insights on the functionality and relationship of the involved entities. Sequence alignment is one of the most common computational bottlenecks in several bioinformatics workflows. We have designed and implemented a time-efficient distributed modular application for sequence alignment, phylogenetic profiling and clustering of protein sequences, by utilizing the European Grid Infrastructure. The optimal utilization of the Grid with regards to the respective modules, allowed us to achieve significant speedups to the order of 1400%.}
}

2014

Conference Papers

Fotis Psomopoulos, Emmanouil Tsardoulias, Alexandros Giokas, Cezary Zielinski, Vincent Prunet, Ilias Trochidis, David Daney, Manuel Serrano, Ludovic Courtes, Stratos Arampatzis and Pericles A. Mitkas
"RAPP System Architecture, Assistance and Service Robotics in a Human Environment"
International Conference on Intelligent Robots and Systems (IEEE/RSJ), Chicago, Illinois, 2014 Sep

Robots are fast becoming a part of everyday life. This rise can be evidenced both through the public news and announcements, as well as in recent literature in the robotics scientific communities. This expanding development requires new paradigms in producing the necessary software to allow for the users

@conference{2014PsomopoulosIEEE/RSJ,
author={Fotis Psomopoulos and Emmanouil Tsardoulias and Alexandros Giokas and Cezary Zielinski and Vincent Prunet and Ilias Trochidis and David Daney and Manuel Serrano and Ludovic Courtes and Stratos Arampatzis and Pericles A. Mitkas},
title={RAPP System Architecture, Assistance and Service Robotics in a Human Environment},
booktitle={International Conference on Intelligent Robots and Systems (IEEE/RSJ)},
address={Chicago, Illinois},
year={2014},
month={09},
date={2014-09-14},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/RAPP-System-Architecture-Assistance-and-Service-Robotics-in-a-Human-Environment.pdf},
keywords={Load Forecasting},
abstract={Robots are fast becoming a part of everyday life. This rise can be evidenced both through the public news and announcements, as well as in recent literature in the robotics scientific communities. This expanding development requires new paradigms in producing the necessary software to allow for the users}
}

2013

Journal Articles

Fotis E. Psomopoulos, Pericles A. Mitkas and Christos A. Ouzounis
"Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profile"
Plos ONE, 2013 Jan

Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.

@article{2013PsomopoulosPlosOne,
author={Fotis E. Psomopoulos and Pericles A. Mitkas and Christos A. Ouzounis},
title={Detection of Genomic Idiosyncrasies Using Fuzzy Phylogenetic Profile},
journal={Plos ONE},
year={2013},
month={01},
date={2013-01-14},
url={http://issel.ee.auth.gr/wp-content/uploads/2015/06/journal.pone_.0052854.pdf},
abstract={Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.}
}

2012

Journal Articles

Fotis Psomopoulos, Victoria Siarkou, Nikolas Papanikolaou, Ioannis Iliopoulos, Athanasios Tsaftaris, Vasilis Promponas and Christos Ouzounis
"The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence"
Genes, Vol 3, No 2 (2012), pp. 291-319, 16, 2012 May

The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.

@article{2012PsomopoulosGenes,
author={Fotis Psomopoulos and Victoria Siarkou and Nikolas Papanikolaou and Ioannis Iliopoulos and Athanasios Tsaftaris and Vasilis Promponas and Christos Ouzounis},
title={The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence},
journal={Genes, Vol 3, No 2 (2012), pp. 291-319},
volume={16},
year={2012},
month={05},
date={2012-05-16},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/The-Chlamydiales-Pangenome-Revisited-Structural-Stability-and-Functional-Coherence.pdf},
doi={http://10.3390/genes3020291},
keywords={Classification;Initialization;Learning Classifier Systems (LCS);Supervised Learning},
abstract={The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.}
}

2012

Conference Papers

Dimitrios M. Vitsios, Fotis E. Psomopoulos, Pericles A. Mitkas and Chistos A. Ouzounis
"Mutli-gemone Core Pathway Identification Through Gene Clustering"
1st Workshop on Algorithms for Data and Text Mining in Bionformatics (WADTMB 2012) in conjunction with the 8th AIAI, Halkidiki, Greece, 2012 Sep

In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.

@inproceedings{2012VitsiosWADTMB,
author={Dimitrios M. Vitsios and Fotis E. Psomopoulos and Pericles A. Mitkas and Chistos A. Ouzounis},
title={Mutli-gemone Core Pathway Identification Through Gene Clustering},
booktitle={1st Workshop on Algorithms for Data and Text Mining in Bionformatics (WADTMB 2012) in conjunction with the 8th AIAI},
address={Halkidiki, Greece},
year={2012},
month={09},
date={2012-09-27},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Multi-genome-Core-Pathway-Identification-through-Gene-Clustering.pdf},
abstract={In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.}
}

2011

Conference Papers

Dimitrios Vitsios, Fotis E. Psomopoulos, Pericles A. Mitkas and Christos A. Ouzounis
"Detecting Species Evolution Through Metabolic Pathways"
6th Conference of the Hellenic Society for computational Biology & Bioinformatics (HSCBB11), pp. 16, Patra, Greece, 2011 Oct

The emergence and evolution of metabolic pathways represented a crucial step in molecular and cellular evolution. Withthe current advances in genomics and proteomics, it has become imperative to explore the impact of gene evolution as reflected in the metabolic signature of each genome (Zhang et al. (2006)). To this end a methodology is presented, which applies a clustering algorithm to genes from different species participating in the same pathway.

@inproceedings{PsomopoulosHSCBB11,
author={Dimitrios Vitsios and Fotis E. Psomopoulos and Pericles A. Mitkas and Christos A. Ouzounis},
title={Detecting Species Evolution Through Metabolic Pathways},
booktitle={6th Conference of the Hellenic Society for computational Biology & Bioinformatics (HSCBB11)},
pages={16},
address={Patra, Greece},
year={2011},
month={10},
date={2011-10-07},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Detecting-species-evolution-through-metabolic-pathways..pdf},
keywords={folksonomy;personalization;recommendation;semantic evaluation;tagging},
abstract={The emergence and evolution of metabolic pathways represented a crucial step in molecular and cellular evolution. Withthe current advances in genomics and proteomics, it has become imperative to explore the impact of gene evolution as reflected in the metabolic signature of each genome (Zhang et al. (2006)). To this end a methodology is presented, which applies a clustering algorithm to genes from different species participating in the same pathway.}
}

2010

Journal Articles

Fotis E. Psomopoulos and Pericles A. Mitkas
"Bioinformatics algorithm development for Grid environments"
Journal of Systems and Software, 83, (7), 2010 Jul

A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.

@article{2010PsomopoulosJOSAS,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Bioinformatics algorithm development for Grid environments},
journal={Journal of Systems and Software},
volume={83},
number={7},
year={2010},
month={07},
date={2010-07-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Bioinformatics-algorithm-development-for-Grid-environments.pdf},
keywords={Bioinformatics;Data analysis;Grid computing;Protein classification;Semi-automated tool;Workflow design},
abstract={A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of increased availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods either focus on specific groups of proteins or reduce the size of the original data set and/or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.}
}

2010

Conference Papers

Kyriakos C. Chatzidimitriou, Fotis E. Psomopoulos and Pericles A. Mitkas
"Grid-enabled parameter initialization for high performance machine learning tasks"
5th EGEE User Forum, pp. 113-114, 2010 Apr

In this work we use the NeuroEvolution of augmented Topologies (NEAT) methodology, for optimising Echo State Networks (ESNs), in order to achieve high performance in machine learning tasks. The large parameter space of NEAT, the many variations of ESNs and the stochastic nature of enolutionary computation, requiring manyevaluations for staatistically valid conclusions, promotes the Grid as a a viable solution for robustly evaluationg the alternatives and deriving significant conclusions.

@inproceedings{2010ChatzidimitriouEGEEForum,
author={Kyriakos C. Chatzidimitriou and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Grid-enabled parameter initialization for high performance machine learning tasks},
booktitle={5th EGEE User Forum},
pages={113-114},
year={2010},
month={04},
date={2010-04-14},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Grid-enabled-parameter-initialization-for-high-performance-machine-learning-tasks.pdf},
keywords={Neuroenolution;Parameter optimisation},
abstract={In this work we use the NeuroEvolution of augmented Topologies (NEAT) methodology, for optimising Echo State Networks (ESNs), in order to achieve high performance in machine learning tasks. The large parameter space of NEAT, the many variations of ESNs and the stochastic nature of enolutionary computation, requiring manyevaluations for staatistically valid conclusions, promotes the Grid as a a viable solution for robustly evaluationg the alternatives and deriving significant conclusions.}
}

Fotis E. Psomopoulos and Pericles A. Mitkas
"Multi Level Clustering of Phylogenetic Profiles"
BioInformatics and BioEngineering (BIBE), 2010 IEEE International Conference, pp. 308-309, Freiburg, Germany, 2010 May

The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.

@conference{2010PsomopoulosBIBE,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Multi Level Clustering of Phylogenetic Profiles},
booktitle={BioInformatics and BioEngineering (BIBE), 2010 IEEE International Conference},
pages={308-309},
address={Freiburg, Germany},
year={2010},
month={05},
date={2010-05-31},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Multi-Level-Clustering-of-Phylogenetic-Profiles.pdf},
keywords={Algorithm;Clustering;Phylogenetic profiles},
abstract={The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infer gene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles. Phylogenetic profiles are vectors which indicate the presence or absence of a gene in other genomes. The main concept of phylogenetic profiles is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of phylogenetic profiles is presented, which aims to detect inter- and intra-genome gene clusters.}
}

Fotis E. Psomopoulos, Pericles A. Mitkas and Christos A. Ouzounis
"Clustering of discrete and fuzzy phylogenetic profiles"
5th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB, pp. 58, Alexandroupoli, Greece, 2010 Oct

Phylogenetic profiles have long been a focus of interest in computational genomics. Encoding the subset of organisms that contain a homolog of a gene or protein, phylogenetic profiles are originally defined as binary vectors of n entries, where n corresponds to the number of target genomes. It is widely accepted that similar profiles especially those not connected by sequence similarity correspond to a correlated pattern of functional linkage. To this end, our study presents two methods of phylogenetic profile data analysis, aiming at detecting genes with peculiar, unique characteristics. Genes with similar phylogenetic profiles are likely to have similar structure or function, such as participating to a common structural complex or to a common pathway. Our two methods aim at detecting those outlier profiles of “interesting” genes, or groups of genes, with different characteristics from their parent genome.

@inproceedings{2010PsomopoulosHSCBB,
author={Fotis E. Psomopoulos and Pericles A. Mitkas and Christos A. Ouzounis},
title={Clustering of discrete and fuzzy phylogenetic profiles},
booktitle={5th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB},
pages={58},
address={Alexandroupoli, Greece},
year={2010},
month={10},
date={2010-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Clustering-of-discrete-and-fuzzy-phylogenetic-profiles.pdf},
keywords={Computational genomics},
abstract={Phylogenetic profiles have long been a focus of interest in computational genomics. Encoding the subset of organisms that contain a homolog of a gene or protein, phylogenetic profiles are originally defined as binary vectors of n entries, where n corresponds to the number of target genomes. It is widely accepted that similar profiles especially those not connected by sequence similarity correspond to a correlated pattern of functional linkage. To this end, our study presents two methods of phylogenetic profile data analysis, aiming at detecting genes with peculiar, unique characteristics. Genes with similar phylogenetic profiles are likely to have similar structure or function, such as participating to a common structural complex or to a common pathway. Our two methods aim at detecting those outlier profiles of “interesting” genes, or groups of genes, with different characteristics from their parent genome.}
}

Fani A. Tzima, Fotis E. Psomopoulos and Pericles A. Mitkas
"An investigation of the effect of clustering-based initialization on Learning Classifiers Systems"
5th EGEE User Forum, pp. 111-112, 2010 Apr

Strength-based Learning Classifier Systems (LCS) are machine learning systems designed to tackle both sequential and single-step decision tasks by coupling a gradually evolving population of rules with a reinforcement component. ZCS-DM, a Zeroth-level Classifier System for Data Mining, is a novel algorithm in this field, recently shown to be very effective in several benchmark classification problems. In this paper, we evaluate the effect of clustering-based initialization on the algorithm’s performance, utilizing the EGEE infrastructure as a robust framework for an efficient parameter sweep.

@inproceedings{2010TzimaEGEEForum,
author={Fani A. Tzima and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={An investigation of the effect of clustering-based initialization on Learning Classifiers Systems},
booktitle={5th EGEE User Forum},
pages={111-112},
year={2010},
month={04},
date={2010-04-01},
keywords={Algorithm Optimization;Parameter Sweep},
abstract={Strength-based Learning Classifier Systems (LCS) are machine learning systems designed to tackle both sequential and single-step decision tasks by coupling a gradually evolving population of rules with a reinforcement component. ZCS-DM, a Zeroth-level Classifier System for Data Mining, is a novel algorithm in this field, recently shown to be very effective in several benchmark classification problems. In this paper, we evaluate the effect of clustering-based initialization on the algorithm’s performance, utilizing the EGEE infrastructure as a robust framework for an efficient parameter sweep.}
}

2009

Journal Articles

Fotis E. Psomopoulos, Pericles A. Mitkas, Christos S. Krinas and Ioannis N. Demetropoulos
"A grid-enabled algorithm yields figure-eight molecular knot"
Molecular Simulation, 35, (9), pp. 725-736, 2009 Jun

The recently proposed general molecular knotting algorithm and its associated package, MolKnot, introduce programming into certain sections of stereochemistry. This work reports the G-MolKnot procedure that was deployed over the grid infrastructure; it applies a divide-and-conquer approach to the problem by splitting the initial search space into multiple independent processes and, combining the results at the end, yields significant improvements with regards to the overall efficiency. The algorithm successfully detected the smallest ever reported alkane configured to an open-knotted shape with four crossings.

@article{2009PsomopoulosMS,
author={Fotis E. Psomopoulos and Pericles A. Mitkas and Christos S. Krinas and Ioannis N. Demetropoulos},
title={A grid-enabled algorithm yields figure-eight molecular knot},
journal={Molecular Simulation},
volume={35},
number={9},
pages={725-736},
year={2009},
month={06},
date={2009-06-17},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/A-grid-enabled-algorithm-yields-Figure-Eight-molecular-knot.pdf},
keywords={data decomposition;figure-eight molecular knot;knot theory;stereochemistry},
abstract={The recently proposed general molecular knotting algorithm and its associated package, MolKnot, introduce programming into certain sections of stereochemistry. This work reports the G-MolKnot procedure that was deployed over the grid infrastructure; it applies a divide-and-conquer approach to the problem by splitting the initial search space into multiple independent processes and, combining the results at the end, yields significant improvements with regards to the overall efficiency. The algorithm successfully detected the smallest ever reported alkane configured to an open-knotted shape with four crossings.}
}

2009

Books

Fotis Psomopoulos and Pericles Mitkas
"Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare"
2, UK: IGI Global., Catanzaro, Italy, 2009 May

@book{2009PsomopoulosHRCGTLSBH,
author={Fotis Psomopoulos and Pericles Mitkas},
title={Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare},
volume={2},
publisher={UK: IGI Global.},
address={Catanzaro, Italy},
year={2009},
month={05},
date={2009-05-00}
}

2009

Conference Papers

Konstantinos M. Karagiannis, Fotis E. Psomopoulos and Pericles A. Mitkas
"Multi Level Clustering of Phylogenetic Profiles"
4th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '09, Athens, Greece, 2009 Dec

The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infergene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles (Pellegriniet al. (1999)). Phylogenetic profiles (pp) are vectors which indicate the presence or absence of a gene in other genomes. The main concept of pp’s is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of pp’s is presented, which aims to detect inter- and intra-genome gene clusters

@inproceedings{2009KaragiannisHSCBB,
author={Konstantinos M. Karagiannis and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Multi Level Clustering of Phylogenetic Profiles},
booktitle={4th Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB '09},
address={Athens, Greece},
year={2009},
month={12},
date={2009-12-18},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Multi-Level-Clustering-of-Phylogenetic-Profiles.pdf},
keywords={infer gene function;prediction of gene},
abstract={The prediction of gene function from genome sequences is one of the main issues in Bioinformatics. Most computational approaches are based on the similarity between sequences to infergene function. However, the availability of several fully sequenced genomes has enabled alternative approaches, such as phylogenetic profiles (Pellegriniet al. (1999)). Phylogenetic profiles (pp) are vectors which indicate the presence or absence of a gene in other genomes. The main concept of pp’s is that proteins participating in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion. In this paper, a multi level clustering algorithm of pp’s is presented, which aims to detect inter- and intra-genome gene clusters}
}

2009

Incollection

Fotis E. Psomopoulos and Pericles A. Mitkas
"Data Mining in Proteomics using Grid Computing"
Handbook of Research on Computational Grid Technologies for LifeSciences, Biomedicine and Healthcare, pp. 245-267, IGI Global, UK, 2009 May

The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.

@incollection{2009PsomopoulosHRCGT,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Data Mining in Proteomics using Grid Computing},
booktitle={Handbook of Research on Computational Grid Technologies for LifeSciences, Biomedicine and Healthcare},
pages={245-267},
publisher={IGI Global},
address={UK},
year={2009},
month={05},
date={2009-05-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/Data-Mining-in-Proteomics-Using-Grid-Computing.pdf},
keywords={Data Mining techniques;knowledge extraction in proteomics},
abstract={The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.}
}

Fotis E. Psomopoulos and Pericles A. Mitkas
"BADGE: Bioinformatics Algorithm Development for Grid Environments"
13th Panhellenic Conference on Informatics, pp. 93-107, Corfu, Greece, 2009 Sep

A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.

@incollection{2009PsomopoulosPCI,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={BADGE: Bioinformatics Algorithm Development for Grid Environments},
booktitle={13th Panhellenic Conference on Informatics},
pages={93-107},
address={Corfu, Greece},
year={2009},
month={09},
date={2009-09-01},
url={http://issel.ee.auth.gr/wp-content/uploads/fpsompci20091.pdf},
abstract={A Grid environment can be viewed as a virtual computing architecture that provides the ability to perform higher throughput computing by taking advantage of many computers geographically dispersed and connected by a network. Bioinformatics applications stand to gain in such a distributed environment in terms of availability, reliability and efficiency of computational resources. There is already considerable research in progress toward applying parallel computing techniques on bioinformatics methods, such as multiple sequence alignment, gene expression analysis and phylogenetic studies. In order to cope with the dimensionality issue, most machine learning methods focus on specific groups of proteins or reduce either the size of the original data set or the number of attributes involved. Grid computing could potentially provide an alternative solution to this problem, by combining multiple approaches in a seamless way. In this paper we introduce a unifying methodology coupling the strengths of the Grid with the specific needs and constraints of the major bioinformatics approaches. We also present a tool that implements this process and allows researchers to assess the computational needs for a specific task and optimize the allocation of available resources for its efficient completion.}
}

2008

Conference Papers

Christos N. Gkekas, Fotis E. Psomopoulos and Pericles A. Mitkas
"Exploiting parallel data mining processing for protein annotation"
Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference, pp. 242-252, Samos, Greece, 2008 Aug

Proteins are large organic compounds consisting of amino acids arranged in a linear chain and joined together by peptide bonds. One of the most important challenges in modern Bioinformatics is the accurate prediction of the functional behavior of proteins. In this paper a novel parallel methodology for automatic protein function annotation is presented. Data mining techniques are employed in order to construct models based on data generated from already annotated protein sequences. The first step of the methodology is to obtain the motifs present in these sequences, which are then provided as input to the data mining algorithms in order to create a model for every term. Experiments conducted using the EGEE Grid environment as a source of multiple CPUs clearly indicate that the methodology is highly efficient and accurate, as the utilization of many processors substantially reduces the execution time.

@inproceedings{2008CkekasEURECA,
author={Christos N. Gkekas and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Exploiting parallel data mining processing for protein annotation},
booktitle={Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference},
pages={242-252},
address={Samos, Greece},
year={2008},
month={08},
date={2008-08-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Exploiting-parallel-data-mining-processing-for-protein-annotation-.pdf},
keywords={Finite State Automata;Parallel Processing},
abstract={Proteins are large organic compounds consisting of amino acids arranged in a linear chain and joined together by peptide bonds. One of the most important challenges in modern Bioinformatics is the accurate prediction of the functional behavior of proteins. In this paper a novel parallel methodology for automatic protein function annotation is presented. Data mining techniques are employed in order to construct models based on data generated from already annotated protein sequences. The first step of the methodology is to obtain the motifs present in these sequences, which are then provided as input to the data mining algorithms in order to create a model for every term. Experiments conducted using the EGEE Grid environment as a source of multiple CPUs clearly indicate that the methodology is highly efficient and accurate, as the utilization of many processors substantially reduces the execution time.}
}

Christos N. Gkekas, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Parallel Data Mining Application for Gene Ontology Term Prediction"
3rd EGEE User Forum, Clermont-Ferrand, France, 2008 Feb

One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.

@inproceedings{2008GkekasEGEEForum,
author={Christos N. Gkekas and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Parallel Data Mining Application for Gene Ontology Term Prediction},
booktitle={3rd EGEE User Forum},
address={Clermont-Ferrand, France},
year={2008},
month={02},
date={2008-02-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/A_parallel_data_mining_application_for_Gene_Ontology_term_prediction_-_Contribution.pdf},
keywords={Gene Ontology;Parallel Algorithms;Protein Classi fi cation},
abstract={One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.}
}

Christos N. Gkekas, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Parallel Data Mining Methodology for Protein Function Prediction Utilizing Finite State Automata"
2nd Electrical and Computer Engineering Student Conference, Athens, Greece, 2008 Apr

One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.

@inproceedings{2008GkekasSFHMMY,
author={Christos N. Gkekas and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Parallel Data Mining Methodology for Protein Function Prediction Utilizing Finite State Automata},
booktitle={2nd Electrical and Computer Engineering Student Conference},
address={Athens, Greece},
year={2008},
month={04},
date={2008-04-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/A-Parallel-Data-Mining-Methodology-for-Protein-Function-Prediction-Utilizing-Finite-State-Automata.pdf},
keywords={Parallel Data Mining for Protein Function},
abstract={One of the most important challenges in modern bioinformatics is the accurate prediction of the functional behaviour of proteins. The strong correlation that exists between the properties of a protein and its motif sequence makes such a prediction possible. In this paper a novel parallel methodology for protein function prediction will be presented. Data mining techniques are employed in order to construct a model for each Gene Ontology term, based on data generated from already annotated protein sequences. In order to predict the annotation of an unknown protein, its motif sequence is run through each GO term model, producing similarity scores for every term. Although it has been experimentally proven that this process is efficient, it unfortunately requires heavy processor resources. In order to address this issue, a parallel application has been implemented and tested using the EGEE Grid infrastructure.}
}

Pericles A. Mitkas, Christos Maramis, Anastastios N. Delopoulos, Andreas Symeonidis, Sotiris Diplaris, Manolis Falelakis, Fotis E. Psomopoulos, Alex andros Batzios, Nikolaos Maglaveras, Irini Lekka, Vasilis Koutkias, Theodoros Agorastos, T. Mikos and A. Tatsis
"ASSIST: Employing Inference and Semantic Technologies to Facilitate Association Studies on Cervical Cancer"
6th European Symposium on Biomedical Engineering, Chania, Greece, 2008 Jun

Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.

@inproceedings{2008MitkasEsbmeAssist,
author={Pericles A. Mitkas and Christos Maramis and Anastastios N. Delopoulos and Andreas Symeonidis and Sotiris Diplaris and Manolis Falelakis and Fotis E. Psomopoulos and Alex andros Batzios and Nikolaos Maglaveras and Irini Lekka and Vasilis Koutkias and Theodoros Agorastos and T. Mikos and A. Tatsis},
title={ASSIST: Employing Inference and Semantic Technologies to Facilitate Association Studies on Cervical Cancer},
booktitle={6th European Symposium on Biomedical Engineering},
address={Chania, Greece},
year={2008},
month={06},
date={2008-06-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/ASSIST-EMPLOYING-INFERENCE-AND-SEMANTIC-TECHNOLOGIES-TO-FACILITATE-ASSOCIATION-STUDIES-ON-CERVICAL-CANCER-.pdf},
keywords={cervical cancer},
abstract={Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.}
}

Ioanna K. Mprouza, Fotis E. Psomopoulos and Pericles A. Mitkas
"AMoS: Agent-based Molecular Simulations"
Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference, pp. 175-186, Samos, Greece, 2008 Aug

Molecular dynamics (MD) is a form of computer simulation wherein atoms and molecules are allowed to interact for a period of time under known laws of physics, giving a view of the motion of the atoms. Usually the number of particles involved in a simulation is so large, that the properties of the system in question are virtually impossible to compute analytically. MD circumvents this problem by employing numerical approaches. Utilizing theories and concepts from mathematics, physics and chemistry and employing algorithms from computer science and information theory, MD is a clear example of a multidisciplinary method. In this paper a new framework for MD simulations is presented, which utilizes software agents as particle representations and an empirical potential function as the means of interaction. The framework is applied on protein structural data (PDB files), using an implicit solvent environment and a time step of 5 femto-seconds (5×10−15 sec). The goal of the simulation is to provide another view to the study of emergent behaviours and trends in the movement of the agent-particles in the protein complex. This information can then be used to construct an abstract model of the rules that govern the motion of the particles.

@inproceedings{2008MprouzaEURECA,
author={Ioanna K. Mprouza and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={AMoS: Agent-based Molecular Simulations},
booktitle={Student EUREKA 2008: 2nd Panhellenic Scientific Student Conference},
pages={175-186},
address={Samos, Greece},
year={2008},
month={08},
date={2008-08-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/AMoS-Agent-based-Molecular-Simulations.pdf},
keywords={Force Field Equations;Molecular Dynamics;Protein Data Bank;Protein Prediction Structure;Simulation},
abstract={Molecular dynamics (MD) is a form of computer simulation wherein atoms and molecules are allowed to interact for a period of time under known laws of physics, giving a view of the motion of the atoms. Usually the number of particles involved in a simulation is so large, that the properties of the system in question are virtually impossible to compute analytically. MD circumvents this problem by employing numerical approaches. Utilizing theories and concepts from mathematics, physics and chemistry and employing algorithms from computer science and information theory, MD is a clear example of a multidisciplinary method. In this paper a new framework for MD simulations is presented, which utilizes software agents as particle representations and an empirical potential function as the means of interaction. The framework is applied on protein structural data (PDB files), using an implicit solvent environment and a time step of 5 femto-seconds (5×10−15 sec). The goal of the simulation is to provide another view to the study of emergent behaviours and trends in the movement of the agent-particles in the protein complex. This information can then be used to construct an abstract model of the rules that govern the motion of the particles.}
}

Fotis E. Psomopoulos and Pericles A. Mitkas
"Sizing Up: Bioinformatics in a Grid Context"
3rd Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB, pp. 558-561, IEEE Computer Society, Thessaloniki, Greece, 2008 Oct

A Frid environmeent can be viewed sa a virtual computing architecture that provides the ability to perform higher thoughput computing by taking advantage of many computer geographically distributed and connected by a network. Bioinformatics applications stand to gain in such environment both in regards of cimputational resources available, but in reliability and efficiency as well. There are several approaches in literature which present the use of Grid resources in bioinformatics. Nevertheless, scientific progress is hindered by the fact that each researcher operates in relative isolation, regarding datasets and efforts, since there is no universally accepted methodology for performing bioinformatics tasks in Grid. Given the complexity of both the data and the algorithms invilvde in the majorityof cases, a case study on protein classification utilizing the Frid infrastructure, may be the first step in presenting a unifying methodology for bioinformatics in a Grind context.

@inproceedings{2008PsomopoulosHSCBB,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Sizing Up: Bioinformatics in a Grid Context},
booktitle={3rd Conference of the Hellenic Society For Computational Biology and Bioinformatics - HSCBB},
pages={558-561},
publisher={IEEE Computer Society},
address={Thessaloniki, Greece},
year={2008},
month={10},
date={2008-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Sizing-Up-Bioinformatics-in-a-Grid-Context.pdf},
keywords={Bioinformatics in Grid Context},
abstract={A Frid environmeent can be viewed sa a virtual computing architecture that provides the ability to perform higher thoughput computing by taking advantage of many computer geographically distributed and connected by a network. Bioinformatics applications stand to gain in such environment both in regards of cimputational resources available, but in reliability and efficiency as well. There are several approaches in literature which present the use of Grid resources in bioinformatics. Nevertheless, scientific progress is hindered by the fact that each researcher operates in relative isolation, regarding datasets and efforts, since there is no universally accepted methodology for performing bioinformatics tasks in Grid. Given the complexity of both the data and the algorithms invilvde in the majorityof cases, a case study on protein classification utilizing the Frid infrastructure, may be the first step in presenting a unifying methodology for bioinformatics in a Grind context.}
}

Fotis E. Psomopoulos, Pericles A. Mitkas, Christos S. Krinas and Ioannis N. Demetropoulos
"G-MolKnot: A grid enabled systematic algorithm to produce open molecular knots"
1st HellasGrid User Forum, pp. 327-362, Springer US, Athens, Greece, 2008 Jan

Multi-agent systems (MAS) have grown quite popular in a wide spec- trum of applications where argumentation, communication, scaling and adaptability are requested. And though the need for well-established engineering approaches for building and evaluating such intelligent systems has emerged, currently no widely accepted methodology exists, mainly due to lack of consensus on relevant defini- tions and scope of applicability. Even existing well-tested evaluation methodologies applied in traditional software engineering, prove inadequate to address the unpre- dictable emerging factors of the behavior of intelligent components. The following chapter aims to present such a unified and integrated methodology for a specific cat- egory of MAS. It takes all constraints and issues into account and denotes the way knowledge extracted with the use of Data mining (DM) techniques can be used for the formulation initially, and the improvement, in the long run, of agent reasoning and MAS performance. The coupling of DM and Agent Technology (AT) principles, proposed within the context of this chapter is therefore expected to provide to the reader an efficient gateway for developing and evaluating highly reconfigurable soft- ware approaches that incorporate domain knowledge and provide sophisticated De- cision Making capabilities. The main objectives of this chapter could be summarized into the following: a) introduce Agent Technology (AT) as a successful paradigm for building Data Mining (DM)-enriched applications, b) provide a methodology for (re)evaluating the performance of such DM-enriched Multi-Agent Systems (MAS), c) Introduce Agent Academy II, an Agent-Oriented Software Engineering framework for building MAS that incorporate knowledge model extracted by the use of (classi- cal and novel) DM techniques and d) denote the benefits of the proposed approach through a real-world demonstrator. This chapter provides a link between DM and AT and explains how these technologies can efficiently cooperate with each other. The exploitation of useful knowledge extracted by the use of DM may consider- ably improve agent infrastructures, while also increasing reusability and minimizing customization costs. The synergy between DM and AT is ultimately expected to provide MAS with higher levels of autonomy, adaptability and accuracy and, hence, intelligence.

@inproceedings{2008PsomopoulosHUF,
author={Fotis E. Psomopoulos and Pericles A. Mitkas and Christos S. Krinas and Ioannis N. Demetropoulos},
title={G-MolKnot: A grid enabled systematic algorithm to produce open molecular knots},
booktitle={1st HellasGrid User Forum},
pages={327-362},
publisher={Springer US},
address={Athens, Greece},
year={2008},
month={01},
date={2008-01-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/G-MolKnot-A-grid-enabled-systematic-algorithm-to-produce-open-molecular-knots-.pdf},
keywords={open molecular knots},
abstract={Multi-agent systems (MAS) have grown quite popular in a wide spec- trum of applications where argumentation, communication, scaling and adaptability are requested. And though the need for well-established engineering approaches for building and evaluating such intelligent systems has emerged, currently no widely accepted methodology exists, mainly due to lack of consensus on relevant defini- tions and scope of applicability. Even existing well-tested evaluation methodologies applied in traditional software engineering, prove inadequate to address the unpre- dictable emerging factors of the behavior of intelligent components. The following chapter aims to present such a unified and integrated methodology for a specific cat- egory of MAS. It takes all constraints and issues into account and denotes the way knowledge extracted with the use of Data mining (DM) techniques can be used for the formulation initially, and the improvement, in the long run, of agent reasoning and MAS performance. The coupling of DM and Agent Technology (AT) principles, proposed within the context of this chapter is therefore expected to provide to the reader an efficient gateway for developing and evaluating highly reconfigurable soft- ware approaches that incorporate domain knowledge and provide sophisticated De- cision Making capabilities. The main objectives of this chapter could be summarized into the following: a) introduce Agent Technology (AT) as a successful paradigm for building Data Mining (DM)-enriched applications, b) provide a methodology for (re)evaluating the performance of such DM-enriched Multi-Agent Systems (MAS), c) Introduce Agent Academy II, an Agent-Oriented Software Engineering framework for building MAS that incorporate knowledge model extracted by the use of (classi- cal and novel) DM techniques and d) denote the benefits of the proposed approach through a real-world demonstrator. This chapter provides a link between DM and AT and explains how these technologies can efficiently cooperate with each other. The exploitation of useful knowledge extracted by the use of DM may consider- ably improve agent infrastructures, while also increasing reusability and minimizing customization costs. The synergy between DM and AT is ultimately expected to provide MAS with higher levels of autonomy, adaptability and accuracy and, hence, intelligence.}
}

Theodoros Agorastos, Pericles A. Mitkas, Manolis Falelakis, Fotis E. Psomopoulos, Anastasios N. Delopoulos, Andreas Symeonidis, Sotiris Diplaris, Christos Maramis, Alexandros Batzios, Irini Lekka, Vasilis Koutkias, Themistoklis Mikos, A. Tatsis and Nikolaos Maglaveras
"Large Scale Association Studies Using Unified Data for Cervical Cancer and beyond: The ASSIST Project"
World Cancer Congress, Geneva, Switzerland, 2008 Aug

Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.

@inproceedings{WCCAssist,
author={Theodoros Agorastos and Pericles A. Mitkas and Manolis Falelakis and Fotis E. Psomopoulos and Anastasios N. Delopoulos and Andreas Symeonidis and Sotiris Diplaris and Christos Maramis and Alexandros Batzios and Irini Lekka and Vasilis Koutkias and Themistoklis Mikos and A. Tatsis and Nikolaos Maglaveras},
title={Large Scale Association Studies Using Unified Data for Cervical Cancer and beyond: The ASSIST Project},
booktitle={World Cancer Congress},
address={Geneva, Switzerland},
year={2008},
month={08},
date={2008-08-01},
url={http://issel.ee.auth.gr/wp-content/uploads/wcc2008.pdf},
keywords={Unified Data for Cervical Cancer},
abstract={Despite the proved close connection of cervical cancer with the human papillomavirus (HPV), intensive ongoing research investigates the role of specific genetic and environmental factors in determining HPV persistence and subsequent progression of the disease. To this end, genetic association studies constitute a significant scientific approach that may lead to a more comprehensive insight on the origin of complex diseases. Nevertheless, association studies are most of the times inconclusive, since the datasets employed are small, usually incomplete and of poor quality. The main goal of ASSIST is to aid research in the field of cervical cancer providing larger high quality datasets, via a software system that virtually unifies multiple heterogeneous medical records, located in various sites. Furthermore, the system is being designed in a generic manner, with provision for future extensions to include other types of cancer or even different medical fields. Within the context of ASSIST, innovative techniques have been elaborated for the semantic modelling and fuzzy inferencing on medical knowledge aiming at meaningful data unification: (i) The ASSIST core ontology (being the first ontology ever modelling cervical cancer) permits semantically equivalent but differently coded data to be mapped to a common language. (ii) The ASSIST inference engine maps medical entities to syntactic values that are understood by legacy medical systems, supporting the processes of hypotheses testing and association studies, and at the same time calculating the severity index of each patient record. These modules constitute the ASSIST Core and are accompanied by two other important subsystems: (1) The Interfacing to Medical Archives subsystem maps the information contained in each legacy medical archive to corresponding entities as defined in the knowledge model of ASSIST. These patient data are generated by an advanced anonymisation tool also developed within the context of the project. (2) The User Interface enables transparent and advanced access to the data repositories incorporated in ASSIST by offering query expression as well as patient data and statistical results visualisation to the ASSIST end-users. We also have to point out that the system is easily extendable virtually to any medical domain, as the core ontology was designed with this in mind and all subsystems are ontology-aware i.e., adaptable to any ontology changes/additions. Using ASSIST, a medical researcher can have seamless access to medical records of participating sites and, through a particularly handy computing environment, collect data records satisfying his criteria. Moreover he can define cases and controls, select records adjusting their validity and use the most popular statistical tools for drawing conclusions. The logical unification of medical records of participating sites, including clinical and genetic data, to a common knowledge base is expected to increase the effectiveness of research in the field of cervical cancer as it permits the creation of on-demand study groups as well as the recycling of data used in previous studies.}
}

2007

Conference Papers

Christos N. Gkekas, Fotis E. Psomopoulos and Pericles A. Mitkas
"Modeling Gene Ontology Terms using Finite State Automata"
Hellenic Bioinformatics and Medical Informatics Meeting, pp. 279--282, IEEE Computer Society, Biomedical Research Foundation, Academy of Athens, Greece, 2007 Oct

Semantic annotation and querying is currently applied on a number of versatile disciplines, providing the addedvalue of such an approach and, consequently the need for more elaborate \\\\96 either case-specific or generic \\\\96 tools. In this context, we have developed Eikonomia: an integrated semantically-aware tool for the description and retrieval of Byzantine Artwork Information. Following the needs of the ORMYLIA Art Diagnosis Center for adding semantics to their legacy data, an ontology describing Byzantine artwork based on CIDOCCRM, along with the interfaces for synchronization to and from the existing RDBMS have been implemented. This ontology has been linked to a reasoning tool, while a dynamic interface for the automated creation of semantic queries in SPARQL was developed. Finally, all the appropriate interfaces were instantiated, in order to allow easy ontology manipulation, query results projection and restrictions creation.

@inproceedings{2007GkekasBioacademy,
author={Christos N. Gkekas and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Modeling Gene Ontology Terms using Finite State Automata},
booktitle={Hellenic Bioinformatics and Medical Informatics Meeting},
pages={279--282},
publisher={IEEE Computer Society},
address={Biomedical Research Foundation, Academy of Athens, Greece},
year={2007},
month={10},
date={2007-10-01},
keywords={Modeling Gene Ontology},
abstract={Semantic annotation and querying is currently applied on a number of versatile disciplines, providing the addedvalue of such an approach and, consequently the need for more elaborate \\\\\\\\96 either case-specific or generic \\\\\\\\96 tools. In this context, we have developed Eikonomia: an integrated semantically-aware tool for the description and retrieval of Byzantine Artwork Information. Following the needs of the ORMYLIA Art Diagnosis Center for adding semantics to their legacy data, an ontology describing Byzantine artwork based on CIDOCCRM, along with the interfaces for synchronization to and from the existing RDBMS have been implemented. This ontology has been linked to a reasoning tool, while a dynamic interface for the automated creation of semantic queries in SPARQL was developed. Finally, all the appropriate interfaces were instantiated, in order to allow easy ontology manipulation, query results projection and restrictions creation.}
}

Ioanna K. Mprouza, Fotis E. Psomopoulos and Pericles A. Mitkas
"Simulating molecular dynamics through intelligent software agents"
Hellenic Bioinformatics and Medical Informatics Meeting, pp. 279--282, IEEE Computer Society, Biomedical Research Foundation, Academy of Athens, Greece, 2007 Oct

Semantic annotation and querying is currently applied on a number of versatile disciplines, providing the addedvalue of such an approach and, consequently the need for more elaborate \\\\96 either case-specific or generic \\\\96 tools. In this context, we have developed Eikonomia: an integrated semantically-aware tool for the description and retrieval of Byzantine Artwork Information. Following the needs of the ORMYLIA Art Diagnosis Center for adding semantics to their legacy data, an ontology describing Byzantine artwork based on CIDOCCRM, along with the interfaces for synchronization to and from the existing RDBMS have been implemented. This ontology has been linked to a reasoning tool, while a dynamic interface for the automated creation of semantic queries in SPARQL was developed. Finally, all the appropriate interfaces were instantiated, in order to allow easy ontology manipulation, query results projection and restrictions creation.

@inproceedings{2007MprouzaBioacademy,
author={Ioanna K. Mprouza and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={Simulating molecular dynamics through intelligent software agents},
booktitle={Hellenic Bioinformatics and Medical Informatics Meeting},
pages={279--282},
publisher={IEEE Computer Society},
address={Biomedical Research Foundation, Academy of Athens, Greece},
year={2007},
month={10},
date={2007-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/Simulating-molecular-dynamics-through-intelligent-software-agents.pdf},
keywords={Modeling Gene Ontology},
abstract={Semantic annotation and querying is currently applied on a number of versatile disciplines, providing the addedvalue of such an approach and, consequently the need for more elaborate \\\\\\\\96 either case-specific or generic \\\\\\\\96 tools. In this context, we have developed Eikonomia: an integrated semantically-aware tool for the description and retrieval of Byzantine Artwork Information. Following the needs of the ORMYLIA Art Diagnosis Center for adding semantics to their legacy data, an ontology describing Byzantine artwork based on CIDOCCRM, along with the interfaces for synchronization to and from the existing RDBMS have been implemented. This ontology has been linked to a reasoning tool, while a dynamic interface for the automated creation of semantic queries in SPARQL was developed. Finally, all the appropriate interfaces were instantiated, in order to allow easy ontology manipulation, query results projection and restrictions creation.}
}

2006

Conference Papers

Pericles A. Mitkas, Anastasios N. Delopoulos, Andreas L. Symeonidis and Fotis E. Psomopoulos
"A Framework for Semantic Data Integration and Inferencing on Cervical Cancer"
Hellenic Bioinformatics and Medical Informatics Meeting, pp. 23-26, IEEE Computer Society, Biomedical Research Foundation, Academy of Athens, Greece, 2006 Oct

Advances in the area of biomedicine and bioengineering have allowed for more accurate and detailed data acquisition in the area of health care. Examinations that once were time- and cost-forbidding, are now available to public, providing physicians and clinicians with more patient data for diagnosis and successful treatment. These data are also used by medical researchers in order to perform association studies among environmental agents, virus characteristics and genetic attributes, extracting new and interesting risk markers which can be used to enhance early diagnosis and prognosis. Nevertheless, scientific progress is hindered by the fact that each medical center operates in relative isolation, regarding datasets and medical effort, since there is no universally accepted archetype/ontology for medical data acquisition, data storage and labeling. This, exactly, is the major goal of ASSIST: to virtually unify multiple patient record repositories, physically located at different laboratories, clinics and/or hospitals. ASSIST focuses on cervical cancer and implements a semantically-aware integration layer that unifies data in a seamless manner. Data privacy and security are ensured by techniques for data anonymization, secure data access and storage. Both the clinician as well as the medical researcher will have access to a knowledge base on cervical cancer and will be able to perform more complex and elaborate association studies on larger groups.

@inproceedings{2006MitkasASSISTBioacademy,
author={Pericles A. Mitkas and Anastasios N. Delopoulos and Andreas L. Symeonidis and Fotis E. Psomopoulos},
title={A Framework for Semantic Data Integration and Inferencing on Cervical Cancer},
booktitle={Hellenic Bioinformatics and Medical Informatics Meeting},
pages={23-26},
publisher={IEEE Computer Society},
address={Biomedical Research Foundation, Academy of Athens, Greece},
year={2006},
month={10},
date={2006-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/A-Framework-for-Semantic-Data-Integration-and-Inferencing-on-Cervical-Cancer.pdf},
keywords={bioinformatics databases},
abstract={Advances in the area of biomedicine and bioengineering have allowed for more accurate and detailed data acquisition in the area of health care. Examinations that once were time- and cost-forbidding, are now available to public, providing physicians and clinicians with more patient data for diagnosis and successful treatment. These data are also used by medical researchers in order to perform association studies among environmental agents, virus characteristics and genetic attributes, extracting new and interesting risk markers which can be used to enhance early diagnosis and prognosis. Nevertheless, scientific progress is hindered by the fact that each medical center operates in relative isolation, regarding datasets and medical effort, since there is no universally accepted archetype/ontology for medical data acquisition, data storage and labeling. This, exactly, is the major goal of ASSIST: to virtually unify multiple patient record repositories, physically located at different laboratories, clinics and/or hospitals. ASSIST focuses on cervical cancer and implements a semantically-aware integration layer that unifies data in a seamless manner. Data privacy and security are ensured by techniques for data anonymization, secure data access and storage. Both the clinician as well as the medical researcher will have access to a knowledge base on cervical cancer and will be able to perform more complex and elaborate association studies on larger groups.}
}

Helen E. Polychroniadou, Fotis E. Psomopoulos and Pericles A. Mitkas
"G-Class: A Divide and Conquer Application for Grid Protein Classification"
Proceedings of the 2nd ADMKD 2006: Workshop on Data Mining and Knowledge Discovery (in conjunction with ADBIS 2006: The 10th East-European Conference on Advances in Databases and Information Systems), pp. 121-132, IEEE Computer Society, Thessaloniki, Greece, 2006 Sep

Protein classification has always been one of the major challenges in modern functional proteomics. The presence of motifs in protein chains can make the prediction of the functional behavior of proteins possible. The correlation between protein properties and their motifs is not always obvious, since more than one motif may exist within a protein chain. Due to the complexity of this correlation most data mining algorithms are either non efficient or time consuming. In this paper a data mining methodology that utilizes grid technologies is presented. First, data are split into multiple sets while preserving the original data distribution in each set. Then, multiple models are created by using the data sets as independent training sets. Finally, the models are combined to produce the final classification rules, containing all the previously extracted information. The methodology is tested using various protein and protein class subsets. Results indicate the improved time efficiency of our technique compared to other known data mining algorithms.

@inproceedings{2006PolychroniadouGClass,
author={Helen E. Polychroniadou and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={G-Class: A Divide and Conquer Application for Grid Protein Classification},
booktitle={Proceedings of the 2nd ADMKD 2006: Workshop on Data Mining and Knowledge Discovery (in conjunction with ADBIS 2006: The 10th East-European Conference on Advances in Databases and Information Systems)},
pages={121-132},
publisher={IEEE Computer Society},
address={Thessaloniki, Greece},
year={2006},
month={09},
date={2006-09-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/G-Class-A-Divide-and-Conquer-Application-for-Grid-Protein-Classification-.pdf},
keywords={bioinformatics databases},
abstract={Protein classification has always been one of the major challenges in modern functional proteomics. The presence of motifs in protein chains can make the prediction of the functional behavior of proteins possible. The correlation between protein properties and their motifs is not always obvious, since more than one motif may exist within a protein chain. Due to the complexity of this correlation most data mining algorithms are either non efficient or time consuming. In this paper a data mining methodology that utilizes grid technologies is presented. First, data are split into multiple sets while preserving the original data distribution in each set. Then, multiple models are created by using the data sets as independent training sets. Finally, the models are combined to produce the final classification rules, containing all the previously extracted information. The methodology is tested using various protein and protein class subsets. Results indicate the improved time efficiency of our technique compared to other known data mining algorithms.}
}

Fotis E. Psomopoulos and Pericles A. Mitkas
"PROTEAS: A Finite State Automata based data mining algorithm for rule extraction in protein classification"
Proceedings of the 5th Hellenic Data Management Symposium (HDMS), pp. 118-126, IEEE Computer Society, Thessaloniki, Greece, 2006 Sep

An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs may exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data mining approach for a motif-based classification of proteins is presented. A new classification algorithm that induces rules and exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, a new algorithm is proposed, for the induction of protein classification rules from finite state automata. The data mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data mining algorithms.

@inproceedings{2006PsomopoulosHDMS,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={PROTEAS: A Finite State Automata based data mining algorithm for rule extraction in protein classification},
booktitle={Proceedings of the 5th Hellenic Data Management Symposium (HDMS)},
pages={118-126},
publisher={IEEE Computer Society},
address={Thessaloniki, Greece},
year={2006},
month={09},
date={2006-09-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/PROTEAS-A-Finite-State-Automata-based-data-mining-algorithm-for-rule-extraction-in-protein-classification-.pdf},
keywords={mining methods and algorithms;classification rules},
abstract={An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs may exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data mining approach for a motif-based classification of proteins is presented. A new classification algorithm that induces rules and exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, a new algorithm is proposed, for the induction of protein classification rules from finite state automata. The data mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data mining algorithms.}
}

2005

Conference Papers

Fotis E. Psomopoulos and Pericles A. Mitkas
"A protein classification engine based on stochastic finite state automata"
Lecture Series on Computer and Computational Sciences VSP/Brill (Proceedings of the Symposium 35: Computational Methods in Molecular Biology in conjunction with ICCMSE), pp. 1371-1374, Springer-Verlag, Loutraki, Greece, 2005 Oct

Accurate protein classification is one of the major challenges in modern bioinformatics. Motifs that exist in the protein chain can make such a classification possible. A plethora of algorithms to address this problem have been proposed by both the artificial intelligence and the pattern recognition communities. In this paper, a data mining methodology for classification rules induction in proposed. Initially, expert – based protein families are processed to create a new hybrid set of families. Then, a prefix tree acceptor is created from the motifs in the protein chains, and subsequently transformed into a stochastic finite state automaton using the ALERGIA algorithm. Finally, an algorithm is presented for the extraction of classification rules from the automaton.

@inproceedings{2005PsomopoulosICCMSE,
author={Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A protein classification engine based on stochastic finite state automata},
booktitle={Lecture Series on Computer and Computational Sciences VSP/Brill (Proceedings of the Symposium 35: Computational Methods in Molecular Biology in conjunction with ICCMSE)},
pages={1371-1374},
publisher={Springer-Verlag},
address={Loutraki, Greece},
year={2005},
month={10},
date={2005-10-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2016/02/A-protein-classification-engine-based-on-stochastic-finite-state-automata-.pdf},
keywords={motifs},
abstract={Accurate protein classification is one of the major challenges in modern bioinformatics. Motifs that exist in the protein chain can make such a classification possible. A plethora of algorithms to address this problem have been proposed by both the artificial intelligence and the pattern recognition communities. In this paper, a data mining methodology for classification rules induction in proposed. Initially, expert – based protein families are processed to create a new hybrid set of families. Then, a prefix tree acceptor is created from the motifs in the protein chains, and subsequently transformed into a stochastic finite state automaton using the ALERGIA algorithm. Finally, an algorithm is presented for the extraction of classification rules from the automaton.}
}

2004

Conference Papers

Fotis E. Psomopoulos, Sotiris Diplaris and Pericles A. Mitkas
"A finite state automata based technique for protein classification rules induction"
Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics (in conjunction with ECML/PKDD), pp. 54--60, Pisa, Italy, 2004 Sep

An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.

@inproceedings{2004PsomopoulosPSEWDMTMB,
author={Fotis E. Psomopoulos and Sotiris Diplaris and Pericles A. Mitkas},
title={A finite state automata based technique for protein classification rules induction},
booktitle={Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics (in conjunction with ECML/PKDD)},
pages={54--60},
address={Pisa, Italy},
year={2004},
month={09},
date={2004-09-01},
url={http://issel.ee.auth.gr/wp-content/uploads/2017/01/A_finite_state_automata_based_technique_for_protei.pdf},
keywords={proteomics},
abstract={An important challenge in modern functional proteomics is the prediction of the functional behavior of proteins. Motifs in protein chains can make such a prediction possible. The correlation between protein properties and their motifs is not always obvious, since more than one motifs can exist within a protein chain. Thus, the behavior of a protein is a function of many motifs, where some overpower others. In this paper a data-mining approach for motif-based classification of proteins is presented. A new classification rules inducing algorithm that exploits finite state automata is introduced. First, data are modeled by terms of prefix tree acceptors, which are later merged into finite state automata. Finally, we propose a new algorithm for the induction of protein classification rules from finite state automata. The data-mining model is trained and tested using various protein and protein class subsets, as well as the whole dataset of known proteins and protein classes. Results indicate the efficiency of our technique compared to other known data-mining algorithms.}
}