Μαρία Κωτούζα

Διδακτορική Φοιτήτρια

Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης
Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών
54124, Θεσσαλονίκη
Τηλ: +30 2310 99 6349
Fax: +30 2310 99 6398
Email: maria.kotouza (at) issel [dot] ee [dot] auth [dot] gr
LinkedIn

Σπουδές

09/2016 – σήμερα Υποψήφια Διδάκτωρ Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα, Θέμα Διδακτορικής Διατριβής: “ Ανάπτυξη τεχνικών μηχανικής μάθησης με στόχο την ανάλυση μεγάλων όγκων δεδομένων (Big Data Analytics) ”
09/2011 – 06/2016 Δίπλωμα Ηλεκτρολόγου Μηχανικού και Μηχανικού Υπολογιστών Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα, Διπλωματική Διατριβή: “Κατάτμηση Καταναλωτών Χαμηλής Τάσης με χρήση Τεχνικών Ομαδοποίησης”.

Επεγγελματική Εμπειρία

08/2017 – σήμερα Βοηθός Έρευνας Ινστιτούτο Εφαρμοσμένων Βιοεπιστημών (ΙΝΕΒ), Εθνικό Κέντρο Έρευνας και Τεχνολογικής Ανάπτυξης (ΕΚΕΤΑ).
11/2016 – 07/2017 Βοηθός Έρευνας Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα Ευρωπαϊκό Ερευνητικό Πρόγραμμα: RAPP (https://issel.ee.auth.gr/r-d/rapp/)

Διδακτική Εμπειρία

09/2016 – σήμερα Βοηθός Διδασκαλίας στο μάθημα Δομές Δεδομένων (Data Structures) Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα
01/2017 – σήμερα Βοηθός Διδασκαλίας στο μάθημα Αντικειμενοστρεφής Προγραμματισμός (Object Oriented Programming) Τμήμα Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών Αριστοτέλειο Πανεπιστήμιο Θεσσαλονίκης, Ελλάδα

Ερευνητικά Ενδιαφέροντα

  • Ανάλυση Δεδομένων Μεγάλου Όγκου (Big Data Analytics)
  • Εξόρυξη Γνώσης (Data Mining)
  • Μηχανική Μάθηση (Machine Learning)
  • Τεχνολογία Λογισμικού (Software Engineering)

Ξένες Γλώσσες

– Αγγλικά: Άριστα (Proficiency of Michigan)

Δημοσιεύσεις

2019

Inproceedings Papers

Maria Kotouza, Fotis Psomopoulos and Periklis A. Mitkas
New Trends in Databases and Information Systems, pp. 564-569, Springer International Publishing, Cham, 2019 Sep

Nowadays, a wide range of sciences are moving towards the Big Data era, producing large volumes of data that require processing for new knowledge extraction. Scientific workflows are often the key tools for solving problems characterized by computational complexity and data diversity, whereas cloud computing can effectively facilitate their efficient execution. In this paper, we present a generative big data analysis workflow that can provide analytics, clustering, prediction and visualization services to datasets coming from various scientific fields, by transforming input data into strings. The workflow consists of novel algorithms for data processing and relationship discovery, that are scalable and suitable for cloud infrastructures. Domain experts can interact with the workflow components, set their parameters, run personalized pipelines and have support for decision-making processes. As case studies in this paper, two datasets consisting of (i) Documents and (ii) Gene sequence data are used, showing promising results in terms of efficiency and performance.

@inproceedings{Kotouza19NTDIS,
author={Maria Kotouza and Fotis Psomopoulos and Periklis A. Mitkas},
title={A Dockerized String Analysis Workflow for Big Data},
booktitle={New Trends in Databases and Information Systems},
pages={564-569},
publisher={Springer International Publishing},
address={Cham},
year={2019},
month={09},
date={2019-09-01},
doi={https://doi.org/10.1007/978-3-030-30278-8_55},
isbn={978-3-030-30278-8},
publisher's url={https://link.springer.com/chapter/10.1007%2F978-3-030-30278-8_55},
abstract={Nowadays, a wide range of sciences are moving towards the Big Data era, producing large volumes of data that require processing for new knowledge extraction. Scientific workflows are often the key tools for solving problems characterized by computational complexity and data diversity, whereas cloud computing can effectively facilitate their efficient execution. In this paper, we present a generative big data analysis workflow that can provide analytics, clustering, prediction and visualization services to datasets coming from various scientific fields, by transforming input data into strings. The workflow consists of novel algorithms for data processing and relationship discovery, that are scalable and suitable for cloud infrastructures. Domain experts can interact with the workflow components, set their parameters, run personalized pipelines and have support for decision-making processes. As case studies in this paper, two datasets consisting of (i) Documents and (ii) Gene sequence data are used, showing promising results in terms of efficiency and performance.}
}

2018

Conference Papers

Konstantinos N. Vavliakis, Maria Th. Kotouza, Andreas L. Symeonidis and Pericles A. Mitkas
"Recommendation Systems in a Conversational Web"
Proceedings of the 14th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,, pp. 68-77, SciTePress, 2018 Jan

In this paper we redefine the concept of Conversation Web in the context of hyper-personalization. We argue that hyper-personalization in the WWW is only possible within a conversational web where websites and users continuously “discuss” (interact in any way). We present a modular system architecture for the conversational WWW, given that adapting to various user profiles and multivariate websites in terms of size and user traffic is necessary, especially in e-commerce. Obviously there cannot be a unique fit-to-all algorithm, but numerous complementary personalization algorithms and techniques are needed. In this context, we propose PRCW, a novel hybrid approach combining offline and online recommendations using RFMG, an extension of RFM modeling. We evaluate our approach against the results of a deep neural network in two datasets coming from different online retailers. Our evaluation indicates that a) the proposed approach outperforms current state-of-art methods in small-medium datasets and can improve performance in large datasets when combined with other methods, b) results can greatly vary in different datasets, depending on size and characteristics, thus locating the proper method for each dataset can be a rather complex task, and c) offline algorithms should be combined with online methods in order to get optimal results since offline algorithms tend to offer better performance but online algorithms are necessary for exploiting new users and trends that turn up.

@conference{webist18,
author={Konstantinos N. Vavliakis and Maria Th. Kotouza and Andreas L. Symeonidis and Pericles A. Mitkas},
title={Recommendation Systems in a Conversational Web},
booktitle={Proceedings of the 14th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
pages={68-77},
publisher={SciTePress},
year={2018},
month={01},
date={2018-01-01},
url={https://issel.ee.auth.gr/wp-content/uploads/2019/02/WEBIST_2018_29.pdf},
doi={http://10.5220/0006935300680077},
isbn={978-989-758-324-7},
abstract={In this paper we redefine the concept of Conversation Web in the context of hyper-personalization. We argue that hyper-personalization in the WWW is only possible within a conversational web where websites and users continuously “discuss” (interact in any way). We present a modular system architecture for the conversational WWW, given that adapting to various user profiles and multivariate websites in terms of size and user traffic is necessary, especially in e-commerce. Obviously there cannot be a unique fit-to-all algorithm, but numerous complementary personalization algorithms and techniques are needed. In this context, we propose PRCW, a novel hybrid approach combining offline and online recommendations using RFMG, an extension of RFM modeling. We evaluate our approach against the results of a deep neural network in two datasets coming from different online retailers. Our evaluation indicates that a) the proposed approach outperforms current state-of-art methods in small-medium datasets and can improve performance in large datasets when combined with other methods, b) results can greatly vary in different datasets, depending on size and characteristics, thus locating the proper method for each dataset can be a rather complex task, and c) offline algorithms should be combined with online methods in order to get optimal results since offline algorithms tend to offer better performance but online algorithms are necessary for exploiting new users and trends that turn up.}
}

2018

Inproceedings Papers

Sotirios-Filippos Tsarouchis, Maria Th. Kotouza, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences"
IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 189-199, Springer, Cham, 2018 May

The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.

@inproceedings{2018Tsarouchis,
author={Sotirios-Filippos Tsarouchis and Maria Th. Kotouza and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Multi-metric Algorithm for Hierarchical Clustering of Same-Length Protein Sequences},
booktitle={IFIP International Conference on Artificial Intelligence Applications and Innovations},
pages={189-199},
publisher={Springer},
address={Cham},
year={2018},
month={05},
date={2018-05-22},
doi={https://doi.org/10.1007/978-3-319-92016-0_18},
isbn={978-3-319-92016-0},
abstract={The identification of meaningful groups of proteins has always been a major area of interest for structural and functional genomics. Successful protein clustering can lead to significant insight, assisting in both tracing the evolutionary history of the respective molecules as well as in identifying potential functions and interactions of novel sequences. Here we propose a clustering algorithm for same-length sequences, which allows the construction of subset hierarchy and facilitates the identification of the underlying patterns for any given subset. The proposed method utilizes the metrics of sequence identity and amino-acid similarity simultaneously as direct measures. The algorithm was applied on a real-world dataset consisting of clonotypic immunoglobulin (IG) sequences from Chronic lymphocytic leukemia (CLL) patients, showing promising results.}
}

Maria Th. Kotouza, Konstantinos N. Vavliakis, Fotis E. Psomopoulos and Pericles A. Mitkas
"A Hierarchical Multi-Metric Framework for Item Clustering"
5th International Conference on Big Data Computing Applications and Technologies, pp. 191-197, IEEE/ACM, Zurich, Switzerland, 2018 Dec

Item clustering is commonly used for dimensionality reduction, uncovering item similarities and connections, gaining insights of the market structure and recommendations. Hierarchical clustering methods produce a hierarchy structure along with the clusters that can be useful for managing item categories and sub-categories, dealing with indirect competition and new item categorization as well. Nevertheless, baseline hierarchical clustering algorithms have high computational cost and memory usage. In this paper we propose an innovative scalable hierarchical clustering framework, which overcomes these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the items using three metrics, a) Identity, b) Similarity and c) Entropy, as well as a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. ?he proposed framework is evaluated on the popular MovieLens 20M dataset achieving significant reduction in both memory consumption and computational time over a baseline hierarchical clustering algorithm.

@inproceedings{KotouzaVPM18,
author={Maria Th. Kotouza and Konstantinos N. Vavliakis and Fotis E. Psomopoulos and Pericles A. Mitkas},
title={A Hierarchical Multi-Metric Framework for Item Clustering},
booktitle={5th International Conference on Big Data Computing Applications and Technologies},
pages={191-197},
publisher={IEEE/ACM},
address={Zurich, Switzerland},
year={2018},
month={12},
date={2018-12-17},
url={http://issel.ee.auth.gr/wp-content/uploads/2019/02/BDCAT_2018_paper_24_Proceedings.pdf},
doi={http://10.1109/BDCAT.2018.00031},
abstract={Item clustering is commonly used for dimensionality reduction, uncovering item similarities and connections, gaining insights of the market structure and recommendations. Hierarchical clustering methods produce a hierarchy structure along with the clusters that can be useful for managing item categories and sub-categories, dealing with indirect competition and new item categorization as well. Nevertheless, baseline hierarchical clustering algorithms have high computational cost and memory usage. In this paper we propose an innovative scalable hierarchical clustering framework, which overcomes these limitations. Our work consists of a binary tree construction algorithm that creates a hierarchy of the items using three metrics, a) Identity, b) Similarity and c) Entropy, as well as a branch breaking algorithm which composes the final clusters by applying thresholds to each branch of the tree. ?he proposed framework is evaluated on the popular MovieLens 20M dataset achieving significant reduction in both memory consumption and computational time over a baseline hierarchical clustering algorithm.}
}

2017

Inproceedings Papers

Maria Th. Kotouza, Antonios C. Chrysopoulos and Pericles A. Mitkas
"Segmentation of Low Voltage Consumers for Designing Individualized Pricing Policies"
European Energy Market (EEM), 2017 14th International Conference, pp. 1-6, IEEE, Dresden, Germany, 2017 Jun

In recent years, the Smart Grid paradigm has opened a vast set of opportunities for all participating parties in the Energy Markets (i.e. producers, Distribution and Transmission System Operators, retailers, consumers), providing two-way data communication, increased security and grid stability. Furthermore, the liberation of distribution and energy services has led towards competitive Energy Market environments [4]. In order to maintain their existing customers\' satisfaction level high, as well as reaching out to new ones, suppliers must provide better and more reliable energy services, that are specifically tailored to each customer or to a group of customers with similar needs. Thus, it is necessary to identify segments of customers that have common energy characteristics via a process called Consumer Load Profiling (CLP) [16].

@inproceedings{2017Kotouza,
author={Maria Th. Kotouza and Antonios C. Chrysopoulos and Pericles A. Mitkas},
title={Segmentation of Low Voltage Consumers for Designing Individualized Pricing Policies},
booktitle={European Energy Market (EEM), 2017 14th International Conference},
pages={1-6},
publisher={IEEE},
address={Dresden, Germany},
year={2017},
month={06},
date={2017-06-06},
doi={https://doi.org/10.1109/EEM.2017.7981862},
issn={2165-4093},
isbn={978-1-5090-5499-2},
abstract={In recent years, the Smart Grid paradigm has opened a vast set of opportunities for all participating parties in the Energy Markets (i.e. producers, Distribution and Transmission System Operators, retailers, consumers), providing two-way data communication, increased security and grid stability. Furthermore, the liberation of distribution and energy services has led towards competitive Energy Market environments [4]. In order to maintain their existing customers\\' satisfaction level high, as well as reaching out to new ones, suppliers must provide better and more reliable energy services, that are specifically tailored to each customer or to a group of customers with similar needs. Thus, it is necessary to identify segments of customers that have common energy characteristics via a process called Consumer Load Profiling (CLP) [16].}
}