Welcome to my home pages!

My name is Georgios Petasis, and these are my personal web pages. Currently, I am research associate at the Software and Knowledge Engineering Laboratory (since June 2006) of the Institute of Informatics and Telecommunications of N.C.S.R. "Demokritos" and an associate at the Systems Reliability and Industrial Safety Laboratory (since November 2003) of the Institute of Nuclear Technology and Radiation Protection of N.C.S.R. "Demokritos", Athens, Greece. Previously, I have been a research associate at the Software and Knowledge Engineering Laboratory (November 1998 - August 2003) of the Institute of Informatics and Telecommunications of N.C.S.R. "Demokritos", Athens, Greece.

[My Photo]My research interests mainly include the areas of Natural Language Processing (NLP) and Machine Learning. My current work focuses on the following issues:

  • The construction of Named-Entity Recognition Systems based on machine learning techniques that are able to easily adapt to various domains or languages.

  • The development of Grammatical Inference (GI) algorithms.

  • The automated construction of lexical resources, such as morphological lexicons, part of speech taggers, gazetteer look-up lexicons, etc.

  • The design and development of generic frameworks for Language Engineering. Parts of this work can be found in the "Ellogon Language Engineering Platform".

In the links below you will find more information about me (CV, publications) and my work regarding NLP, GI, Ellogon and Tcl/Tk.

My Publications

    Year: 2022

    1. Elisjana Ymeralli, Giorgos Flouris, Vasilis Efthymiou, Katerinae Papantoniou, Theodore Patkos, Yannis Roussakis, Elias Tzortzakakis, Georgios Petasis and Nikiforos Pittaras.
      Representing online debates in the context of e-journalism.
      In The Sixteenth International Conference on Advances in Semantic Processing - SEMAPRO 2022. 2022, to appear.
      Abstract
      URL BibTeX

      @inproceedings{debatelab-semapro2022,
      	title = "Representing online debates in the context of e-journalism",
      	author = "Elisjana Ymeralli and Giorgos Flouris and Vasilis Efthymiou and Katerinae Papantoniou and Theodore Patkos and Yannis Roussakis and Elias Tzortzakakis and Georgios Petasis and Nikiforos Pittaras",
      	booktitle = "The Sixteenth International Conference on Advances in Semantic Processing - SEMAPRO 2022",
      	day = "13 --17",
      	month = "",
      	year = 2022,
      	address = "Valencia, Spain",
      	abstract = "",
      	language = "English",
      	pages = "to appear",
      	url = "https://www.iaria.org/conferences2022/SEMAPRO22.html"
      }
      
    2. Dimitris V Politikos, Nikolaos Sykiniotis, Georgios Petasis, Pavlos Dedousis, Alba Ordoñez, Rune Vabø, Aikaterini Anastasopoulou, Endre Moen, Chryssi Mytilineou, Arnt-Børre Salberg, Archontia Chatzispyrou and Ketil Malde.
      An online otolith-to-fish age reader using deep neural networks: Perspectives and Challenges.
      In Proceedings of the ICES Annual Science Conference. 2022, to appear.
      Abstract
      URL BibTeX

      @inproceedings{politikos-ices-acs2022,
      	author = "Politikos, Dimitris V. and Sykiniotis, Nikolaos and Petasis, Georgios and Dedousis, Pavlos and Ordoñez, Alba and Vabø, Rune and Anastasopoulou, Aikaterini and Moen, Endre and Mytilineou, Chryssi and Salberg, Arnt-Børre and Chatzispyrou, Archontia and Malde, Ketil",
      	title = "An online otolith-to-fish age reader using deep neural networks: Perspectives and Challenges",
      	booktitle = "Proceedings of the ICES Annual Science Conference",
      	year = 2022,
      	month = "",
      	day = "19--22",
      	abstract = "",
      	pages = "to appear",
      	location = "Dublin, Ireland",
      	url = "https://www.ices.dk/events/asc/ASC2022/Pages/default.aspx"
      }
      
    3. Dimitris V Politikos, Argyro Adamopoulou, Georgios Petasis and Francois Galgani.
      Leveraging artificial intelligence for tackling marine litter pollution: A survey and a web database.
      In Proceedings of the 2022 Marine and Inland Waters Research Symposium. September 2022, to appear.
      Abstract
      URL BibTeX

      @inproceedings{politikos-marine-inland-waters2022,
      	author = "Politikos, Dimitris V. and Argyro Adamopoulou and Georgios Petasis and Francois Galgani",
      	title = "Leveraging artificial intelligence for tackling marine litter pollution: A survey and a web database",
      	booktitle = "Proceedings of the 2022 Marine and Inland Waters Research Symposium",
      	year = 2022,
      	month = "September",
      	day = "16--20",
      	abstract = "",
      	pages = "to appear",
      	location = "Porto heli, Argolida, Greece",
      	url = "https://symposia.gr/"
      }
      
    4. Alexandros Fotios Ntogramatzis, Anna Gradou, Georgios Petasis and Marko Kokol.
      The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments.
      In Proceedings of the Thirteenth Language Resources and Evaluation Conference. 2022, 3442–3450.
      Abstract In this paper, we present the Ellogon Web Annotation Tool. It is a collaborative, web-based annotation tool built upon the Ellogon infrastructure offering an improved user experience and adaptability to various annotation scenarios by making good use of the latest design practices and web development frameworks. Being in development for many years, this paper describes its current architecture, along with the recent modifications that extend the existing functionalities and the new features that were added. The new version of the tool offers document analytics, annotation inspection and comparison features, a modern UI, and formatted text import (e.g. TEI XML documents, rendered with simple markup). We present two use cases that serve as two examples of different annotation scenarios to demonstrate the new functionalities. An appropriate (user-supplied, XML-based) annotation schema is used for each scenario. The first schema contains the relevant components for representing concepts, moral values, and ideas. The second includes all the necessary elements for annotating argumentative units in a document and their binary relations.
      URL BibTeX

      @inproceedings{ntogramatzis-etal-2022-ellogon,
      	title = "The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments",
      	author = "Ntogramatzis, Alexandros Fotios and Gradou, Anna and Petasis, Georgios and Kokol, Marko",
      	booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
      	month = "",
      	year = 2022,
      	address = "Marseille, France",
      	publisher = "European Language Resources Association",
      	url = "https://aclanthology.org/2022.lrec-1.368",
      	pages = "3442--3450",
      	abstract = "In this paper, we present the Ellogon Web Annotation Tool. It is a collaborative, web-based annotation tool built upon the Ellogon infrastructure offering an improved user experience and adaptability to various annotation scenarios by making good use of the latest design practices and web development frameworks. Being in development for many years, this paper describes its current architecture, along with the recent modifications that extend the existing functionalities and the new features that were added. The new version of the tool offers document analytics, annotation inspection and comparison features, a modern UI, and formatted text import (e.g. TEI XML documents, rendered with simple markup). We present two use cases that serve as two examples of different annotation scenarios to demonstrate the new functionalities. An appropriate (user-supplied, XML-based) annotation schema is used for each scenario. The first schema contains the relevant components for representing concepts, moral values, and ideas. The second includes all the necessary elements for annotating argumentative units in a document and their binary relations."
      }
      
    5. Giorgos Flouris, Vasilis Efthymiou, Katherine Papantoniou, Theodore Patkos, Georgios Petasis, Nikiforos Pittaras, Dimitris Plexousakis, Yannis Roussakis, Elias Tzortzakakis and Elisjana Ymeralli.
      DebateLab Tools for E-Journalism and Informed Citizenship.
      In 4th Summit on Gender Equality in Computing – GEC22. 2022.
      Abstract We describe DebateLab, a project which aims to conduct research towards developing the theoretical infrastructure for mining, representing and reasoning with arguments found online, while delivering a suite of tools and services supporting the uptake of the related technologies. DebateLab will pave the way for a new Web paradigm, where the different types of arguments and human deliberation will be amenable to algorithmic processing and machine-interpretable representation. Towards this, DebateLab analyzes articles with argumentative content to provide tools that will be useful for the professional journalist, but also for users who want to be better informed regarding public debates.
      URL BibTeX

      @inproceedings{debatelab-gec2022,
      	title = "DebateLab Tools for E-Journalism and Informed Citizenship",
      	author = "Giorgos Flouris and Vasilis Efthymiou and Katherine Papantoniou and Theodore Patkos and Georgios Petasis and Nikiforos Pittaras and Dimitris Plexousakis and Yannis Roussakis and Elias Tzortzakakis and Elisjana Ymeralli",
      	booktitle = "4th Summit on Gender Equality in Computing -- GEC22",
      	month = "",
      	year = 2022,
      	address = "Thessaloniki, Greece",
      	abstract = "We describe DebateLab, a project which aims to conduct research towards developing the theoretical infrastructure for mining, representing and reasoning with arguments found online, while delivering a suite of tools and services supporting the uptake of the related technologies. DebateLab will pave the way for a new Web paradigm, where the different types of arguments and human deliberation will be amenable to algorithmic processing and machine-interpretable representation. Towards this, DebateLab analyzes articles with argumentative content to provide tools that will be useful for the professional journalist, but also for users who want to be better informed regarding public debates.",
      	language = "English",
      	url = "https://gec22.auth.gr/"
      }
      
    6. Dimitris V Politikos, Nikolaos Sykiniotis, Georgios Petasis, Pavlos Dedousis, Alba Ordoñez, Rune Vabø, Aikaterini Anastasopoulou, Endre Moen, Chryssi Mytilineou, Arnt-Børre Salberg, Archontia Chatzispyrou and Ketil Malde.
      DeepOtolith v1.0: An Open-Source AI Platform for Automating Fish Age Reading from Otolith or Scale Images.
      Fishes 7(3), 2022.
      Abstract Every year, marine scientists around the world read thousands of otolith or scale images to determine the age structure of commercial fish stocks. This knowledge is important for fisheries and conservation management. However, the age-reading procedure is time-consuming and costly to perform due to the specialized expertise and labor needed to identify annual growth zones in otoliths. Effective automated systems are needed to increase throughput and reduce cost. DeepOtolith is an open-source artificial intelligence (AI) platform that addresses this issue by providing a web system with a simple interface that automatically estimates fish age by combining otolith images with convolutional neural networks (CNNs), a class of deep neural networks that has been a dominant method in computer vision tasks. Users can upload otolith image data for selective fish species, and the platform returns age estimates. The estimates of multiple images can be exported to conduct conclusions or further age-related research. DeepOtolith currently contains classifiers/regressors for three fish species; however, more species will be included as related work on ageing will be tested and published soon. Herein, the architecture and functionality of the platform are presented. Current limitations and future directions are also discussed. Overall, DeepOtolith should be considered as the first step towards building a community of marine ecologists, machine learning experts, and stakeholders that will collaborate to support the conservation of fishery resources.
      URL, DOI BibTeX

      @article{fishes7030121,
      	author = "Politikos, Dimitris V. and Sykiniotis, Nikolaos and Petasis, Georgios and Dedousis, Pavlos and Ordoñez, Alba and Vabø, Rune and Anastasopoulou, Aikaterini and Moen, Endre and Mytilineou, Chryssi and Salberg, Arnt-Børre and Chatzispyrou, Archontia and Malde, Ketil",
      	title = "DeepOtolith v1.0: An Open-Source AI Platform for Automating Fish Age Reading from Otolith or Scale Images",
      	journal = "Fishes",
      	volume = 7,
      	year = 2022,
      	number = 3,
      	article-number = 121,
      	url = "https://www.mdpi.com/2410-3888/7/3/121",
      	issn = "2410-3888",
      	abstract = "Every year, marine scientists around the world read thousands of otolith or scale images to determine the age structure of commercial fish stocks. This knowledge is important for fisheries and conservation management. However, the age-reading procedure is time-consuming and costly to perform due to the specialized expertise and labor needed to identify annual growth zones in otoliths. Effective automated systems are needed to increase throughput and reduce cost. DeepOtolith is an open-source artificial intelligence (AI) platform that addresses this issue by providing a web system with a simple interface that automatically estimates fish age by combining otolith images with convolutional neural networks (CNNs), a class of deep neural networks that has been a dominant method in computer vision tasks. Users can upload otolith image data for selective fish species, and the platform returns age estimates. The estimates of multiple images can be exported to conduct conclusions or further age-related research. DeepOtolith currently contains classifiers/regressors for three fish species; however, more species will be included as related work on ageing will be tested and published soon. Herein, the architecture and functionality of the platform are presented. Current limitations and future directions are also discussed. Overall, DeepOtolith should be considered as the first step towards building a community of marine ecologists, machine learning experts, and stakeholders that will collaborate to support the conservation of fishery resources.",
      	doi = "10.3390/fishes7030121"
      }
      

    Year: 2021

    1. Dimitris V Politikos, Georgios Petasis and George Katselis.
      Interpretable machine learning to forecast hypoxia in a lagoon.
      Ecological Informatics 66:101480, 2021.
      Abstract Dissolved oxygen is a key indicator in aquatic ecosystems, reflecting changes in water quality. Low levels of dissolved oxygen may lead to oxygen depletion (called hypoxia), putting at high risk the survival of aquatic organisms. Identifying the environmental conditions inducing hypoxia is therefore a topic of high ecological importance. In this study, we used four machine learning algorithms (Extreme Gradient Boosting (XGBoost), Extremely Randomized Trees (EXT), Random Forest, and Logistic Regression) to forecast hypoxia in a lagoon, considering different time lags (2,5,10 and 20-days). To do so, we used data on dissolved oxygen and a total of nine physicochemical and meteorological variables from Papas lagoon, Greece during 2015–2018. Key drivers and synergies that increase the risk of hypoxia were identified using the Shapley Additive exPlanations (SHAP) methodology. EXT was slightly superior to the other algorithms in forecasting hypoxia, achieving a success between 89% and 94% with pH, water temperature, chlorophyll and salinity as top explanatory variables. SHAP showed that the synergistic effect of low pH and chlorophyll, and elevated water temperature and salinity tended to favor conditions leading to hypoxia. SHAP also illustrated that diverse synergies of the explanatory variables can induce hypoxia, indicating the complex and nonlinear relationships between environmental factors and hypoxia. Overall, the present approach may be proved useful for the development of a reliable forecasting tool for alarming hypoxia and, ultimately, the effective monitoring of the lagoon.
      URL, DOI BibTeX

      @article{POLITIKOS2021101480,
      	title = "Interpretable machine learning to forecast hypoxia in a lagoon",
      	journal = "Ecological Informatics",
      	volume = 66,
      	pages = 101480,
      	year = 2021,
      	issn = "1574-9541",
      	doi = "https://doi.org/10.1016/j.ecoinf.2021.101480",
      	url = "https://www.sciencedirect.com/science/article/pii/S1574954121002715",
      	author = "Dimitris V. Politikos and Georgios Petasis and George Katselis",
      	keywords = "Dissolved oxygen, Machine learning, classification, Hypoxia, SHAP, Interpretability",
      	abstract = "Dissolved oxygen is a key indicator in aquatic ecosystems, reflecting changes in water quality. Low levels of dissolved oxygen may lead to oxygen depletion (called hypoxia), putting at high risk the survival of aquatic organisms. Identifying the environmental conditions inducing hypoxia is therefore a topic of high ecological importance. In this study, we used four machine learning algorithms (Extreme Gradient Boosting (XGBoost), Extremely Randomized Trees (EXT), Random Forest, and Logistic Regression) to forecast hypoxia in a lagoon, considering different time lags (2,5,10 and 20-days). To do so, we used data on dissolved oxygen and a total of nine physicochemical and meteorological variables from Papas lagoon, Greece during 2015–2018. Key drivers and synergies that increase the risk of hypoxia were identified using the Shapley Additive exPlanations (SHAP) methodology. EXT was slightly superior to the other algorithms in forecasting hypoxia, achieving a success between 89% and 94% with pH, water temperature, chlorophyll and salinity as top explanatory variables. SHAP showed that the synergistic effect of low pH and chlorophyll, and elevated water temperature and salinity tended to favor conditions leading to hypoxia. SHAP also illustrated that diverse synergies of the explanatory variables can induce hypoxia, indicating the complex and nonlinear relationships between environmental factors and hypoxia. Overall, the present approach may be proved useful for the development of a reliable forecasting tool for alarming hypoxia and, ultimately, the effective monitoring of the lagoon."
      }
      
    2. Dimitris V Politikos, Georgios Petasis, Archontia Chatzispyrou, Chryssi Mytilineou and Aikaterini Anastasopoulou.
      Automating fish age estimation combining otolith images and deep learning: The role of multitask learning.
      Fisheries Research 242:106033, 2021.
      Abstract Knowledge on the age of fish is vital for assessing the status of fish stocks and proposing management actions to ensure their sustainability. Prevalent methods of fish ageing are based on the readings of otolith images by experts, a process that is often time-consuming and costly. This suggests the need for automatic and cost-effective approaches. Herein, we investigate the feasibility of using deep learning to provide an automatic estimation of fish age from otolith images through a convolutional neural network designed for image analysis. On top of this network, we propose an enhanced - with multitask learning - network to better estimate fish age by introducing as an auxiliary training task the prediction of fish length from otolith images. The proposed approach is applied on a collection of 5027 otolith images of red mullet (Mullus barbatus), considering fish age estimation as a multi-class classification task with six age groups (Age-0, Age-1, Age-2, Age-3, Age-4, Age-5+). Results showed that the network without multitask learning predicted fish age correctly by 64.4 %, attaining high performance for younger age groups (Age-0 and Age-1, F1 score > 0.8) and moderate performance for older age groups (Age-2 to Age-5+, F1 score: 0.50−0.54). The network with multitask learning increased correctness in age prediction reaching 69.2 % and proved efficient to leverage its predictive performance for older age groups (Age-2 to Age-5+, F1 score: 0.57−0.64). Our findings suggest that deep learning has the potential to support the automation of fish age reading, though further research is required to build an operational tool useful in routine fish aging protocols for age reading experts.
      URL, DOI BibTeX

      @article{POLITIKOS2021106033,
      	title = "Automating fish age estimation combining otolith images and deep learning: The role of multitask learning",
      	journal = "Fisheries Research",
      	volume = 242,
      	pages = 106033,
      	year = 2021,
      	issn = "0165-7836",
      	doi = "https://doi.org/10.1016/j.fishres.2021.106033",
      	url = "https://www.sciencedirect.com/science/article/pii/S0165783621001612",
      	author = "Dimitris V. Politikos and Georgios Petasis and Archontia Chatzispyrou and Chryssi Mytilineou and Aikaterini Anastasopoulou",
      	keywords = "Otolith images, Deep learning, Fish age estimation, Multitask learning, Classification",
      	abstract = "Knowledge on the age of fish is vital for assessing the status of fish stocks and proposing management actions to ensure their sustainability. Prevalent methods of fish ageing are based on the readings of otolith images by experts, a process that is often time-consuming and costly. This suggests the need for automatic and cost-effective approaches. Herein, we investigate the feasibility of using deep learning to provide an automatic estimation of fish age from otolith images through a convolutional neural network designed for image analysis. On top of this network, we propose an enhanced - with multitask learning - network to better estimate fish age by introducing as an auxiliary training task the prediction of fish length from otolith images. The proposed approach is applied on a collection of 5027 otolith images of red mullet (Mullus barbatus), considering fish age estimation as a multi-class classification task with six age groups (Age-0, Age-1, Age-2, Age-3, Age-4, Age-5+). Results showed that the network without multitask learning predicted fish age correctly by 64.4 %, attaining high performance for younger age groups (Age-0 and Age-1, F1 score > 0.8) and moderate performance for older age groups (Age-2 to Age-5+, F1 score: 0.50−0.54). The network with multitask learning increased correctness in age prediction reaching 69.2 % and proved efficient to leverage its predictive performance for older age groups (Age-2 to Age-5+, F1 score: 0.57−0.64). Our findings suggest that deep learning has the potential to support the automation of fish age reading, though further research is required to build an operational tool useful in routine fish aging protocols for age reading experts."
      }
      
    3. Dimitris V Politikos, Georgios Petasis, Archontia Chatzispyrou, Chryssi Mytilineou and Aikaterini Anastasopoulou.
      Automatic reading of fish age using deep learning.
      In Proceedings of the 3rd NOAA Workshop on Leveraging AI in Environmental Sciences. September 2021.
      Abstract
      URL BibTeX

      @inproceedings{politikos-noaa2021,
      	author = "Dimitris V. Politikos and Georgios Petasis and Archontia Chatzispyrou and Chryssi Mytilineou and Aikaterini Anastasopoulou",
      	title = "Automatic reading of fish age using deep learning",
      	booktitle = "Proceedings of the 3rd NOAA Workshop on Leveraging AI in Environmental Sciences",
      	year = 2021,
      	month = "September",
      	day = "7--17",
      	abstract = "",
      	location = "Boulder, CO, USA",
      	url = "https://2021noaaaiworkshop.sched.com/"
      }
      
    4. Dimitris V Politikos, Georgios Petasis and George Katselis.
      Predicting dissolved oxygen in a lagoon using interpretable machine learning.
      In Proceedings of the Online Python Machine Learning Conference and GeoPython. 2021.
      Abstract
      URL BibTeX

      @inproceedings{politikos-geopyhton2021,
      	author = "Dimitris V. Politikos and Georgios Petasis and George Katselis",
      	title = "Predicting dissolved oxygen in a lagoon using interpretable machine learning",
      	booktitle = "Proceedings of the Online Python Machine Learning Conference and GeoPython",
      	year = 2021,
      	month = "",
      	day = "22--23",
      	abstract = "",
      	location = "Online, Worldwide",
      	url = "https://2021.geopython.net/"
      }
      

    Year: 2020

    1. Christos Platias and Georgios Petasis.
      A Comparison of Machine Learning Methods for Data Imputation.
      In SETN 2020: 11th Hellenic Conference on Artificial Intelligence. 2020, 150–159.
      Abstract Handling missing values in a dataset is a long-standing issue across many disciplines. Missing values can arise from different sources such as mishandling of samples, measurement errors, lack of responses, or deleted values. The main problem emerging from this situation is that many algorithms can’t run with incomplete datasets. Several methods exist for handling missing values, including “SoftImpute”, “k-nearest neighbor”, “mice”, “MatrixFactorization”, and “miss- Forest”. However, performance comparisons for these methods are hard to find, as most research approaches usually face imputation as an intermediate problem of a regression or a classification task, and only focus on this task’s performance. In addition, comparisons with existing scientific work are difficult, due to the lack of evaluations on publicly-available, open-access datasets. In order to overcome the aforementioned obstacles, in this paper we are proposing four new open datasets, representing data from real use cases, collected from publicly-available existing datasets, so as anyone can have access to them and compare their experimental results. Then, we compared the performance of some of the state-of-art approaches and most frequently used methods for missing data imputation. In addition to that, we have proposed and evaluated two new approaches, one based on Denoising Autoencoders and one on bagging. All in all, 17 different methods were tested using four different real world, publicly available datasets.
      URL, DOI BibTeX

      @inproceedings{10.1145/3411408.3411465,
      	author = "Platias, Christos and Petasis, Georgios",
      	title = "A Comparison of Machine Learning Methods for Data Imputation",
      	year = 2020,
      	isbn = 9781450388788,
      	publisher = "Association for Computing Machinery",
      	address = "New York, NY, USA",
      	url = "https://doi.org/10.1145/3411408.3411465",
      	doi = "10.1145/3411408.3411465",
      	abstract = "Handling missing values in a dataset is a long-standing issue across many disciplines. Missing values can arise from different sources such as mishandling of samples, measurement errors, lack of responses, or deleted values. The main problem emerging from this situation is that many algorithms can’t run with incomplete datasets. Several methods exist for handling missing values, including “SoftImpute”, “k-nearest neighbor”, “mice”, “MatrixFactorization”, and “miss- Forest”. However, performance comparisons for these methods are hard to find, as most research approaches usually face imputation as an intermediate problem of a regression or a classification task, and only focus on this task’s performance. In addition, comparisons with existing scientific work are difficult, due to the lack of evaluations on publicly-available, open-access datasets. In order to overcome the aforementioned obstacles, in this paper we are proposing four new open datasets, representing data from real use cases, collected from publicly-available existing datasets, so as anyone can have access to them and compare their experimental results. Then, we compared the performance of some of the state-of-art approaches and most frequently used methods for missing data imputation. In addition to that, we have proposed and evaluated two new approaches, one based on Denoising Autoencoders and one on bagging. All in all, 17 different methods were tested using four different real world, publicly available datasets.",
      	booktitle = "SETN 2020: 11th Hellenic Conference on Artificial Intelligence",
      	pages = "150–159",
      	numpages = 10,
      	keywords = "autoencoders, neural networks, missing values, imputation methods",
      	location = "Athens, Greece",
      	series = "SETN 2020"
      }
      
    2. Ioannis Loumiotis and Georgios Petasis.
      A Corpus Augmentation Approach for Improving the Performance of Dialogue Systems in the Greek Language.
      In SETN 2020: 11th Hellenic Conference on Artificial Intelligence. 2020, 185–188.
      Abstract One of the main challenges in dialogue systems is their efficiency when trained with small corpora. Although this problem has been solved for dialogue systems in the English language, as there are many corpora available in the literature, there are inherent limitations in other languages. In this paper, the authors study the problem of training end-to-end dialogue systems with a small dataset by proposing a corpus augmentation approach using a morphological lexicon to find the synonyms of the words in the initial corpus and enhance it with more question and answer pairs. The proposed approach was tested under different architectures, including transformers and recurrent neural networks (RNN). The obtained results showed that training the dialogue systems with the augmented corpus can improve their performance achieving an average increase of about 32% for the RNN and 43% for the transformer with respect to the BLEU-2 score. Finally, hypothesis testing has been applied to investigate on the validity of the results.
      URL, DOI BibTeX

      @inproceedings{10.1145/3411408.3411464,
      	author = "Loumiotis, Ioannis and Petasis, Georgios",
      	title = "A Corpus Augmentation Approach for Improving the Performance of Dialogue Systems in the Greek Language",
      	year = 2020,
      	isbn = 9781450388788,
      	publisher = "Association for Computing Machinery",
      	address = "New York, NY, USA",
      	url = "https://doi.org/10.1145/3411408.3411464",
      	doi = "10.1145/3411408.3411464",
      	abstract = "One of the main challenges in dialogue systems is their efficiency when trained with small corpora. Although this problem has been solved for dialogue systems in the English language, as there are many corpora available in the literature, there are inherent limitations in other languages. In this paper, the authors study the problem of training end-to-end dialogue systems with a small dataset by proposing a corpus augmentation approach using a morphological lexicon to find the synonyms of the words in the initial corpus and enhance it with more question and answer pairs. The proposed approach was tested under different architectures, including transformers and recurrent neural networks (RNN). The obtained results showed that training the dialogue systems with the augmented corpus can improve their performance achieving an average increase of about 32% for the RNN and 43% for the transformer with respect to the BLEU-2 score. Finally, hypothesis testing has been applied to investigate on the validity of the results.",
      	booktitle = "SETN 2020: 11th Hellenic Conference on Artificial Intelligence",
      	pages = "185–188",
      	numpages = 4,
      	keywords = "hypothesis testing, morphological lexicon, data augmentation, dialogue systems, recurrent neural networks, transformer",
      	location = "Athens, Greece",
      	series = "SETN 2020"
      }
      
    3. Dimitris V Politikos, Georgios Petasis, Archontia Chatzispyrou, Chryssi Mytilineou and Aikaterini Anastasopoulou.
      Automating fish age estimation from otolith images using deep learning.
      In SETN 2020: 11th Hellenic Conference on Artificial Intelligence. September 2020.
      Abstract
      URL, DOI BibTeX

      @inproceedings{politikos-setn2020,
      	author = "Dimitris V. Politikos and Georgios Petasis and Archontia Chatzispyrou and Chryssi Mytilineou and Aikaterini Anastasopoulou",
      	title = "Automating fish age estimation from otolith images using deep learning",
      	booktitle = "SETN 2020: 11th Hellenic Conference on Artificial Intelligence",
      	year = 2020,
      	month = "September",
      	day = "2--4",
      	url = "https://doi.org/10.1145/3411408.3411464",
      	doi = "10.1145/3411408.3411464",
      	abstract = "",
      	location = "Athens, Greece",
      	series = "SETN 2020"
      }
      
    4. Leonidas Tsekouras, Georgios Petasis, George Giannakopoulos and Aris Kosmopoulos.
      Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources.
      In Proceedings of the 12th Language Resources and Evaluation Conference. 2020, 2000–2008.
      Abstract Within this work we describe a framework for the collection and summarization of information from the Web in an entity-driven manner. The framework consists of a set of appropriate workflows and the Social Web Observatory platform, which implements those workflows, supporting them through a language analysis pipeline. The pipeline includes text collection/crawling, identification of different entities, clustering of texts into events related to entities, entity-centric sentiment analysis, but also text analytics and visualization functionalities. The latter allow the user to take advantage of the gathered information as actionable knowledge: to understand the dynamics of the public opinion for a given entity over time and across real-world events. We describe the platform and the analysis functionality and evaluate the performance of the system, by allowing human users to score how the system fares in its intended purpose of summarizing entity-centered information from different sources in the Web.
      URL BibTeX

      @inproceedings{tsekouras-etal-2020-social,
      	title = "Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources",
      	author = "Tsekouras, Leonidas and Petasis, Georgios and Giannakopoulos, George and Kosmopoulos, Aris",
      	booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
      	month = "",
      	year = 2020,
      	address = "Marseille, France",
      	publisher = "European Language Resources Association",
      	url = "https://aclanthology.org/2020.lrec-1.246",
      	pages = "2000--2008",
      	abstract = "Within this work we describe a framework for the collection and summarization of information from the Web in an entity-driven manner. The framework consists of a set of appropriate workflows and the Social Web Observatory platform, which implements those workflows, supporting them through a language analysis pipeline. The pipeline includes text collection/crawling, identification of different entities, clustering of texts into events related to entities, entity-centric sentiment analysis, but also text analytics and visualization functionalities. The latter allow the user to take advantage of the gathered information as actionable knowledge: to understand the dynamics of the public opinion for a given entity over time and across real-world events. We describe the platform and the analysis functionality and evaluate the performance of the system, by allowing human users to score how the system fares in its intended purpose of summarizing entity-centered information from different sources in the Web.",
      	language = "English",
      	isbn = "979-10-95546-34-4"
      }
      
    5. Georgios Petasis and Leonidas Tsekouras.
      Ellogon Casual Annotation Infrastructure.
      In Proceedings of the 12th Language Resources and Evaluation Conference. May 2020, 3360–3365.
      Abstract This paper presents a new annotation paradigm, casual annotation, along with a proposed architecture and a reference implementation, the Ellogon Casual Annotation Tool, which implements this paradigm and architecture. The novel aspects of the proposed paradigm originate from the vision to tightly integrate annotation with the casual, everyday activities of users. Annotating in a less “controlled” environment, and removing the bottleneck of selecting content and importing it to annotation infrastructures, casual annotation provides the ability to vastly increase the content that can be annotated and ease the annotation process through automatic pre-training. The proposed paradigm, architecture and reference implementation has been evaluated for more than two years on an annotation task related to sentiment analysis. Evaluation results suggest that, at least for this annotation task, there is a huge improvement in productivity after casual annotation adoption, in comparison to the more traditional annotation paradigms followed in the early stages of the annotation task.
      URL BibTeX

      @inproceedings{petasis-tsekouras-2020-ellogon,
      	title = "Ellogon Casual Annotation Infrastructure",
      	author = "Petasis, Georgios and Tsekouras, Leonidas",
      	booktitle = "Proceedings of the 12th Language Resources and Evaluation Conference",
      	month = "may",
      	year = 2020,
      	address = "Marseille, France",
      	publisher = "European Language Resources Association",
      	url = "https://aclanthology.org/2020.lrec-1.412",
      	pages = "3360--3365",
      	abstract = "This paper presents a new annotation paradigm, casual annotation, along with a proposed architecture and a reference implementation, the Ellogon Casual Annotation Tool, which implements this paradigm and architecture. The novel aspects of the proposed paradigm originate from the vision to tightly integrate annotation with the casual, everyday activities of users. Annotating in a less {``}controlled{''} environment, and removing the bottleneck of selecting content and importing it to annotation infrastructures, casual annotation provides the ability to vastly increase the content that can be annotated and ease the annotation process through automatic pre-training. The proposed paradigm, architecture and reference implementation has been evaluated for more than two years on an annotation task related to sentiment analysis. Evaluation results suggest that, at least for this annotation task, there is a huge improvement in productivity after casual annotation adoption, in comparison to the more traditional annotation paradigms followed in the early stages of the annotation task.",
      	language = "English",
      	isbn = "979-10-95546-34-4"
      }
      

    Year: 2019

    1. Leonidas Papachristopoulos, Pantelis Ampatzoglou, Ioanna Seferli, Andriani Zafeiropoulou and Georgios Petasis.
      Introducing Sentiment Analysis for the Evaluation of Library’s Services Effectiveness.
      Qualitative and Quantitative Methods in Libraries 8(1):99–110, 2019.
      Abstract Increasingly, text mining approaches have come to academic and commercial foreground as an effective solution for managing textual resources. Users‟ and consumers‟ comments and reviews hyper proliferation due to web 2.0 emergence, generated the need for such techniques implementation as a way to get insights from an active world expressed textually and not limited to specific scales and options. Sentiment analysis constitutes a NLP method aiming at sentiment detection out of textual snippets. On the other hand, it is a common truth that academic libraries have been intensively based their evaluation attempts on quantitative methods. The current study proposes the use of Sentiment analysis on user comments about Hellenic Open University Distance Library and Information Center which were included to the institutional annual survey. The analysis highlighted latent information about specific aspects of the library that couldn‟t be detected through the constraints that scaling pοses.
      URL BibTeX

      @article{qqml,
      	author = "Leonidas Papachristopoulos and Pantelis Ampatzoglou and Ioanna Seferli and Andriani Zafeiropoulou and Georgios Petasis",
      	title = "Introducing Sentiment Analysis for the Evaluation of Library’s Services Effectiveness",
      	journal = "Qualitative and Quantitative Methods in Libraries",
      	volume = 8,
      	number = 1,
      	year = 2019,
      	keywords = "",
      	abstract = "Increasingly, text mining approaches have come to academic and commercial foreground as an effective solution for managing textual resources. Users‟ and consumers‟ comments and reviews hyper proliferation due to web 2.0 emergence, generated the need for such techniques implementation as a way to get insights from an active world expressed textually and not limited to specific scales and options. Sentiment analysis constitutes a NLP method aiming at sentiment detection out of textual snippets. On the other hand, it is a common truth that academic libraries have been intensively based their evaluation attempts on quantitative methods. The current study proposes the use of Sentiment analysis on user comments about Hellenic Open University Distance Library and Information Center which were included to the institutional annual survey. The analysis highlighted latent information about specific aspects of the library that couldn‟t be detected through the constraints that scaling pοses.",
      	issn = "2241-1925",
      	pages = "99--110",
      	url = "http://www.qqml.net/index.php/qqml/article/view/515"
      }
      
    2. Leonidas Tsekouras, Georgios Petasis and Aris Kosmopoulos.
      Social Web Observatory: An entity-driven, holistic information summarization platform across sources.
      In Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources. 2019, 44–52.
      Abstract The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.
      URL, DOI BibTeX

      @inproceedings{tsekouras-etal-2019-social,
      	title = "Social Web Observatory: An entity-driven, holistic information summarization platform across sources",
      	author = "Tsekouras, Leonidas and Petasis, Georgios and Kosmopoulos, Aris",
      	booktitle = "Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources",
      	month = "",
      	year = 2019,
      	address = "Varna, Bulgaria",
      	publisher = "INCOMA Ltd.",
      	url = "https://aclanthology.org/W19-8907",
      	doi = "10.26615/978-954-452-058-8_007",
      	pages = "44--52",
      	abstract = "The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events."
      }
      
    3. Georgios Petasis.
      Segmentation of Argumentative Texts with Contextualised Word Representations.
      In Proceedings of the 6th Workshop on Argument Mining, 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28 - August 2, 2019. 2019, 1–10.
      Abstract The segmentation of argumentative units is an important subtask of argument mining, which is frequently addressed at a coarse granularity, usually assuming argumentative units to be no smaller than sentences. Approaches focusing at the clause-level granularity, typically address the task as sequence labeling at the token level, aiming to classify whether a token begins, is inside, or is outside of an argumentative unit. Most approaches exploit highly engineered, manually constructed features, and algorithms typically used in sequential tagging – such as Conditional Random Fields, while more recent approaches try to exploit manually constructed features in the context of deep neural networks. In this context, we examined to what extend recent advances in sequential labelling allow to reduce the need for highly sophisticated, manually constructed features, and whether limiting features to embeddings, pre-trained on large corpora is a promising approach. Evaluation results suggest the examined models and approaches can exhibit comparable performance, minimising the need for feature engineering.
      URL, DOI BibTeX

      @inproceedings{petasis-2019-segmentation,
      	title = "Segmentation of Argumentative Texts with Contextualised Word Representations",
      	author = "Petasis, Georgios",
      	booktitle = "Proceedings of the 6th Workshop on Argument Mining, 57th Conference of the Association for Computational Linguistics, {ACL} 2019, Florence, Italy, July 28 - August 2, 2019",
      	month = "",
      	year = 2019,
      	address = "Florence, Italy",
      	publisher = "Association for Computational Linguistics",
      	url = "https://aclanthology.org/W19-4501",
      	doi = "10.18653/v1/W19-4501",
      	pages = "1--10",
      	abstract = "The segmentation of argumentative units is an important subtask of argument mining, which is frequently addressed at a coarse granularity, usually assuming argumentative units to be no smaller than sentences. Approaches focusing at the clause-level granularity, typically address the task as sequence labeling at the token level, aiming to classify whether a token begins, is inside, or is outside of an argumentative unit. Most approaches exploit highly engineered, manually constructed features, and algorithms typically used in sequential tagging {--} such as Conditional Random Fields, while more recent approaches try to exploit manually constructed features in the context of deep neural networks. In this context, we examined to what extend recent advances in sequential labelling allow to reduce the need for highly sophisticated, manually constructed features, and whether limiting features to embeddings, pre-trained on large corpora is a promising approach. Evaluation results suggest the examined models and approaches can exhibit comparable performance, minimising the need for feature engineering."
      }
      

    Year: 2018

    1. Manolis Koubarakis, George A Vouros, Georgios Chalkiadakis, Vassilis P Plagianakos, Christos Tjortjis, Ergina Kavallieratou, Dimitris Vrakas, Nikolaos Mavridis, Georgios Petasis, Konstantinos Blekas and Anastasia Krithara.
      AI in Greece: The Case of Research on Linked Geospatial Data.
      AI Magazine 39(2):91–96, 2018.
      URL BibTeX

      @article{DBLP:journals/aim/KoubarakisVCPTK18,
      	author = "Manolis Koubarakis and George A. Vouros and Georgios Chalkiadakis and Vassilis P. Plagianakos and Christos Tjortjis and Ergina Kavallieratou and Dimitris Vrakas and Nikolaos Mavridis and Georgios Petasis and Konstantinos Blekas and Anastasia Krithara",
      	title = "{AI} in Greece: The Case of Research on Linked Geospatial Data",
      	journal = "{AI} Magazine",
      	volume = 39,
      	number = 2,
      	pages = "91--96",
      	year = 2018,
      	url = "https://www.aaai.org/ojs/index.php/aimagazine/article/view/2801",
      	timestamp = "Tue, 17 Jul 2018 01:00:00 +0200",
      	biburl = "https://dblp.org/rec/bib/journals/aim/KoubarakisVCPTK18",
      	bibsource = "dblp computer science bibliography, https://dblp.org"
      }
      
    2. Leonidas Papachristopoulos, Pantelis Ampatzoglou, Ioanna Seferli, Andriani Zafeiropoulou and Georgios Petasis.
      Introducing Sentiment Analysis for the Evaluation of Library's Services Effectiveness.
      In Proceedings of the 10th Qualitative and Quantitative Methods in Libraries International Conference (QQML2018). May 2018.
      BibTeX

      @inproceedings{Papachristopoulos-QQML:2018,
      	author = "Leonidas Papachristopoulos and Pantelis Ampatzoglou and Ioanna Seferli and Andriani Zafeiropoulou and Georgios Petasis",
      	title = "Introducing Sentiment Analysis for the Evaluation of Library's Services Effectiveness",
      	booktitle = "Proceedings of the 10th Qualitative and Quantitative Methods in Libraries International Conference (QQML2018)",
      	day = "22--25",
      	month = "May",
      	year = 2018,
      	address = "Chania, Greece"
      }
      

    Year: 2017

    1. Georgios Petasis, Anna Triantafillou and Eric Karstens.
      YourDataStories: Transparency and Corruption Fighting through Data Interlinking and Visual Exploration.
      In Proceedings of the Data Economy Workshop, 4th International Conference on Internet Science (INSCI 2017). November 2017.
      BibTeX

      @inproceedings{Petasis-DataEconomy:2017,
      	author = "Georgios Petasis and Anna Triantafillou and Eric Karstens",
      	title = "YourDataStories: Transparency and Corruption Fighting through Data Interlinking and Visual Exploration",
      	booktitle = "Proceedings of the Data Economy Workshop, 4th International Conference on Internet Science (INSCI 2017)",
      	day = 22,
      	month = "November",
      	year = 2017,
      	address = "Thessaloniki, Greece"
      }
      
    2. Alfio Ferrara, Stefano Montanelli and Georgios Petasis.
      Unsupervised Detection of Argumentative Units though Topic Modeling Techniques.
      In Proceedings of the 4th Workshop on Argument Mining, 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). September 2017, 97–107.
      Abstract In this paper we present a new unsupervised approach, "Attraction to Topics" – A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to classify sentences as claims and premises. Preliminary evaluation results suggest that topic information can be successfully used for the detection of argumentative sentences, at least for corpora used for evaluation. Our approach has been evaluated on two English corpora, the first of which contains 90 persuasive essays, while the second is a collection of 340 documents from user generated content.
      URL BibTeX

      @inproceedings{ferrara-montanelli-petasis:2017:ArgumentMining,
      	author = "Ferrara, Alfio and Montanelli, Stefano and Petasis, Georgios",
      	title = "Unsupervised Detection of Argumentative Units though Topic Modeling Techniques",
      	booktitle = "Proceedings of the 4th Workshop on Argument Mining, 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)",
      	month = "September",
      	year = 2017,
      	address = "Copenhagen, Denmark",
      	publisher = "Association for Computational Linguistics",
      	pages = "97--107",
      	abstract = {In this paper we present a new unsupervised approach, "Attraction to Topics" -- A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to classify sentences as claims and premises. Preliminary evaluation results suggest that topic information can be successfully used for the detection of argumentative sentences, at least for corpora used for evaluation. Our approach has been evaluated on two English corpora, the first of which contains 90 persuasive essays, while the second is a collection of 340 documents from user generated content.},
      	url = "http://www.aclweb.org/anthology/W17-5113"
      }
      

    Year: 2016

    1. Georgios Petasis and Vangelis Karkaletsis.
      Identifying Argument Components through TextRank.
      In Proceedings of the 3rd Workshop on Argument Mining (ArgMining2016), 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). August 2016, 56–66.
      URL BibTeX

      @inproceedings{Petasis-EtAl:2016:ARG-MINING,
      	author = "Georgios Petasis and Vangelis Karkaletsis",
      	title = "Identifying Argument Components through TextRank",
      	booktitle = "Proceedings of the 3rd Workshop on Argument Mining (ArgMining2016), 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016)",
      	month = "August",
      	year = 2016,
      	address = "Berlin, Germany",
      	publisher = "Association for Computational Linguistics",
      	pages = "56--66",
      	url = "http://aclweb.org/anthology/W/W16/W16-2811.pdf"
      }
      
    2. Ioannis Manousos Katakis, Georgios Petasis and Vangelis Karkaletsis.
      CLARIN-EL Web-based Annotation Tool.
      In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk and Stelios Piperidis (eds.). Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016.. 2016.
      URL BibTeX

      @inproceedings{DBLP:conf/lrec/KatakisPK16,
      	author = "Ioannis Manousos Katakis and Georgios Petasis and Vangelis Karkaletsis",
      	title = "{CLARIN-EL} Web-based Annotation Tool",
      	booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation {LREC} 2016, Portoro{\v{z}}, Slovenia, May 23-28, 2016.",
      	editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and H{\'{e}}l{\`{e}}ne Mazo and Asunci{\'{o}}n Moreno and Jan Odijk and Stelios Piperidis",
      	year = 2016,
      	publisher = "European Language Resources Association {(ELRA)}",
      	url = "http://www.lrec-conf.org/proceedings/lrec2016/summaries/990.html",
      	timestamp = "Tue, 30 Aug 2016 18:49:47 +0200",
      	biburl = "http://dblp.uni-trier.de/rec/bib/conf/lrec/KatakisPK16",
      	bibsource = "dblp computer science bibliography, http://dblp.org"
      }
      

    Year: 2015

    1. Theodosis Goudas, Christos Louizos, Georgios Petasis and Vangelis Karkaletsis.
      Argument Extraction from News, Blogs, and the Social Web.
      International Journal on Artificial Intelligence Tools 24(05):1540024, 2015.
      URL, DOI BibTeX

      @article{doi:10.1142/S0218213015400242,
      	author = "Goudas, Theodosis and Louizos, Christos and Petasis, Georgios and Karkaletsis, Vangelis",
      	title = "Argument Extraction from News, Blogs, and the Social Web",
      	journal = "International Journal on Artificial Intelligence Tools",
      	volume = 24,
      	number = 05,
      	pages = 1540024,
      	year = 2015,
      	doi = "10.1142/S0218213015400242",
      	url = "http://www.worldscientific.com/doi/abs/10.1142/S0218213015400242",
      	eprint = {http://www.worldscientific.com/doi/pdf/10.1142/S0218213015400242} abstract = {Argument extraction is the task of identifying arguments, along with their components in text. Arguments can be usually decomposed into a claim and one or more premises justifying it. Among the novel aspects of this work is the thematic domain itself which relates to Social Media, in contrast to traditional research in the area, which concentrates mainly on law documents and scientific publications. The huge increase of social media communities, along with their user tendency to debate, makes the identification of arguments in these texts a necessity. Argument extraction from Social Media is more challenging because texts may not always contain arguments, as is the case of legal documents or scientific publications usually studied. In addition, being less formal in nature, texts in Social Media may not even have proper syntax or spelling. This paper presents a two-step approach for argument extraction from social media texts. During the first step, the proposed approach tries to classify the sentences into "sentences that contain arguments" and "sentences that don’t contain arguments". In the second step, it tries to identify the exact fragments that contain the premises from the sentences that contain arguments, by utilizing conditional random fields. The results exceed significantly the base line approach, and according to literature, are quite promising.}
      }
      
    2. Anastasia Krithara, George Giannakopoulos, George Paliouras, George Petasis and Vangelis Karkaletsis.
      Predicting Sentiment using Tranfer Learning.
      In Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software at IJCAI 2015 (AdaptiveNLP 2015). 2015.
      Abstract A new transfer learning method is presented in this paper, addressing the task of sentiment analysis across domains.The proposed approach is a transfer variant of the Probabilistic Latent Semantic Analysis (PLSA) model that we name KLIEP-PLSA. The approach captures the difference of the tributions between the different domains. We perform experiments over well known datasets and show the promising results that we obtained new method.
      URL BibTeX

      @inproceedings{ref35,
      	author = "Anastasia Krithara and George Giannakopoulos and George Paliouras and George Petasis and Vangelis Karkaletsis",
      	title = "Predicting Sentiment using Tranfer Learning",
      	year = 2015,
      	booktitle = "Workshop on Replicability and Reproducibility in Natural Language Processing: adaptive methods, resources and software at IJCAI 2015 (AdaptiveNLP 2015)",
      	abstract = "A new transfer learning method is presented in this paper, addressing the task of sentiment analysis across domains.The proposed approach is a transfer variant of the Probabilistic Latent Semantic Analysis (PLSA) model that we name KLIEP-PLSA. The approach captures the difference of the tributions between the different domains. We perform experiments over well known datasets and show the promising results that we obtained new method.",
      	keywords = {transfer learning,KLIEP-PLSA,PLSA,sentiment analysis" url = {https://sites.google.com/site/adaptivenlp2015/},
      	url = "http://www.ellogon.org/petasis/bibliography/IJCAI2015/Krithara-TL_sentiment.pdf"
      }
      
    3. Christos Sardianos, Ioannis Manousos Katakis, Georgios Petasis and Vangelis Karkaletsis.
      Argument Extraction from News.
      In Proceedings of the 2nd Workshop on Argumentation Mining. June 2015, 56–66.
      URL BibTeX

      @inproceedings{sardianos-EtAl:2015:ARG-MINING,
      	author = "Sardianos, Christos and Katakis, Ioannis Manousos and Petasis, Georgios and Karkaletsis, Vangelis",
      	title = "Argument Extraction from News",
      	booktitle = "Proceedings of the 2nd Workshop on Argumentation Mining",
      	month = "June",
      	year = 2015,
      	address = "Denver, CO",
      	publisher = "Association for Computational Linguistics",
      	pages = "56--66",
      	url = "http://www.aclweb.org/anthology/W15-0508"
      }
      

    Year: 2014

    1. Georgios Petasis.
      Annotating Arguments: The NOMAD Collaborative Annotation Tool.
      In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk and Stelios Piperidis (eds.). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014. 2014, 1930-1937.
      BibTeX

      @inproceedings{DBLP:conf/lrec/Petasis14,
      	author = "Georgios Petasis",
      	title = "Annotating Arguments: The NOMAD Collaborative Annotation Tool",
      	booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014",
      	pages = "1930-1937",
      	ee = "http://www.lrec-conf.org/proceedings/lrec2014/summaries/669.html",
      	editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'o}n Moreno and Jan Odijk and Stelios Piperidis",
      	publisher = "European Language Resources Association (ELRA)",
      	year = 2014,
      	bibsource = "DBLP, http://dblp.uni-trier.de"
      }
      
    2. Georgios Petasis.
      The Ellogon Pattern Engine: Context-free Grammars over Annotations.
      In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk and Stelios Piperidis (eds.). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014. 2014, 2460-2465.
      BibTeX

      @inproceedings{DBLP:conf/lrec/Petasis14a,
      	author = "Georgios Petasis",
      	title = "The Ellogon Pattern Engine: Context-free Grammars over Annotations",
      	pages = "2460-2465",
      	ee = "http://www.lrec-conf.org/proceedings/lrec2014/summaries/1060.html",
      	booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014",
      	editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'o}n Moreno and Jan Odijk and Stelios Piperidis",
      	publisher = "European Language Resources Association (ELRA)",
      	year = 2014,
      	bibsource = "DBLP, http://dblp.uni-trier.de"
      }
      
    3. George Kiomourtzis, George Giannakopoulos, Georgios Petasis, Pythagoras Karampiperis and Vangelis Karkaletsis.
      NOMAD: Linguistic Resources and Tools Aimed at Policy Formulation and Validation.
      In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk and Stelios Piperidis (eds.). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014. 2014, 3464-3470.
      BibTeX

      @inproceedings{DBLP:conf/lrec/KiomourtzisGPKK14,
      	author = "George Kiomourtzis and George Giannakopoulos and Georgios Petasis and Pythagoras Karampiperis and Vangelis Karkaletsis",
      	title = "NOMAD: Linguistic Resources and Tools Aimed at Policy Formulation and Validation",
      	pages = "3464-3470",
      	ee = "http://www.lrec-conf.org/proceedings/lrec2014/summaries/813.html",
      	booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014",
      	editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asunci{\'o}n Moreno and Jan Odijk and Stelios Piperidis",
      	publisher = "European Language Resources Association (ELRA)",
      	year = 2014,
      	bibsource = "DBLP, http://dblp.uni-trier.de"
      }
      
    4. Aristidis Likas, Konstantinos Blekas and Dimitris Kalles (eds.).
      Argument Extraction from News, Blogs, and Social Media
      .
      pages 287–299, Springer International Publishing, 2014.
      URL, DOI BibTeX

      @inbook{Goudas2014,
      	author = "Goudas, Theodosis and Louizos, Christos and Petasis, Georgios and Karkaletsis, Vangelis",
      	editor = "Likas, Aristidis and Blekas, Konstantinos and Kalles, Dimitris",
      	title = "Argument Extraction from News, Blogs, and Social Media",
      	booktitle = "Artificial Intelligence: Methods and Applications: 8th Hellenic Conference on AI, SETN 2014, Ioannina, Greece, May 15-17, 2014. Proceedings",
      	year = 2014,
      	publisher = "Springer International Publishing",
      	address = "Cham",
      	pages = "287--299",
      	isbn = "978-3-319-07064-3",
      	doi = "10.1007/978-3-319-07064-3_23",
      	url = "http://dx.doi.org/10.1007/978-3-319-07064-3_23"
      }
      
    5. Georgios Petasis, Dimitris Spiliotopoulos, Nikos Tsirakis and Panayotis Tsantilas.
      Sentiment Analysis for Reputation Management: Mining the Greek Web.
      In Aristidis Likas, Konstantinos Blekas and Dimitris Kalles (eds.). Artificial Intelligence: Methods and Applications - 8th Hellenic Conference on AI, SETN 2014, Ioannina, Greece, May 15-17, 2014. Proceedings 8445. 2014, 327-340.
      BibTeX

      @inproceedings{SETN-2014-Petasis,
      	author = "Georgios Petasis and Dimitris Spiliotopoulos and Nikos Tsirakis and Panayotis Tsantilas",
      	title = "Sentiment Analysis for Reputation Management: Mining the Greek Web",
      	editor = "Aristidis Likas and Konstantinos Blekas and Dimitris Kalles",
      	booktitle = "Artificial Intelligence: Methods and Applications - 8th Hellenic Conference on AI, SETN 2014, Ioannina, Greece, May 15-17, 2014. Proceedings",
      	year = 2014,
      	pages = "327-340",
      	publisher = "Springer",
      	series = "Lecture Notes in Computer Science",
      	volume = 8445,
      	ee = "http://dx.doi.org/10.1007/978-3-319-07064-3_26",
      	isbn = "978-3-319-07063-6, 978-3-319-07064-3",
      	bibsource = "DBLP, http://dblp.uni-trier.de"
      }
      

    Year: 2013

    1. Georgios Petasis, Dimitrios Spiliotopoulos, Nikos Tsirakis and Panayiotis Tsantilas.
      Large-scale Sentiment Analysis for Reputation Management.
      In Stefan Gindl, Robert Remus and Michael Wiegand (eds.). Proceedings of the 2nd Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS-2013). 2013.
      Abstract Harvesting the web and social web data is a meticulous and complex task. Applying the results to a successful business case such as brand monitoring requires high precision and recall for the opinion mining and entity recognition tasks. This work reports on the integrated platform of a state of the art Named-entity Recognition and Classification (NERC) system and opinion mining methods for a Software-as-a-Service (SaaS) approach on a fully automatic service for brand monitoring for the Greek language. The service has been successfully deployed to the biggest search engine in Greece powering the large-scale linguistic and sentiment analysis of about 80.000 resources per hour.
      BibTeX

      @inproceedings{pathos-2013-Petasis,
      	author = "Georgios Petasis and Dimitrios Spiliotopoulos and Nikos Tsirakis and Panayiotis Tsantilas",
      	abstract = "Harvesting the web and social web data is a meticulous and complex task. Applying the results to a successful business case such as brand monitoring requires high precision and recall for the opinion mining and entity recognition tasks. This work reports on the integrated platform of a state of the art Named-entity Recognition and Classification (NERC) system and opinion mining methods for a Software-as-a-Service (SaaS) approach on a fully automatic service for brand monitoring for the Greek language. The service has been successfully deployed to the biggest search engine in Greece powering the large-scale linguistic and sentiment analysis of about 80.000 resources per hour.",
      	title = "Large-scale Sentiment Analysis for Reputation Management",
      	booktitle = "Proceedings of the 2nd Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (PATHOS-2013)",
      	address = "Darmstadt, Germany",
      	month = "September 23",
      	year = 2013,
      	editor = "Stefan Gindl and Robert Remus and Michael Wiegand"
      }
      
    2. Georgios Petasis.
      Structuring the Blogosphere on News from Traditional Media.
      In On the Move to Meaningful Internet Systems: OTM 2013 Workshops - Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, ACM, EI2N, ISDE, META4eS, ORM, SeDeS, SINCOM, SMS, and SOMOCO 2013 8186. 2013, 608-617.
      BibTeX

      @inproceedings{DBLP:conf/otm/Petasis13,
      	author = "Georgios Petasis",
      	title = "Structuring the Blogosphere on News from Traditional Media",
      	booktitle = "On the Move to Meaningful Internet Systems: OTM 2013 Workshops - Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, ACM, EI2N, ISDE, META4eS, ORM, SeDeS, SINCOM, SMS, and SOMOCO 2013",
      	address = "Graz, Austria",
      	month = "September 9--13",
      	year = 2013,
      	pages = "608-617",
      	ee = "http://dx.doi.org/10.1007/978-3-642-41033-8_77",
      	bibsource = "DBLP, http://dblp.uni-trier.de} editor = {Yan Tang Demey and Herv{\'e} Panetto",
      	publisher = "Springer",
      	series = "Lecture Notes in Computer Science",
      	volume = 8186,
      	isbn = "978-3-642-41032-1"
      }
      
    3. Georgios Petasis, Ralf Möller and Vangelis Karkaletsis.
      BOEMIE: Reasoning-based Information Extraction.
      In Proceedings of the 1st Workshop on Natural Language Processing and Automated Reasoning co-located with 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2013) 1044. 2013, 60-75.
      Abstract This paper presents a novel approach for exploiting an ontology in an ontology-based information extraction system, which substitutes part of the extraction process with reasoning, guided by a set of automatically acquired rules.
      BibTeX

      @inproceedings{DBLP:conf/lpnmr/PetasisMK13,
      	author = {Georgios Petasis and Ralf M{\"o}ller and Vangelis Karkaletsis},
      	abstract = "This paper presents a novel approach for exploiting an ontology in an ontology-based information extraction system, which substitutes part of the extraction process with reasoning, guided by a set of automatically acquired rules.",
      	title = "BOEMIE: Reasoning-based Information Extraction",
      	address = "A Corunna, Spain",
      	month = "September 15",
      	year = 2013,
      	pages = "60-75",
      	ee = "http://ceur-ws.org/Vol-1044/paper-06.pdf",
      	bibsource = {DBLP, http://dblp.uni-trier.de} editor = {Chitta Baral and Peter Sch{\"u}ller},
      	booktitle = "Proceedings of the 1st Workshop on Natural Language Processing and Automated Reasoning co-located with 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2013)",
      	publisher = "CEUR-WS.org",
      	series = "CEUR Workshop Proceedings",
      	volume = 1044
      }
      

    Year: 2012

    1. Georgios Petasis and Mara Tsoumari.
      A New Annotation Tool for Aligned Bilingual Corpora.
      In Petr Sojka, Aleš Horák, Ivan Kopeček and Karel Pala (eds.). Text, Speech and Dialogue. Lecture Notes in Computer Science series, volume 7499, Springer Berlin Heidelberg, 2012, pages 95-104.
      Abstract This paper presents a new annotation tool for aligned bilingual corpora, which allows the annotation of a wide range of information, ranging from information about words (such as part-of-speech tags or named-entities) to quite complex annotation schemas involving links between aligned segments, such as co-reference or translation equivalence between aligned segments in the two languages. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. The new annotation tool is distributed with an open source license (LGPL), as part of the Ellogon language engineering platform.
      URL, DOI BibTeX

      @incollection{tsd-2012-Petasis-Tsoumari,
      	author = "Petasis, Georgios and Tsoumari, Mara",
      	abstract = "This paper presents a new annotation tool for aligned bilingual corpora, which allows the annotation of a wide range of information, ranging from information about words (such as part-of-speech tags or named-entities) to quite complex annotation schemas involving links between aligned segments, such as co-reference or translation equivalence between aligned segments in the two languages. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. The new annotation tool is distributed with an open source license (LGPL), as part of the Ellogon language engineering platform.",
      	address = "Brno, Czech Republic",
      	booktitle = "Text, Speech and Dialogue",
      	booksubtitle = "15th International Conference, TSD 2012, Brno, Czech Republic, September 3--7, 2012. Proceedings",
      	keywords = "Annotation tools; collaborative annotation; adaptable annotation schemas",
      	month = "September 3--7",
      	title = "{A} {N}ew {A}nnotation {T}ool for {A}ligned {B}ilingual {C}orpora",
      	url = "http://www.ellogon.org/petasis/bibliography/TSD2012/tsd450.pdf",
      	year = 2012,
      	pages = "95-104",
      	isbn = "978-3-642-32789-6",
      	volume = 7499,
      	series = "Lecture Notes in Computer Science",
      	editor = "Sojka, Petr and Hor\'{a}k, Ale\v{s} and Kope\v{c}ek, Ivan and Pala, Karel",
      	doi = "10.1007/978-3-642-32790-2_11",
      	publisher = "Springer Berlin Heidelberg"
      }
      
    2. Georgios Petasis.
      The SYNC3 Collaborative Annotation Tool.
      In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012. May 2012, 363–370.
      Abstract The huge amount of the available information in the Web creates the need for effective information extraction systems that are able to produce metadata that satisfy user's information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, which provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the SYNC3 collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the SYNC3 annotation tool is distributed with an open source license, as part of the Ellogon platform.
      URL BibTeX

      @inproceedings{lrec-2012-Petasis,
      	author = "Georgios Petasis",
      	abstract = "The huge amount of the available information in the Web creates the need for effective information extraction systems that are able to produce metadata that satisfy user's information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, which provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the SYNC3 collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the SYNC3 annotation tool is distributed with an open source license, as part of the Ellogon platform.",
      	address = "Istanbul, Turkey",
      	booktitle = "Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012",
      	keywords = "annotation tools, collaborative annotation, adaptable annotation schemas",
      	month = "May",
      	pages = "363--370",
      	publisher = "European Language Resources Association",
      	title = "{T}he {SYNC}3 {C}ollaborative {A}nnotation {T}ool",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2012/LREC2012-700.pdf",
      	year = 2012
      }
      
    3. Elias Iosif, Georgios Petasis and Vangelis Karkaletsis.
      Ontology-Based Information Extraction under a Bootstrapping Approach.
      In Armando Stellato Maria Teresa Pazienza (ed.). Semi-Automatic Ontology Development: Processes and Resources. IGI Global, April 2012, pages 1–21.
      Abstract The authors present an ontology-based information extraction process, which operates in a bootstrapping framework. The novelty of this approach lies in the continuous semantics extraction from textual content in order to evolve the underlying ontology, while the evolved ontology enhances in turn the information extraction mechanism. This process was implemented in the context of the R&D project BOEMIE. The BOEMIE system was evaluated on the athletics domain.
      URL, DOI BibTeX

      @incollection{Iosif.IGIGLOBAL.2012,
      	author = "Elias Iosif and Georgios Petasis and Vangelis Karkaletsis",
      	abstract = "The authors present an ontology-based information extraction process, which operates in a bootstrapping framework. The novelty of this approach lies in the continuous semantics extraction from textual content in order to evolve the underlying ontology, while the evolved ontology enhances in turn the information extraction mechanism. This process was implemented in the context of the R{\&}D project BOEMIE. The BOEMIE system was evaluated on the athletics domain.",
      	editor = "Maria Teresa Pazienza, Armando Stellato",
      	booktitle = "Semi-Automatic Ontology Development: Processes and Resources",
      	address = "Hershey, PA, USA",
      	chapter = 1,
      	doi = "10.4018/978-1-4666-0188-8.ch001",
      	isbn = 9781466601888,
      	month = "April",
      	pages = "1--21",
      	publisher = "IGI Global",
      	title = "{O}ntology-{B}ased {I}nformation {E}xtraction under a {B}ootstrapping {A}pproach",
      	url = "http://www.igi-global.com/chapter/ontology-based-information-extraction-under/63896",
      	year = 2012
      }
      

    Year: 2011

    1. Nikos Sarris, Gerasimos Potamianos, Jean-Michel Renders, Claire Grover, Eric Karstens, Leonidas Kallipolitis, Vasilis Tountopoulos, Georgios Petasis, Anastasia Krithara, Matthias Gallé, Guillaume Jacquet, Beatrice Alex, Richard Tobin and Liliana Bounegru.
      A System for Synergistically Structuring News Content from Traditional Media and the Blogosphere.
      In Paul Cunningham and Miriam Cunningham (eds.). eChallenges e-2011 Conference Proceedings. 2011.
      Abstract News and social media are emerging as a dominant source of information for numerous applications. However, their vast unstructured content present challenges to efficient extraction of such information. In this paper, we present the SYNC3 system that aims to intelligently structure content from both traditional news media and the blogosphere. To achieve this goal, SYNC3 incorporates innovative algorithms that first model news media content statistically, based on fine clustering of articles into so-called "news events". Such models are then adapted and applied to the blogosphere domain, allowing its content to map to the traditional news domain. Furthermore, appropriate algorithms are employed to extract news event labels and relations between events, in order to efficiently present news content to the system end users.
      URL BibTeX

      @inproceedings{eChallenges-2011-Sarris,
      	author = "Nikos Sarris and Gerasimos Potamianos and Jean-Michel Renders and Claire Grover and Eric Karstens and Leonidas Kallipolitis and Vasilis Tountopoulos and Petasis, Georgios and Anastasia Krithara and Matthias Gall{\'e} and Guillaume Jacquet and Beatrice Alex and Richard Tobin and Liliana Bounegru",
      	abstract = {News and social media are emerging as a dominant source of information for numerous applications. However, their vast unstructured content present challenges to efficient extraction of such information. In this paper, we present the SYNC3 system that aims to intelligently structure content from both traditional news media and the blogosphere. To achieve this goal, SYNC3 incorporates innovative algorithms that first model news media content statistically, based on fine clustering of articles into so-called {"}news events{"}. Such models are then adapted and applied to the blogosphere domain, allowing its content to map to the traditional news domain. Furthermore, appropriate algorithms are employed to extract news event labels and relations between events, in order to efficiently present news content to the system end users.},
      	booktitle = "eChallenges e-2011 Conference Proceedings",
      	editor = "Paul Cunningham and Miriam Cunningham",
      	organization = "IIMC International Information Management Corporation",
      	address = "Florence, Italy",
      	month = "October 26--28",
      	title = "{A} {S}ystem for {S}ynergistically {S}tructuring {N}ews {C}ontent from {T}raditional {M}edia and the {B}logosphere",
      	year = 2011,
      	url = "http://www.ellogon.org/petasis/bibliography/eChallenges2011/echallenges_ref_81_doc_7322.pdf",
      	isbn = "978-1-905824-27-4"
      }
      
    2. Mara Tsoumari and Georgios Petasis.
      Coreference Annotator - A new annotation tool for aligned bilingual corpora.
      In Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora (AEPC 2), in 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011). 2011, 43–52.
      Abstract This paper presents the main features of an annotation tool, the Coreference Annotator, which manages bilingual corpora consisting of aligned texts that can be grouped in collections and subcollections according to their topics and discourse. The tool allows the manual annotation of certain linguistic items in the source text and their translation equivalent in the target text, by entering useful information about these items based on their context.
      URL BibTeX

      @inproceedings{AEPC2-RANLP-2011-Tsoumari,
      	author = "Tsoumari, Mara and Petasis, Georgios",
      	abstract = "This paper presents the main features of an annotation tool, the Coreference Annotator, which manages bilingual corpora consisting of aligned texts that can be grouped in collections and subcollections according to their topics and discourse. The tool allows the manual annotation of certain linguistic items in the source text and their translation equivalent in the target text, by entering useful information about these items based on their context.",
      	booktitle = "Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora (AEPC 2), in 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011)",
      	month = "September 15",
      	pages = "43--52",
      	title = "{C}oreference {A}nnotator - {A} new annotation tool for aligned bilingual corpora",
      	year = 2011,
      	url = "http://www.aclweb.org/anthology/W11-4307"
      }
      
    3. Georgios Petasis.
      Unsupervised Domain Adaptation based on Text Relatedness.
      In Proceedings of the International Conference Recent Advances in Natural Language Processing 2011. 2011, 733–739.
      Abstract In this paper an unsupervised approach to do-main adaptation is presented, which exploits external knowledge sources in order to port a classification model into a new thematic do-main. Our approach extracts a new feature set from documents of the target domain, and tries to align the new features to the original ones, by exploiting text relatedness from external knowledge sources, such as WordNet. The approach has been evaluated on the task of document classification, involving the classification of newsgroup postings into 20 news groups.
      URL BibTeX

      @inproceedings{petasis:2011:RANLP,
      	author = "Petasis, Georgios",
      	title = "Unsupervised Domain Adaptation based on Text Relatedness",
      	abstract = "In this paper an unsupervised approach to do-main adaptation is presented, which exploits external knowledge sources in order to port a classification model into a new thematic do-main. Our approach extracts a new feature set from documents of the target domain, and tries to align the new features to the original ones, by exploiting text relatedness from external knowledge sources, such as WordNet. The approach has been evaluated on the task of document classification, involving the classification of newsgroup postings into 20 news groups.",
      	booktitle = "Proceedings of the International Conference Recent Advances in Natural Language Processing 2011",
      	month = "September 12--14",
      	year = 2011,
      	address = "Hissar, Bulgaria",
      	publisher = "RANLP 2011 Organising Committee",
      	pages = "733--739",
      	url = "http://aclweb.org/anthology/R11-1107"
      }
      
    4. Georgios Petasis.
      Machine Learning in Natural Language Processing.
      Ph.D. Thesis, Department of Informatics and Telecommunications, University of Athens, 2011.
      Abstract This thesis examines the use of machine learning techniques in various tasks of natural language processing, mainly for the task of information extraction from texts. The objectives are the improvement of adaptability of information extraction systems to new thematic domains (or even languages), and the improvement of their performance using as fewer resources (either linguistic or human) as possible. This thesis has examined two main axes: a) the research and assessment of existing algorithms of machine learning mainly in the stages of linguistic pre-processing (such as part of speech tagging) and named-entity recognition, and b) the creation of a new machine learning algorithm and its assessment on synthetic data, as well as in real world data from the task of relation extraction between named entities. This new algorithm belongs to the category of inductive grammar learning, and can infer context free grammars from positive examples only.
      URL BibTeX

      @phdthesis{PhD-2011-Petasis,
      	author = "Petasis, Georgios",
      	abstract = "This thesis examines the use of machine learning techniques in various tasks of natural language processing, mainly for the task of information extraction from texts. The objectives are the improvement of adaptability of information extraction systems to new thematic domains (or even languages), and the improvement of their performance using as fewer resources (either linguistic or human) as possible. This thesis has examined two main axes: a) the research and assessment of existing algorithms of machine learning mainly in the stages of linguistic pre-processing (such as part of speech tagging) and named-entity recognition, and b) the creation of a new machine learning algorithm and its assessment on synthetic data, as well as in real world data from the task of relation extraction between named entities. This new algorithm belongs to the category of inductive grammar learning, and can infer context free grammars from positive examples only.",
      	keywords = "information extraction, machine learning, grammatical inference",
      	month = "July 1",
      	school = "Department of Informatics and Telecommunications, University of Athens",
      	title = "{M}achine {L}earning in {N}atural {L}anguage {P}rocessing",
      	type = "Ph.D. Thesis",
      	url = "http://www.ellogon.org/petasis/bibliography/Petasis/Ph.D.Thesis-GeorgiosPetasis.pdf",
      	year = 2011
      }
      

    Year: 2010

    1. Georgios Petasis.
      TkDND: a cross-platform drag'n'drop package.
      In Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010). 2010.
      Abstract This paper is about TkDND, a Tcl/Tk extension that aims to add cross-application drag and drop support to Tk, for popular operating systems, such as Microsoft Windows, Apple OS X and GNU/Linux. Being in its second rewrite, TkDND 2.x has a stable implementation for Windows and OS X, while support for Linux and the XDND protocol is still under development.
      URL BibTeX

      @inproceedings{Tcl-2010-TkDND,
      	author = "Petasis, Georgios",
      	abstract = "This paper is about TkDND, a Tcl/Tk extension that aims to add cross-application drag and drop support to Tk, for popular operating systems, such as Microsoft Windows, Apple OS X and GNU/Linux. Being in its second rewrite, TkDND 2.x has a stable implementation for Windows and OS X, while support for Linux and the XDND protocol is still under development.",
      	address = "Hilton Suites Chicago/Oakbrook Terrace, 10 Drury Lane, Oakbrook Terrace, Illinois, United States 60181",
      	booktitle = "Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010)",
      	month = "October 11--15",
      	title = "{T}k{DND}: a cross-platform drag'n'drop package",
      	url = "http://www.ellogon.org/petasis/bibliography/Tcl2010/TkDND.pdf",
      	year = 2010
      }
      
    2. Georgios Petasis.
      Ellogon and the challenge of threads.
      In Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010). 2010.
      Abstract This paper is about the Ellogon language engineering platform, and the challenges faced in modernising it, in order to better exploit contemporary hardware. Ellogon is an open-source infrastructure, specialised in natural language processing. Following a data model that closely resembles TIPSTER, Ellogon can be used either as an autonomous application, offering a graphical user interface, or it can be embedded in a C/C++ application as a library. Ellogon has been implemented in C/C++ and Tcl/Tk: in fact Ellogon is a vanilla Tcl interpreter, with the Ellogon core loaded as a Tcl extension, and a set of Tcl/Tk scripts that implement the GUI. The core component of Ellogon, being a Tcl extension, heavily relies on Tcl objects to implement its data model, a decision made more than a decade ago, which poses difficulties into making Ellogon a multi-threaded application.
      URL BibTeX

      @inproceedings{Tcl-2010-Ellogon,
      	author = "Petasis, Georgios",
      	abstract = "This paper is about the Ellogon language engineering platform, and the challenges faced in modernising it, in order to better exploit contemporary hardware. Ellogon is an open-source infrastructure, specialised in natural language processing. Following a data model that closely resembles TIPSTER, Ellogon can be used either as an autonomous application, offering a graphical user interface, or it can be embedded in a C/C++ application as a library. Ellogon has been implemented in C/C++ and Tcl/Tk: in fact Ellogon is a vanilla Tcl interpreter, with the Ellogon core loaded as a Tcl extension, and a set of Tcl/Tk scripts that implement the GUI. The core component of Ellogon, being a Tcl extension, heavily relies on Tcl objects to implement its data model, a decision made more than a decade ago, which poses difficulties into making Ellogon a multi-threaded application.",
      	address = "Hilton Suites Chicago/Oakbrook Terrace, 10 Drury Lane, Oakbrook Terrace, Illinois, United States 60181",
      	booktitle = "Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010)",
      	month = "October 11--15",
      	title = "{E}llogon and the challenge of threads",
      	url = "http://www.ellogon.org/petasis/bibliography/Tcl2010/EllogonAndThreads.pdf",
      	year = 2010
      }
      
    3. Georgios Petasis.
      TileQt and TileGtk: current status.
      In Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010). 2010.
      Abstract This paper is about two Tile and Ttk themes, TileQt and TileGTK. Despite being two distinct and very different extensions, the motivation for their development was common: making Tk applications look as native as possible under the Linux operating system.
      URL BibTeX

      @inproceedings{Tcl-2010-TileQtTileGTK,
      	author = "Petasis, Georgios",
      	abstract = "This paper is about two Tile and Ttk themes, TileQt and TileGTK. Despite being two distinct and very different extensions, the motivation for their development was common: making Tk applications look as native as possible under the Linux operating system.",
      	address = "Hilton Suites Chicago/Oakbrook Terrace, 10 Drury Lane, Oakbrook Terrace, Illinois, United States 60181",
      	booktitle = "Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010)",
      	month = "October 11--15",
      	title = "{T}ile{Q}t and {T}ile{G}tk: current status",
      	url = "http://www.ellogon.org/petasis/bibliography/Tcl2010/TileQtAndTileGTK.pdf",
      	year = 2010
      }
      
    4. Georgios Petasis.
      TkGecko: Another Attempt for an HTML Renderer for Tk.
      In Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010). 2010.
      Abstract The support for displaying HTML and especially complex Web sites has always been problematic in Tk. Several efforts have been made in order to alleviate this problem, and this paper presents another (and still incomplete) one. This paper presents TkGecko, a Tcl/Tk extension written in C++, which allows Gecko (the HTML processing and rendering engine developed by the Mozilla Foundation) to be embedded as a widget in Tk. The current status of the TkGecko extension is alpha quality, while the code is publically available under the BSD license.
      URL BibTeX

      @inproceedings{Tcl-2010-TkGecko,
      	author = "Petasis, Georgios",
      	abstract = "The support for displaying HTML and especially complex Web sites has always been problematic in Tk. Several efforts have been made in order to alleviate this problem, and this paper presents another (and still incomplete) one. This paper presents TkGecko, a Tcl/Tk extension written in C++, which allows Gecko (the HTML processing and rendering engine developed by the Mozilla Foundation) to be embedded as a widget in Tk. The current status of the TkGecko extension is alpha quality, while the code is publically available under the BSD license.",
      	address = "Hilton Suites Chicago/Oakbrook Terrace, 10 Drury Lane, Oakbrook Terrace, Illinois, United States 60181",
      	booktitle = "Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010)",
      	month = "October 11--15",
      	title = "{T}k{G}ecko: {A}nother {A}ttempt for an {HTML} {R}enderer for {T}k",
      	url = "http://www.ellogon.org/petasis/bibliography/Tcl2010/TkGecko.pdf",
      	year = 2010
      }
      
    5. Georgios Petasis.
      TkRibbon: Windows Ribbons for Tk.
      In Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010). 2010.
      Abstract This paper is about TkRibbon, a Tcl/Tk extension that aims to introduce support for the Windows Ribbon Framework in the Tk toolkit. The Windows Ribbon is a graphical interface where a set of toolbars are placed on tabs in a notebook widget, aiming to substitute traditional menus and toolbars. This paper briefly describes Windows Ribbon framework, the TkRibbon Tk extension and presents some examples on how TkRibbon can be used by Tk applications.
      URL BibTeX

      @inproceedings{Tcl2010-TkRibbon,
      	author = "Petasis, Georgios",
      	abstract = "This paper is about TkRibbon, a Tcl/Tk extension that aims to introduce support for the Windows Ribbon Framework in the Tk toolkit. The Windows Ribbon is a graphical interface where a set of toolbars are placed on tabs in a notebook widget, aiming to substitute traditional menus and toolbars. This paper briefly describes Windows Ribbon framework, the TkRibbon Tk extension and presents some examples on how TkRibbon can be used by Tk applications.",
      	address = "Hilton Suites Chicago/Oakbrook Terrace, 10 Drury Lane, Oakbrook Terrace, Illinois, United States 60181",
      	booktitle = "Proceedings of the 17th Annual Tcl/Tk Conference (Tcl 2010)",
      	month = "October 11--15",
      	title = "{T}k{R}ibbon: {W}indows {R}ibbons for {T}k",
      	url = "http://www.ellogon.org/petasis/bibliography/Tcl2010/TkRibbon.pdf",
      	year = 2010
      }
      
    6. Georgios Petasis and Dimitrios Petasis.
      BlogBuster: A Tool for Extracting Corpora from the Blogosphere.
      In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.). Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. 2010.
      Abstract This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, suitable for linguistic and language technology research and development, has attracted significant research interest recently. Several general purpose approaches for removing boilerplate have been presented in the literature; however the blogosphere poses additional requirements, such as a finer control over the extracted textual segments in order to accurately identify important elements, i.e. individual blog posts, titles, posting dates or comments. BlogBuster tries to provide such additional details along with boilerplate removal, following a rule-based approach. A small set of rules were manually constructed by observing a limited set of blogs from the Blogger and Wordpress hosting platforms. These rules operate on the DOM tree of an HTML page, as constructed by a popular browser, Mozilla Firefox. Evaluation results suggest that BlogBuster is very accurate when extracting corpora from blogs hosted in the Blogger and Wordpress, while exhibiting a reasonable precision when applied to blogs not hosted in these two popular blogging platforms.
      URL BibTeX

      @inproceedings{DBLP:conf/lrec/PetasisP10,
      	author = "Petasis, Georgios and Dimitrios Petasis",
      	abstract = "This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, suitable for linguistic and language technology research and development, has attracted significant research interest recently. Several general purpose approaches for removing boilerplate have been presented in the literature; however the blogosphere poses additional requirements, such as a finer control over the extracted textual segments in order to accurately identify important elements, i.e. individual blog posts, titles, posting dates or comments. BlogBuster tries to provide such additional details along with boilerplate removal, following a rule-based approach. A small set of rules were manually constructed by observing a limited set of blogs from the Blogger and Wordpress hosting platforms. These rules operate on the DOM tree of an HTML page, as constructed by a popular browser, Mozilla Firefox. Evaluation results suggest that BlogBuster is very accurate when extracting corpora from blogs hosted in the Blogger and Wordpress, while exhibiting a reasonable precision when applied to blogs not hosted in these two popular blogging platforms.",
      	address = "Valletta, Malta",
      	booktitle = "Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010",
      	editor = "Nicoletta Calzolari and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias",
      	isbn = "2-9517408-6-7",
      	month = "May 17--23",
      	publisher = "European Language Resources Association",
      	title = "{B}log{B}uster: {A} {T}ool for {E}xtracting {C}orpora from the {B}logosphere",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2010/LREC2010-BlogBuster-CameraReady.pdf",
      	year = 2010
      }
      

    Year: 2009

    1. Georgios Petasis, Vangelis Karkaletsis, Anastasia Krithara, Georgios Paliouras and Constantine D Spyropoulos.
      Semi-automated ontology learning: the BOEMIE approach.
      In Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS 2009), 6th European Semantic Web Conference (ESWC 2009). 2009.
      Abstract In this paper we describe a semi-automated approach for ontology learning. Exploiting an ontology-based multimodal information extraction system, the ontology learning subsystem accumulates documents that are insufficiently analysed and through clustering proposes new concepts, relations and interpretation rules to be added to the ontology.
      URL BibTeX

      @inproceedings{citeulike:9267249,
      	author = "Petasis, Georgios and Vangelis Karkaletsis and Anastasia Krithara and Georgios Paliouras and Constantine D. Spyropoulos",
      	abstract = "In this paper we describe a semi-automated approach for ontology learning. Exploiting an ontology-based multimodal information extraction system, the ontology learning subsystem accumulates documents that are insufficiently analysed and through clustering proposes new concepts, relations and interpretation rules to be added to the ontology.",
      	address = "Hersonissos, Crete, Greece",
      	booktitle = "Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS 2009), 6th European Semantic Web Conference (ESWC 2009)",
      	day = 1,
      	keywords = "evolution, ontologies",
      	month = "June 1",
      	title = "{S}emi-automated ontology learning: the {BOEMIE} approach",
      	url = "http://www.ellogon.org/petasis/bibliography/ESWC2009/IRMLeS2009-ESWC2009.pdf",
      	year = 2009
      }
      
    2. Silvana Castano, Irma Sofia Espinosa Peraldi, Alfio Ferrara, Vangelis Karkaletsis, Atila Kaya, Ralf Möller, Stefano Montanelli, Georgios Petasis and Michael Wessel.
      Multimedia Interpretation for Dynamic Ontology Evolution.
      Journal of Logic and Computation 19(5):859–897, 2009.
      Abstract The recent success of distributed and dynamic infrastructures for knowledge sharing has raised the need for semiautomatic/automatic ontology evolution strategies. Ontology evolution is generally defined as the timely adaptation of an ontology to changing requirements and the consistent propagation of changes to dependent artifacts. In this article, we present an ontology evolution approach in the context of multimedia interpretation. Ontology evolution in this context relies on the results obtained through reasoning for the interpretation of multimedia resources, through population of the ontology with new individuals or through enrichment of the ontology with new concepts and new semantic relations. The article analyses the results of interpretation, population and enrichment obtained in evaluation experiments in terms of measures such as precision and recall. The evaluation reveals encouraging results.
      URL, DOI BibTeX

      @article{Castano01102009,
      	author = {Silvana Castano and Irma Sofia Espinosa Peraldi and Alfio Ferrara and Karkaletsis, Vangelis and Atila Kaya and Ralf M{\"o}ller and Stefano Montanelli and Petasis, Georgios and Michael Wessel},
      	abstract = "The recent success of distributed and dynamic infrastructures for knowledge sharing has raised the need for semiautomatic/automatic ontology evolution strategies. Ontology evolution is generally defined as the timely adaptation of an ontology to changing requirements and the consistent propagation of changes to dependent artifacts. In this article, we present an ontology evolution approach in the context of multimedia interpretation. Ontology evolution in this context relies on the results obtained through reasoning for the interpretation of multimedia resources, through population of the ontology with new individuals or through enrichment of the ontology with new concepts and new semantic relations. The article analyses the results of interpretation, population and enrichment obtained in evaluation experiments in terms of measures such as precision and recall. The evaluation reveals encouraging results.",
      	doi = "10.1093/logcom/exn049",
      	journal = "Journal of Logic and Computation",
      	number = 5,
      	pages = "859--897",
      	title = "{M}ultimedia {I}nterpretation for {D}ynamic {O}ntology {E}volution",
      	url = "http://logcom.oxfordjournals.org/content/19/5/859.abstract",
      	volume = 19,
      	year = 2009,
      	eprint = "http://logcom.oxfordjournals.org/content/19/5/859.full.pdf+html"
      }
      

    Year: 2008

    1. Georgios Petasis, Pavlina Fragkou, Aris Theodorakos, Vangelis Karkaletsis and Constantine D Spyropoulos.
      Segmenting HTML pages using visual and semantic information.
      In Proceedings of the 4th Web as a Corpus Workshop (WAC-4), 6th Language Resources and Evaluation Conference (LREC 2008). 2008, 18–24.
      Proceedings: The 4th Web as Corpus: Can we do better than Google? http://www.lrec-conf.org/proceedings/lrec2008/workshops/W19_Proceedings.pdf.
      Abstract The information explosion of the Web aggravates the problem of effective information retrieval. Even though linguistic approaches found in the literature perform linguistic annotation by creating metadata in the form of tokens, lemmas or part of speech tags, however,this process is insufficient. This is due to the fact that these linguistic metadata do not exploit the actual content of the page, leading to the need of performing semantic annotation based on a predefined semantic model. This paper proposes a new learning approach for performing automatic semantic annotation. This is the result of a two step procedure: the first step partitions a web page into blocks based on its visual layout, while the second, performs subsequent partitioning based on the examination of appearance of specific types of entities denoting the semantic category as well as the application of a number of simple heuristics. Preliminary experiments performed on a manually annotated corpus regarding athletics proved to be very promising.
      URL, DOI BibTeX

      @inproceedings{citeulike:5663452,
      	author = "Petasis, Georgios and Pavlina Fragkou and Aris Theodorakos and Vangelis Karkaletsis and Constantine D. Spyropoulos",
      	abstract = "The information explosion of the Web aggravates the problem of effective information retrieval. Even though linguistic approaches found in the literature perform linguistic annotation by creating metadata in the form of tokens, lemmas or part of speech tags, however,this process is insufficient. This is due to the fact that these linguistic metadata do not exploit the actual content of the page, leading to the need of performing semantic annotation based on a predefined semantic model. This paper proposes a new learning approach for performing automatic semantic annotation. This is the result of a two step procedure: the first step partitions a web page into blocks based on its visual layout, while the second, performs subsequent partitioning based on the examination of appearance of specific types of entities denoting the semantic category as well as the application of a number of simple heuristics. Preliminary experiments performed on a manually annotated corpus regarding athletics proved to be very promising.",
      	address = "Marrakech, Morocco",
      	booktitle = "Proceedings of the 4th Web as a Corpus Workshop (WAC-4), 6th Language Resources and Evaluation Conference (LREC 2008)",
      	doi = "10.1109/SPCA.2006.297506",
      	journal = "4th Web as Corpus Workshop (WAC-4)",
      	month = "June 1",
      	note = "Proceedings: The 4th Web as Corpus: Can we do better than Google? http://www.lrec-conf.org/proceedings/lrec2008/workshops/W19_Proceedings.pdf",
      	pages = "18--24",
      	title = "{S}egmenting {HTML} pages using visual and semantic information",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2008/LREC-2008-SemanticSegmentation-Submitted.pdf",
      	year = 2008
      }
      
    2. Pavlina Fragkou, Georgios Petasis, Aris Theodorakos, Vangelis Karkaletsis and Constantine D Spyropoulos.
      BOEMIE Ontology-Based Text Annotation Tool.
      In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). 2008.
      Abstract The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be significantly facilitated by annotation tools that are able to annotate, according to a defined ontology, not only named entities but most importantly relations between them. This paper describes the BOEMIE ontology-based annotation tool which is able to locate blocks of text that correspond to specific types of named entities, fill tables corresponding to ontology concepts with those named entities and link the filled tables based on relations defined in the domain ontology. Additionally, it can perform annotation of blocks of text that refer to the same topic. The tool has a user-friendly interface, supports automatic pre-annotation, annotation comparison as well as customization to other annotation schemata. The annotation tool has been used in a large scale annotation task involving 3000 web pages regarding athletics. It has also been used in another annotation task involving 503 web pages with medical information, in different languages.
      URL BibTeX

      @inproceedings{DBLP:conf/lrec/FragkouPTKS08,
      	author = "Pavlina Fragkou and Petasis, Georgios and Aris Theodorakos and Karkaletsis, Vangelis and Constantine D. Spyropoulos",
      	abstract = "The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be significantly facilitated by annotation tools that are able to annotate, according to a defined ontology, not only named entities but most importantly relations between them. This paper describes the BOEMIE ontology-based annotation tool which is able to locate blocks of text that correspond to specific types of named entities, fill tables corresponding to ontology concepts with those named entities and link the filled tables based on relations defined in the domain ontology. Additionally, it can perform annotation of blocks of text that refer to the same topic. The tool has a user-friendly interface, supports automatic pre-annotation, annotation comparison as well as customization to other annotation schemata. The annotation tool has been used in a large scale annotation task involving 3000 web pages regarding athletics. It has also been used in another annotation task involving 503 web pages with medical information, in different languages.",
      	address = "Marrakech, Morocco",
      	booktitle = "Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008)",
      	month = "May 26 -- June 1",
      	publisher = "European Language Resources Association",
      	title = "{BOEMIE} {O}ntology-{B}ased {T}ext {A}nnotation {T}ool",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2008/LREC-2008-324_paper.pdf",
      	year = 2008
      }
      
    3. Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras and Constantine D Spyropoulos.
      Learning context-free grammars to extract relations from text.
      In Malik Ghallab, Constantine D Spyropoulos, Nikos Fakotakis and Nikolaos M Avouris (eds.). Proceeding of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence 178. 2008, 303–307.
      Abstract In this paper we propose a novel relation extraction method, based on grammatical inference. Following a semi-supervised learning approach, the text that connects named entities in an annotated corpus is used to infer a context free grammar. The grammar learning algorithm is able to infer grammars from positive examples only, controlling overgeneralisation through minimum description length. Evaluation results show that the proposed approach performs comparable to the state of the art, while exhibiting a bias towards precision, which is a sign of conservative generalisation.
      URL BibTeX

      @inproceedings{Petasis:2008:LCG:1567281.1567350,
      	author = "Petasis, Georgios and Karkaletsis, Vangelis and Georgios Paliouras and Constantine D. Spyropoulos",
      	abstract = "In this paper we propose a novel relation extraction method, based on grammatical inference. Following a semi-supervised learning approach, the text that connects named entities in an annotated corpus is used to infer a context free grammar. The grammar learning algorithm is able to infer grammars from positive examples only, controlling overgeneralisation through minimum description length. Evaluation results show that the proposed approach performs comparable to the state of the art, while exhibiting a bias towards precision, which is a sign of conservative generalisation.",
      	address = "Amsterdam, The Netherlands, The Netherlands",
      	booktitle = "Proceeding of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence",
      	editor = "Malik Ghallab and Constantine D. Spyropoulos and Nikos Fakotakis and Nikolaos M. Avouris",
      	isbn = "978-1-58603-891-5",
      	pages = "303--307",
      	publisher = "IOS Press",
      	series = "Frontiers in Artificial Intelligence and Applications",
      	title = "{L}earning context-free grammars to extract relations from text",
      	url = "http://www.ellogon.org/petasis/bibliography/ECAI2008/ECAI2008_0371.pdf",
      	volume = 178,
      	year = 2008
      }
      

    Year: 2007

    1. Silvana Castano, Sofia Espinosa, Alfio Ferrara, Vangelis Karkaletsis, Atila Kaya, Sylvia Melzer, Ralf M "o, Stefano Montanelli and Georgios Petasis.
      Ontology Dynamics with Multimedia Information: The BOEMIE Evolution Methodology.
      In Proceedings of the International ESWC Workshop on Ontology Dynamics (IWOD 2007). 2007.
      http://kmi.open.ac.uk/events/iwod/.
      Abstract In this paper, we present the ontology evolution methodology developed in the context of the BOEMIE project. Ontology evolution in BOEMIE relies on the results obtained through reasoning for the interpretation of multimedia resources in order to evolve (enhance) the ontology, through population of the ontology with new instances, or through enrichment of the ontology with new concepts and new semantic relations.
      URL BibTeX

      @inproceedings{IWOD07,
      	author = {Silvana Castano and Sofia Espinosa and Alfio Ferrara and Karkaletsis, Vangelis and Atila Kaya and Sylvia Melzer and Ralf M{\{"}o}ller and Stefano Montanelli and Petasis, Georgios},
      	abstract = "In this paper, we present the ontology evolution methodology developed in the context of the BOEMIE project. Ontology evolution in BOEMIE relies on the results obtained through reasoning for the interpretation of multimedia resources in order to evolve (enhance) the ontology, through population of the ontology with new instances, or through enrichment of the ontology with new concepts and new semantic relations.",
      	address = "Innsbruck, Austria",
      	booktitle = "Proceedings of the International ESWC Workshop on Ontology Dynamics (IWOD 2007)",
      	month = "June 7",
      	note = "http://kmi.open.ac.uk/events/iwod/",
      	title = "{O}ntology {D}ynamics with {M}ultimedia {I}nformation: {T}he {BOEMIE} {E}volution {M}ethodology",
      	url = "http://www.ellogon.org/petasis/bibliography/IWOD2007/IWOD2007-paper-07.pdf",
      	year = 2007
      }
      

    Year: 2005

    1. Dimitris Spiliotopoulos, Georgios Petasis and Georgios Kouroupetroglou.
      Prosodically Enriched Text Annotation for High Quality Speech Synthesis.
      In Proceedings of the 10th International Conference on Speech and Computer (SPECOM-2005). 2005, 313–316.
      Abstract Linguistically enriched text generated from natural language modules contributes significantly on the quality of speech synthesis. For all cases where such modules are not available, such enriched input needs to be produced from plain text in order to maintain quality. This work reports on a framework of several combined language resources and procedures (word/sentence identification, syntactic analysis, prosodic feature annotation) for text annotation/processing from plain text. Using that, the implementation of an automatic XML formatted output generation module produces the prosodically enriched markup.
      URL BibTeX

      @inproceedings{SPECOM-2005-Spiliotopoulos,
      	author = "Dimitris Spiliotopoulos and Petasis, Georgios and Georgios Kouroupetroglou",
      	abstract = "Linguistically enriched text generated from natural language modules contributes significantly on the quality of speech synthesis. For all cases where such modules are not available, such enriched input needs to be produced from plain text in order to maintain quality. This work reports on a framework of several combined language resources and procedures (word/sentence identification, syntactic analysis, prosodic feature annotation) for text annotation/processing from plain text. Using that, the implementation of an automatic XML formatted output generation module produces the prosodically enriched markup.",
      	address = "Patras, Greece",
      	booktitle = "Proceedings of the 10th International Conference on Speech and Computer (SPECOM-2005)",
      	month = "October 17--19",
      	pages = "313--316",
      	title = "{P}rosodically {E}nriched {T}ext {A}nnotation for {H}igh {Q}uality {S}peech {S}ynthesis",
      	url = "http://www.ellogon.org/petasis/bibliography/SPECOM2005/Spiliotopoulos-SPECOM-2005.pdf",
      	year = 2005
      }
      

    Year: 2004

    1. Georgios Petasis, Georgios Paliouras, Constantine D Spyropoulos and Constantine Halatsis.
      Eg-GRIDS: Context-Free Grammatical Inference from Positive Examples Using Genetic Search.
      In Georgios Paliouras and Yasubumi Sakakibara (eds.). Grammatical Inference: Algorithms and Applications, Proceedings of the 7th International Colloquium on Grammatical Inference (ICGI 2004) 3264. 2004, 223–234.
      Abstract In this paper we present eg-GRIDS, an algorithm for inducing context-free grammars that is able to learn from positive sample sentences. The presented algorithm, similar to its GRIDS predecessors, uses simplicity as a criterion for directing inference, and a set of operators for exploring the search space. In addition to the basic beam search strategy of GRIDS, eg-GRIDS incorporates an evolutionary grammar selection process, aiming to explore a larger part of the search space. Evaluation results are presented on artificially generated data, comparing the performance of beam search and genetic search. These results show that genetic search performs better than beam search while being significantly more efficient computationally.
      URL BibTeX

      @inproceedings{DBLP:conf/icgi/PetasisPSH04,
      	author = "Petasis, Georgios and Georgios Paliouras and Constantine D. Spyropoulos and Constantine Halatsis",
      	abstract = "In this paper we present eg-GRIDS, an algorithm for inducing context-free grammars that is able to learn from positive sample sentences. The presented algorithm, similar to its GRIDS predecessors, uses simplicity as a criterion for directing inference, and a set of operators for exploring the search space. In addition to the basic beam search strategy of GRIDS, eg-GRIDS incorporates an evolutionary grammar selection process, aiming to explore a larger part of the search space. Evaluation results are presented on artificially generated data, comparing the performance of beam search and genetic search. These results show that genetic search performs better than beam search while being significantly more efficient computationally.",
      	address = "Athens, Greece",
      	booktitle = "Grammatical Inference: Algorithms and Applications, Proceedings of the 7th International Colloquium on Grammatical Inference (ICGI 2004)",
      	editor = "Georgios Paliouras and Yasubumi Sakakibara",
      	isbn = "3-540-23410-1",
      	month = "October 11--13",
      	pages = "223--234",
      	publisher = "Springer Berlin / Heidelberg",
      	series = "Lecture Notes in Computer Science",
      	title = "{E}g-{GRIDS}: {C}ontext-{F}ree {G}rammatical {I}nference from {P}ositive {E}xamples {U}sing {G}enetic {S}earch",
      	url = "http://www.ellogon.org/petasis/bibliography/ICGI2004/e-GRIDS-ICGI-2004-Submission.pdf",
      	volume = 3264,
      	year = 2004
      }
      
    2. Georgios Petasis, Vangelis Karkaletsis, Claire Grover, Ben Hachey, Maria Teresa Pazienza, Michele Vindigni and José Coch.
      Adaptive, Multilingual Named Entity Recognition in Web Pages.
      In Ramon López Mántaras and Lorenza Saitta (eds.). Proceedings of the 16th Eureopean Conference on Artificial Intelligence (ECAI'2004), including Prestigious Applicants of Intelligent Systems (PAIS 2004). 2004, 1073–1074.
      Extended version: http://www.ellogon.org/petasis/bibliography/ECAI2004/ECAI2004_NERC.pdf.
      Abstract Most of the information on the Web today is in the form of HTML documents, which are designed for presentation purposes and not for machine understanding and reasoning. Existing web extraction systems require a lot of human involvement for maintenance due to changes to targeted web sites and for adaptation to new web sites or even to new domains. This paper presents the adaptive, multilingual named entity recognition and classification (NERC) technologies developed for processing web pages in the context of the R&D project CROSSMARC. The evaluation results demonstrate the viability of our approach.
      URL BibTeX

      @inproceedings{DBLP:conf/ecai/PetasisKGHPVC04,
      	author = "Petasis, Georgios and Karkaletsis, Vangelis and Claire Grover and Ben Hachey and Maria Teresa Pazienza and Michele Vindigni and Jos{\'e} Coch",
      	abstract = "Most of the information on the Web today is in the form of HTML documents, which are designed for presentation purposes and not for machine understanding and reasoning. Existing web extraction systems require a lot of human involvement for maintenance due to changes to targeted web sites and for adaptation to new web sites or even to new domains. This paper presents the adaptive, multilingual named entity recognition and classification (NERC) technologies developed for processing web pages in the context of the R{\&}D project CROSSMARC. The evaluation results demonstrate the viability of our approach.",
      	address = "Valencia, Spain",
      	booktitle = "Proceedings of the 16th Eureopean Conference on Artificial Intelligence (ECAI'2004), including Prestigious Applicants of Intelligent Systems (PAIS 2004)",
      	crossref = "DBLP:conf/ecai/2004",
      	editor = "Ramon L{\'o}pez de M{\'a}ntaras and Lorenza Saitta",
      	isbn = "1-58603-452-9",
      	month = "August 22--27",
      	note = "Extended version: http://www.ellogon.org/petasis/bibliography/ECAI2004/ECAI2004_NERC.pdf",
      	pages = "1073--1074",
      	publisher = "IOS Press",
      	title = "{A}daptive, {M}ultilingual {N}amed {E}ntity {R}ecognition in {W}eb {P}ages",
      	url = "http://www.ellogon.org/petasis/bibliography/ECAI2004/Petasis-ECAI2004-Poster.pdf",
      	year = 2004
      }
      
    3. Stavros J Perantonis, Basilios Gatos, Vassilios Maragos, Vangelis Karkaletsis and Georgios Petasis.
      Text Area Identification in Web Images.
      In George A Vouros and Themis Panayiotopoulos (eds.). Methods and Applications of Artificial Intelligence, Proceedings of the 3rd Hellenic Conference on Artificial Intelligence (SETN 2004) 3025. May 2004, 82–92.
      Abstract With the explosive growth of the World Wide Web, millions of documents are published and accessed on-line. Statistics show that a significant part of Web text information is encoded in Web images. Since Web images have special characteristics that sometimes distinguish them from other types of images, commercial OCR products often fail to recognize Web images due to their special characteristics. This paper proposes a novel Web image processing algorithm that aims to locate text areas and prepare them for OCR procedure with better results. Our methodology for text area identification has been fully integrated with an OCR engine and with an Information Extraction system. We present quantitative results for the performance of the OCR engine as well as qualitative results concerning its effects to the Information Extraction system. Experimental results obtained from a large corpus of Web images, demonstrate the efficiency of our methodology.
      URL BibTeX

      @inproceedings{DBLP:conf/setn/PerantonisGMKP04,
      	author = "Stavros J. Perantonis and Basilios Gatos and Vassilios Maragos and Karkaletsis, Vangelis and Petasis, Georgios",
      	abstract = "With the explosive growth of the World Wide Web, millions of documents are published and accessed on-line. Statistics show that a significant part of Web text information is encoded in Web images. Since Web images have special characteristics that sometimes distinguish them from other types of images, commercial OCR products often fail to recognize Web images due to their special characteristics. This paper proposes a novel Web image processing algorithm that aims to locate text areas and prepare them for OCR procedure with better results. Our methodology for text area identification has been fully integrated with an OCR engine and with an Information Extraction system. We present quantitative results for the performance of the OCR engine as well as qualitative results concerning its effects to the Information Extraction system. Experimental results obtained from a large corpus of Web images, demonstrate the efficiency of our methodology.",
      	address = "Samos, Greece",
      	booktitle = "Methods and Applications of Artificial Intelligence, Proceedings of the 3rd Hellenic Conference on Artificial Intelligence (SETN 2004)",
      	editor = "George A. Vouros and Themis Panayiotopoulos",
      	isbn = "3-540-21937-4",
      	month = "May",
      	pages = "82--92",
      	publisher = "Springer Berlin / Heidelberg",
      	series = "Lecture Notes in Computer Science",
      	title = "{T}ext {A}rea {I}dentification in {W}eb {I}mages",
      	url = "http://www.ellogon.org/petasis/bibliography/SETN2004/SETN2004.pdf",
      	volume = 3025,
      	year = 2004
      }
      
    4. Georgios Petasis, Georgios Paliouras, Vangelis Karkaletsis, Constantine Halatsis and Constantine D Spyropoulos.
      E-GRIDS: Computationally Efficient Grammatical Inference from Positive Examples.
      GRAMMARS 7:69–110, 2004.
      Technical Report referenced in the paper: http://www.ellogon.org/petasis/bibliography/GRAMMARS/GRAMMARS2004-SpecialIssue-Petasis-TechnicalReport.pdf.
      Abstract In this paper we present a new computationally efficient algorithm for inducing context-free grammars that is able to learn from positive sample sentences. This new algorithm uses simplicity as a criterion for directing inference, and the search process of the new algorithm has been optimised by utilising the results of a theoretical analysis regarding the behaviour and complexity of the search operators. Evaluation results are presented on artificially generated data, while the scalability of the algorithm is tested on a large textual corpus. These results show that the new algorithm performs well and can infer grammars from large data sets in a reasonable amount of time.
      URL BibTeX

      @article{GRAMMARS-vol.7-Petasis,
      	author = "Petasis, Georgios and Georgios Paliouras and Vangelis Karkaletsis and Constantine Halatsis and Constantine D. Spyropoulos",
      	abstract = "In this paper we present a new computationally efficient algorithm for inducing context-free grammars that is able to learn from positive sample sentences. This new algorithm uses simplicity as a criterion for directing inference, and the search process of the new algorithm has been optimised by utilising the results of a theoretical analysis regarding the behaviour and complexity of the search operators. Evaluation results are presented on artificially generated data, while the scalability of the algorithm is tested on a large textual corpus. These results show that the new algorithm performs well and can infer grammars from large data sets in a reasonable amount of time.",
      	journal = "GRAMMARS",
      	keywords = "grammatical inference, context-free grammars, minimum description length, positive examples",
      	note = "Technical Report referenced in the paper: http://www.ellogon.org/petasis/bibliography/GRAMMARS/GRAMMARS2004-SpecialIssue-Petasis-TechnicalReport.pdf",
      	pages = "69--110",
      	title = "{E}-{GRIDS}: {C}omputationally {E}fficient {G}rammatical {I}nference from {P}ositive {E}xamples",
      	url = "http://www.ellogon.org/petasis/bibliography/GRAMMARS/GRAMMARS2004.pdf",
      	volume = 7,
      	year = 2004
      }
      

    Year: 2003

    1. Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras and Constantine D Spyropoulos.
      Using the Ellogon Natural Language Engineering Infrastructure.
      In Proceedings of the Workshop on Balkan Language Resources and Tools, 1st Balkan Conference in Informatics (BCI 2003). 2003.
      http://labs-repos.iit.demokritos.gr/skel/bci03_workshop/.
      Abstract Ellogon is a multi-lingual, cross-operating system, general-purpose natural language engineering infrastructure. Ellogon has been used extensively in various NLP applications. It is currently provided for free for research use to research and academic organisations. In this paper, we outline its architecture and data model, present Ellogon features as used by different types of users and discuss its functionalities against other infrastructures for language engineering.
      URL BibTeX

      @inproceedings{BCI2003-Petasis,
      	author = "Petasis, Georgios and Vangelis Karkaletsis and Georgios Paliouras and Constantine D. Spyropoulos",
      	abstract = "Ellogon is a multi-lingual, cross-operating system, general-purpose natural language engineering infrastructure. Ellogon has been used extensively in various NLP applications. It is currently provided for free for research use to research and academic organisations. In this paper, we outline its architecture and data model, present Ellogon features as used by different types of users and discuss its functionalities against other infrastructures for language engineering.",
      	address = "Thessaloniki, Greece",
      	booktitle = "Proceedings of the Workshop on Balkan Language Resources and Tools, 1st Balkan Conference in Informatics (BCI 2003)",
      	month = "November 21",
      	note = "http://labs-repos.iit.demokritos.gr/skel/bci03_workshop/",
      	title = "{U}sing the {E}llogon {N}atural {L}anguage {E}ngineering {I}nfrastructure",
      	url = "http://www.ellogon.org/petasis/bibliography/BCI2003/BCI2003-Petasis.pdf",
      	year = 2003
      }
      
    2. Georgios Petasis, Vangelis Karkaletsis and Constantine D Spyropoulos.
      Cross-lingual Information Extraction from Web pages: the use of a general-purpose Text Engineering Platform.
      In Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003). 2003, 381–388.
      http://lml.bas.bg/ranlp2003/.
      Abstract In this paper we present how the use of a general-purpose text engineering platform has facilitated the development of a cross-lingual information extraction system and its adaptation to new domains and languages. Our approach for crosslingual information extraction from the Web covers all the way from the identification of Web sites of interest, to the location of the domain specific Web pages, to the extraction of specific information from the Web pages and its presentation to the end-user. This approach has been implemented in the context of the IST project CROSSMARC. The text engineering platform "Ellogon" offers functionalities that facilitated the development of core CROSSMARC components as well as their porting into new domains and languages.
      URL BibTeX

      @inproceedings{RANLP2003-Petasis,
      	author = "Petasis, Georgios and Vangelis Karkaletsis and Constantine D. Spyropoulos",
      	abstract = {In this paper we present how the use of a general-purpose text engineering platform has facilitated the development of a cross-lingual information extraction system and its adaptation to new domains and languages. Our approach for crosslingual information extraction from the Web covers all the way from the identification of Web sites of interest, to the location of the domain specific Web pages, to the extraction of specific information from the Web pages and its presentation to the end-user. This approach has been implemented in the context of the IST project CROSSMARC. The text engineering platform {"}Ellogon{"} offers functionalities that facilitated the development of core CROSSMARC components as well as their porting into new domains and languages.},
      	address = "Borovets, Bulgaria",
      	booktitle = "Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing (RANLP 2003)",
      	month = "September 10--12",
      	note = "http://lml.bas.bg/ranlp2003/",
      	pages = "381--388",
      	title = "{C}ross-lingual {I}nformation {E}xtraction from {W}eb pages: the use of a general-purpose {T}ext {E}ngineering {P}latform",
      	url = "http://www.ellogon.org/petasis/bibliography/RANLP2003/RANLP-CameraReady.pdf",
      	year = 2003
      }
      

    Year: 2002

    1. Dimitra Farmakiotou, Vangelis Karkaletsis, Ioannis Koutsias, Georgios Petasis and Constantine D Spyropoulos.
      PatEdit: An Information Extraction Pattern Editor for Fast System Customization.
      In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002). 2002, 1097–1102.
      Abstract This paper addresses the problem of Information Extraction (IE) system customization to new domains and extraction needs with the use of PatEdit, an IE Pattern Editor. PatEdit is a human-assisted knowledge engineering tool, that facilitates the production of IE patterns. First, we present the problem of IE system customisation and the use of human assisted knowledge engineering tools. Then, we describe PatEdit with respect to the IE pattern language used and discuss its characteristics that facilitate rapid pattern writing. Finally, the exploitation of PatEdit in two information extraction projects is presented along with our plans for future work.
      URL BibTeX

      @inproceedings{FarmakiotouEtAl02,
      	author = "Dimitra Farmakiotou and Karkaletsis, Vangelis and Ioannis Koutsias and Petasis, Georgios and Constantine D. Spyropoulos",
      	abstract = "This paper addresses the problem of Information Extraction (IE) system customization to new domains and extraction needs with the use of PatEdit, an IE Pattern Editor. PatEdit is a human-assisted knowledge engineering tool, that facilitates the production of IE patterns. First, we present the problem of IE system customisation and the use of human assisted knowledge engineering tools. Then, we describe PatEdit with respect to the IE pattern language used and discuss its characteristics that facilitate rapid pattern writing. Finally, the exploitation of PatEdit in two information extraction projects is presented along with our plans for future work.",
      	address = "Las Palmas, Canary Islands, Spain",
      	booktitle = "Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002)",
      	month = "May 29--31",
      	pages = "1097--1102",
      	publisher = "European Language Resources Association",
      	title = "{P}at{E}dit: {A}n {I}nformation {E}xtraction {P}attern {E}ditor for {F}ast {S}ystem {C}ustomization",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2002/LREC2002_Farmakiotou.pdf",
      	year = 2002
      }
      
    2. Claire Grover, Scott Mcdonald, Donnla Nic Gearailt, Vangelis Karkaletsis, Dimitra Farmakiotou, Georgios Samaritakis, Georgios Petasis, Maria Teresa Pazienza, Michele Vindigni and Frantz Vichot.
      Multilingual XML-Based Named Entity Recognition for E-Retail Domains.
      In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002). 2002.
      Abstract We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.
      URL BibTeX

      @inproceedings{LREC2002-Grover,
      	author = "Claire Grover and Scott Mcdonald and Donnla Nic Gearailt and Vangelis Karkaletsis and Dimitra Farmakiotou and Georgios Samaritakis and Petasis, Georgios and Maria Teresa Pazienza and Michele Vindigni and Frantz Vichot",
      	abstract = "We describe the multilingual Named Entity Recognition and Classification (NERC) subpart of an e-retail product comparison system which is currently under development as part of the EU-funded project CROSSMARC. The system must be rapidly extensible, both to new languages and new domains. To achieve this aim we use XML as our common exchange format and the monolingual NERC components use a combination of rule-based and machine-learning techniques. It has been challenging to process web pages which contain heavily structured data where text is intermingled with HTML and other code. Our preliminary evaluation results demonstrate the viability of our approach.",
      	address = "Las Palmas, Canary Islands, Spain",
      	booktitle = "Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002)",
      	month = "May 29--31",
      	publisher = "European Language Resources Association",
      	title = "{M}ultilingual {XML}-{B}ased {N}amed {E}ntity {R}ecognition for {E}-{R}etail {D}omains",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2002/LREC2002_Grover.pdf",
      	year = 2002
      }
      
    3. Georgios Petasis, Vangelis Karkaletsis, Georgios Paliouras, Ion Androutsopoulos and Constantine D Spyropoulos.
      Ellogon: A New Text Engineering Platform.
      In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002). 2002, 72–78.
      Abstract This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.
      URL BibTeX

      @inproceedings{Petasis02ellogon:a,
      	author = "Petasis, Georgios and Karkaletsis, Vangelis and Georgios Paliouras and Ion Androutsopoulos and Constantine D. Spyropoulos",
      	abstract = "This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language engineering systems for the end-user. Ellogon provides a powerful TIPSTER-based infrastructure for managing, storing and exchanging textual data, embedding and managing text processing components as well as visualising textual data and their associated linguistic information. Among its key features are full Unicode support, an extensive multi-lingual graphical user interface, its modular architecture and the reduced hardware requirements.",
      	address = "Las Palmas, Canary Islands, Spain",
      	booktitle = "Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002)",
      	month = "May 29--31",
      	pages = "72--78",
      	publisher = "European Language Resources Association",
      	title = "{E}llogon: {A} {N}ew {T}ext {E}ngineering {P}latform",
      	url = "http://www.ellogon.org/petasis/bibliography/LREC2002/LREC2002_Petasis.pdf",
      	year = 2002
      }
      
    4. Dimitra Farmakiotou, Vangelis Karkaletsis, Georgios Samaritakis, Georgios Petasis and Constantine D Spyropoulos.
      Named Entity Recognition from Greek Web Pages.
      In Ioannis P Vlahavas and Constantine D Spyropoulos (eds.). Proceedings of the 2nd Hellenic Conference on Artificial Intelligence (SETN-02), Companion Volume. 2002, 91–102.
      http://lpis.csd.auth.gr/setn02/.
      Abstract We describe the functionalities of the Hellenic Named Entity Recognition and Classification (HNERC) system developed in the context of the CROSSMARC project. CROSSMARC is developing technology for e-retail product comparison. The CROSSMARC system locates relevant retailers’ web pages and processes them in order to extract information about their products (e.g. technical features, prices). CROSSMARC’s technology is demonstrated and evaluated for two different product types and four languages (English, Greek, Italian, French). This paper presents the HNERC system that is responsible for the identification and classification of specific types of proper names (e.g. laptop manufacturers, models), numerical expressions (e.g. length, weight), and temporal expressions (e.g. time, date) in Hellenic vendor sites. The paper presents the HNERC processing stages using examples from the laptops domain.
      URL BibTeX

      @inproceedings{DBLP:conf/setn/AndroutsopoulosSSDKS02,
      	author = "Dimitra Farmakiotou and Karkaletsis, Vangelis and Georgios Samaritakis and Petasis, Georgios and Constantine D. Spyropoulos",
      	abstract = "We describe the functionalities of the Hellenic Named Entity Recognition and Classification (HNERC) system developed in the context of the CROSSMARC project. CROSSMARC is developing technology for e-retail product comparison. The CROSSMARC system locates relevant retailers’ web pages and processes them in order to extract information about their products (e.g. technical features, prices). CROSSMARC’s technology is demonstrated and evaluated for two different product types and four languages (English, Greek, Italian, French). This paper presents the HNERC system that is responsible for the identification and classification of specific types of proper names (e.g. laptop manufacturers, models), numerical expressions (e.g. length, weight), and temporal expressions (e.g. time, date) in Hellenic vendor sites. The paper presents the HNERC processing stages using examples from the laptops domain.",
      	address = "Thessaloniki, Greece",
      	booktitle = "Proceedings of the 2nd Hellenic Conference on Artificial Intelligence (SETN-02), Companion Volume",
      	editor = "Ioannis P. Vlahavas and Constantine D. Spyropoulos",
      	month = "April 11--12",
      	note = "http://lpis.csd.auth.gr/setn02/",
      	pages = "91--102",
      	title = "{N}amed {E}ntity {R}ecognition from {G}reek {W}eb {P}ages",
      	url = "http://www.ellogon.org/petasis/bibliography/SETN2002/091.pdf",
      	year = 2002
      }
      
    5. Georgios Petasis, Sergios Petridis, Georgios Paliouras, Vangelis Karkaletsis, Stavros J Perantonis and Constantine D Spyropoulos.
      Symbolic and Neural Learning of Named-Entity Recognition and Classification Systems in Two Languages.
      In Hans-J"urgen Zimmermann, Georgios Tselentis, Maarsten Someren and Georgios Dounias (eds.). Advances in Computational Intelligence and Learning: Methods and Applications. International Series in Intelligent Technologies series, volume 18, Springer Berlin / Heidelberg, January 2002, pages 193–210.
      http://www.springer.com/mathematics/book/978-0-7923-7645-3.
      Abstract This paper compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.
      URL BibTeX

      @incollection{Petasis:2002:SNL:647292.722672,
      	author = "Petasis, Georgios and Sergios Petridis and Georgios Paliouras and Karkaletsis, Vangelis and Stavros J. Perantonis and Constantine D. Spyropoulos",
      	abstract = "This paper compares two alternative approaches to the problem of acquiring named-entity recognition and classification systems from training corpora, in two different languages. The process of named-entity recognition and classification is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, effective methods to acquire such systems automatically from data are very desirable. In this paper we compare two popular learning methods on this task: a decision-tree induction method and a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate data representation for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on large corpora of English and Greek texts and present the results. In addition to the good performance of both methods, one very interesting result is the fact that a simple representation of the data, which ignores the order of the words within a named entity, leads to improved results over a more complex approach that preserves word order.",
      	booktitle = "Advances in Computational Intelligence and Learning: Methods and Applications",
      	editor = {Hans-J\{"}{u}rgen Zimmermann and Georgios Tselentis and Maarsten van Someren and Georgios Dounias},
      	isbn = "978-0-7923-7645-3",
      	keywords = "named entity recognition, tree induction, neural networks",
      	month = "January",
      	note = "http://www.springer.com/mathematics/book/978-0-7923-7645-3",
      	pages = "193--210",
      	publisher = "Springer Berlin / Heidelberg",
      	series = "International Series in Intelligent Technologies",
      	title = "{S}ymbolic and {N}eural {L}earning of {N}amed-{E}ntity {R}ecognition and {C}lassification {S}ystems in {T}wo {L}anguages",
      	url = "http://www.ellogon.org/petasis/bibliography/COIL2000/COILBook2001.pdf",
      	volume = 18,
      	year = 2002
      }
      

    Year: 2001

    1. Georgios Petasis, Vangelis Karkaletsis, Dimitra Farmakiotou, Ion Androutsopoulos and Constantine D Spyropoulos.
      A Greek Morphological Lexicon and its Exploitation by a Greek Controlled Language Checker.
      In Proceedings of the 8th Panhellenic Conference on Informatics (PCI'01). 2001, 80–89.
      Abstract This paper presents a large-scale Greek morphological lexicon, developed by the Software & Knowledge Engineering Laboratory (SKEL) of NCSR "Demokritos". The paper describes the lexicon architecture and the procedure to develop and update it. The morphological lexicon was used to develop a lemmatiser and a morphological analyser that were included in a controlled language checker for Greek. The paper discusses the current coverage of the lexicon, as well as remaining issues and how we plan to address them. Our goal is to produce a wide-coverage morphological lexicon of Greek that can be easily exploited in several natural language processing applications.
      URL BibTeX

      @inproceedings{Petasis:2001:GML:1756269.1756295,
      	author = "Petasis, Georgios and Karkaletsis, Vangelis and Dimitra Farmakiotou and Ion Androutsopoulos and Constantine D. Spyropoulos",
      	abstract = {This paper presents a large-scale Greek morphological lexicon, developed by the Software {\&} Knowledge Engineering Laboratory (SKEL) of NCSR {"}Demokritos{"}. The paper describes the lexicon architecture and the procedure to develop and update it. The morphological lexicon was used to develop a lemmatiser and a morphological analyser that were included in a controlled language checker for Greek. The paper discusses the current coverage of the lexicon, as well as remaining issues and how we plan to address them. Our goal is to produce a wide-coverage morphological lexicon of Greek that can be easily exploited in several natural language processing applications.},
      	booktitle = "Proceedings of the 8th Panhellenic Conference on Informatics (PCI'01)",
      	month = "November 8--10",
      	pages = "80--89",
      	series = "PCI'01",
      	title = "{A} {G}reek {M}orphological {L}exicon and its {E}xploitation by a {G}reek {C}ontrolled {L}anguage {C}hecker",
      	url = "http://www.ellogon.org/petasis/bibliography/PCI2001/EPY-Morph-CameraReady.pdf",
      	year = 2001
      }
      
    2. Georgios Petasis, Frantz Vichot, Francis Wolinski, Georgios Paliouras, Vangelis Karkaletsis and Constantine D Spyropoulos.
      Using Machine Learning to Maintain Rule-based Named - Entity Recognition and Classification Systems.
      In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. 2001, 426–433.
      Abstract This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, constructed with the use of machine learning, to monitor the performance of the rule-based system. The training data for the second system is generated with the use of the rule-based system, thus avoiding the need for manual tagging. The disagreement of the two systems acts as a signal for updating the rule-based system. The generality of the approach is illustrated by applying it to large corpora in two different languages: Greek and French. The results are very encouraging, showing that this alternative use of machine learning can assist significantly in the maintenance of rule-based systems.
      URL, DOI BibTeX

      @inproceedings{Petasis:2001:UML:1073012.1073067,
      	author = "Petasis, Georgios and Frantz Vichot and Francis Wolinski and Georgios Paliouras and Karkaletsis, Vangelis and Constantine D. Spyropoulos",
      	abstract = "This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, constructed with the use of machine learning, to monitor the performance of the rule-based system. The training data for the second system is generated with the use of the rule-based system, thus avoiding the need for manual tagging. The disagreement of the two systems acts as a signal for updating the rule-based system. The generality of the approach is illustrated by applying it to large corpora in two different languages: Greek and French. The results are very encouraging, showing that this alternative use of machine learning can assist significantly in the maintenance of rule-based systems.",
      	address = "Toulouse, France",
      	booktitle = "Proceedings of the 39th Annual Meeting on Association for Computational Linguistics",
      	doi = "http://dx.doi.org/10.3115/1073012.1073067",
      	month = "July 9--11",
      	pages = "426--433",
      	publisher = "Association for Computational Linguistics",
      	series = "ACL '01",
      	title = "{U}sing {M}achine {L}earning to {M}aintain {R}ule-based {N}amed - {E}ntity {R}ecognition and {C}lassification {S}ystems",
      	url = "http://www.ellogon.org/petasis/bibliography/ACL2001/ACL-2001-CameraReady.pdf",
      	year = 2001
      }
      
    3. Vangelis Karkaletsis, Georgios Samaritakis, Georgios Petasis, Dimitra Farmakiotou, Ion Androutsopoulos and Constantine D Spyropoulos.
      A Controlled Language Checker Based on the Ellogon Text Engineering Platform.
      In Proceedings from Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001). 2001, 74–75.
      URL BibTeX

      @inproceedings{NAACL2001-Karkaletsis,
      	author = "Vangelis Karkaletsis and Georgios Samaritakis and Petasis, Georgios and Dimitra Farmakiotou and Ion Androutsopoulos and Constantine D. Spyropoulos",
      	address = "Pittsburgh, PA, USA",
      	booktitle = "Proceedings from Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001)",
      	month = "June 2--7",
      	organization = "Carnegie Mellon University",
      	pages = "74--75",
      	title = "{A} {C}ontrolled {L}anguage {C}hecker {B}ased on the {E}llogon {T}ext {E}ngineering {P}latform",
      	year = 2001,
      	url = "http://www.ellogon.org/petasis/bibliography/NAACL2001/NAACL01-demo-ABSTRACT.pdf",
      	keywords = "controlled languages, Modern Greek"
      }
      

    Year: 2000

    1. Georgios Paliouras, Vangelis Karkaletsis, Georgios Petasis and Constantine D Spyropoulos.
      Learning Decision Trees for Named-Entity Recognition and Classification.
      In Proceedings of the 14th European Conference on Artificial Intelligence (ECAI 2000). 2000.
      Abstract We propose the use of decision tree induction as a solution to the problem of customising a named-entity recognition and classification (NERC) system to a specific domain. A NERC system assigns semantic tags to phrases that correspond to named entities, e.g. persons, locations and organisations. Typically, such a system makes use of two language resources: a recognition grammar and a lexicon of known names, classified by the corresponding named-entity types. NERC systems have been shown to achieve good results when the domain of application is very specific. However, the construction of the grammar and the lexicon for a new domain is a hard and time-consuming process. We propose the use of decision trees as NERC "grammars" and the construction of these trees using machine learning. In order to validate our approach, we tested C4.5 on the identification of person and organisation names involved in management succession events, using data from the sixth Message Understanding Conference. The results of the evaluation are very encouraging showing that the induced tree can outperform a grammar that was constructed manually.
      URL BibTeX

      @inproceedings{ECAI2000-Petasis,
      	author = "Georgios Paliouras and Karkaletsis, Vangelis and Petasis, Georgios and Constantine D. Spyropoulos",
      	abstract = {We propose the use of decision tree induction as a solution to the problem of customising a named-entity recognition and classification (NERC) system to a specific domain. A NERC system assigns semantic tags to phrases that correspond to named entities, e.g. persons, locations and organisations. Typically, such a system makes use of two language resources: a recognition grammar and a lexicon of known names, classified by the corresponding named-entity types. NERC systems have been shown to achieve good results when the domain of application is very specific. However, the construction of the grammar and the lexicon for a new domain is a hard and time-consuming process. We propose the use of decision trees as NERC {"}grammars{"} and the construction of these trees using machine learning. In order to validate our approach, we tested C4.5 on the identification of person and organisation names involved in management succession events, using data from the sixth Message Understanding Conference. The results of the evaluation are very encouraging showing that the induced tree can outperform a grammar that was constructed manually.},
      	booktitle = "Proceedings of the 14th European Conference on Artificial Intelligence (ECAI 2000)",
      	month = "August 20--25",
      	series = "ECAI 2000",
      	title = "{L}earning {D}ecision {T}rees for {N}amed-{E}ntity {R}ecognition and {C}lassification",
      	url = "http://www.ellogon.org/petasis/bibliography/ECAI2000/ECAI-2000.pdf",
      	year = 2000
      }
      
    2. Georgios Petasis.
      Machine Learning and Named-Entity Recognition.
      In Proceedings of the 8th ELSNET European Summer School on Language and Speech Communication on the subject of Text and Speech Triggered Information Access (TeSTIA 2000). 2000.
      BibTeX

      @inproceedings{TESTIA2000-Petasis,
      	author = "Petasis, Georgios",
      	address = "Chios, Greece",
      	booktitle = "Proceedings of the 8th ELSNET European Summer School on Language and Speech Communication on the subject of Text and Speech Triggered Information Access (TeSTIA 2000)",
      	month = "July 15--30",
      	title = "{M}achine {L}earning and {N}amed-{E}ntity {R}ecognition",
      	year = 2000
      }
      
    3. Georgios Petasis, Alessandro Cucchiarelli, Paola Velardi, Georgios Paliouras, Vangelis Karkaletsis and Constantine D Spyropoulos.
      Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods.
      In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 2000, 128–135.
      Abstract The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the avail-ability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (of-ten these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of a PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.
      URL, DOI BibTeX

      @inproceedings{Petasis:2000:AAP:345508.345563,
      	author = "Petasis, Georgios and Alessandro Cucchiarelli and Paola Velardi and Georgios Paliouras and Karkaletsis, Vangelis and Constantine D. Spyropoulos",
      	abstract = "The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the avail-ability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (of-ten these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of a PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50{\%} of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90{\%} of the remaining 50{\%}.",
      	address = "New York, NY, USA",
      	booktitle = "Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)",
      	doi = "http://doi.acm.org/10.1145/345508.345563",
      	isbn = "1-58113-226-3",
      	keywords = "information extraction, machine learning and IR, natural language processing for IR, text data mining",
      	month = "July 24--28",
      	pages = "128--135",
      	publisher = "ACM",
      	series = "SIGIR '00",
      	title = "{A}utomatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods",
      	url = "http://www.ellogon.org/petasis/bibliography/SIGIR2000/SIGIR-CameraReady.pdf",
      	year = 2000
      }
      
    4. Georgios Petasis, Sergios Petridis, Georgios Paliouras, Vangelis Karkaletsis, Stavros J Perantonis and Constantine D Spyropoulos.
      Symbolic and Neural Learning for Named-Entity Recognition.
      In Proceedings of European Best Practice Workshops and Symposium on Computational Intelligence and Learning (COIL 2000). 2000, 58–66.
      Abstract Named-entity recognition involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, we present in this paper two approaches to learning named-entity recognition rules from text. The first approach is a decision-tree induction method and the second a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate feature set for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on a large corpus of English text and present the results.
      URL BibTeX

      @inproceedings{Petasis00c.:symbolic,
      	author = "Petasis, Georgios and Sergios Petridis and Georgios Paliouras and Karkaletsis, Vangelis and Stavros J. Perantonis and Constantine D. Spyropoulos",
      	abstract = "Named-entity recognition involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, we present in this paper two approaches to learning named-entity recognition rules from text. The first approach is a decision-tree induction method and the second a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate feature set for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on a large corpus of English text and present the results.",
      	address = "Chios, Greece",
      	booktitle = "Proceedings of European Best Practice Workshops and Symposium on Computational Intelligence and Learning (COIL 2000)",
      	keywords = "name entity recognition, tree induction, neural networks",
      	month = "June 19--23",
      	pages = "58--66",
      	title = "{S}ymbolic and {N}eural {L}earning for {N}amed-{E}ntity {R}ecognition",
      	url = "http://www.ellogon.org/petasis/bibliography/COIL2000/COIL-2000.pdf",
      	year = 2000
      }
      
    5. Georgios Petasis, Georgios Paliouras, Vangelis Karkaletsis, Constantine D Spyropoulos and Ion Androutsopoulos.
      Using Machine Learning Techniques for Part-Of-Speech Tagging in the Greek Language.
      In Dimitrios I Fotiadis and Stavros D Nikolopoulos (eds.). ADVANCES IN INFORMATICS: Proceedings of the 7th Hellenic Conference on Informatics (HCI '99). World Scientific, May 2000, pages 273–281.
      http://www.worldscibooks.com/compsci/4320.html.
      Abstract This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on "management succession events" and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.
      URL BibTeX

      @incollection{HCI1999-Petasis,
      	author = "Petasis, Georgios and Georgios Paliouras and Karkaletsis, Vangelis and Constantine D. Spyropoulos and Ion Androutsopoulos",
      	abstract = {This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on {"}management succession events{"} and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95{\%}.},
      	booktitle = "ADVANCES IN INFORMATICS: Proceedings of the 7th Hellenic Conference on Informatics (HCI '99)",
      	editor = "Dimitrios I. Fotiadis and Stavros D. Nikolopoulos",
      	isbn = "978-981-02-4192-6",
      	month = "May",
      	note = "http://www.worldscibooks.com/compsci/4320.html",
      	pages = "273--281",
      	publisher = "World Scientific",
      	title = "{U}sing {M}achine {L}earning {T}echniques for {P}art-{O}f-{S}peech {T}agging in the {G}reek {L}anguage",
      	url = "http://www.ellogon.org/petasis/bibliography/HCI1999/EPY99.pdf",
      	year = 2000
      }
      

    Year: 1999

    1. Vangelis Karkaletsis, Georgios Paliouras, Georgios Petasis, Natasa Manousopoulou and Constantine D Spyropoulos.
      Named-Entity Recognition from Greek and English Texts.
      Journal of Intelligent and Robotic Systems 26(2):123–135, October 1999.
      Abstract Named-entity recognition (NER) involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. In this paper, we present a prototype NER system for Greek texts that we developed based on a NER system for English. Both systems are evaluated on corpora of the same domain and of similar size. The time-consuming process for the construction and update of domain-specific resources in both systems led us to examine a machine learning method for the automatic construction of such resources for a particular application in a specific language.
      URL, DOI BibTeX

      @article{Karkaletsis:1999:NRG:595358.595565,
      	author = "Karkaletsis, Vangelis and Georgios Paliouras and Petasis, Georgios and Natasa Manousopoulou and Constantine D. Spyropoulos",
      	abstract = "Named-entity recognition (NER) involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. In this paper, we present a prototype NER system for Greek texts that we developed based on a NER system for English. Both systems are evaluated on corpora of the same domain and of similar size. The time-consuming process for the construction and update of domain-specific resources in both systems led us to examine a machine learning method for the automatic construction of such resources for a particular application in a specific language.",
      	address = "Hingham, MA, USA",
      	doi = "10.1023/A:1008124406923",
      	issn = "0921-0296",
      	journal = "Journal of Intelligent and Robotic Systems",
      	keywords = "information extraction, machine learning, named-entity recognition",
      	month = "October",
      	number = 2,
      	pages = "123--135",
      	publisher = "Kluwer Academic Publishers",
      	title = "{N}amed-{E}ntity {R}ecognition from {G}reek and {E}nglish {T}exts",
      	url = "http://www.ellogon.org/petasis/bibliography/JIRS1999/JIRS-1999.pdf",
      	volume = 26,
      	year = 1999
      }
      
    2. Georgios Petasis.
      Exploiting Learning in Bilingual Named Entity Recognition.
      In Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99). 1999.
      URL BibTeX

      @inproceedings{ACAI1999-Petasis2,
      	author = "Petasis, Georgios",
      	address = "Chania, Greece",
      	booktitle = "Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99)",
      	month = "July 5--16",
      	title = "{E}xploiting {L}earning in {B}ilingual {N}amed {E}ntity {R}ecognition",
      	year = 1999,
      	url = "http://www.ellogon.org/petasis/bibliography/ACAI1999/ss1_07.pdf"
      }
      
    3. Georgios Petasis, Georgios Paliouras, Vangelis Karkaletsis and Constantine D Spyropoulos.
      Resolving Part-of-Speech Ambiguity in the Greek Language Using Learning Techniques.
      In Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99). 1999.
      Abstract This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on "management succession events" and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.
      URL BibTeX

      @inproceedings{Petasis99resolvingpart-of-speech,
      	author = "Petasis, Georgios and Georgios Paliouras and Karkaletsis, Vangelis and Constantine D. Spyropoulos",
      	abstract = {This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on {"}management succession events{"} and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95{\%}.},
      	address = "Chania, Greece",
      	booktitle = "Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99)",
      	month = "July 5--16",
      	title = "{R}esolving {P}art-of-{S}peech {A}mbiguity in the {G}reek {L}anguage {U}sing {L}earning {T}echniques",
      	url = "http://www.ellogon.org/petasis/bibliography/ACAI1999/9906019.pdf",
      	year = 1999
      }
      
    4. Vangelis Karkaletsis, Constantine D Spyropoulos and Georgios Petasis.
      Named Entity Recognition from Greek texts: the GIE Project.
      In Spyros G Tzafestas (ed.). Advances in Intelligent Systems: Concepts, Tools and Applications. Intelligent Systems, Control and Automation: Science and Engineering series, volume 21, Springer Berlin / Heidelberg, 1999, pages 131–142.
      Presented at the 3rd European Robotics Intelligent Systems & Control Conference (EURISCON '98), June 22–25 1998, Athens, Greece..
      URL BibTeX

      @incollection{EURISCON1998-Karkaletsis,
      	author = "Vangelis Karkaletsis and Constantine D. Spyropoulos and Petasis, Georgios",
      	booktitle = "Advances in Intelligent Systems: Concepts, Tools and Applications",
      	chapter = 12,
      	editor = "Spyros G. Tzafestas",
      	isbn = "978-1-4020-0393-6",
      	keywords = "named-entity recognition, information extraction, machine learning",
      	note = "Presented at the 3rd European Robotics Intelligent Systems {\&} Control Conference (EURISCON '98), June 22--25 1998, Athens, Greece.",
      	pages = "131--142",
      	publisher = "Springer Berlin / Heidelberg",
      	series = "Intelligent Systems, Control and Automation: Science and Engineering",
      	title = "{N}amed {E}ntity {R}ecognition from {G}reek texts: the {GIE} {P}roject",
      	url = "http://www.springer.com/computer/image+processing/book/978-1-4020-0393-6",
      	volume = 21,
      	year = 1999
      }
      

Download all publications in a single bibtex file.

Subcategories