Computational Linguistics
Coordinated by: Institute of Formal and Applied Linguistics
Study branch coordinator: Doc. RNDr. Markéta Lopatková, Ph.D.
Specializations:
- – Computational and formal linguistics
- – Statistical methods and machine learning in computational linguistics
The aim of the study branch Computational Linguistics is to get the students ready for research in the area of natural language processing and development of applications dealing with both written and spoken language. Examples of such applications are systems of information retrieval, machine translation, grammar checking, text summarization and information extraction, automatic speech recognition, voice control, spoken dialogue systems, and speech synthesis. The emphasis is put on deep understanding of formal foundations and their practical applicability. The study branch Computational Linguistics can be studied in two specializations: (i) computational and formal linguistics, and (ii) statistical methods and machine learning in computational linguistics.
The graduate is familiar with the theoretical foundations of the formal description of natural languages, the mathematical and algorithmic foundations of automatic natural language processing, and state-of-the-art machine learning techniques. Graduates have the ability to apply the knowledge acquired during their studies in the design and development of systems automatically processing natural language and large quantities of both structured and unstructured data, such as information retrieval, question answering, summarization and information extraction, machine translation and speech processing. They are equipped with reasonable knowledge, skills, and experience in software development and teamwork applicable in all areas involving the development of applications aiding human-computer interaction and/or machine learning.
6.1 Obligatory courses
Code | Subject | Credits | Winter | Summer | |
NTIN090 | Introduction to Complexity and Computability | 5 | 2/1 C+Ex | — | |
NTIN066 | Data Structures I | 5 | 2/1 C+Ex | — | |
NPFL063 | Introduction to General Linguistics | 5 | 2/1 C+Ex | — | |
NPFL067 | Statistical Methods in Natural Language Processing I | 6 | 2/2 C+Ex | — | |
NPFL092 | NLP Technology | 5 | 1/2 MC | — | |
NSZZ023 | Diploma Thesis I | 6 | 0/4 C | 0/4 C | |
NSZZ024 | Diploma Thesis II | 9 | 0/6 C | 0/6 C | |
NSZZ025 | Diploma Thesis III | 15 | 0/10 C | 0/10 C |
6.2 Elective courses
The student needs to obtain at least 42 credits for the courses from the following set:
Code | Subject | Credits | Winter | Summer | |
NPFL006 | Introduction to Formal Linguistics | 3 | 2/0 Ex | — | |
NPFL038 | Fundamentals of Speech Recognition and Generation | 6 | 2/2 C+Ex | — | |
NPFL068 | Statistical Methods in Natural Language Processing II | 6 | — | 2/2 C+Ex | |
NPFL070 | Language Data Resources | 5 | — | 1/2 MC | |
NPFL075 | Prague Dependency Treebank | 6 | — | 2/2 C+Ex | |
NPFL079 | Algorithms in Speech Recognition | 6 | — | 2/2 C+Ex | |
NPFL082 | Information Structure of Sentences and Discourse Structure | 3 | — | 0/2 C | |
NPFL083 | Linguistic Theory and Grammar Formalisms | 6 | — | 2/2 C+Ex | |
NPFL087 | Statistical Machine Translation | 6 | — | 2/2 C+Ex | |
NPFL093 | NLP Applications | 5 | — | 2/1 MC | |
NPFL094 | Morphological and Syntactic Analysis I | 3 | 2/0 MC | — | |
NPFL095 | Modern Methods in Computational Linguistics | 3 | 0/2 C | — | |
NPFL096 | Computational Morphology | 4 | — | 2/1 Ex | |
NPFL099 | Statistical Dialogue Systems | 5 | — | 2/1 C+Ex | |
NPFL103 | Information Retrieval | 6 | 2/2 C+Ex | — | |
NPFL104 | Machine Learning Methods | 5 | — | 1/2 C+Ex | |
NPRG027 | Credit for Project | 6 | 0/4 C | 0/4 C | |
NPRG023 | Software Project | 9 | 0/6 C | 0/6 C | |
NPFL114 | Deep Learning | 7 | — | 3/2 C+Ex |
6.3 State Final Exam
In addition to the two examination areas that are obligatory for all study branches, there is one obligatory area for this study branch, one obligatory area dependent on the specialization, and one elective examination area. As the last examination area, the student may also select the obligatory area of the other specialization of the study branch Computational Linguistics, or any area from the specialization Intelligent agents or the specialization Machine learning of the study branch Artificial Intelligence, or any area from the specialization Computer graphics of the study branch Computer Graphics and Game Development. In total, each student will get five questions from the five examination areas.
Examination areas
- 1. Fundamentals of natural language processing (obligatory for both specializations)
- 2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
- 3. Statistical methods and machine learning in computational linguistics (obligatory for the specialization Statistical methods and machine learning in computational linguistics)
- 4. Multimodal technologies and data (elective)
- 5. Applications in natural language processing (elective)
Knowledge requirements
1. Fundamentals of natural language processing
Fundamentals of general linguistics. System of layers in language description. Dependency syntax, formal definition of dependency trees and their characteristics. The Chomsky hierarchy of languages, context free languages, phrase grammars, unification-based grammars and categorial grammars for a natural language. Design and evaluation of linguistic experiments, evaluation metrics. Basic stochastic methods. Language modeling, basic methods for training stochastic models. Basic algorithms.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL067 | Statistical Methods in Natural Language Processing I | 6 | 2/2 C+Ex | — | |
NPFL063 | Introduction to General Linguistics | 5 | 2/1 C+Ex | — |
2. Linguistic theories and formalisms
Functional Generative Description. Prague Dependency Treebank. Other basic grammar formalisms (Government and Binding, unification-based grammars, feature structures, HPSG, LFG, categorial grammars, (L)TAG). Phonetics, phonology. Computational Morphology. Syntax. Computational lexicography. Topic-focus articulation; information structure, discourse. Coreference. Linguistic typology. Formal grammars and their application in rule-based morphology and parsing.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL063 | Introduction to General Linguistics | 5 | 2/1 C+Ex | — | |
NPFL083 | Linguistic Theory and Grammar Formalisms | 6 | — | 2/2 C+Ex | |
NPFL075 | Prague Dependency Treebank | 6 | — | 2/2 C+Ex | |
NPFL094 | Morphological and Syntactic Analysis I | 3 | 2/0 MC | — | |
NPFL006 | Introduction to Formal Linguistics | 3 | 2/0 Ex | — |
3. Statistical methods and machine learning in computational linguistics
Generative and discriminative models. Supervised machine learning for classification and regression (linear models, other methods: Naive Bayes, decision trees, example-based learning). Support Vector Machines and Kernel functions. Logistic regression. Unsupervised machine learning methods. Bayesian Networks. Bias-variance tradeoff. Language models and noisy channel models. Smoothing, model combination. HMM, trellis, Viterbi, Baum-Welch. Algorithms for statistical tagging. Algorithms for phrase-based and dependency-based statistical parsing.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL067 | Statistical Methods in Natural Language Processing I | 6 | 2/2 C+Ex | — | |
NPFL068 | Statistical Methods in Natural Language Processing II | 6 | — | 2/2 C+Ex | |
NPFL104 | Machine Learning Methods | 5 | — | 1/2 C+Ex | |
NPFL087 | Statistical Machine Translation | 6 | — | 2/2 C+Ex |
4. Multimodal technologies and data
Fundamentals of speech production and perception. Methods of speech signal processing. HMM acoustic modeling of phonemes. The implementation of the Baum-Welch and Viterbi algorithms in speech recognition systems. Continuous speech recognition using large dictionaries. Adaptation techniques. Speech summarization. Topic and key-word spotting in speech corpora. Speaker recognition. Methods of speech synthesis. Text processing for speech synthesis. Prosody modeling. Basic components of a dialog system. Spoken language understanding. Dialog control – MDP and POMDP systems. Reinforcement learning. Dialogue state tracking in MDP and POMDP systems. User simulation. Speech generation. Dialog systems quality evaluation. Search and indexing in audio-visual archives.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL038 | Fundamentals of Speech Recognition and Generation | 6 | 2/2 C+Ex | — | |
NPFL079 | Algorithms in Speech Recognition | 6 | — | 2/2 C+Ex | |
NPFL099 | Statistical Dialogue Systems | 5 | — | 2/1 C+Ex |
5. Applications in natural language processing
Spell-checking and grammar-checking. Input methods. Machine translation. Machine-aided translation. Statistical methods in machine translation. Quality evaluation of machine translation. Information retrieval, models for information retrieval. Query expansion and relevance feedback. Document clustering. Web search. Duplicate detection and plagiarism detection. Information retrieval evaluation. Sentiment analysis, social network analysis. Search systems (Lucene, SOLR, Terrier). NLP toolkits (GATE, NLTK, NLPTools).
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL087 | Statistical Machine Translation | 6 | — | 2/2 C+Ex | |
NPFL103 | Information Retrieval | 6 | 2/2 C+Ex | — | |
NPFL093 | NLP Applications | 5 | — | 2/1 MC |