Language Technologies and Computational Linguistics
Coordinated by: Institute of Formal and Applied Linguistics
Study branch coordinator: Doc. RNDr. Markéta Lopatková, Ph.D.
Specializations:
- – Computational and formal linguistics
- – Statistical and machine learning methods in Natural Language Processing
The graduate is familiar with mathematical and algorithmic foundations of automatic natural language processing, with theoretical foundations of formal description of natural languages, as well as with state-of-the-art machine learning techniques. The student acquires the skills in designing and development of systems to automatically process large quantities of language data, written and spoken, structured and unstructured alike, and to solve language-related tasks, such as information retrieval, question answering, summarization and information extraction, machine translation, and speech processing.
The graduate is well prepared for doctoral studies in computational linguistics and language technologies, as well as for a professional career in the public or private sector. Given the general applicability of machine learning and data driven methods, the graduate is well equipped to use these methods not only in natural language processing tasks but also in other domains where large quantities of both structured and unstructured data are being analyzed (finances, economy, biology, medicine, and other domains). The student acquires programming experience and soft skills required for team work on applications that involve machine learning or human-computer interaction.
5.1 Obligatory Courses
Code | Subject | Credits | Winter | Summer | |
NTIN066 | Data Structures 1 | 6 | — | 2/2 C+Ex | |
NTIN090 | Introduction to Complexity and Computability | 4 | 2/1 C+Ex | — | |
NPFL063 | Introduction to General Linguistics | 4 | 2/1 C+Ex | — | |
NPFL067 | Statistical Methods in Natural Language Processing I | 5 | 2/2 C+Ex | — | |
NPFL114 | Deep Learning | 7 | — | 3/2 C+Ex | |
NSZZ023 | Diploma Thesis I | 6 | — | 0/4 C | |
NSZZ024 | Diploma Thesis II | 9 | 0/6 C | — | |
NSZZ025 | Diploma Thesis III | 15 | — | 0/10 C |
5.2 Elective Courses - Set 1
The student needs to obtain at least 40 credits in total for the elective courses. Of these 40 required credits, at most 6 credits can be obtained from project courses (set 2 below) and at most 10 credits from the additional set of elective courses (set 3 below).
Code | Subject | Credits | Winter | Summer | |
NPFL006 | Introduction to Formal Linguistics | 3 | 2/0 Ex | — | |
NPFL038 | Fundamentals of Speech Recognition and Generation | 5 | 2/2 C+Ex | — | |
NPFL068 | Statistical Methods in Natural Language Processing II | 5 | — | 2/2 C+Ex | |
NPFL070 | Language Data Resources | 4 | 1/2 MC | — | |
NPFL075 | Dependency Grammars and Treebanks | 5 | — | 2/2 C+Ex | |
NPFL079 | Algorithms in Speech Recognition | 5 | — | 2/2 C+Ex | |
NPFL082 | Information Structure of Sentences and Discourse Structure | 2 | — | 0/2 C | |
NPFL083 | Linguistic Theories and Grammar Formalisms | 5 | — | 2/2 C+Ex | |
NPFL087 | Statistical Machine Translation | 5 | — | 2/2 C+Ex | |
NPFL093 | NLP Applications | 4 | — | 2/1 MC | |
NPFL094 | Morphological and Syntactic Analysis | 3 | 2/0 MC | — | |
NPFL095 | Modern Methods in Computational Linguistics | 3 | 0/2 C | — | |
NPFL097 | Unsupervised Machine Learning in NLP | 3 | 1/1 C | — | |
NPFL099 | Statistical Dialogue Systems | 4 | 2/1 C+Ex | — | |
NPFL100 | Variability of Languages in Time and Space | 2 | 1/1 C | — | |
NPFL103 | Information Retrieval | 5 | 2/2 C+Ex | — | |
NPFL104 | Machine Learning Methods | 4 | — | 1/2 C+Ex | |
NPFL122 | Deep Reinforcement Learning | 5 | 2/2 C+Ex | — | |
NPFL128 | Language Technologies in Practice | 4 | — | 2/1 MC |
5.3 Elective Courses - Set 2 - Team Project Courses
The student can select at most one of the project courses as an elective course; at most 6 credits count as credits for elective courses. (Other potential credits for courses from this set count as credits for free courses.)
Code | Subject | Credits | Winter | Summer | |
NPRG069 | Software Project | 12 | 0/8 C | 0/8 C | |
NPRG070 | Research Project | 9 | 0/6 C | 0/6 C | |
NPRG071 | Company Project | 6 | 0/4 C | 0/4 C |
5.4 Elective Courses - Set 3
The student can select any course from the following set of additional courses; at most 10 credits count as credits for elective courses. (Other potential credits for courses from this set count as credits for free courses.)
Code | Subject | Credits | Winter | Summer | |
NAIL025 | Evolutionary Algorithms 1 | 5 | 2/2 C+Ex | — | |
NAIL069 | Artificial Intelligence 1 | 4 | 2/1 C+Ex | — | |
NAIL070 | Artificial Intelligence 2 | 3 | — | 2/0 Ex | |
NAIL104 | Probabilistic graphical models | 3 | 2/0 Ex | — | |
NPGR036 | Computer Vision | 5 | — | 2/2 C+Ex |
5.5 State Final Exam
The state final exam for the program Language Technologies and Computational Linguistics consists of one obligatory examination area for both specializations (examination area 1), one obligatory area dependent on the selected specialization (examination area 2 or examination area 3), and one elective examination area (examination areas 4 and 5). As the last examination area, the student may also select the obligatory area of the other specialization of this study program. In total, each student gets questions from three examination areas.
Examination areas
- 1. Fundamentals of natural language processing (obligatory for both specializations)
- 2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
- 3. Statistical methods and machine learning in computational linguistics (obligatory for the specialization Statistical and machine learning methods in Natural Language Processing)
- 4. Speech, dialogue and multimodal systems (elective)
- 5. Applications in natural language processing (elective)
- 2. Linguistic theories and formalisms (obligatory for the specialization Computational and formal linguistics)
Knowledge requirements
1. Fundamentals of natural language processing
Phonetics, phonology, morphology, syntax, semantics, pragmatics. Ambiguity, arbitrariness. Description and prescription. Diachronic and synchronic language description. Fundamentals of information theory. Markov models. Language modeling and smoothing. Word classes. Annotated corpora. Design and evaluation of linguistic experiments, evaluation metrics. Morphological disambiguation and syntactic analysis. Basic classification and regression algorithms.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL063 | Introduction to General Linguistics | 4 | 2/1 C+Ex | — | |
NPFL067 | Statistical Methods in Natural Language Processing I | 5 | 2/2 C+Ex | — |
2. Linguistic theories and formalisms
Functional Generative Description. Prague Dependency Treebank. Universal Dependencies. Other grammar formalisms (overview and basic characteristics). Phonetics, phonology. Computational Morphology. Surface and deep syntactic structure; valency. Computational lexicography. Topic-focus articulation; information structure, discourse. Coreference. Linguistic typology. Formal grammars and their application in rule-based morphology. Parsing.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL063 | Introduction to General Linguistics | 4 | 2/1 C+Ex | — | |
NPFL006 | Introduction to Formal Linguistics | 3 | 2/0 Ex | — | |
NPFL075 | Dependency Grammars and Treebanks | 5 | — | 2/2 C+Ex | |
NPFL083 | Linguistic Theories and Grammar Formalisms | 5 | — | 2/2 C+Ex | |
NPFL094 | Morphological and Syntactic Analysis | 3 | 2/0 MC | — |
3. Statistical methods and machine learning in computational linguistics
Generative and discriminative models. Supervised machine learning methods for classification and regression (linear models, other methods: naive Bayes, decision trees, instance-based learning, SVM and kernels, logistic regression). Unsupervised machine learning methods. Language models, noisy channel model. Model smoothing, model combination. HMM, trellis, Viterbi, Baum-Welch. Algorithms for statistical tagging. Algorithms for constituency and dependency statistical parsing. Neural networks in machine learning. Convolution and recurrent networks. Word embeddings.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL067 | Statistical Methods in Natural Language Processing I | 5 | 2/2 C+Ex | — | |
NPFL114 | Deep Learning | 7 | — | 3/2 C+Ex | |
NPFL068 | Statistical Methods in Natural Language Processing II | 5 | — | 2/2 C+Ex |
4. Speech, dialogue and multimodal systems
Fundamentals of speech production and perception. Methods of speech signal processing. HMM acoustic modeling of phonemes. The implementation of the Baum-Welch and Viterbi algorithms in speech recognition systems. Neural models for speech. Methods of speech synthesis. Speech applications. Basic components of a dialogue system. Natural language understanding in dialogue systems. Dialogue state tracking. Methods for dialogue management. User simulation. End-to-end neural dialogue systems. Open-domain dialogue system architectures. Natural language generation. Dialogue systems evaluation. Visual dialogue and multimodal systems.
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL038 | Fundamentals of Speech Recognition and Generation | 5 | 2/2 C+Ex | — | |
NPFL079 | Algorithms in Speech Recognition | 5 | — | 2/2 C+Ex | |
NPFL099 | Statistical Dialogue Systems | 4 | 2/1 C+Ex | — |
5. Applications in natural language processing
Spell-checking and grammar-checking. Machine translation. Machine-aided translation. Statistical methods in machine translation. Quality evaluation of machine translation. Speech translation. Information retrieval, models for information retrieval. Query expansion and relevance feedback. Document clustering. Duplicate detection and plagiarism detection. Information retrieval evaluation. Sentiment analysis. Toolkits (GATE, NLTK, NLPTools, Lucene, Terrier).
Recommended courses
Code | Subject | Credits | Winter | Summer | |
NPFL087 | Statistical Machine Translation | 5 | — | 2/2 C+Ex | |
NPFL093 | NLP Applications | 4 | — | 2/1 MC | |
NPFL103 | Information Retrieval | 5 | 2/2 C+Ex | — | |
NPFL128 | Language Technologies in Practice | 4 | — | 2/1 MC |