A Week at Lancaster University Summer School in Corpus Linguistics

A Week at Lancaster University Summer School in Corpus Linguistics
Burçin Şenyurt

From 15 to 19 June, I had the opportunity to attend the Lancaster Summer School in Corpus Linguistics for Language Teaching, Testing and Assessment at Lancaster University.

Set in the historic city of Lancaster, the program brought together participants from different countries and professional backgrounds with a shared interest in language, research, and education. Beyond its welcoming atmosphere, historic architecture, and beautiful campus, the university is internationally recognised for its excellence in linguistics, ranking 2nd in the UK (Complete University Guide 2027) and 2nd in the world (QS World University Rankings by Subject 2026), which made the experience even more meaningful for me.

Corpus linguistics is an approach to studying language by analysing large collections of authentic written and spoken texts, known as corpora, with the help of computer technology. Instead of relying on intuition or isolated examples, researchers can examine millions of words to identify how language is actually used in different contexts. This makes it possible to investigate questions such as which words commonly occur together, how frequently particular language features appear, and how language varies across different situations. These insights have important applications in language learning, teaching, testing, and assessment, allowing decisions to be based on evidence from authentic language use rather than assumptions.

The summer school combined lectures with practical sessions, introducing corpus linguistic methods and demonstrating how they can be applied to a range of research questions. Throughout the week, we explored topics including corpus statistics, concordancing, language testing, data visualisation, AI in language assessment, pragmatics, and corpus-based approaches to language teaching. We also worked in computer labs to explore features of various corpus tools and learned how to build a corpus.

One of the sessions that particularly stood out was delivered by Dr Dana Gablasova, who introduced Data-driven Learning (DDL) and demonstrated how corpora can support language teaching. Rather than presenting language as isolated rules, DDL encourages learners to discover patterns by analysing authentic language data. By examining multiple examples taken from corpora, learners can observe how words, grammatical structures, and expressions are naturally used in context. This approach is especially valuable when teaching formulaic language, collocations, and other language patterns that are often difficult to explain through traditional methods.

Another important aspect of the session was the emphasis on authentic language. Corpus data exposes learners to the kinds of language they are likely to encounter outside the classroom, helping reduce the gap between textbook examples and real-world communication. We also explored practical ways of designing corpus-based learning activities, both in paper-based and computer-based formats, enabling learners to investigate language independently through guided discovery.

Artificial intelligence was another important theme of the program. In his sessions on language testing, Professor Luke Harding mentioned how AI is beginning to influence both language learning and language assessment. One discussion focused on conversations generated by chatbots and how they differ from authentic spoken interaction. Although AI systems can produce fluent language, they often rely on expressions that sound less natural to native speakers and tend to use a style that differs from everyday conversation. This raises important questions for language learners who increasingly use AI as a learning tool and highlights the importance of maintaining authentic language models in teaching and assessment.

Professor Harding also demonstrated how corpus linguistics contributes to language testing research. By analysing large collections of spoken test responses, researchers can investigate patterns in learners' language use and better understand how speaking ability is demonstrated across different proficiency levels. We also discussed interactional competence and the challenges of assessing speaking through semi-direct tests, where candidates respond to a computer rather than interacting with another person. The session showed how corpus analysis provides empirical evidence that can support the design and evaluation of language assessments.

Across the week, every session combined theoretical knowledge with practical application. The lecturers demonstrated not only the breadth of corpus linguistics but also its relevance to contemporary issues in language education and assessment. Dana Gablasova, Vaclav Brezina, Luke Harding, Tineke Brunfaut, Raffaella Bottini and John Pill each contributed different perspectives, illustrating how corpus methods can inform research, improve teaching practices, and strengthen language assessment through evidence-based approaches.

The program offered an excellent balance between academic content and practical application. It was also incredible to exchange ideas with participants from different countries and disciplines, gaining insights into how corpus linguistics is being applied in diverse educational contexts.