Huggingface deep dive: Sequence Classification with BERT

Introduction LLMs (Large Language Models) have revolutionized NLP (Natural Language Processing) and are still transforming the field and its applications as of 2025. These models excel at common NLP tasks such as summarization, question answering, and text generation. A common trend in state-of-the-art LLMs is that they base their architecture on the Transformer鈥檚 architecture聽1 , and decoder-only models have gained favorability compared to encoder-only or encoder-decoder models聽2 . In this article, I will discuss how to use the BERT (Bidirectional Encoder Representations from Transformers) model 3 for a sequence classification task with the Huggingface鈥檚 transformers library. Remember that BERT is technically just a language model due to its relatively small sizes (~ 100 to ~350 million, depending on the version) compared to large language models with billions of parameters, and it is an encoder-only model. Nevertheless, as I will argue next, understanding and knowing how to use this model is essential. ...

March 8, 2025 路 13 min 路 2702 words 路 Jorge Roldan

N-Gram Language Models

This post is based chapter 3 from Speech and Language Processing by Dan Jurafsky and James H. Martin N-Grams models N-gram models are the simplest type of language models. The N-gram term has two meanings. One meaning refers to a sequence of n words, so a 2-gram, and 3-gram are sequences of 2, and 3 words, respectively. The second meaning refers to a probabilistic model that estimates the probability of a word given the n-1 previous words. ...

February 27, 2025 路 3 min 路 450 words 路 Jorge Roldan