NorwegianNewsTopics is a tabular dataset containing metadata and topic distribution of online news articles published in 2023 by 22 Norwegian news outlets. Each row represents one article and includes publication metadata and topic model outputs derived from a 28-topic Latent Dirichlet Allocation (LDA) model. The dataset was constructed to analyse topic diversity in Norwegian journalism and to enable comparisons across editorial types and distribution platforms.
RStudio, 2026.01.1+403
The original dataset used to conduct the LDA Topic modeling included full-text articles (leads + body texts). Full texts are removed from the deposited dataset due to copy right regulations and terms in data sharing agreements with sources
Data from sources Amedia, Schibsted and Polaris was accessed after signing data sharing agreements with the three companies. Data was retrieved through API access (Amedia) or transferal of file batches (Scibsted and Polaris). For TV2 Nyheter, Klassekampen, Morgenbladet, Vårt land and Nationen, data was access through scraping of their online news websites after permission was given by the media companies. The scraping was set up using Python, leveraging libraries such as Selenium for browser automation and BeautifulSoup for HTML parsing