NorwegianNewsTopics (2023): Topic Modeling of Norwegian News

DOI

NorwegianNewsTopics is a tabular dataset containing metadata and topic distribution of online news articles published in 2023 by 22 Norwegian news outlets. Each row represents one article and includes publication metadata and topic model outputs derived from a 28-topic Latent Dirichlet Allocation (LDA) model. The dataset was constructed to analyse topic diversity in Norwegian journalism and to enable comparisons across editorial types and distribution platforms.

RStudio, 2026.01.1+403

The original dataset used to conduct the LDA Topic modeling included full-text articles (leads + body texts). Full texts are removed from the deposited dataset due to copy right regulations and terms in data sharing agreements with sources

Data from sources Amedia, Schibsted and Polaris was accessed after signing data sharing agreements with the three companies. Data was retrieved through API access (Amedia) or transferal of file batches (Scibsted and Polaris). For TV2 Nyheter, Klassekampen, Morgenbladet, Vårt land and Nationen, data was access through scraping of their online news websites after permission was given by the media companies. The scraping was set up using Python, leveraging libraries such as Selenium for browser automation and BeautifulSoup for HTML parsing

Identifier
DOI https://doi.org/10.18710/473JEF
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/473JEF
Provenance
Creator Steen Steensen ORCID logo
Publisher DataverseNO
Contributor Steen Steensen; OsloMet – Oslo Metropolitan University; Høyskolen Kristiania; Guneshwar Singh Manhas
Publication Year 2026
Funding Reference Medietilsynet
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Steen Steensen (OsloMet – Oslo Metropolitan University)
Representation
Resource Type Tabular metadata of Norwegian news articles, including topic distribution after LDA topic modeling; Dataset
Format text/plain; text/comma-separated-values; application/pdf
Size 2217; 164546890; 1512; 3859; 102375
Version 1.0
Discipline Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences
Spatial Coverage Norway