- SUBTLEX-NL: frequencies based on Dutch subtitles
- SUBTLEX-US: frequencies based on American English subtitles:
- SUBTLEX-CH: frequencies based on Chinese subtitles:
- SUBTLEX-ESP: frequencies based on Spanish subtitles
- SUBTLEX-DE: frequencies based on German subtitles
- SUBTLEX-GR: frequencies based on Greek subtitles (Dimitropoulou et al., 2010)
- SUBTLEX-UK: frequencies based on British English subtitles
- SUBTLEX-PL: frequencies based on Polish subtitles
- SUBTLEX-PT: frequencies based on Portuguese subtitles
- SUBTLEX-PT-BR: frequencies based on Brazilian Portuguese subtitles
- Facebook-EN: word frequencies based on public Facebook posts between November 2014 and January 2015.
- Twitter-EN: word frequencies based on the Rovereto Twitter Corpus.
Word association norms
Concept and category norms
- Leuven concept data: norms for over 400 concrete nouns including typicality, similarity within particular domains, category naming data, exemplar generation data, frequency, AoA, etc.
- McRae Lab feature norms: feature norm Excel files from McRae, Cree, Seidenberg, & McNorgan (2005). Data for 541 English nouns.
- Centre for Speech, Language and the Brain (CSLB) Concept Property Norms: Semantic properties and associated production frequency data for 638 concrete concepts, with data for each concept collected from 30 participants.
- SNAUT: Interface and access to semantic vectors for Dutch and English based on word2vec
- SNAUT-Italien: Semantic spaces for Italian.
- Latent Semantic Analysis: Interface to obtain semantic similarity for words and documents
- GloVE vectors: Pretrained word vectors in English. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.
- ESPAL: phonology, part-of-speech, subtitle frequencies, etc. in Castillian and Latin American Spanish
- Erin Buchanan’s word norms: Concept features, LSA and BEAGLE similarity estimates
Humor: Humor norms.