Towards Augmenting Lexical Resources for Slang and African American English (2024)

Alyssa Hwang,William R. Frey,Kathleen McKeown

Abstract

Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.

Anthology ID:
2020.vardial-1.15
Volume:
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Marcos Zampieri,Preslav Nakov,Nikola Ljubešić,Jörg Tiedemann,Yves Scherrer
Venue:
VarDial
SIG:
Publisher:
International Committee on Computational Linguistics (ICCL)
Note:
Pages:
160–172
Language:
URL:
https://aclanthology.org/2020.vardial-1.15
DOI:
Bibkey:
Cite (ACL):
Alyssa Hwang, William R. Frey, and Kathleen McKeown. 2020. Towards Augmenting Lexical Resources for Slang and African American English. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 160–172, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Cite (Informal):
Towards Augmenting Lexical Resources for Slang and African American English (Hwang et al., VarDial 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.vardial-1.15.pdf
Data
ConceptNet

PDFCiteSearch

Export citation
  • BibTeX
  • MODS XML
  • Endnote
  • Preformatted
@inproceedings{hwang-etal-2020-towards, title = "Towards Augmenting Lexical Resources for Slang and {A}frican {A}merican {E}nglish", author = "Hwang, Alyssa and Frey, William R. and McKeown, Kathleen", editor = {Zampieri, Marcos and Nakov, Preslav and Ljube{\v{s}}i{\'c}, Nikola and Tiedemann, J{\"o}rg and Scherrer, Yves}, booktitle = "Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects", month = dec, year = "2020", address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics (ICCL)", url = "https://aclanthology.org/2020.vardial-1.15", pages = "160--172", abstract = "Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.",}

Download as File

<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="hwang-etal-2020-towards"> <titleInfo> <title>Towards Augmenting Lexical Resources for Slang and African American English</title> </titleInfo> <name type="personal"> <namePart type="given">Alyssa</namePart> <namePart type="family">Hwang</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">William</namePart> <namePart type="given">R</namePart> <namePart type="family">Frey</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Kathleen</namePart> <namePart type="family">McKeown</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2020-12</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects</title> </titleInfo> <name type="personal"> <namePart type="given">Marcos</namePart> <namePart type="family">Zampieri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Preslav</namePart> <namePart type="family">Nakov</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nikola</namePart> <namePart type="family">Ljubešić</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jörg</namePart> <namePart type="family">Tiedemann</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yves</namePart> <namePart type="family">Scherrer</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>International Committee on Computational Linguistics (ICCL)</publisher> <place> <placeTerm type="text">Barcelona, Spain (Online)</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.</abstract> <identifier type="citekey">hwang-etal-2020-towards</identifier> <location> <url>https://aclanthology.org/2020.vardial-1.15</url> </location> <part> <date>2020-12</date> <extent unit="page"> <start>160</start> <end>172</end> </extent> </part></mods></modsCollection>

Download as File

%0 Conference Proceedings%T Towards Augmenting Lexical Resources for Slang and African American English%A Hwang, Alyssa%A Frey, William R.%A McKeown, Kathleen%Y Zampieri, Marcos%Y Nakov, Preslav%Y Ljubešić, Nikola%Y Tiedemann, Jörg%Y Scherrer, Yves%S Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects%D 2020%8 December%I International Committee on Computational Linguistics (ICCL)%C Barcelona, Spain (Online)%F hwang-etal-2020-towards%X Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE). In this work, we use word embeddings and clustering algorithms to group semantically similar words in three datasets, two of which contain high incidence of slang and AAE. Since high-quality clusters would contain related words, we could also infer the meaning of an unfamiliar word based on the meanings of words clustered with it. After clustering, we compute precision and recall scores using WordNet and ConceptNet as gold standards and show that these scores are unimportant when the given resources do not fully represent slang and AAE. Amazon Mechanical Turk and expert evaluations show that clusters with low precision can still be considered high quality, and we propose the new Cluster Split Score as a metric for machine-generated clusters. These contributions emphasize the gap in natural language processing research for variations of English and motivate further work to close it.%U https://aclanthology.org/2020.vardial-1.15%P 160-172

Download as File

Markdown (Informal)

[Towards Augmenting Lexical Resources for Slang and African American English](https://aclanthology.org/2020.vardial-1.15) (Hwang et al., VarDial 2020)

  • Towards Augmenting Lexical Resources for Slang and African American English (Hwang et al., VarDial 2020)
ACL
  • Alyssa Hwang, William R. Frey, and Kathleen McKeown. 2020. Towards Augmenting Lexical Resources for Slang and African American English. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 160–172, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
Towards Augmenting Lexical Resources for Slang and African American English (2024)
Top Articles
Latest Posts
Article information

Author: Carlyn Walter

Last Updated:

Views: 6261

Rating: 5 / 5 (50 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Carlyn Walter

Birthday: 1996-01-03

Address: Suite 452 40815 Denyse Extensions, Sengermouth, OR 42374

Phone: +8501809515404

Job: Manufacturing Technician

Hobby: Table tennis, Archery, Vacation, Metal detecting, Yo-yoing, Crocheting, Creative writing

Introduction: My name is Carlyn Walter, I am a lively, glamorous, healthy, clean, powerful, calm, combative person who loves writing and wants to share my knowledge and understanding with you.