| Peer-Reviewed

Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter

Received: 23 September 2021     Accepted: 20 October 2021     Published: 10 November 2021
Views:       Downloads:
Abstract

Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.

Published in International Journal of Data Science and Analysis (Volume 7, Issue 6)
DOI 10.11648/j.ijdsa.20210706.11
Page(s) 132-138
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Data Science, Machine Learning, EDA, Stop Asian Hate, COVID-19

References
[1] Muzhir Al. (18 Feb. 2021) “Rewview Study on Sciencedirect Library Based on Coronavirus COVID-19”, UHD Journal of Science and Tehcnology. 4 (2): 46.
[2] Tessler, H., Choi, M., & Kao, G. (2020, June 10). The Anxiety of Being Asian American: Hate Crimes and Negative Biases During the COVID-19 Pandemic. American Journal of Criminal Justice.
[3] Gover, A., Harper, S., & Langton, L. (2020). Anti-Asian hate crime during the COVID-19 pandemic: Exploring the reproduction of inequality. American Journal of Criminal Justice, 45 (4), 647-667.
[4] John. Jame, Satt. David. Otton. Kylie (2021). “COVID-19—Related Assault on Asians: Economic Hardship in the United States and India Predicts Diminished Support for Victim Compensation and Assailant Punishment.” Int. J. Environ. Res. Public Health 2021, 18 (10), 5320.
[5] Lan. B, Wenger. Mar (20 August 2021). “Are Asian Victims Less Likely to Report Hate Crime Victimization to the Police? Implications for Research and Policy in the Wake of the COVID-19 Pandemic.” Crime & Delinquency.
[6] Carney, Nikita. (2016). All Lives Matter, but so Does Race: Black Lives Matter and the Evolving Role of Social Media. Humanity & Society. 40. 10.1177/0160597616643868.
[7] H. Aggie, Ku. Karen, S. Eleanor, and Var. Edward. (2021) “Asian Americans’ Indifference to Black Lives Matter: The Role of Nativity, Belonging and Acknowledgment of Anti-Black Racism.” Soc. Sci. 2021, 10, 168. https://doi.org/10.3390/socsci10050168.
[8] Yam, Kimmy.(28 April 2021) “New Report Finds 169 Percent Surge in Anti-Asian Hate Crimes during the First Quarter.” NBCNews.com, NBCUniversal News Group. www.nbcnews.com/news/asian-america/new-report-finds-169-percent-surge-anti-asian-hate-crimes-n1265756.
[9] Times, Global. “Online Discrimination.” Global Times, www.globaltimes.cn/page/202104/1220979.shtml.
[10] Johnson, Joseph. “U.S. Teens Hate Speech Social Media by Type 2018 l Statistic.” Statista, 25 Jan. 2021, www.statista.com/statistics/945392/teenagers-who-encounter-hate-speech-online-social-media-usa/.
[11] R. Paff, and X. Kong (2015) “Python in Data Science Research and Education”, Proc of the 14th python in science conf.
[12] Yu, Chong Ho. (2010). Exploratory data analysis in the context of data mining and resampling. International Journal of Psychological Research. 3. 10.21500/20112084.819.
[13] Jurafsky. D, and J. Martin. (2020). “Logistic Regression.” Speech and Language Processing.
[14] Biau, G, and Scornet, E. (2016). A random forest guided tour. TEST 25, 197–227.
[15] Evgeniou, Theodoros & Pontil, Massimiliano. (2001). Support Vector Machines: Theory and Applications. 2049. 249-257. 10.1007/3-540-44673-7_12.
[16] Brownlee, Jason. (23 Feb. 2020) “Develop k-Nearest Neighbors in Python From Scratch.” Machine Learning Mastery.
Cite This Article
  • APA Style

    Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. (2021). Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. International Journal of Data Science and Analysis, 7(6), 132-138. https://doi.org/10.11648/j.ijdsa.20210706.11

    Copy | Download

    ACS Style

    Jung-hun Baeck; Teresa Hyoju Chang; Jaden Chunho Chyu; Bryan Chunwoo Chyu; Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int. J. Data Sci. Anal. 2021, 7(6), 132-138. doi: 10.11648/j.ijdsa.20210706.11

    Copy | Download

    AMA Style

    Jung-hun Baeck, Teresa Hyoju Chang, Jaden Chunho Chyu, Bryan Chunwoo Chyu, Chaehyun Lim. Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter. Int J Data Sci Anal. 2021;7(6):132-138. doi: 10.11648/j.ijdsa.20210706.11

    Copy | Download

  • @article{10.11648/j.ijdsa.20210706.11,
      author = {Jung-hun Baeck and Teresa Hyoju Chang and Jaden Chunho Chyu and Bryan Chunwoo Chyu and Chaehyun Lim},
      title = {Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter},
      journal = {International Journal of Data Science and Analysis},
      volume = {7},
      number = {6},
      pages = {132-138},
      doi = {10.11648/j.ijdsa.20210706.11},
      url = {https://doi.org/10.11648/j.ijdsa.20210706.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210706.11},
      abstract = {Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Racial Filtering Classification Model Through Data Analysis of Racial Contents in Twitter
    AU  - Jung-hun Baeck
    AU  - Teresa Hyoju Chang
    AU  - Jaden Chunho Chyu
    AU  - Bryan Chunwoo Chyu
    AU  - Chaehyun Lim
    Y1  - 2021/11/10
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ijdsa.20210706.11
    DO  - 10.11648/j.ijdsa.20210706.11
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 132
    EP  - 138
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20210706.11
    AB  - Stop Asian Hate or Stop Asian American Pacific Islanders (AAPI) Hate refers to the national movement against racially-motivated attacks on Asians. This protest was initiated in line with the Black Lives Matter (BLM) movement, to dismantle the ongoing hate and targeted crimes against Asians, and to educate people of such threats. Hate crimes targeting Asians have been occurring steadily across the U.S, but with the effect of COVID-19, these crimes started increasing in number. For the Stop Asian Hate movement, the matter was exacerbated with people accusing certain Asian countries as the source for COVID-19. In 2021, Asian Americans reported a single biggest increase in serious incidents of online hate and harassment with racist and xenophobic slurs blaming people of Asian descent for the spread of COVID-19. To specifically assess the impacts and measures of each movement, research was conducted to examine the racial slurs used towards Asians on social media, specifically Twitter. For analysis of the data on social media, Python programming was used to collect and analyze the ratio of racial slurs and Anti-Asian hate. In doing so, the data set was modeled through data labeling, which classified each social media tweet into one of three sub-categories. Data were classified into two types: type 1 that contains racial contents or information against Asians and type 0 that has non-racial contents. The data collection was done through Twint, a Python scraping tool for Twitter, gathering a total of over 2,000 recent tweets for keywords relevant to the movement. Then, a preprocessing step was taken through Python, involving the process of decapitalizing, lemmatizing, and tokenizing. These data were then represented by graphs and word clouds, displaying some of the most commonly used terms targeting Asians on social media. Lastly, the data went through a design of a binary classification model for filtering tweets with racial content. We compared the accuracy of classification models with three different algorithms: logistic regression, random forest, and SVM. The model created would be able to safeguard users from exposures to racist terms vastly pervaded on the internet.
    VL  - 7
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • St. Mark’s School, Southborough, United States

  • Seoul International School, Seoul, South Korea

  • Phillips Academy Andover, Andover, United States

  • Phillips Academy Andover, Andover, United States

  • McLean High School, McLean, United States

  • Sections