Artificial intelligence and the future of our sociolinguistic work

IF 1.5 1区 文学 Q2 LINGUISTICS Journal of Sociolinguistics Pub Date : 2024-11-10 DOI:10.1111/josl.12678
Helen Kelly-Holmes
{"title":"Artificial intelligence and the future of our sociolinguistic work","authors":"Helen Kelly-Holmes","doi":"10.1111/josl.12678","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) is likely to have a substantial impact on the field of sociolinguistics. AI has the potential to change the landscape of sociolinguistic research in a number of ways. For example, AI tools can assist sociolinguists in analyzing large volumes of social media data to study language variation, linguistic trends, and changes in language use over time. Sentiment analysis and topic modeling algorithms can reveal insights into societal attitudes and language dynamics, helping us to study language ideologies.</p><p>AI-powered speech recognition technologies can aid in the automatic identification and analysis of dialects and accents. AI can streamline the process of conducting large-scale surveys and collecting sociolinguistic data. Chatbots or automated interview tools can be employed to gather responses from diverse populations, facilitating more comprehensive studies of language variation.</p><p>AI can assist in the analysis of linguistic markers related to identity and representation in texts and speech. This includes studying how language is used to construct and express social identities, such as gender, ethnicity, or socioeconomic status. AI algorithms can help sociolinguists analyze social networks and communities based on linguistic interactions. This can provide insights into how language is used within specific social groups and how linguistic patterns contribute to the formation of social networks.</p><p>AI tools can support ethnographic research by automating certain aspects of data analysis. For example, natural language processing algorithms can assist in categorizing and extracting themes from qualitative data, making the analysis process more efficient. AI can contribute to the analysis of language policies and their impact on society. This includes assessing the effects of language planning initiatives on linguistic diversity, language maintenance, and language shift within communities.</p><p>Sociolinguists can use AI to conduct digital ethnography by examining online communities, forums, and virtual spaces. This allows researchers to explore how language is used in digital environments, contributing to a deeper understanding of online sociolinguistics. Collaboration between sociolinguists and computational linguists can lead to the development of AI tools specifically tailored for sociolinguistic research, combining linguistic expertise with computational methods.</p><p>Sociolinguists will need to be mindful of biases in AI models and algorithms. Ensuring fairness and addressing biases is crucial, especially when studying sociolinguistic phenomena that are sensitive to issues such as race, gender, or socioeconomic status. While AI offers exciting possibilities for advancing sociolinguistic research, ethical considerations and the importance of human interpretation and context cannot be understated. Sociolinguists will continue to play a critical role in guiding and interpreting AI-driven analyses to ensure that the research remains ethically sound and culturally sensitive.</p><p>These six introductory paragraphs, including the reflective and critical section, as you may or may not have guessed at this stage were produced not by my own research on the topic, but by ChatGPT in response to my question “How will Artificial Intelligence change the way that sociolinguists carry out research?” The writing is substantially more correct than my own rambling, and I have deliberately not indented it or demarcated it from my “own” writing which starts in this paragraph. Not surprisingly, AI predicts that AI will bring both advantages and challenges to how we do our work. The advantages are generally in terms of the work of data gathering and analysis, the ability to collect and handle vastly greater quantities of data, and to correlate those data more widely with the accompanying downside of the inherent biases in algorithmic models (e.g., “automatic” dialect and accent identification) and the ethical challenges of human–machine interactions and partnering in sociolinguistic research and analysis. Reassuringly, it would seem that AI also predicts that the intervention—the guidance—of the human sociolinguist will still be needed in this bright new future.</p><p>Less than a decade ago, but a lifetime in terms of AI and its development, the late Jan Blommaert described a move away from the known world for sociolinguistics, in the form of a “… seismic shift from a scholarly universe almost entirely dominated by theoretical and methodological preferences for offline, spoken discourse in fixed and clearly definable time-space, socio-cultural and interpersonal contexts and identities” arguing that this shift necessitates the rethinking of “much of what we assumed to be natural, primordial and commonsense about language—in-society needs to be revised, rethought, developed.” (<span>2017</span>, p. 7)</p><p>More than just a shift—even a seismic one—from offline to online, however, there is an argument that we are now entering—or even to a certain extent are already a part of—the human–machine era (Sayers et al., <span>2021</span>), in which the futuristic and fantastical notions of science fiction are coming true due to advances in artificial intelligence, machine learning, wearable technology, and real-time data analysis (for a realistic and critical assessment of these fantasies for the social life of language, see Seargeant, <span>2023</span>). Maly (<span>2022</span>), for example, has argued that it is essential that we change our approach to how we study digital discourse. The concern in this dialogue goes beyond those sociolinguists who are primarily concerned with digital discourse and seeks to examine how all sociolinguists will be impacted by these changes given the era we are living through and that which is still to come.</p><p>Using the prompts from the first respondent, my ChatGPT interactant above, as well as some issues I have come across while carrying out my own recent work (Kelly-Holmes, 2019, 2022, 2023, and <span>2024</span>), the aim of this dialogue is to open a discussion with peers about if and how AI will change how we work, although the “if” seems almost redundant. Some respondents will have been working for many years with advancing digital technologies and are recognized as experts in the topic. However, in addition to expert views, I am also interested in how ordinary sociolinguists, that is, not specifically focused on digital or AI communication, and those working in closely related disciplines view this change, and how and to what extent it will impact on their work and is already impacting, depending on their context.</p><p>The impetus for the dialogue comes from my own experiences of working with digital sociolinguistic data and living in an increasingly technologized reality over the last number of years. And to start off the dialogue and invite responses, I would like to share three things in particular that are increasingly hard to ignore in terms of how I approach my work and think about the data I encounter: first of all, the increasing invisibilization of technology; second, the role of our own algorithmic identity as individuals and as researchers; and, third, where and how we will find multilingual and sociolinguistic data in the future.</p><p>While the discipline of sociolinguistics has already undergone a transition from its exclusive concern with spoken language to incorporating written language, mediated language, and more recently digitally mediated language, these have, to date, been treated as “different” domains. Qualifiers such as “mediated” and “digital,” for example, highlight the use of some kind of technology that marks these types of language as different to “normal” language. For sociolinguists, this is still, in terms of the origin of the discipline, spoken, physical person-to-person communication. The current and emerging technologies, however, threaten to upend these distinctions and that norm, by merging the physical and the mediated or technologized in ways that force us to question our disciplinary tenets. One example is how the normal focus of sociolinguistics, face-to-face spoken communication, now seems to require a qualifier such as “co-present” language (Androutsopoulos &amp; Lexander, <span>2021</span>).</p><p>Technological mediation of language, while not an original concern of sociolinguistics, is of course not new, and language as we know needs some kind of physical substance to be communicated, whether that is a physiological “technology” internal to the body that makes language audible or visible, or an external technology such as writing or printing (Kristiansen, <span>2014</span>). Sometimes the technology is more obviously and directly connected to the sociolinguistic action, for example, with a pen and paper on a desk, a printing press, a PC, a laptop, or a phone. But sometimes it is an indirect or less obvious technological connection such as boats bringing traders and colonizers and their language or planes bringing tourists and migrants and their languages. In these cases, the transportation of the language is an indirect byproduct, not the purpose or function of the technology. And, while previously what might be termed indirect technologies had to physically bring speakers in order to bring language, and while this does of course still happen, it is no longer necessary since the development of the more directly mediating technologies which have enabled the separation of language from the body of the speaker and its mediation beyond their physical space. We are now facing and to an increasing extent experiencing the human–machine era which brings about the merging of speaker, language, technology, and mediation.</p><p>Consequently, the mediation becomes increasingly invisible, attached—albeit it in detachable form—to the physical body (the co-present mobile phone, wearable devices, etc.), and possibly ultimately implanted in future and emerging technologies. A simple example of the disappearance of mediation is the real-time translation available on a number of latest generation smartphones. It could be argued that this drive for invisible mediation among tech innovators highlights perhaps a reverence for and idealizing of pure, unmediated production of language by unencumbered human bodies, and the desire to replicate as much as possible “immediate language” (Kristiansen, <span>2014</span>) or the “normal” language that has traditionally been the concern of sociolinguists as highlighted above by Blommaert. On the other hand, we can see how the less visible a technology is, the more questions there are in terms of treating this language as authentic and “immediate.”</p><p>So, in the context of invisible mediation and technologies, we face a number of fundamental challenges in terms of defining what is (“real,” “authentic”) language. What is social language? And what/where is the unit of society that we are interested in? How do we deal now with technologically (co-)produced language, with a conversation or interaction, that is both human and technological? What is a ChatGPT transcript in terms of sociolinguistics? Where do we locate the socio? Who is the speaker/author of the response to my question above? Is it my ChatGPt interactant, is it me, or is it both of us? The web is a social machine (Hendler &amp; Berners-Lee, <span>2010</span>). It learns by being fed by humans, thriving on “social” interaction with humans and this is how it becomes intelligent. In one sense, AI is a massive human language corpus—it is the product of masses of people interacting with each other and with technology over many years, from many different but still limited contexts. So, it is uniquely human, fed by human language. But that human corpus is in turn manipulated and reassembled by technology and by algorithms and large language models, which tend to be stripped of sociolinguistic content.</p><p>How then can we contextualize these human–machine interactions and the language that they produce? Where and how should we position these linguistic data in relation to our familiar—albeit of course increasingly contested and deconstructed—anchors of place, gender, age, education, class, ethnicity, race, ableness …? The language we encounter from our technological conversation partner is the product of multiple inputs from so many humans in so many places with so many potential essentialized and nonessentialized identity categorizations—while also excluding and erasing large parts of the world and speakers due to ongoing digital access and content inequalities. Another layer is added when these inputs are then filtered and tweaked to our own digital footprint, our algorithmic identity, and this is the next issue I would like to raise.</p><p>It is probably impossible to find a sociolinguist who works and lives completely offline. The reality is that our personal and professional sociolinguistic selves have now merged. While it can be argued that we have always impacted our surroundings, and in particular the contexts that we study, simply by our presence and methods, this type of impact and its associated biases are fundamentally different to the kind of merging that we now have to acknowledge.</p><p>One reason for this is the phenomenon of personalization, which involves using technology and either real-time or archived data about individuals to “tailor electronic commerce interactions between a business and each individual customer” (Personalization Consortium, 2005, in Vesanen, <span>2007</span>, p. 410). Sometimes this personalization results from explicit choices that we make (e.g., choosing locations, languages, saving preferences, and other primary categories) and sometimes it is the result of implicit choices (e.g., by accepting “cookies,” our search histories, online activities). The online context is an economic one and everything that happens online is for the purpose of generating revenue. In many ways, this demonstrates and makes explicit how economic processes—at micro-, meso-, and macro-levels—have always impacted language and the sociolinguistic contexts we live in and study, in ways that go far beyond the observer's paradox (Labov, <span>1972</span>). Now, however, it becomes harder to ignore this impact, even though, ironically, these processes are actually less visible than ever. The reasons for this are that we often do not have the physical clues for market activity that we recognize; private and marketized spaces merge in our online interfaces, and because activities that seem untouched by the market now take place in a market-supported context, for example, private interactions on WhatsApp.</p><p>Since the advent of consumer marketing in the second half of the twentieth century, markets were divided into segments along crude sociodemographic and lingua-cultural criteria, using many of the same demographics also in use in sociolinguistics. With advancing technology, the segmentation can happen not just for groups but also at individual levels, with the exchange or product or service being “altered to fit that customer's stated needs as well as needs perceived by the business based on the available customer information” (Personalization Consortium, 2005, in Vesanen, <span>2007</span>, p. 410). Much of this information is of course sociolinguistic information, including essentialized categories and language preferences as well as consumer preferences. “Mass individualization” describes this change from targeting mass markets to a “market of one”(Gilmore &amp; Pine, <span>2008</span>). In a sense, this is a type of superdiverse (Androutsopoulos &amp; Juffermans, <span>2014</span>) or hyperlingual (Kelly-Holmes, <span>2019</span>; Pauwels, <span>2010</span>) market or context.</p><p>This market-driven personalization also means that we leave an algorithmic footprint that impacts all of our activities online, both personal and professional. So, it can be difficult to break out of personalized filters (Pariser, <span>2011</span>), which assign a particular lingua-cultural profile and expected behaviors to us and this has consequences for our professional digital selves since our “algorithmic identity” (Blommaert, <span>2017</span>) will inevitably shape what we see and experience when we try to study digital spaces. More than this, we need to account for algorithmic agency in our studies: “algorithms are (co-) producers of discourses and at the same time, they are also discourse” (Maly, <span>2022</span>, p. 3). Such a realization needs “to inform how we approach our object of study” (Maly, <span>2022</span>). As a way to try to account for both my own algorithmic footprint and algorithmic agency in my object of study, when examining a language ideological thread on Twitter/X, four concepts emerged to help explain how the thread in question came to my attention/prominence. These were: influence (in terms of my own networks, numbers of followers, rate and range of activity, and those of the thread participants); expertise (in terms of how I am described and positioned online); validation (in terms of my ability to give credibility to different users and their comments); and alignment (how this piece of language ideological data aligns with my online profile and activity) (Kelly-Holmes, <span>2024</span>). I needed to do this in order to fully understand the data I was dealing with and to account for my algorithmic footprint in the shaping of the data.</p><p>A further dilemma—more profoundly for humanity—but also more parochially for sociolinguists as we try to decolonize our methods and theories is the reality that this type of personalization will take place within existing digital and content divides and may serve to exacerbate them further. Put crudely, some speakers’ communication needs will be “worth” (in economic terms) meeting and personalizing while a majority's will most likely not.</p><p>Personalization has also seen a move from mass public to more privatized online spaces for individuals or small groups (see Androutsopoulos &amp; Lexander, <span>2021</span>). With this move, a related and final challenge that I have increasingly experienced in my work in the last few years is how and where to find the texts and interactions to study in this new technologized reality.</p><p>Securitization (Buzan et al., <span>1998</span>) has become a key marketing feature for differentiating public from private spaces, and an inevitable outcome of this is that there will be fewer texts and interactions available to researchers. “Securitized” spaces allow users to construct their own speech communities that are not publicly visible or accessible. In sociolinguistic work on multilingualism and digital communication to date the “scope is often limited to public forums and social media platforms” (Androutsopoulos &amp; Lexander, <span>2021</span>, p. 724). This is no surprise since it is much harder to access and study “private or secure groups ”and in particular dyadic chat. One recent study showed that a minority language (in this example, Frisian) is used more frequently in private social media platforms (in this case WhatsApp) than in public ones (Jongbloed–Faber, <span>2021</span>). One possible outcome could be the erasure of “authentic” minority languages from public digital spaces making it more difficult to gather data on real usage because it would be necessary to rely on public areas and/or negotiate access to these private spaces. The spread of personalization together with securitization means that finding and collaborating with increasingly private, secure, invisible, and individualized speech communities will become even more challenging.</p><p>In such a context, extant approaches to studying online multilingualism, for example, virtual linguistic landscape analysis (Ivkovic &amp; Lotherington, <span>2009</span>) and virtual linguistic ethnography (Auleear Owodally &amp; Peeroo, <span>2021</span>; Berezkina, <span>2018</span>) as well as the building of multilingual social media corpora (Coats, <span>2019</span>) become very challenging. With lingua-cultural (essentially sociolinguistic) personalization filtering out multilingual options and content based on our algorithmic footprint, together with securitization and a premium being placed on privacy, such spaces are no longer fully accessible and/or are not the places where we may find the kind of authentic practices we are looking for.</p><p>Hashtag analysis, for instance, has been used to document content in particular languages on public forums like Twitter/X (e.g. McMonagle et al., <span>2019</span>). But this relies on a user adding a hashtag to identify the language of the tweet, which in turn relies on a kind of hyperreflexivity and/or activist stance. The need for this kind of stance means that we do not find casual, nonreflexive content in the language and mixed practices that may not meet the language classification criteria.</p><p>It is useful to reflect that while the web in its earliest form was culturally, practically, and even ideologically monolingual, as it grew and developed, it became more and more multilingual. With the enabling of user-generated text, parallel, and partial multilingualism gave way to a hyperlingual moment on public social and digital media spaces (Kelly-Holmes, <span>2019</span>). As someone who studies public encounters with multilingualism in technologically mediated spaces, my experience is that multilingualism is more difficult to encounter now due to personalization and securitization/privatization. The interactional space has narrowed and has become more controlled.</p><p>Notwithstanding all of the imbalances in distribution, access, and the current and future challenges, it is also important to recognize that technology can give agency to people who may struggle due to all sorts of structural, physical, and geographic inequalities to be heard when they speak. So, technology can allow more speaking to happen, and in this way it has the potential to enlarge the range of speakers, and expand the understanding of who is/can be a speaker and where a language can be ‘spoken’.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":51486,"journal":{"name":"Journal of Sociolinguistics","volume":"28 5","pages":"3-10"},"PeriodicalIF":1.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/josl.12678","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sociolinguistics","FirstCategoryId":"98","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/josl.12678","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Artificial intelligence (AI) is likely to have a substantial impact on the field of sociolinguistics. AI has the potential to change the landscape of sociolinguistic research in a number of ways. For example, AI tools can assist sociolinguists in analyzing large volumes of social media data to study language variation, linguistic trends, and changes in language use over time. Sentiment analysis and topic modeling algorithms can reveal insights into societal attitudes and language dynamics, helping us to study language ideologies.

AI-powered speech recognition technologies can aid in the automatic identification and analysis of dialects and accents. AI can streamline the process of conducting large-scale surveys and collecting sociolinguistic data. Chatbots or automated interview tools can be employed to gather responses from diverse populations, facilitating more comprehensive studies of language variation.

AI can assist in the analysis of linguistic markers related to identity and representation in texts and speech. This includes studying how language is used to construct and express social identities, such as gender, ethnicity, or socioeconomic status. AI algorithms can help sociolinguists analyze social networks and communities based on linguistic interactions. This can provide insights into how language is used within specific social groups and how linguistic patterns contribute to the formation of social networks.

AI tools can support ethnographic research by automating certain aspects of data analysis. For example, natural language processing algorithms can assist in categorizing and extracting themes from qualitative data, making the analysis process more efficient. AI can contribute to the analysis of language policies and their impact on society. This includes assessing the effects of language planning initiatives on linguistic diversity, language maintenance, and language shift within communities.

Sociolinguists can use AI to conduct digital ethnography by examining online communities, forums, and virtual spaces. This allows researchers to explore how language is used in digital environments, contributing to a deeper understanding of online sociolinguistics. Collaboration between sociolinguists and computational linguists can lead to the development of AI tools specifically tailored for sociolinguistic research, combining linguistic expertise with computational methods.

Sociolinguists will need to be mindful of biases in AI models and algorithms. Ensuring fairness and addressing biases is crucial, especially when studying sociolinguistic phenomena that are sensitive to issues such as race, gender, or socioeconomic status. While AI offers exciting possibilities for advancing sociolinguistic research, ethical considerations and the importance of human interpretation and context cannot be understated. Sociolinguists will continue to play a critical role in guiding and interpreting AI-driven analyses to ensure that the research remains ethically sound and culturally sensitive.

These six introductory paragraphs, including the reflective and critical section, as you may or may not have guessed at this stage were produced not by my own research on the topic, but by ChatGPT in response to my question “How will Artificial Intelligence change the way that sociolinguists carry out research?” The writing is substantially more correct than my own rambling, and I have deliberately not indented it or demarcated it from my “own” writing which starts in this paragraph. Not surprisingly, AI predicts that AI will bring both advantages and challenges to how we do our work. The advantages are generally in terms of the work of data gathering and analysis, the ability to collect and handle vastly greater quantities of data, and to correlate those data more widely with the accompanying downside of the inherent biases in algorithmic models (e.g., “automatic” dialect and accent identification) and the ethical challenges of human–machine interactions and partnering in sociolinguistic research and analysis. Reassuringly, it would seem that AI also predicts that the intervention—the guidance—of the human sociolinguist will still be needed in this bright new future.

Less than a decade ago, but a lifetime in terms of AI and its development, the late Jan Blommaert described a move away from the known world for sociolinguistics, in the form of a “… seismic shift from a scholarly universe almost entirely dominated by theoretical and methodological preferences for offline, spoken discourse in fixed and clearly definable time-space, socio-cultural and interpersonal contexts and identities” arguing that this shift necessitates the rethinking of “much of what we assumed to be natural, primordial and commonsense about language—in-society needs to be revised, rethought, developed.” (2017, p. 7)

More than just a shift—even a seismic one—from offline to online, however, there is an argument that we are now entering—or even to a certain extent are already a part of—the human–machine era (Sayers et al., 2021), in which the futuristic and fantastical notions of science fiction are coming true due to advances in artificial intelligence, machine learning, wearable technology, and real-time data analysis (for a realistic and critical assessment of these fantasies for the social life of language, see Seargeant, 2023). Maly (2022), for example, has argued that it is essential that we change our approach to how we study digital discourse. The concern in this dialogue goes beyond those sociolinguists who are primarily concerned with digital discourse and seeks to examine how all sociolinguists will be impacted by these changes given the era we are living through and that which is still to come.

Using the prompts from the first respondent, my ChatGPT interactant above, as well as some issues I have come across while carrying out my own recent work (Kelly-Holmes, 2019, 2022, 2023, and 2024), the aim of this dialogue is to open a discussion with peers about if and how AI will change how we work, although the “if” seems almost redundant. Some respondents will have been working for many years with advancing digital technologies and are recognized as experts in the topic. However, in addition to expert views, I am also interested in how ordinary sociolinguists, that is, not specifically focused on digital or AI communication, and those working in closely related disciplines view this change, and how and to what extent it will impact on their work and is already impacting, depending on their context.

The impetus for the dialogue comes from my own experiences of working with digital sociolinguistic data and living in an increasingly technologized reality over the last number of years. And to start off the dialogue and invite responses, I would like to share three things in particular that are increasingly hard to ignore in terms of how I approach my work and think about the data I encounter: first of all, the increasing invisibilization of technology; second, the role of our own algorithmic identity as individuals and as researchers; and, third, where and how we will find multilingual and sociolinguistic data in the future.

While the discipline of sociolinguistics has already undergone a transition from its exclusive concern with spoken language to incorporating written language, mediated language, and more recently digitally mediated language, these have, to date, been treated as “different” domains. Qualifiers such as “mediated” and “digital,” for example, highlight the use of some kind of technology that marks these types of language as different to “normal” language. For sociolinguists, this is still, in terms of the origin of the discipline, spoken, physical person-to-person communication. The current and emerging technologies, however, threaten to upend these distinctions and that norm, by merging the physical and the mediated or technologized in ways that force us to question our disciplinary tenets. One example is how the normal focus of sociolinguistics, face-to-face spoken communication, now seems to require a qualifier such as “co-present” language (Androutsopoulos & Lexander, 2021).

Technological mediation of language, while not an original concern of sociolinguistics, is of course not new, and language as we know needs some kind of physical substance to be communicated, whether that is a physiological “technology” internal to the body that makes language audible or visible, or an external technology such as writing or printing (Kristiansen, 2014). Sometimes the technology is more obviously and directly connected to the sociolinguistic action, for example, with a pen and paper on a desk, a printing press, a PC, a laptop, or a phone. But sometimes it is an indirect or less obvious technological connection such as boats bringing traders and colonizers and their language or planes bringing tourists and migrants and their languages. In these cases, the transportation of the language is an indirect byproduct, not the purpose or function of the technology. And, while previously what might be termed indirect technologies had to physically bring speakers in order to bring language, and while this does of course still happen, it is no longer necessary since the development of the more directly mediating technologies which have enabled the separation of language from the body of the speaker and its mediation beyond their physical space. We are now facing and to an increasing extent experiencing the human–machine era which brings about the merging of speaker, language, technology, and mediation.

Consequently, the mediation becomes increasingly invisible, attached—albeit it in detachable form—to the physical body (the co-present mobile phone, wearable devices, etc.), and possibly ultimately implanted in future and emerging technologies. A simple example of the disappearance of mediation is the real-time translation available on a number of latest generation smartphones. It could be argued that this drive for invisible mediation among tech innovators highlights perhaps a reverence for and idealizing of pure, unmediated production of language by unencumbered human bodies, and the desire to replicate as much as possible “immediate language” (Kristiansen, 2014) or the “normal” language that has traditionally been the concern of sociolinguists as highlighted above by Blommaert. On the other hand, we can see how the less visible a technology is, the more questions there are in terms of treating this language as authentic and “immediate.”

So, in the context of invisible mediation and technologies, we face a number of fundamental challenges in terms of defining what is (“real,” “authentic”) language. What is social language? And what/where is the unit of society that we are interested in? How do we deal now with technologically (co-)produced language, with a conversation or interaction, that is both human and technological? What is a ChatGPT transcript in terms of sociolinguistics? Where do we locate the socio? Who is the speaker/author of the response to my question above? Is it my ChatGPt interactant, is it me, or is it both of us? The web is a social machine (Hendler & Berners-Lee, 2010). It learns by being fed by humans, thriving on “social” interaction with humans and this is how it becomes intelligent. In one sense, AI is a massive human language corpus—it is the product of masses of people interacting with each other and with technology over many years, from many different but still limited contexts. So, it is uniquely human, fed by human language. But that human corpus is in turn manipulated and reassembled by technology and by algorithms and large language models, which tend to be stripped of sociolinguistic content.

How then can we contextualize these human–machine interactions and the language that they produce? Where and how should we position these linguistic data in relation to our familiar—albeit of course increasingly contested and deconstructed—anchors of place, gender, age, education, class, ethnicity, race, ableness …? The language we encounter from our technological conversation partner is the product of multiple inputs from so many humans in so many places with so many potential essentialized and nonessentialized identity categorizations—while also excluding and erasing large parts of the world and speakers due to ongoing digital access and content inequalities. Another layer is added when these inputs are then filtered and tweaked to our own digital footprint, our algorithmic identity, and this is the next issue I would like to raise.

It is probably impossible to find a sociolinguist who works and lives completely offline. The reality is that our personal and professional sociolinguistic selves have now merged. While it can be argued that we have always impacted our surroundings, and in particular the contexts that we study, simply by our presence and methods, this type of impact and its associated biases are fundamentally different to the kind of merging that we now have to acknowledge.

One reason for this is the phenomenon of personalization, which involves using technology and either real-time or archived data about individuals to “tailor electronic commerce interactions between a business and each individual customer” (Personalization Consortium, 2005, in Vesanen, 2007, p. 410). Sometimes this personalization results from explicit choices that we make (e.g., choosing locations, languages, saving preferences, and other primary categories) and sometimes it is the result of implicit choices (e.g., by accepting “cookies,” our search histories, online activities). The online context is an economic one and everything that happens online is for the purpose of generating revenue. In many ways, this demonstrates and makes explicit how economic processes—at micro-, meso-, and macro-levels—have always impacted language and the sociolinguistic contexts we live in and study, in ways that go far beyond the observer's paradox (Labov, 1972). Now, however, it becomes harder to ignore this impact, even though, ironically, these processes are actually less visible than ever. The reasons for this are that we often do not have the physical clues for market activity that we recognize; private and marketized spaces merge in our online interfaces, and because activities that seem untouched by the market now take place in a market-supported context, for example, private interactions on WhatsApp.

Since the advent of consumer marketing in the second half of the twentieth century, markets were divided into segments along crude sociodemographic and lingua-cultural criteria, using many of the same demographics also in use in sociolinguistics. With advancing technology, the segmentation can happen not just for groups but also at individual levels, with the exchange or product or service being “altered to fit that customer's stated needs as well as needs perceived by the business based on the available customer information” (Personalization Consortium, 2005, in Vesanen, 2007, p. 410). Much of this information is of course sociolinguistic information, including essentialized categories and language preferences as well as consumer preferences. “Mass individualization” describes this change from targeting mass markets to a “market of one”(Gilmore & Pine, 2008). In a sense, this is a type of superdiverse (Androutsopoulos & Juffermans, 2014) or hyperlingual (Kelly-Holmes, 2019; Pauwels, 2010) market or context.

This market-driven personalization also means that we leave an algorithmic footprint that impacts all of our activities online, both personal and professional. So, it can be difficult to break out of personalized filters (Pariser, 2011), which assign a particular lingua-cultural profile and expected behaviors to us and this has consequences for our professional digital selves since our “algorithmic identity” (Blommaert, 2017) will inevitably shape what we see and experience when we try to study digital spaces. More than this, we need to account for algorithmic agency in our studies: “algorithms are (co-) producers of discourses and at the same time, they are also discourse” (Maly, 2022, p. 3). Such a realization needs “to inform how we approach our object of study” (Maly, 2022). As a way to try to account for both my own algorithmic footprint and algorithmic agency in my object of study, when examining a language ideological thread on Twitter/X, four concepts emerged to help explain how the thread in question came to my attention/prominence. These were: influence (in terms of my own networks, numbers of followers, rate and range of activity, and those of the thread participants); expertise (in terms of how I am described and positioned online); validation (in terms of my ability to give credibility to different users and their comments); and alignment (how this piece of language ideological data aligns with my online profile and activity) (Kelly-Holmes, 2024). I needed to do this in order to fully understand the data I was dealing with and to account for my algorithmic footprint in the shaping of the data.

A further dilemma—more profoundly for humanity—but also more parochially for sociolinguists as we try to decolonize our methods and theories is the reality that this type of personalization will take place within existing digital and content divides and may serve to exacerbate them further. Put crudely, some speakers’ communication needs will be “worth” (in economic terms) meeting and personalizing while a majority's will most likely not.

Personalization has also seen a move from mass public to more privatized online spaces for individuals or small groups (see Androutsopoulos & Lexander, 2021). With this move, a related and final challenge that I have increasingly experienced in my work in the last few years is how and where to find the texts and interactions to study in this new technologized reality.

Securitization (Buzan et al., 1998) has become a key marketing feature for differentiating public from private spaces, and an inevitable outcome of this is that there will be fewer texts and interactions available to researchers. “Securitized” spaces allow users to construct their own speech communities that are not publicly visible or accessible. In sociolinguistic work on multilingualism and digital communication to date the “scope is often limited to public forums and social media platforms” (Androutsopoulos & Lexander, 2021, p. 724). This is no surprise since it is much harder to access and study “private or secure groups ”and in particular dyadic chat. One recent study showed that a minority language (in this example, Frisian) is used more frequently in private social media platforms (in this case WhatsApp) than in public ones (Jongbloed–Faber, 2021). One possible outcome could be the erasure of “authentic” minority languages from public digital spaces making it more difficult to gather data on real usage because it would be necessary to rely on public areas and/or negotiate access to these private spaces. The spread of personalization together with securitization means that finding and collaborating with increasingly private, secure, invisible, and individualized speech communities will become even more challenging.

In such a context, extant approaches to studying online multilingualism, for example, virtual linguistic landscape analysis (Ivkovic & Lotherington, 2009) and virtual linguistic ethnography (Auleear Owodally & Peeroo, 2021; Berezkina, 2018) as well as the building of multilingual social media corpora (Coats, 2019) become very challenging. With lingua-cultural (essentially sociolinguistic) personalization filtering out multilingual options and content based on our algorithmic footprint, together with securitization and a premium being placed on privacy, such spaces are no longer fully accessible and/or are not the places where we may find the kind of authentic practices we are looking for.

Hashtag analysis, for instance, has been used to document content in particular languages on public forums like Twitter/X (e.g. McMonagle et al., 2019). But this relies on a user adding a hashtag to identify the language of the tweet, which in turn relies on a kind of hyperreflexivity and/or activist stance. The need for this kind of stance means that we do not find casual, nonreflexive content in the language and mixed practices that may not meet the language classification criteria.

It is useful to reflect that while the web in its earliest form was culturally, practically, and even ideologically monolingual, as it grew and developed, it became more and more multilingual. With the enabling of user-generated text, parallel, and partial multilingualism gave way to a hyperlingual moment on public social and digital media spaces (Kelly-Holmes, 2019). As someone who studies public encounters with multilingualism in technologically mediated spaces, my experience is that multilingualism is more difficult to encounter now due to personalization and securitization/privatization. The interactional space has narrowed and has become more controlled.

Notwithstanding all of the imbalances in distribution, access, and the current and future challenges, it is also important to recognize that technology can give agency to people who may struggle due to all sorts of structural, physical, and geographic inequalities to be heard when they speak. So, technology can allow more speaking to happen, and in this way it has the potential to enlarge the range of speakers, and expand the understanding of who is/can be a speaker and where a language can be ‘spoken’.

The author declares no conflicts of interest.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能与我们社会语言学工作的未来
人工智能(AI)可能会对社会语言学领域产生重大影响。人工智能有可能以多种方式改变社会语言学研究的格局。例如,人工智能工具可以帮助社会语言学家分析大量社交媒体数据,研究语言变异、语言趋势和语言使用随时间的变化。情感分析和话题建模算法可以揭示社会态度和语言动态,帮助我们研究语言意识形态。人工智能驱动的语音识别技术可以帮助自动识别和分析方言和口音。人工智能可以简化开展大规模调查和收集社会语言数据的过程。可以利用聊天机器人或自动访谈工具来收集不同人群的回答,从而促进对语言变异进行更全面的研究。人工智能可以帮助分析文本和语音中与身份和代表性相关的语言标记。这包括研究语言如何用于构建和表达社会身份,如性别、种族或社会经济地位。人工智能算法可以帮助社会语言学家分析基于语言互动的社会网络和社区。人工智能工具可以通过自动完成数据分析的某些方面来支持人种学研究。例如,自然语言处理算法可以帮助从定性数据中分类和提取主题,从而提高分析过程的效率。人工智能有助于分析语言政策及其对社会的影响。社会语言学家可以利用人工智能对在线社区、论坛和虚拟空间进行研究,从而开展数字人种学研究。社会语言学家可以利用人工智能对在线社区、论坛和虚拟空间进行研究,从而开展数字人种学研究。研究人员可以借此探索数字环境中的语言使用方式,从而加深对在线社会语言学的理解。社会语言学家与计算语言学家之间的合作可以开发出专门为社会语言学研究定制的人工智能工具,将语言学专业知识与计算方法相结合。确保公平和消除偏见至关重要,尤其是在研究对种族、性别或社会经济地位等问题敏感的社会语言现象时。虽然人工智能为推进社会语言学研究提供了令人兴奋的可能性,但伦理方面的考虑以及人类解释和语境的重要性也不容低估。社会语言学家将继续在指导和解释人工智能驱动的分析方面发挥关键作用,以确保研究保持伦理上的合理性和文化上的敏感性。"这六个介绍性段落,包括反思和批判部分,你可能猜到了,也可能没有猜到,它们不是我自己对这个主题的研究成果,而是 ChatGPT 在回答我的问题 "人工智能将如何改变社会语言学家开展研究的方式?"时所作。这篇文章比我自己的胡言乱语要正确得多,我故意没有将它缩进,也没有将它与本段开始的我 "自己 "的文章区分开来。人工智能预测,人工智能将给我们的工作方式带来优势和挑战,这并不奇怪。优势一般体现在数据收集和分析工作、收集和处理大量数据的能力,以及将这些数据更广泛地关联起来的能力,而随之而来的劣势则是算法模型中固有的偏差(如 "自动 "方言和口音识别),以及社会语言学研究和分析中人机互动和合作所面临的伦理挑战。令人欣慰的是,人工智能似乎还预言,在这个光明的新未来,仍然需要人类社会语言学家的干预--指导。不到十年前,但就人工智能及其发展而言,已故的 Jan Blommaert 曾描述过社会语言学远离已知世界的过程,其形式是"............",认为这种转变需要重新思考 "我们所假定的关于社会语言的许多自然的、原始的和常识性的东西需要修正、重新思考和发展"。"(2017, p.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.20
自引率
10.50%
发文量
69
期刊介绍: Journal of Sociolinguistics promotes sociolinguistics as a thoroughly linguistic and thoroughly social-scientific endeavour. The journal is concerned with language in all its dimensions, macro and micro, as formal features or abstract discourses, as situated talk or written text. Data in published articles represent a wide range of languages, regions and situations - from Alune to Xhosa, from Cameroun to Canada, from bulletin boards to dating ads.
期刊最新文献
Issue Information Issue Information Language is not a data set—Why overcoming ideologies of dataism is more important than ever in the age of AI (Socio)linguistics and generative AI: Taking the reins as researchers and steering its use toward ethical outcomes Existential challenges and interactional sociolinguistics/linguistic ethnography
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1