Speaking with AI: Aligning Language Models with Human Values through Pragmatics, Context, and Cooperative Dialogue

Speaking with AI: Aligning Language Models with Human Values through Pragmatics, Context, and Cooperative Dialogue

New research drawing upon pragmatics and philosophy proposes ways to align conversational agents with human values. Language is a uniquely human trait and the principal vehicle for sharing information, including thoughts, intentions, and feelings. In recent years, breakthroughs in AI have yielded conversational agents capable of nuanced, naturalistic interaction with people. These agents run on large language models—computational systems trained on vast corpora of text to predict and generate language through sophisticated statistical techniques. While models such as InstructGPT, Gopher, and LaMDA have achieved notable performance across tasks like translation, question answering, and reading comprehension, they have also demonstrated a spectrum of potential risks and failure modes. Among these are the production of toxic or discriminatory language and the dissemination of false or misleading information. These shortcomings constrain the practical deployment of conversational agents in applied contexts and highlight how such systems often fall short of ideal communicative standards. To date, much of the alignment work around conversational agents has concentrated on anticipating and mitigating harms. Our new paper, In conversation with AI: aligning language models with human values, shifts the focus. It asks what successful communication between a human and an artificial conversational agent might look like, and what values should govern these interactions across different conversational domains. It draws on insights from pragmatics to reframe the alignment problem as a question of how to support cooperative, context-sensitive dialogue rather than merely policing output.

Context and Background
Language models operate at the intersection of computation and communication. They are trained on enormous bodies of written text to learn patterns, predict next words, and generate coherent continuations that often resemble human discourse. The promise of these systems lies in their versatility: they can translate languages, answer questions, summarize information, assist with reasoning, and participate in interactive tasks that require an understanding of user intent and conversational context. Yet this versatility comes with corresponding risks. When models produce toxic, biased, or factually suspect material, the consequences can be harmful—socially, politically, or even personally. The tension between capability and safety manifests in many real-world applications, from customer support to educational tools to automated moderation.

Conventional alignment work has largely targeted the reduction of harm: limiting the probability and impact of unsafe content, preventing the spread of misinformation, and ensuring that agents do not misrepresent themselves or evidence. While these efforts are essential, they address only part of a broader aspiration: aligning language models with human values in ways that support meaningful, constructive dialogue across diverse contexts. The new work argues that alignment should be treated not only as a problem of constraint and risk management but as a problem of design for communication—how to shape an agent’s behavior so that it participates in conversation in ways that reflect domain-appropriate norms and values. The core idea is to view conversation as a cooperative enterprise in which both the human and the agent contribute to a shared understanding, guided by a framework that respects the purposes, norms, and values relevant to a given context.

Pragmatics in AI Alignment
The new approach draws its theoretical backbone from pragmatics, a field at the crossroads of linguistics and philosophy. Pragmatics emphasizes that the meaning of utterances is inseparable from context—the situation in which communication occurs, the goals of the speakers, and the shared norms that guide how conversations unfold. From this perspective, successful dialogue depends on more than syntactic correctness or surface-level coherence. It requires awareness of purpose, audience, and the implicit rules that govern interaction over time.

Within pragmatics, the cooperative principle offers a foundational lens. The principle, as articulated by Paul Grice and later refined, suggests that participants in a conversation should work together to achieve effective communication. This cooperation rests on several maxims: speak informatively, tell the truth, provide relevant information, and avoid obscure or ambiguous statements. These maxims guide expectations about what a conversational partner should contribute to a dialogue under typical conditions. However, the paper argues that a straightforward application of Gricean maxims is insufficient for evaluating or guiding AI behavior across all domains. The goals and values embedded in different conversational contexts differ, and as a result, the maxims require refinement to accommodate variation in purpose, audience, and normative commitments. The authors propose a more nuanced, domain-aware interpretation of these principles that can inform how language models are designed, trained, and evaluated when interacting with humans across diverse settings.

A crucial step in translating pragmatics into AI practice is to acknowledge that communication is not a one-size-fits-all activity. In practice, the norms that govern informative discourse in one domain may be less restrictive—or even inappropriate—in another. The approach therefore calls for a flexible, context-sensitive framework in which agents align their communicative behavior with the values appropriate to the domain of interaction. The result is a more dynamic conception of alignment that accounts for differences in goals, risks, and ideals across scientific, civic, and creative domains. By integrating pragmatics with a careful analysis of domain-specific aims, the framework aims to produce agents that can adapt their communicative style and standards of truthfulness to the task at hand, without compromising safety or inclusivity.

Discursive Ideals Across Domains
To illustrate how domain context shapes appropriate communicative norms, the paper presents a contrastive view of how a conversational agent should behave in different settings. Scientific investigation and communication, for example, prioritize understanding and predicting empirical phenomena. In this context, a language model designed to assist scientific inquiry would ideally make statements only when the underlying claims are supported by robust empirical evidence, with explicit qualification when necessary. When the model states a fact such as, for instance, a measurement about a celestial object, it should indicate confidence levels and corresponding uncertainties, ensuring that users interpret the claim in light of the evidence.

By contrast, a conversational agent serving as a moderator in public political discourse operates under different virtues. The goal in this arena is to manage differences, facilitate constructive cooperation within a community, and support democratic deliberation. In this domain, the agent should foreground values such as toleration, civility, and respect. The emphasis shifts away from rigid empirical verification toward the maintenance of a respectful dialogue that enables diverse voices to contribute meaningfully. This does not imply abandoning truthfulness; rather, it recognizes that the social function of discourse in public deliberation often requires balancing rigor with inclusivity, accessibility, and fairness, so as to prevent escalation and ensure participation from a broad range of stakeholders.

The authors argue that these contrasting values help explain why the generation of toxic or prejudicial language by language models is particularly problematic in some contexts. When an utterance fails to convey equal respect for all participants, it undermines the social fabric of the conversation and, in turn, the legitimacy of the discourse itself. The same principle applies to other domains: the scientific arena may place a premium on comprehensive data presentation and reproducibility, while public deliberation may require emphasis on viewpoint pluralism, mutual recognition, and procedural civility. Creative storytelling, too, operates under a distinct set of assumptions. In that sector, novelty and originality take center stage, and the exchange aims at imaginative expression and engagement. However, even here, safeguards remain essential to prevent the creation or dissemination of content that could cause harm or mislead audiences under the banner of “creative use.”

These domain-specific ideals illustrate why a single, universal standard for language-model alignment is insufficient. Instead, the framework calls for a repertoire of evaluative standards—truthfulness, relevance, and clarity—aligned with the purposes and values of the specific dialogic setting. In scientific work, for example, the standard would emphasize verifiable claims backed by data and transparent uncertainty communication. In civic discourse, it would foreground respect, inclusivity, and the facilitation of productive differences. In creative storytelling, it would support originality while maintaining responsibility—ensuring that creativity does not become a vehicle for deception or harm. The paper argues that the agent’s behavior must be sensitive to these domain-specific norms to avoid misalignment or unintended consequences.

Paths Ahead: Practical Implications for Aligned Conversational AI
The work outlines a series of practical implications for the development of aligned conversational AI agents. First, there is no universal, one-size-fits-all account of alignment. Agents will need to embody different traits and calibrate their performances according to the contexts in which they are deployed. The appropriate mode of operation and the evaluative standards for truthfulness—and for other normative commitments—will vary based on the context and purpose of each conversational exchange. This implies that designers should embed contextual awareness directly into agents, enabling them to select and adapt their communicative strategies to fit domain-specific norms.

Second, the framework suggests a potential for agents to cultivate more robust and respectful conversations over time through context construction and elucidation. Even when a human participant is not fully conscious of the governing values in a given conversational practice, the agent can prefigure these values in dialogue. By doing so, the agent can help the human interlocutor become more aware of the underlying norms, thereby deepening and enriching the course of communication. This dynamic process of context construction can lead to more meaningful interactions, where users gain insight into the standards that shape discourse and learn how to engage more productively.

Third, the approach emphasizes the importance of explicit guidance on how to handle uncertainty and transparency in dialogue. In scientific domains, agents should be upfront about the evidentiary basis for claims, the confidence to which a claim is tethered, and the limits of applicability. In civic contexts, agents should provide balanced perspectives, acknowledge constraints, and avoid presenting a single viewpoint as the sole legitimate one. In creative contexts, agents should clarify when they are engaging in imaginative or fictional content, distinguishing it from factual assertions, while still respecting the audience’s capacity for interpretation and enjoyment.

Fourth, practical deployment will require robust evaluation frameworks that can capture domain-specific norms and user expectations. Traditional metrics for accuracy or fluency may be insufficient on their own. Instead, evaluation should incorporate measures of relevance, usefulness, and alignment with the values most salient to the task. For example, in scientific assistance, evaluation may involve reproducibility, dataset alignment, and empirical corroboration; in civic moderation, evaluation may consider inclusivity, fairness, and the ability to sustain civil dialogue; in creative assistance, evaluation might emphasize originality, coherence, and audience engagement.

Fifth, the framework calls for careful design of interaction patterns that reflect cooperative dialogue rather than adversarial testing. Agents should be structured to invite user input, provide clarifying questions when necessary, and adjust their assertions in light of new information. The cooperative paradigm helps reduce the probability that the agent’s outputs will disrupt discourse, mislead participants, or trigger unintended harms. It also creates space for ongoing learning and refinement as the system encounters real-world interactions across diverse domains.

Finally, the work underscores the importance of governance and governance-oriented design principles. Since values vary across contexts and communities, it is essential to engage with stakeholders from multiple domains when defining what counts as aligned behavior. This includes including domain experts, ethicists, policymakers, and representatives of affected communities in the design and evaluation processes. The goal is to establish transparent, auditable criteria for alignment, as well as mechanisms for redress and improvement when misalignment occurs. The approach seeks not only to constrain risk but also to cultivate dialogue that is constructive, inclusive, and capable of adapting to evolving norms and expectations.

Context Construction and Elucidation: A Mechanism for Deeper Dialogue
A distinctive feature of the proposed approach is the concept of context construction and elucidation. This mechanism envisions agents that actively illuminate the values, assumptions, and norms shaping a given conversational practice. Rather than simply adhering to a static rule set, the agent engages with the user to reveal the underlying frame governing the interaction. For instance, in a scientific Q&A session, the agent might explicitly articulate why certain claims require empirical support, how to interpret confidence intervals, and what constitutes sufficient evidence. In a political moderation scenario, the agent could clarify the normative commitments—such as respect for dissent, non-discrimination, and procedural fairness—that guide its responses, and invite the user to express preferences or constraints that matter to them.

This process of elucidation supports deeper understanding on the part of human interlocutors. It helps users recognize the values that shape the conversation and fosters more meaningful participation. The approach also supports the agent’s own learning, as exposure to real-world interactions can reveal which norms hold in practice and where adjustments are needed. By prefiguring values through dialogue, agents can help humans navigate complex social terrains with greater awareness and intentionality.

In addition, context construction can mitigate some common failure modes of language models. When an agent clearly communicates the basis for its claims, it reduces the likelihood of producing unsupported statements or presenting uncertain information as certainty. When it foregrounds domain-specific norms, it becomes easier for users to calibrate their expectations and engage in more productive conversations. This mechanism also helps bridge gaps between technical and non-technical audiences, ensuring that specialized knowledge is conveyed with appropriate levels of rigor and accessibility.

Challenges and Considerations
While the pragmatic, domain-aware approach offers a promising path forward, it also presents several challenges. One major challenge is operationalizing domain-specific norms in a way that is scalable and maintainable across a wide range of applications. Capturing the nuances of truth claims in science, civility in public discourse, and originality in creativity requires careful specification, documentation, and governance. There is a risk that attempts to encode norms could become brittle or overly prescriptive, inhibiting flexibility and adaptability in evolving contexts.

Another challenge concerns evaluation. How can evaluators reliably judge whether an AI agent has achieved domain-appropriate alignment? The task requires metrics that capture subtle aspects of discourse, such as the balance between truthfulness and accessibility, or the degree to which civil norms are upheld without constraining legitimate debate. Developing robust, transparent evaluation protocols will be essential to ensure accountability and continual improvement.

A related difficulty is interpreting and managing uncertainty. In scientific settings, quantifying uncertainty is crucial; in civic dialogue, uncertainty may arise from divergent viewpoints and the normative complexity of public policy. Crafting AI behavior that can appropriately signal uncertainty without eroding user trust or undermining the perceived competence of the agent is a delicate balancing act.

Ethical considerations also arise. The goal of aligning agents with human values includes protecting vulnerable communities from harm, ensuring fairness, and avoiding the perpetuation of bias. The domain-sensitive approach must guard against the risk of overfitting to dominant cultural norms or marginalizing minority perspectives. This requires ongoing engagement with diverse stakeholders and ongoing re-evaluation of normative commitments as societal norms evolve.

Implementation Pathways for Stakeholders
For developers, researchers, and organizations seeking to implement these ideas, a concrete roadmap emerges. The first step is to embed contextual awareness into agent architectures. This includes mechanisms for identifying the domain of a conversation, selecting appropriate normative standards, and modulating the agent’s responses accordingly. A second step involves building explicit channels for context construction and elucidation, enabling agents to reveal the values and assumptions guiding their contributions. A third step is to establish flexible evaluation frameworks that assess domain-specific alignment across

  • truthfulness and evidence support in scientific contexts,
  • civility, inclusivity, and fairness in civic settings,
  • originality balanced with safety in creative domains.

These steps should be accompanied by governance models that involve domain experts and affected communities in ongoing oversight and improvement processes. Finally, it is essential to design user-facing controls that allow people to adjust the agent’s behavior to suit their preferences and needs, while preserving the integrity of the domain-specific norms that guide dialogue.

A Practical Vision for Everyday Interactions
The overarching aim of this body of work is to render conversational AI more useful, trustworthy, and ethically attuned to human values. In practical terms, the vision entails agents capable of:

  • Adapting their communicative strategies to the task, audience, and domain,
  • Providing transparent, evidence-based support when required, and
  • Encouraging user reflection on the norms and values that govern the conversation.

In scientific work, the agent’s contributions would emphasize verifiable claims, transparent uncertainty, and disciplined evidence presentation. In public discourse, the agent would help manage disagreement with civility, respect for diverse viewpoints, and pathways toward constructive cooperation. In creative contexts, the agent would support imaginative exploration while maintaining safeguards that prevent exploitation or harm under the banner of creativity. Across all domains, the agent would engage in context construction and elucidation to deepen understanding and foster meaningful dialogue.

Conclusion
The integration of pragmatics and philosophy into the alignment of language models offers a rich, nuanced framework for aligning conversational AI with human values. By treating conversation as a cooperative enterprise and recognizing that different domains require different virtues and standards, this approach moves beyond blanket safety constraints toward a more sophisticated, context-aware form of alignment. It highlights that there is no universal standard for how an AI should speak or what it should prioritize; instead, the agent should embody the appropriate traits for the circumstances in which it operates. This perspective further suggests that alignment is not a one-time fix but an emergent property of ongoing dialogue, learning, and governance. Through context construction and elucidation, agents can help humans understand the values that govern dialogue, fostering deeper, more productive interactions. As conversational AI becomes more integrated into everyday life and critical processes, embracing domain-sensitive alignment grounded in pragmatics and philosophical reflection will be essential to ensure that these systems enrich human communication rather than undermine it.

Artificial Intelligence