Conversation Insights Feature

We welcome your feedback and ideas! Click here to share your feedback with our Product team.

Overview

The Conversation Insights feature operates without human intervention and is designed to assist Community Owners and Facilitators in understanding conversations and fostering positive, safe environments. While this AI-driven tool aims to be accurate, it may occasionally misidentify content, so we recommend reviewing and confirming its findings. In its initial version, it identifies objectionable content that might not align with community guidelines, providing useful moderation insights. It relies on a trained AI model to evaluate the potential for harmful or inappropriate content by analyzing specific words and their context within a sentence. These AI-driven insights offer a proactive approach to content moderation, helping to maintain safe, positive, and inclusive communities.

How Does It Work?

The feature examines text-based content in your community and categorizes it into the following:

Categories of Objectionable Content

Category	Description	Examples
Insult	Disrespectful or offensive language aimed at causing hurt or belittlement.	Name-calling, personal attacks, and derogatory remarks.
Obscene	Indecent or inappropriate language.	Vulgarity, sexual explicitness, and graphic violence.
Toxicity	Harmful or disruptive language.	Trolling, mocking, harassment, bullying, hate speech.
Threat	Explicit declarations of violence or implicit warnings of harm that jeopardize the safety and well-being of others.	Threats of physical harm, intimidation, blackmail and coercion.
Identity Attack	Discriminatory language targeting race, gender, sexual orientation, religion, or other aspects of identity.	Invalidation, marginalization, and hate speech targeting specific identities.
Severe Toxicity	Extreme harmful language that goes beyond general toxicity and has serious emotional or psychological impacts.	Explicit threats of violence, harassment campaigns, incitement of self-harm, and sustained/organized hate speech.

Each category is assigned a score from 0 to 100, representing the probability that content matches the respective category. Content scoring higher than 20 in any category will display a corresponding insights tag, allowing you to review and take action as needed.

What Can Owners and Facilitators Do?

Community Owners and Facilitators can moderate content easily with quick moderation buttons to Keep or Remove it from the feed.

When an Owner or Facilitator removes content, it is moved out of the feed and into the Flagged Posts page, which is accessible only to Owners and Facilitators.

Screenshot 2025-04-09 at 3.36.56 PM.png

Looking ahead, we’re exploring expanding this tool to offer more insights into positive community engagement.

Frequently Asked Questions

Is objectionable content automatically removed?

Content is marked for review, but the decision to remove it from the feed remains with the Owners and Facilitators. This allows them to explain the removal to community members and offer guidance on more appropriate behavior within the community.

Does it analyze images or videos?

Currently, the feature only analyzes text-based content.

Does this feature send data to third parties?

No, all data remains within Yellowdig’s infrastructure. The AI model is trained and operates in-house. All inferences are run within our secure AWS environment on GPU instances.

What are the data privacy implications?

All data is handled securely within Yellowdig's infrastructure. Data is discarded immediately after processing, ensuring a high level of privacy protection. The Neural Network does not retain any user data, such as Personally Identifiable Information (PII), after the analysis is complete.

Was this article helpful?

10 out of 10 found this helpful