Overview
The Conversation Insights feature is designed to help Community Owners and Facilitators gain insights into conversations and maintain positive, safe environments. In its first iteration, this AI-driven feature identifies content that may not align with community guidelines, offering helpful information for moderation. It leverages a trained AI model to assess the likelihood of harmful or inappropriate content based on specific words and their context within a sentence. These AI-driven insights offer a proactive approach to content moderation, helping to maintain safe, positive, and inclusive communities.
How Does It Work?
The feature examines text-based content in your community and categorizes it into the following:
Categories of Undesirable Content
Category | Description | Examples |
Insult |
Disrespectful or offensive language aimed at causing hurt or belittlement. |
Name-calling, personal attacks, and derogatory remarks. |
Obscene |
Indecent or inappropriate language. |
Vulgarity, sexual explicitness, and graphic violence. |
Toxicity |
Harmful or disruptive language. |
Trolling, mocking, harassment, bullying, hate speech. |
Threat |
Explicit declarations of violence or implicit warnings of harm that jeopardize the safety and well-being of others. |
Threats of physical harm, intimidation, blackmail and coercion. |
Identity Attack |
Discriminatory language targeting race, gender, sexual orientation, religion, or other aspects of identity. |
Invalidation, marginalization, and hate speech targeting specific identities. |
Severe Toxicity |
Extreme harmful language that goes beyond general toxicity and has serious emotional or psychological impacts. |
Explicit threats of violence, harassment campaigns, incitement of self-harm, and sustained/organized hate speech. |
Each category is assigned a score from 0 to 100, representing the probability that content matches the respective category. Content scoring higher than 20 in any category will display a corresponding insights tag, allowing you to review and take action as needed.
What Can Owners and Facilitators Do?
Community Owners and Facilitators can moderate content easily with quick moderation buttons to Keep or Remove it from the feed.
When an Owner or Facilitator removes content, it is moved out of the feed and into the Flagged Posts page, which is accessible only to Owners and Facilitators.
Looking ahead, we’re exploring expanding this tool to offer more insights into positive community engagement. We welcome your feedback and ideas! Click here to share your feedback with our Product team.
Frequently Asked Questions
Is undesirable content automatically removed?
Content is marked for review, but the decision to remove it from the feed remains with the Owners and Facilitators. This allows them to explain the removal to community members and offer guidance on more appropriate behavior within the community.
Does it analyze images or videos?
Currently, the feature only analyzes text-based content.
Does this feature send data to third parties?
No, all data remains within Yellowdig’s infrastructure. The AI model is trained and operates in-house. All inferences are run within our secure AWS environment on GPU instances.
What are the data privacy implications?
All data is handled securely within Yellowdig's infrastructure. Data is discarded immediately after processing, ensuring a high level of privacy protection. The Neural Network does not retain any user data, such as Personally Identifiable Information (PII), after the analysis is complete.