Making Your Media Accessible: Text Alternatives and Captions

IN THIS ARTICLE
Text alternatives
How to write a text alternative for synchronized media
What makes a text alternative good
Captions
What to watch for when editing auto-generated captions

When you share audio or video content in Yellowdig, you're asking your community to engage with something they can hear and see. But not everyone can access media the same way. Students using screen readers, those who are deaf or hard of hearing, learners in loud or quiet environments, and anyone who processes written content more easily all depend on accessible alternatives to participate fully.

This guide covers two things: how to write a text alternative for your media, and how to write accurate captions. Understanding the difference between them and when each is required will help you do both well.

Text alternatives

A text alternative is a written document that fully substitutes for your media. Someone who reads your text alternative instead of watching or listening to your media should come away with the same understanding as someone who accessed the media directly.

That's the bar worth keeping in mind as you write: could someone skip the media entirely, read this document, and know everything they need to know?

What to include

What belongs in a text alternative depends on the type of media you're sharing.

Audio-only content (a podcast clip, a recorded interview, a voice memo) needs a transcript that captures all spoken content accurately, along with any non-speech audio that carries meaning — music that establishes mood, laughter that signals a reaction, an alert sound that indicates something happening in the recording. Note these briefly in brackets: [upbeat background music], [audience laughs], [notification sound].

Don't paraphrase the spoken content. Transcribe it. The exact language often matters, especially in academic or instructional recordings where word choice is the point.

Video-only content (a silent screen recording, an animation, a time-lapse) needs a written account of what happens visually, detailed enough that a reader can follow the content without watching. Describe what's shown, what actions occur, and what changes on screen in enough detail that the sequence of events is clear.

Synchronized media (video with audio — the most common type) requires both: accurate transcription of the spoken content and description of the visual content, woven together in a way that reflects how they relate to each other. This is the most complete form of text alternative, and it's what most of the media shared in Yellowdig will need.

How to write a text alternative for synchronized media

The goal is a coherent document, not a disconnected list. Narration and visual content should be integrated the way they work in the video itself — the reader should be able to follow the logic and sequence of the content from beginning to end.

A structure that works well for most instructional media:

A brief overview of what the media covers (two to three sentences)
The content itself, moving through the media in sequence, with narration transcribed and visual content described where it adds information the narration doesn't make explicit
Any key data, references, or on-screen text that appeared

Describe visual content that carries meaning. If the narration says "as you can see here, the solution changes color" and the change itself is the evidence being pointed to, that change needs to be described — what color it was, what color it became, how quickly. If a chart appears on screen, don't write "a chart is shown." Describe what the chart shows, including the key data it communicates. On-screen text should be transcribed.

Integrate narration and visual content as they relate. Sometimes the narration explains the visuals directly and you can simply transcribe the narration. Sometimes the visuals add context or nuance the narration doesn't state explicitly — in those cases, describe the visual content and make clear how it connects to what's being said.

Here's an example of narration and visual content integrated in a text alternative:

Narration: "As you can see here, the solution changes color almost immediately once the catalyst is introduced."

Visual: The researcher adds three drops from a dropper to a beaker of clear liquid. Within two seconds, the liquid turns from clear to a deep amber. The researcher holds the beaker up to the camera.

Narration: "That color change tells us the reaction has started. We're going to let this run for ten minutes and see where it ends up."

A reader who encounters this text instead of the video has the same understanding a viewer would: not just what was said, but what was shown and how the two connect.

What makes a text alternative good

Be specific. "A graph showing enrollment trends" is weaker than "A bar chart showing undergraduate enrollment declining from 4,200 students in 2019 to 3,600 in 2023." The second version gives the reader the actual information the graph conveys.

Be neutral. You're representing the content, not interpreting it. Describe what's there.

Match the level of detail to what the content actually communicates. A recording where all the meaningful content is in the spoken words needs accurate transcription more than extensive visual description. A demonstration where what's happening on screen is the point needs detailed visual description. Use your judgment.

Check it against the media. Once you've written your text alternative, read it back against the original: is there anything someone would learn from watching or listening that they wouldn't learn from reading? If so, add it.

Captions

Captions are synchronized text representations of spoken audio, displayed on screen as media plays. They're different from text alternatives in that they're time-aligned to the media — a viewer reads them in real time while watching — rather than read as a standalone document.

Yellowdig automatically generates captions. That's a useful starting point, but auto-generated captions are frequently imperfect. Editing them is one of the most impactful things you can do for the members of your community who depend on them.

How to edit captions in Yellowdig

After uploading your media, you'll find an option to review and edit the auto-generated captions before your post goes live. You can correct errors, adjust timing, and add detail to make them accurate.

What good captions include

Accuracy above everything else. A caption that misrepresents what was said is worse than one that's slightly awkward. Prioritize getting the words right, especially for technical terms, proper names, and anything central to understanding the content.

Speaker identification. When more than one person is speaking, captions should indicate who's talking. Use names or descriptive labels: [Professor Chen], [Student], [Interviewer].

Meaningful non-speech audio. Captions exist for people who can't hear the audio, so they should include sounds that carry meaning: music that establishes mood, laughter that signals a reaction, an alert sound that indicates something happening in the recording. Note these briefly in brackets: [upbeat background music], [audience laughs], [notification sound].

Clear punctuation. Captions without punctuation are significantly harder to follow, especially for complex or technical content. Add periods, commas, and question marks where they belong.

What to watch for when editing auto-generated captions

Auto-generated captions tend to struggle with:

Technical or disciplinary vocabulary. Domain-specific terms almost always need correction — "epistemology" becomes "a pistol ology," that kind of thing.

Proper names. Names of researchers, institutions, and places are frequent error sources.

Accents and speaking styles. The more a speaker's voice varies from the model's training data, the more errors you'll see.

Fast speech. When speakers talk quickly, auto-generation often drops words or merges phrases incorrectly.

Numbers and data. "4.2 percent" can become "for two percent" or "forty-two percent," which matters when the number itself is the point.

A good approach: play the media at normal speed while reading the captions alongside. Mark errors as you go, then correct them. For longer recordings, working through it in sections makes the task more manageable.

A note on caption quality

Captions that are 80% accurate might feel close enough, but for a student who relies on them, the 20% that's wrong represents real gaps in their ability to follow the content. Hold the same standard for caption accuracy that you'd hold for anything else you share with your community.

A few final thoughts

Text alternatives and captions serve different needs, and both matter. Captions help someone follow along with media in real time. A text alternative ensures that someone who can't access the media at all — because of a disability, a technical limitation, or any other reason — can still fully engage with what you've shared.

When you take the time to do these well, you're making a genuine choice to include everyone in your community's conversations. That's worth the effort.

If you have questions about accessibility in Yellowdig or want feedback on your text alternatives or captions, reach out through the Help section.

Was this article helpful?

0 out of 0 found this helpful