From Noise to Signal: Finding README Gaps at Scale

Automating the search for under-documented GitHub projects

I kept running into the same problem while browsing GitHub for repositories that I could confidently contribute to. Plenty of projects looked interesting, but their README files were missing, thin, or unclear. Finding good candidates for documentation work meant opening repositories one by one and making a quick judgment. It worked well enough, but it didn’t scale. It meant sorting through a large number of non-candidates first.

So I built a small CLI tool to make that process more consistent and to better manage my own time (and sanity).

The Problem

There’s no simple way to surface weak README files across many repositories. The process tends to rely on manual searching and quick impressions, which makes it difficult to explain why one project feels worth working on while another doesn’t. Over time, that lack of consistency becomes a problem. It’s harder to compare repositories, and even harder to prioritize where your effort would actually make a difference. It’s an inefficient and time-consuming way to sort through work, and both are worth conserving.

What I Needed

I wasn’t trying to solve documentation quality in a general sense. I just wanted a way to move a little faster without losing clarity. That meant being able to surface likely problem repositories quickly, understand what was missing without opening each one, and apply the same lens across everything I looked at. Something simple, repeatable, and easy to manage. A time-saver that actually saves time. I needed the tool to not only work, but to work well.

Turning Judgment into Signals

The shift came from taking something instinctive and giving it a bit of structure.

Instead of relying on a quick read and a gut feeling, I started asking what weak READMEs tend to have in common. In most cases, it wasn’t anything subtle. Very low word count usually meant a lack of depth. Missing sections like installation or usage made the project harder to approach. A lack of headings often meant the content wasn’t organized in a way that helped the reader.

None of these signals are perfect on their own. Some projects are intentionally minimal, and some document things elsewhere, so there are still cases where the tool works as designed, but the result isn’t a strong candidate. But taken together, these signals form a pattern that’s useful enough to work with. Once that pattern exists, it becomes easy to apply consistently.

How I Built It

The tool follows a straightforward flow. It begins by searching GitHub repositories using a query, then retrieves README files through the GitHub API. Once the content is decoded, it runs a set of simple checks against it.

If you want to try it yourself: readme-radar on GitHub

Those checks are intentionally basic. Word count provides a rough sense of depth. Section detection looks for the presence of common elements like installation or usage. A quick pass over the structure looks for headings and general organization. From there, the tool assigns issue flags and a rough score.

The results are sorted so that weaker candidates rise to the top. The goal isn’t to produce a perfect ranking, but to make the output easy to scan. Instead of opening repository after repository, you can quickly see which ones are likely to need attention.

Over time, I added a few small improvements. Filtering made it easier to remove strong candidates from the results. Summary counts provided a quick sense of what issues were most common in a given scan. An optional JSON output made it possible to reuse the results elsewhere. None of these additions are complex, but they reinforce the same idea: apply a consistent set of checks, then make the results usable.

What Changed

Before building this, the process was mostly reactive. I would browse, click into a repository, scan the README, and decide in the moment whether it needed work.

Afterward, the process became more intentional. Instead of looking for anything, I could look for specific types of problems. That small shift changed the workflow from browsing to targeting. It also made the decision-making process easier to explain, because it was based on defined signals rather than a vague sense of quality.

Outcome

The tool reduced the time spent searching for documentation opportunities and made the process more consistent. It started as a small utility to speed up a repetitive task, but it ended up changing how I evaluate documentation in general. More importantly, it clarified what I was actually evaluating when I looked at a README.

Even without the tool, that clarity sticks. A vague problem becomes easier to work with once it’s defined, even loosely. From there, it becomes something you can repeat and refine over time.

The tool itself is small. The thinking behind it is what scales.