Building Your First Hermes Agent Skill: A Complete Walkthrough

I stared at my terminal for 20 minutes trying to figure out why my Hermes Agent kept forgetting everything between sessions. Same context. Same prompts. Same frustration. Then I discovered skills, the extensibility layer that turns a bare agent into something that actually remembers how you work.

Here's exactly how to build your first Hermes skill from scratch, including the mistakes I made that you don't have to repeat. By the end, you'll have a working skill that teaches your agent your conventions, your tools, and your workflow.

I've published 6 skills so far. Three failed spectacularly before I figured out what actually works. This walkthrough compresses those failures into a path you can follow in about 30 minutes, start to finish.

What Are Skills, Actually?

Think of skills as your agent's procedural memory. They're not prompts, they're structured documents that tell your agent how to handle specific tasks the way you want them handled. The difference matters.

A prompt says "write tests." A skill says "use pytest with xdist, put fixtures in conftest.py, run with coverage thresholds, and here's what to do when the database migrations fail." See the gap?

Skills are reusable procedures, not one-shot instructions.

Hermes skills live as SKILL.md files with YAML frontmatter. When the agent encounters a task that matches the skill's domain, it loads the full instructions and follows them. Every time. Consistently. Without you having to re-explain your preferences.

The first time I realized this power was when I stopped typing "remember to use tabs not spaces" in every conversation. I put it in a skill. The agent just... followed it. Forever. That moment of silence, not having to repeat myself, that's when skills clicked for me.

The Anatomy of a SKILL.md

Every skill has two parts: the frontmatter (metadata) and the body (instructions). Get either wrong and your skill either won't load or won't work. I learned this the hard way when my first skill, a code review checklist, kept getting ignored because I misspelled trigger_conditions in the frontmatter.

---
name: my-awesome-skill
category: productivity
description: "Short description of what this skill does"
version: 1.0.0
author: Your Name
platforms: [linux, macos]
metadata:
  hermes:
    tags: [your, tags]
    prerequisites:
      skills: []
---

The frontmatter is YAML wrapped in --- delimiters. The body is Markdown. That's it. That's the whole structure. Simple, but strict.

Pro tip: Always include version and platforms. I skipped them on my second skill and spent 40 minutes debugging why it wouldn't load on my colleague's Mac. It was a platform filter. Don't be me.

Here's something nobody tells you: the category field affects discovery. Hermes uses it to surface relevant skills when it detects task types. If you pick "creative" for a code review skill, it might never appear when you need it. Match the category to the actual work.

Step 1: Pick Something You Actually Do Repeatedly

This is where most people go wrong. They build skills for hypothetical scenarios instead of real workflows. I once built a "microservice deployment orchestrator" skill that I never used because my actual deployment was just three kubectl commands I kept copy-pasting.

Instead, look for tasks you do at least twice a week where you find yourself giving the same instructions repeatedly. For me, it was:

Setting up new project repositories with my preferred structure
Running my pre-commit checklist (lint, format, test, build)
Generating API documentation from OpenAPI specs
Onboarding explanations of how our monorepo is organized

Those repeated instructions are gold. They're skill candidates.

The test: If you've explained something to the agent more than twice, it's a skill candidate. If you only do it once a month, skip it. Save your effort for the tasks where the compounding payoff is real.

I keep a notes file where I jot down things I find myself repeating. After a week, I review it. The items that appear 3+ times become skills. Sounds low-tech. Works perfectly.

Step 2: Write the Frontmatter

Create a directory for your skill. I keep mine in ~/.hermes/skills/ but you can also use project-local paths. The file must be named SKILL.md, not skill.md, not README.md. Exact name. Case-sensitive.

---
name: project-setup
category: software-development
description: "Initialize new projects with my standard directory structure, tooling, and CI config"
version: 1.0.0
author: Your Name
license: MIT
platforms: [linux, macos, windows]
metadata:
  hermes:
    tags: [project-setup, scaffolding, ci]
    prerequisites:
      skills: []
---

Notice the category field. Hermes uses this to organize skills and determine relevance. Pick from the existing categories rather than inventing new ones, it makes discovery easier later.

One thing I wish I'd known earlier: the prerequisites.skills field isn't just metadata. Hermes actually loads prerequisite skills first when executing yours. My debugging skill depends on my logging skill, so I declare that dependency and Hermes handles the ordering automatically. That's powerful.

Step 3: Write the Body, Instructions That Actually Work

Here's where I went wrong on my first attempt. I wrote instructions like I was writing documentation for a human reader. Vague. High-level. Full of "should" and "consider" and "optionally." That doesn't work for agents.

Agents need imperative instructions. Exact commands. Specific file paths. Error handling. Think runbook, not blog post. Think "operating instructions for a very literal Junior engineer who takes everything at face value." Because that's essentially what you're writing for.

## Agent Workflow

1. Run `mkdir -p src/{components,utils,hooks,tests}`
2. Copy the template files from `~/.hermes/templates/project-setup/`
3. Run `npm init -y` then `npm install` the standard dependencies
4. Create `.github/workflows/ci.yml` from template
5. Run the test suite to verify everything works
6. Report what was created and any issues encountered

## Pitfalls

- If package.json already exists, DO NOT overwrite it
- If tests fail after setup, report the failure but keep the structure
- If git is already initialized, skip `git init`
- If the directory already has a src/ folder, check what's in it before creating new files

See how specific that is? No ambiguity. No "consider checking if...", just "if X, do Y." I started writing all my skill bodies this way and my success rate went from maybe 60% to about 95%.

The rule: Every instruction should be something you could hand to a junior dev who's never seen your project. If they'd have to ask a clarifying question, rewrite the instruction until they don't have to.

Step 4: Add Triggers and Conditions

Skills don't load automatically for every message. You need to tell Hermes when to activate them. This is the trigger_conditions field, and it's the single most important thing for making skills feel magical.

metadata:
  hermes:
    tags: [project-setup]
    trigger_conditions:
      - "user mentions 'new project' or 'scaffold' or 'setup'"
      - "user asks to initialize a repository"
      - keywords: [scaffold, init, setup, new-project]

Without triggers, your skill sits there unused. With good triggers, it feels like the agent reads your mind. My project-setup skill fires whenever I say "spin up a new project" or "scaffold something for X", because those are the phrases I actually use.

I track which phrases I use over a week, then add the most common ones as triggers. Sounds tedious. Takes 5 minutes. Saves infinite frustration. I missed this step on my second skill, wrote perfect instructions that never activated because I used different vocabulary than I'd declared.

Common trigger pitfall: being too technical in your trigger keywords. I had init_repository as a trigger. I kept saying "set up a new project." Different words. The agent never matched. Use the exact words you type when you're in flow, not the formal task name.

Step 5: Test It (The Part Everyone Skips)

I published my third skill without testing it. It failed on a fresh machine because I'd hardcoded a path that only existed on my dev box. Embarrassing. Took me 3 minutes to fix but felt like 3 hours of pride evaporating. Don't publish without testing. Ever.

Here's my full testing checklist, refined over 6 months of skill writing:

Fresh context test: Open a new session and trigger the skill. Does it work without prior conversation context? This catches "as we discussed earlier" assumptions.
Edge case test: What happens when the prerequisites aren't installed? When files already exist? When the network is down? When the user says something slightly unexpected?
Cross-platform test: If you specified multiple platforms, test on each one. Path separators, shell commands, and environment variables differ.
Interference test: Does it conflict with other skills you have loaded? I once had two skills that both tried to format code differently. The result was... messy.
Repeat test: Run it 3 times in a row. Does it produce consistent results? Non-determinism in skill instructions is a silent killer.

My favorite testing trick: Trigger the skill, then deliberately give it a slightly wrong input. A good skill handles errors gracefully and tells the user what went wrong. A bad one crashes silently or produces garbage. Both are fixable if you catch them before publishing.

The 3 Skills That Failed (And What They Taught Me)

Skill #1: "Universal Code Reviewer." Too broad. It tried to review everything, security, performance, style, architecture, and did nothing well. The instructions were contradictory. Lesson: one skill, one domain. My "React Testing Reviewer" skill that replaced it works 10x better because it has a clear, narrow focus. Specificallyness isn't a limitation, it's a superpower.

Skill #2: "Database Migration Assistant." Hardcoded for PostgreSQL but didn't declare that dependency. Failed silently on MySQL. The agent just... did nothing. No error. Nothing. Lesson: always specify prerequisites and assumptions in the frontmatter. And in the body. Be explicit about what your skill needs to work.

Skill #3: "Documentation Generator." Generated beautiful docs that were completely wrong because it didn't actually read the source code, it inferred from comments. The docs looked authoritative but described behavior that didn't exist. Lesson: skills can't cheat. They need to do the actual work of reading, checking, validating. An assumption dressed up as a fact is worse than no information.

Each failure took me maybe 10 minutes to diagnose and fix. That's 10 minutes I could have saved by being more specific upfront, more honest about assumptions, and more thorough in testing. These days I budget 10% of my skill-writing time for testing. It's the most valuable 10% I spend.

Advanced: Chaining Skills Together

Once you have 3-4 skills working reliably, you can chain them into workflows. My typical workflow for a new feature looks like this:

project-setup scaffolds the directory structure and creates files
test-driven-development enforces the RED-GREEN-REFACTOR cycle
requesting-code-review runs the security scan and quality gates

The magic is that each skill handles its domain expertly. I don't have to remember to run the security scan, the code review skill does it. I don't have to remember the test structure, the TDD skill enforces it. I don't have to check linting, that's built into the pre-commit skill.

But start simple. Get one skill working perfectly before you chain anything. Chaining compounds both successes and failures. A chain of 3 mediocre skills produces a terrible experience. A single excellent skill produces a great one.

The prerequisite system handles ordering. When Hermes loads my code-review skill, it first loads my logging skill (declared as a prerequisite), then the code-review instructions. I don't have to think about it. The agent just handles the dependency graph.

Common Mistakes I Still See (Including My Own)

Writing walls of text. If your skill body is more than 200 lines, it's probably doing too much. Split it. My longest skill is 140 lines. Most are 60-80.

Forgetting error paths. You describe the happy path beautifully. What happens when the command fails? When the file doesn't exist? When the user doesn't have permission? The error path is where 80% of the value lives.

Using "you should" instead of "do." "You should run the tests" is a suggestion. "Run the tests" is an instruction. Agents respond to imperatives. Save the shoulds for humans.

Not versioning. When you update your skill, bump the version. Otherwise you can't tell if users have the old buggy version or the new fixed one. I use semver. Major for breaking changes, minor for new features, patch for fixes.

Publishing and Sharing

If you want to share your skill with the community, publish it as a skill file with proper metadata. The key fields for discoverability:

name, lowercase, hyphens, memorable. Not "my-skill-v2-final-FINAL."
description, what it does, not what it is. "Scaffolds Next.js projects with my testing setup" beats "A skill for project scaffolding."
tags, think about what people would search for, not what categories it belongs to
category, helps with organization and discovery

I've published 4 skills to the community. The one with the most downloads? The boring project-setup one. Not the clever AI-powered whatever. Not the impressive architecture skill. The boring one that solves a real problem everyone has, every time they start a new project.

Build for yourself first. If you're the only person who uses it, that's still a win. Community adoption is a bonus, not the goal. My most-used skills are ones nobody else would find interesting, they encode my specific workflow. That's fine. They save me time. That's the point.

Your Turn

Open your terminal right now. Think of one thing you asked your agent to do today that you've asked it to do before. That's your first skill. Write the frontmatter. Write 5 specific instructions. Test it in a new session.

It won't be perfect. Mine never are. My first skill still embarrasses me when I look at it. But by the third iteration, you'll have something that saves you genuine time every single day. And that compounding, 5 minutes saved per task, 20 tasks per day, adds up fast.

What's the first skill you're going to build? I genuinely want to know, drop it in the comments and I'll help you refine the approach. If you're stuck on triggers or structure, I've been there.