How to Write a SKILL.md That Actually Works: The Complete Guide
Most AI skills are vague, unstructured, and underperform. This guide breaks down the anatomy of an effective SKILL.md - from YAML frontmatter to trigger phrases - with real examples and a scoring rubric.
Most AI skills are written once, never reviewed, and silently underperform. They're vague (“help me write better code”), unstructured (one long paragraph), and missing the components that make AI agents actually follow instructions. The result: your AI assistant ignores your skill half the time, hallucinates the other half, and you blame the model instead of the prompt.
This guide changes that. We'll break down every component of an effective SKILL.md file - from YAML frontmatter to trigger phrases to negative examples - with real templates you can copy and a scoring rubric to measure quality. Whether you're writing your first skill or improving your 50th, this is the reference.
What Is a SKILL.md File?
A SKILL.md file is a structured Markdown document that teaches an AI coding tool how to perform a specific task. It lives in a tool's configuration directory - ~/.claude/commands/ for Claude Code, ~/.cursor/rules/ for Cursor, ~/.codex/skills/ for Codex - and gets loaded when the tool starts a session.
The SKILL.md pattern was introduced by Anthropic in December 2025 as part of the Claude Code agent skills specification. OpenAI adopted the same format for Codex CLI in early 2026, and OpenClaw uses it as its core plugin format. This means a well-written SKILL.md is portable across all three platforms - write once, works everywhere.
Think of a SKILL.md as a job description for your AI. A bad job description leads to bad work. A precise, structured one leads to consistent, high-quality output.
The Anatomy of an Effective Skill
Every great SKILL.md has 6 components. Miss one and the skill degrades. Here's what they are and why each matters:
1. YAML Frontmatter (Metadata)
The frontmatter is how tools index, discover, and categorize your skill. Without it, the skill is just a floating document with no identity. With it, the tool knows what the skill is called, what it does, and when to suggest it.
| 1 | --- |
| 2 | name: code-review |
| 3 | description: Reviews code for bugs, style issues, and security vulnerabilities |
| 4 | version: 1.2.0 |
| 5 | tags: [review, quality, security] |
| 6 | triggers: |
| 7 | - "review this code" |
| 8 | - "check my code" |
| 9 | - "find bugs" |
| 10 | --- |
Why version matters:Without a version number, you can't tell if you're running the version that fixed the false positive problem or the old one that flagged everything. Increment the patch number (1.2.0 → 1.2.1) for small fixes, minor (1.2.0 → 1.3.0) for new capabilities.
Why tags matter: Tags enable filtering and organization. When you have 30+ skills, finding the right one by name alone is painful. Tags like review, testing, documentation let you browse by function.
2. Clear, Numbered Instructions
This is where most skills fail. The difference between a 2/5 and a 4/5 skill is almost always the instructions section. Here's the principle: tell the AI what to DO step by step, not what to KNOW.
❌ Bad: vague knowledge dump
“You are an expert code reviewer. Review code carefully, looking for bugs and issues. Provide helpful feedback.”
Problem: every AI already tries to do this. This skill adds zero value.
✅ Good: specific action steps
“When reviewing code, follow these 6 steps in order: 1) Read the entire file before commenting. 2) List all potential null pointer issues. 3) Check for SQL injection...”
The AI follows a checklist. Output is consistent and thorough.
A good rule of thumb: if you can replace your skill text with “do a good job” and the output wouldn't meaningfully change, your instructions aren't specific enough.
3. Input/Output Examples
Examples are the single most impactful thing you can add to a skill. They transform vague instructions into concrete patterns the AI can replicate. Without examples, the AI interprets your instructions creatively (read: unpredictably). With examples, it follows the demonstrated pattern.
| 1 | ## Examples |
| 2 | |
| 3 | ### Input: |
| 4 | ```python |
| 5 | def get_user(id): |
| 6 | user = db.query("SELECT * FROM users WHERE id = " + id) |
| 7 | return user.name |
| 8 | ``` |
| 9 | |
| 10 | ### Expected review output: |
| 11 | **Line 2: SQL Injection** - User input `id` is concatenated directly |
| 12 | into the SQL query. Use parameterized queries instead: |
| 13 | ```python |
| 14 | user = db.query("SELECT * FROM users WHERE id = %s", (id,)) |
| 15 | ``` |
| 16 | |
| 17 | **Line 3: Null handling** - `user` may be None if the ID doesn't |
| 18 | exist. Add a null check before accessing `.name`. |
Notice the example doesn't just show the output - it shows the format (bold issue title, line reference, code fix). The AI will replicate this format for all future reviews.
How many examples? 2-3 is the sweet spot. One example shows a pattern but might be coincidence. Two examples confirm the pattern. Three examples with edge cases make it rock-solid. More than 5 wastes context window.
4. Trigger Phrases
Triggers tell the AI whento activate this skill. Without triggers, the skill sits in the context but never fires unless the user explicitly invokes it. With triggers, the AI automatically recognizes that “review this PR” means “use the code-review skill.”
| 1 | ## Triggers |
| 2 | Activate this skill when the user says any of: |
| 3 | - "review this code" |
| 4 | - "review this PR" |
| 5 | - "check for bugs" |
| 6 | - "what's wrong with this code" |
| 7 | - "security review" |
| 8 | - "find issues" |
Good triggers are natural language variationsof the same intent. Think about how you'd ask a colleague - those are your triggers. Include at least 4-6 variations.
5. Negative Examples (What NOT to Do)
This is the most underused component. Negative examples prevent the AI from doing things that seem reasonable but are actually wrong for your use case. They're guard rails.
| 1 | ## Don't |
| 2 | - Don't suggest complete rewrites unless explicitly asked |
| 3 | - Don't comment on formatting (that's what linters are for) |
| 4 | - Don't be vague - every comment needs a specific line reference |
| 5 | - Don't flag style preferences as bugs (e.g., tabs vs spaces) |
| 6 | - Don't review generated code (test files, migrations, etc.) |
Without “Don't” rules, the AI will happily suggest rewriting your entire file when you asked for a review, or flag every indentation inconsistency as a bug. Negative examples save you from “helpful” AI behavior that's actually annoying.
6. Reference Files (Advanced)
For complex skills, the instructions alone aren't enough. A code review skill works better if the AI also has your team's coding standards. A documentation skill improves with your doc template as context. Reference files solve this.
Place reference files alongside your SKILL.md in a references/ folder:
| 1 | code-review/ |
| 2 | ├── SKILL.md # The skill instructions |
| 3 | ├── references/ |
| 4 | │ ├── coding-standards.md # Your team's style guide |
| 5 | │ ├── security-checklist.md # OWASP-based checks |
| 6 | │ └── review-template.md # Output format template |
| 7 | └── scripts/ |
| 8 | └── pre-review-hook.sh # Optional automation |
The AI reads these files as additional context. Your code review skill now knows YOUR team's standards, not generic best practices. This is the difference between a generic AI reviewer and one that sounds like your senior engineer.
The Complete Template
Copy this template and fill in the blanks. It includes all 6 components:
| 1 | --- |
| 2 | name: your-skill-name |
| 3 | description: One sentence explaining what this skill does |
| 4 | version: 1.0.0 |
| 5 | tags: [tag1, tag2, tag3] |
| 6 | --- |
| 7 | |
| 8 | # Skill Name |
| 9 | |
| 10 | Brief explanation of when and why to use this skill. |
| 11 | |
| 12 | ## Steps |
| 13 | |
| 14 | 1. **First step** - what the AI should do first |
| 15 | 2. **Second step** - what comes next |
| 16 | 3. **Third step** - and so on |
| 17 | 4. **Final step** - how to conclude |
| 18 | |
| 19 | ## Examples |
| 20 | |
| 21 | ### Example 1: [scenario name] |
| 22 | |
| 23 | **Input:** |
| 24 | ``` |
| 25 | [example input] |
| 26 | ``` |
| 27 | |
| 28 | **Expected output:** |
| 29 | [example of correct AI behavior] |
| 30 | |
| 31 | ### Example 2: [edge case] |
| 32 | |
| 33 | **Input:** |
| 34 | ``` |
| 35 | [tricky input] |
| 36 | ``` |
| 37 | |
| 38 | **Expected output:** |
| 39 | [how the AI should handle this edge case] |
| 40 | |
| 41 | ## Triggers |
| 42 | |
| 43 | Activate this skill when the user says: |
| 44 | - "[trigger phrase 1]" |
| 45 | - "[trigger phrase 2]" |
| 46 | - "[trigger phrase 3]" |
| 47 | |
| 48 | ## Don't |
| 49 | |
| 50 | - Don't [common mistake 1] |
| 51 | - Don't [common mistake 2] |
| 52 | - Don't [common mistake 3] |
The 5 Quality Dimensions
Use this rubric to evaluate any skill. Each dimension is scored 0 (missing) or 1 (present), for a total of 0-5. Most skills start at 1-2. A good skill scores 4-5.
Clarity
Can someone who's never seen this skill understand what it does in 10 seconds?
The name and description are self-explanatory. Instructions use simple language. No jargon without explanation.
Specificity
Are instructions concrete enough that two different AI models would behave the same way?
Steps are numbered and unambiguous. 'Check for bugs' is vague; 'List all potential null pointer dereferences' is specific.
Structure
Is information organized with headings, lists, and clear sections?
Uses ## headings for sections, numbered lists for steps, code blocks for examples. Scannable in 30 seconds.
Completeness
Are edge cases, examples, and negative examples included?
Has at least 2 input/output examples, a Don't section, and handles at least one non-obvious edge case.
Actionability
Does every instruction tell the AI what to DO, not just what to KNOW?
'Review code carefully' = knowledge (useless). 'For each function, check: null handling, error handling, return type correctness' = action (useful).
Common Mistakes (and How to Fix Them)
After reviewing hundreds of skills, these are the 7 most common failures:
❌ Too vague
"Write good code" - the AI already tries to do this. Your skill adds zero value.
Fix: Replace with specific checklist: 'Check for: 1) null handling 2) error propagation 3) type safety 4) naming consistency'
❌ Too long
5,000+ word skills get truncated by context windows. The AI literally can't read the whole thing.
Fix: Keep skills under 1,500 words. Split large skills into focused sub-skills (code-review-security, code-review-style, etc.)
❌ No examples
The AI guesses instead of following a pattern. Output is inconsistent between sessions.
Fix: Add 2-3 input/output pairs showing exactly the format and depth you expect.
❌ No triggers
The skill sits in context but never activates. User has to manually invoke it by name.
Fix: Add 4-6 natural language trigger phrases.
❌ No negatives
The AI does 'helpful' things you don't want - rewrites code, flags formatting, comments on everything.
Fix: Add a 'Don't' section with 3-5 specific things to avoid.
❌ No versioning
One bad edit and your best skill is gone. No way to compare what changed.
Fix: Use version numbers in frontmatter. Better: use a skill manager like Praxl that versions automatically.
❌ Copy-paste syndrome
Same skill copy-pasted to 3 tool directories. They've already drifted apart.
Fix: Single source of truth. Edit once, deploy everywhere - manually or with a sync tool.
Real-World Example: Before and After
Let's take a real skill and improve it. Here's a typical “code review” skill someone might write on their first attempt:
Before (score: 1/5)
| 1 | --- |
| 2 | name: code-review |
| 3 | --- |
| 4 | You are an expert code reviewer. Review the code carefully and |
| 5 | provide helpful, constructive feedback. Focus on bugs, style, |
| 6 | and best practices. |
Problems: no description, no version, no tags, no examples, no triggers, no negatives. Instructions are vague. Score: Clarity 1, Specificity 0, Structure 0, Completeness 0, Actionability 0.
After (score: 5/5)
| 1 | --- |
| 2 | name: code-review |
| 3 | description: Reviews code for bugs, style, and security issues |
| 4 | with line-specific feedback and suggested fixes |
| 5 | version: 1.2.0 |
| 6 | tags: [review, quality, security] |
| 7 | --- |
| 8 | |
| 9 | # Code Review |
| 10 | |
| 11 | Perform a structured code review focusing on correctness, |
| 12 | security, and maintainability. |
| 13 | |
| 14 | ## Steps |
| 15 | |
| 16 | 1. **Read the entire file** before making any comments |
| 17 | 2. **Check for bugs**: null handling, off-by-one errors, |
| 18 | race conditions, unhandled exceptions |
| 19 | 3. **Check for security**: SQL injection, XSS, auth bypasses, |
| 20 | hardcoded secrets, unsafe deserialization |
| 21 | 4. **Check for maintainability**: naming clarity, function |
| 22 | length (>30 lines = flag), duplication, dead code |
| 23 | 5. **Format each issue as**: |
| 24 | - Bold title with severity (Bug/Security/Style) |
| 25 | - Line number reference |
| 26 | - Brief explanation |
| 27 | - Code suggestion for the fix |
| 28 | |
| 29 | ## Examples |
| 30 | |
| 31 | ### Example review comment: |
| 32 | **Bug (Line 42): Null pointer** - `user.getName()` will throw |
| 33 | if `findUser()` returns null on missing ID. |
| 34 | |
| 35 | Fix: `String name = user != null ? user.getName() : "Unknown";` |
| 36 | |
| 37 | ## Triggers |
| 38 | - "review this code" |
| 39 | - "check for bugs" |
| 40 | - "security review" |
| 41 | - "what's wrong here" |
| 42 | |
| 43 | ## Don't |
| 44 | - Don't suggest complete rewrites unless asked |
| 45 | - Don't comment on formatting (use linters) |
| 46 | - Don't review auto-generated code |
All 6 components present. Clear, specific, structured, complete, actionable. Score: 5/5.
Where to Put Your Skills
Each AI tool has its own directory:
| Tool | Skill Directory |
|---|---|
| Claude Code | ~/.claude/commands/ |
| Cursor | ~/.cursor/rules/ |
| Codex CLI | ~/.codex/skills/ |
| GitHub Copilot | ~/.github/copilot/ |
| Windsurf | ~/.windsurf/skills/ |
| OpenCode | ~/.opencode/skills/ |
| Gemini CLI | ~/.gemini/skills/ |
If you use more than one tool, you need the same skill in multiple directories. This is where “skill fragmentation” starts - and why tools like Praxl exist to manage the sync for you.
Next Steps
1. Write one skill using the template above. Start with your most common task - code review, test writing, or documentation.
2. Score it against the 5 dimensions. Be honest - most first drafts score 2/5.
3. Add examples - this is the single highest-impact improvement you can make.
4. Test it - use the skill 5 times and note where the AI deviates from what you expected. Fix the gaps.
5. Version it- if you're managing more than a handful of skills, use Praxl's AI review to get an automated quality score and improvement suggestions.
The best AI skills aren't written - they're iterated. Write v1, test, improve, version. The skill that scores 5/5 was a 2/5 three versions ago.
Manage your skills with Praxl
Edit once, deployed to every AI tool. Version history, AI review, team sharing.
Try Praxl free
Praxl