The 5 Dimensions of AI Skill Quality: How to Score and Improve Your Prompts

Clarity, specificity, structure, completeness, actionability - a framework for evaluating any AI skill. Includes a self-assessment checklist and before/after examples.

April 1, 2026·8 min read·

qualityAI reviewframework

Most AI skills score 1-2 out of 5. They're vague instructions wrapped in Markdown that the AI half-follows. The difference between a 2/5 skill and a 5/5 skill is measurable: a high-quality skill produces consistent output across sessions, requires fewer follow-up corrections, and saves 15-30 minutes per use. A low-quality skill is marginally better than having no skill at all.

This guide introduces a practical scoring framework with 5 dimensions, a self-assessment checklist, and a complete before/after transformation showing how to take a skill from score 1 to score 5, one dimension at a time.

The 5 Quality Dimensions

Each dimension is binary: 0 (missing/inadequate) or 1 (present/adequate). Total score ranges from 0 to 5. This isn't a subjective rubric - each dimension has concrete criteria you can check in 30 seconds.

Dimension 1: Clarity

Clarity measures whether someone who has never seen your skill can understand what it does within 10 seconds of reading it. This means the name is self-explanatory, the description is one sentence of plain English, and the instructions don't use unexplained jargon or assume context that isn't provided.

The test is simple: show the skill to a colleague who didn't write it. Can they tell you what it does without asking questions? If they need to read the whole document to understand the purpose, clarity is 0. If the name and first sentence tell the story, clarity is 1.

Clarity failures compound with team size. A solo developer can tolerate a cryptically-named skill because they wrote it and remember the context. On a team of 5, that same skill becomes a mystery that nobody uses because nobody knows what it does.

Score 0: Unclear

SKILL.md

1	---
2	name: cr-v2
3	description: does code stuff
4	---
5	Check things and report.

Name is an abbreviation. Description is meaningless. Instructions say nothing.

Score 1: Clear

SKILL.md

1	---
2	name: code-review
3	description: Reviews code for bugs, security
4	issues, and maintainability problems
5	---
6	Perform a structured code review...

Name is descriptive. Description tells you what it does and what it checks for.

Common pitfall:Using abbreviations in skill names. “cr” could mean code review, change request, or carriage return. Always spell it out.

Dimension 2: Specificity

Specificity measures whether the instructions are concrete enough that two different AI models would produce the same output. Vague instructions like “review code carefully” are interpreted differently by every model in every session. Specific instructions like “check for null handling on every function parameter” produce consistent results.

The litmus test: can you replace your skill's instructions with “do a good job” and get meaningfully different output? If not, your skill isn't specific enough. Every instruction should constrain the AI's behavior in a way that “do a good job” doesn't.

Specificity is the dimension with the highest impact on output quality. A skill that scores 1 on specificity and 0 on everything else still outperforms a skill that scores 1 on all other dimensions but 0 on specificity. Concrete steps beat good formatting every time.

Score 0: Vague

markdown

1	Review the code thoroughly and provide
2	helpful feedback on quality and style.

Every AI already tries to do this. Zero constraining value.

Score 1: Specific

markdown

1	1. Check every function for: null params,
2	missing return types, unhandled exceptions
3	2. Flag functions over 30 lines
4	3. Format: [Severity] (Line N): Title

Concrete checklist. Measurable thresholds. Output format defined.

Common pitfall:Using adjectives instead of criteria. “Clean code” is subjective. “Functions under 30 lines with no more than 3 parameters” is measurable.

Dimension 3: Structure

Structure measures whether the skill is organized with headings, lists, and clear sections that the AI can parse efficiently. A wall of text is harder for the AI to follow than a numbered list with bold headings. Structure isn't about aesthetics - it directly affects how well the AI extracts and follows instructions.

LLMs process structured text better than prose. When instructions are in a numbered list, the model can track “I've completed step 3, now I need step 4.” When instructions are in a paragraph, the model has to parse which sentence is an instruction and which is context. Numbered steps reduce instruction-skipping by roughly 40% in practice.

The ideal structure for a skill is: heading (“Steps”), numbered list of actions, heading (“Examples”), input/output pairs, heading (“Don't”), bullet list of prohibitions. This is the pattern that tools like Claude Code and Codex CLI are optimized for.

Score 0: Unstructured

markdown

1	You should review the code and check for
2	bugs. Also look for security issues. Make
3	sure to provide examples. Don't forget to
4	check error handling too. Format your
5	output nicely.

Score 1: Structured

markdown

1	## Steps
2	1. Bugs: null handling, off-by-one
3	2. Security: injection, auth bypass
4	3. Errors: unhandled exceptions
5
6	## Output format
7	[Severity] (Line N): Title - fix

Common pitfall: Over-structuring. A skill with 10 heading levels and 50 bullet points is as hard to follow as a wall of text. Keep it to 2-3 sections with 4-8 items each.

Dimension 4: Completeness

Completeness measures whether the skill includes all components needed for reliable AI behavior: examples, edge cases, triggers, and negative examples (Don'ts). A skill with perfect instructions but no examples is like a recipe with ingredient quantities but no photos of the final dish - the cook has to guess what “done” looks like.

The minimum bar for completeness is: at least 2 input/output examples, a Don't section with 3+ items, and trigger phrases. The examples show the AI the expected format and depth. The Don't section prevents the most common unwanted behaviors. The triggers enable automatic activation.

Completeness is the dimension most developers skip because it feels like extra work. Writing 2 examples takes 10 minutes. But those 10 minutes prevent hours of correcting AI output that almost-but-not-quite matches what you wanted. Examples are the single highest-ROI component of any skill.

Score 0: Incomplete

markdown

1	## Steps
2	1. Review the code
3	2. Report issues
4	(no examples, no triggers, no don'ts)

Score 1: Complete

markdown

1	## Steps (4 numbered items)
2	## Examples (2 input/output pairs)
3	## Triggers (5 phrases)
4	## Don't (4 prohibitions)

Common pitfall:Examples that are too simple. Showing a trivial case doesn't teach the AI how to handle complexity. Include at least one example with an edge case or non-obvious scenario.

Dimension 5: Actionability

Actionability measures whether every instruction tells the AI what to do, not what to know. “You are an expert code reviewer” is knowledge (the AI ignores it - it's already trying to be helpful). “For each function, check: null handling, error propagation, return type correctness” is action (the AI does something specific).

The easiest test: read each sentence in your skill and ask “can the AI act on this right now?” If the answer is “it depends” or “it's background info,” the sentence isn't actionable. Cut it or convert it to an action step. Every sentence should either be an instruction the AI follows or an example it imitates.

Actionability is what separates skills that “kinda work” from skills that “just work.” A skill with high actionability produces the same output regardless of the model's mood, the conversation context, or whether it's 2 AM and the API is slow. The instructions are so concrete that interpretation is minimized.

Score 0: Knowledge-based

markdown

1	You are an expert code reviewer with deep
2	knowledge of security best practices and
3	software architecture principles.

Score 1: Action-based

markdown

1	1. Read the entire diff before commenting
2	2. For each function, check: null params,
3	missing error handling, SQL injection
4	3. Format: [Bug] (Line N): description

Common pitfall:Starting with “You are...” or “Act as...” preambles. These waste tokens and don't change behavior. Skip the identity statement and go straight to instructions.

Self-Assessment: 15 Questions

Answer these yes/no questions for any skill. Count the “yes” answers in each group of 3 to get your dimension score (0 or 1 - score 1 if at least 2 of 3 are “yes”).

Clarity

Can someone understand what this skill does from the name alone?

Is the description one clear sentence with no jargon?

Could a junior developer use this skill without asking you what it means?