How to Write AI Phone Screening Questions That Work
Four question types for AI phone screens: logistics, experience, behavioral, motivation. 8-12 questions, 10-15 minutes. Includes rubric framework and role examples.

TL;DR: Effective AI phone screens use four question types: logistics (eligibility verification), experience (quantifiable background), behavioral (situational problem-solving), and motivation (genuine interest). Target 8-12 questions in 10-15 minutes, completion rates drop below 60% past 15 minutes (Aptitude Research, 2025). Every question needs a three-tier scoring rubric (strong/acceptable/weak). Schmidt & Hunter (1998) found structured interviews predict job performance at r = 0.51; question design is the single largest factor in screening accuracy.
Why Question Design Matters More with AI
Human recruiters improvise, they rephrase, probe vague answers, and read between the lines. AI screening depends heavily on question quality to guide evaluation. A well-designed question set:
- Collects decision-relevant information mapped directly to hiring criteria
- Creates consistent experience so every candidate is assessed on the same dimensions
- Generates structured data recruiters can compare across the entire applicant pool
The Four Question Types
1. Logistics Questions
Verify basic eligibility. Typically yes/no or short-answer, auto-scored with high confidence. Place first, no point assessing skills if the candidate cannot meet basic requirements.
Examples:
- "Are you authorized to work in [country] without sponsorship?"
- "What is the earliest date you could start?"
- "This role requires weekend shifts on rotation. Can you accommodate that?"
2. Experience Questions
Quantify background against requirements. Produce numerical or categorical data AI systems score reliably. Be specific, "How many years in B2B SaaS sales?" produces scoreable data; "Tell me about your experience" produces rambling answers.
Examples:
- "How many years managing a team of five or more people?"
- "Which programming languages have you used professionally: Python, Java, or Go?"
- "What is the largest budget you have been responsible for?"
3. Behavioral Questions
Assess how candidates handle real situations. Reveal problem-solving, work style, and interpersonal skills. Pair each with a rubric defining strong, acceptable, and weak responses.
Examples:
- "Describe a time you met a tight deadline with limited resources. What did you do?"
- "Tell me about a time you disagreed with a manager's decision. How did you handle it?"
4. Motivation Questions
Gauge genuine interest and alignment. Filter out mass-applicants without real intent. Place near the end after the candidate has engaged with role specifics.
Examples:
- "What specifically about this role interests you?"
- "What is the most important factor when choosing your next employer?"
Question Mix by Role Type
| Role Type | Logistics | Experience | Behavioral | Motivation | Optimal Length | Source |
|---|---|---|---|---|---|---|
| High-volume (retail, warehouse, CS) | 40% | 30% | 20% | 10% | 6-8 min | Aptitude Research 2025 |
| Professional (marketing, finance, ops) | 20% | 30% | 30% | 20% | 10-12 min | Aptitude Research 2025 |
| Technical (engineering, data science) | 15% | 35% | 25% | 25% | 10-15 min | , |
| Healthcare/regulated | 40% | 30% | 20% | 10% | 10-12 min | , |
Designing Scoring Rubrics
Every question needs a rubric. Without one, AI defaults to surface-level analysis. With a clear rubric, accuracy approaches human inter-rater reliability.
Three-Tier Model
| Rating | Points | Definition |
|---|---|---|
| Strong | 3 | Ideal answer, specific keywords, metrics, or behaviors that directly map to job success |
| Acceptable | 2 | Meets minimum bar, define threshold clearly |
| Weak | 1 | Poor fit, common red flags for this question |
Example rubric: "How many years in project management?"
| Rating | Criteria |
|---|---|
| Strong | 5+ years with formal PM methodology (Agile, PMP) |
| Acceptable | 2-4 years, or informal PM experience in related role |
| Weak | Under 2 years, no direct PM experience |
Rubric Tips
- Anchor to job requirements, not ideal candidate fantasies. If 3 years is required, don't score 5 higher than 4.
- Include examples of strong responses to help AI calibrate.
- Avoid scoring on communication style unless communication is a core job requirement.
- Test rubrics internally, have team members answer questions and verify scores match expectations.
Common Mistakes
Too many questions. 8-12 questions targeting 10-15 minutes. Every additional minute reduces completion by 2-4 percentage points (Aptitude Research, 2025).
Jargon and ambiguous language. "Cross-functional stakeholder alignment" means different things to different people. Use plain language: "Have you worked with teams from other departments?"
Double-barreled questions. "Are you comfortable with remote work and do you have experience managing distributed teams?", two questions pretending to be one. AI cannot score a single response against two criteria.
Neglecting candidate experience. Questions represent your employer brand. Aggressive, confusing, or irrelevant questions damage perception. Review from the candidate's perspective.
Forgetting to update. Roles evolve. Review questions quarterly and whenever job requirements change. Track which questions predict downstream interview performance and replace low-signal ones.
Frequently Asked Questions
How many questions should an AI phone screen include?
8-12 questions, completable in 10-15 minutes. Under 8 minutes may not collect enough data; over 15 minutes sees significant drop-off (Aptitude Research, 2025). Prioritize questions that map directly to pass/fail hiring criteria.
Should candidates see questions beforehand?
Sharing question topics (not exact wording) improves response quality without compromising assessment validity. Candidates give more thoughtful answers when they can prepare. This is especially helpful for behavioral questions.
How do I write questions that avoid bias?
Focus every question on job-relevant criteria per EEOC Uniform Guidelines. Avoid questions about personal circumstances, cultural references, or educational pedigree unless directly required. Have a diverse group review questions before launch. Monitor scoring outcomes across demographic groups post-implementation.
Can AI accurately score open-ended behavioral questions?
AI scoring of behavioral questions works best with detailed rubrics. Define strong responses in terms of specific actions, outcomes, or competencies. Without a rubric, accuracy drops. With a good rubric, it approaches human inter-rater reliability (Schmidt & Hunter, 1998).
How often should I update screening questions?
Quarterly at minimum, or whenever role requirements change. Correlate screen scores with downstream interview performance. Questions that don't differentiate between candidates who advance and those who don't should be replaced.
Written by
Outhire Team