Phone Screening Benchmarks: AI vs Human Performance

Q: Are AI phone screens less accurate than human screens?

Comparably accurate for structured qualification evaluation. AI achieves r = 0.30-0.45 vs. r = 0.25-0.40 for human screens ([Schmidt & Hunter, 1998](https://psycnet.apa.org/doi/10.1037/0033-2909.124.2.262)). Humans retain an edge in assessing nuanced qualities like cultural fit.

Q: Do candidates prefer human or AI phone screens?

Satisfaction surveys show human screens at 72-80% vs. 65-78% for AI ([Gartner 2024](https://www.gartner.com/en/human-resources/topics/artificial-intelligence-in-hr)). However, candidates rate AI higher on convenience and scheduling flexibility.

Q: What completion rate should I expect from AI phone screening?

70-85% vs. 55-70% for human-scheduled screens (Aptitude Research, 2025). Primary driver: on-demand availability eliminates scheduling friction. SMS invitations outperform email by 15-20 points.

Q: How do AI and human screens compare on bias?

AI screens with proper design and regular auditing show reduced demographic disparities, primarily through consistent, job-relevant evaluation criteria. Neither method is bias-free. Regular adverse impact analysis is essential regardless of method ([EEOC guidelines](https://www.eeoc.gov/laws/guidance/uniform-guidelines-employee-selection-procedures-1978)).

Q: What is the cost difference?

AI: $2-$8/screen vs. human: $15-$38/screen ([SHRM 2024](https://www.shrm.org/topics-tools/research/human-capital-benchmarking)). At 2,000 annual screens, this represents $20,000-$50,000 in direct savings before time-to-fill and quality improvements.

AI phone screens cost $2-8 vs. $15-38 for human screens (SHRM 2024), with 70-85% completion vs. 55-70%. Eight benchmarks compared with sources.

TL;DR: AI phone screening outperforms human screening on cost ($2-8 vs. $15-38 per screen, SHRM 2024), completion rate (70-85% vs. 55-70%), throughput (unlimited vs. 6-10/day), and consistency. Human screening retains advantages in nuanced assessment, relationship building, and adaptive questioning. Predictive validity is comparable: r = 0.30-0.45 for AI vs. r = 0.25-0.40 for human (Schmidt & Hunter, 1998). The optimal approach is hybrid, AI for initial qualification, human for shortlisted candidates.

Benchmark Summary

Metric	Human Screening	AI Screening	Winner	Source
Completion rate	55-70%	70-85%	AI (+15-20 pts)	Aptitude Research 2025; Outhire platform data
Time to complete	2-5 business days	4-24 hours (median 3.2 hrs)	AI	Aptitude Research 2025
Cost per screen	$15-$38	$2-$8	AI (75-90% lower)	SHRM 2024
Evaluation consistency	Moderate (30-40% variability)	High (<5% variability)	AI	Schmidt & Hunter 1998
Candidate satisfaction	72-80%	65-78%	Human (slight edge)	Gartner 2024
Predictive validity	r = 0.25-0.40	r = 0.30-0.45	Comparable	Schmidt & Hunter 1998
Throughput capacity	6-10/recruiter/day	Unlimited concurrent	AI	LinkedIn Global Talent Trends 2024
Bias and fairness	Variable (unconscious bias documented)	Depends on design; auditable	Conditional	EEOC Uniform Guidelines

Benchmark Details

1. Completion Rates

Human: 55-70%. The primary drag is scheduling friction, 30-45% of scheduled screens require at least one reschedule (Aptitude Research, 2025).

AI: 70-85%. On-demand availability eliminates scheduling as a barrier. Candidates initiate at a convenient time and are more likely to complete.

2. Time to Complete

Human: 2-5 business days including scheduling back-and-forth, timezone coordination, and recruiter availability.

AI: Median 3.2 hours from invitation to completed screen (based on Outhire platform data, 2025-2026). 60% complete within 6 hours.

3. Cost Per Screen

Human: $15-$38 per screen accounting for recruiter time (20-30 min call + 10-15 min scheduling/notes/ATS updates) at $40-$55/hour fully loaded (BLS median + benefits).

AI: $2-$8 per screen. At 1,000+ annual screens, per-screen cost trends toward $2-4.

4. Evaluation Consistency

Human: Moderate. Interviewer ratings for the same candidate vary 30-40% depending on who conducts the screen (Schmidt & Hunter, 1998). Training improves consistency but doesn't eliminate variability.

AI: High. Identical questions, rubrics, and criteria applied every time. Variability is limited to inherent ambiguity in candidate responses.

5. Candidate Satisfaction

Human: 72-80% satisfaction. Candidates value human interaction and the ability to ask nuanced questions. Dissatisfaction stems from scheduling difficulties and rushed interviews.

AI: 65-78% satisfaction (Gartner 2024). Candidates appreciate convenience and flexibility. Satisfaction correlates strongly with conversation quality, systems with natural, adaptive dialogue score significantly higher.

Key insight: poorly executed human screens (rushed, distracted recruiters) score lower than well-implemented AI screens.

6. Predictive Validity

Human: r = 0.25-0.40 correlation with job performance. Experienced recruiters with structured training perform at the higher end. Unstructured screens correlate poorly.

AI: r = 0.30-0.45 correlation. AI's advantage comes from consistently applying validated criteria (Schmidt & Hunter, 1998 showed structured interviews at r = 0.51 vs. unstructured at r = 0.38). AI may miss nuanced signals experienced screeners detect.

7. Throughput Capacity

Human: Maximum 8-12 screens/day; practical capacity 6-8 when accounting for other responsibilities. Scaling requires headcount.

AI: Unlimited concurrent screens. 10 or 10,000 simultaneously without degradation.

8. Bias and Fairness

Human: Variable. Unconscious bias from name, accent, voice characteristics, and conversational style is well-documented (EEOC Uniform Guidelines). Training reduces but doesn't eliminate.

AI: Depends on design. Well-calibrated AI evaluating only job-relevant criteria shows reduced demographic disparities. But AI trained on biased data can perpetuate or amplify bias. The advantage is systematic auditability — bias can be measured, monitored, and corrected at the system level rather than the interviewer level.

Where AI Outperforms Humans

Speed and availability: 24/7 screening, sub-hour completion
Cost efficiency: 75-90% lower cost per screen
Consistency: Identical evaluation every time
Throughput: Unlimited capacity without quality degradation

Where Humans Outperform AI

Nuanced assessment: Reading motivation, fit, and potential between the lines
Relationship building: Creating candidate engagement and employer brand impressions
Adaptive questioning: Exploring unexpected but relevant conversational tangents

The Optimal Approach: Hybrid Screening

The data supports a hybrid model:

Stage 1: AI screen for all candidates. Capture structured data on qualifications, experience, logistics, and behavioral responses. Handles 80% of evaluation needed to identify top candidates.

Stage 2: Human screen for shortlisted candidates (top 20-30%). Recruiters evaluate cultural fit, motivation depth, and build relationships with candidates who've already demonstrated baseline qualifications.

Frequently Asked Questions

Are AI phone screens less accurate than human screens?

Comparably accurate for structured qualification evaluation. AI achieves r = 0.30-0.45 vs. r = 0.25-0.40 for human screens (Schmidt & Hunter, 1998). Humans retain an edge in assessing nuanced qualities like cultural fit.

Do candidates prefer human or AI phone screens?

Satisfaction surveys show human screens at 72-80% vs. 65-78% for AI (Gartner 2024). However, candidates rate AI higher on convenience and scheduling flexibility.

What completion rate should I expect from AI phone screening?