Although artificial intelligence (AI) has made remarkable progress, a recent study has uncovered an unexpected limitation: AI struggles to read analog clocks—a task easily handled by most eight-year-old children. While AI has advanced in image and text generation, it still faces difficulties with fundamental tasks that rely on visual perception and numerical reasoning.
AI’s Persistent Weakness in Time Interpretation
The study, conducted by researchers at the University of Edinburgh, tested seven AI models on their ability to read time from analog clocks and answer calendar-based questions. The findings revealed that AI models consistently failed at reading clocks, getting the correct answer less than 25% of the time. The models performed particularly poorly when faced with Roman numerals or stylized clock hands.
For instance, when shown a clock displaying 4:00, OpenAI’s Chat GPT-o1 mistakenly guessed 12:15, while Claude-3.5-S responded with 11:35. These errors suggest that despite AI’s advancements in object detection and scene analysis, fundamental gaps remain in its ability to interpret visual and numerical information in real-world scenarios.
The Complexity of Clock and Calendar Interpretation
Reading an analog clock involves multiple cognitive steps, including recognizing clock-hand positions, differentiating between hour and minute hands, and applying numerical reasoning to determine the exact time. Similarly, calendar-based tasks require an understanding of date layouts and the ability to compute day offsets. These tasks, while seemingly simple for humans, demand a sophisticated combination of visual perception, numerical computation, and structured logical inference—areas where AI is still developing.
In the study, AI models performed slightly better on calendar-related tasks but still struggled with accuracy. When asked to determine which weekday Christmas falls on or the weekday of the 100th day of the year, the models produced incorrect answers approximately 20% of the time.
Closed-Source vs. Open-Source AI Performance
The study found notable differences in performance between closed-source and open-source AI models. Closed-source models, such as GPT-o1 and Claude-3.5, were more accurate when answering questions about widely recognized holidays, likely due to memorized training data. However, when faced with less common or arithmetic-heavy queries, such as identifying the 153rd day of the year, performance dropped significantly.
Open-source models, including MiniCPM, Qwen2-VL-7B, and Llama3.2-Vision, performed even worse on these queries, often producing results that were nearly random. This suggests that AI models struggle with applying logical reasoning to offset-based calculations, further limiting their reliability in real-world applications.
Implications for AI in Everyday Use
The inability of AI models to accurately interpret clocks and calendars raises concerns about their effectiveness in time-sensitive applications such as scheduling and event planning. Many AI research efforts focus on complex reasoning tasks, yet basic everyday functions remain a challenge. Without improvements in AI’s ability to integrate visual recognition with numerical reasoning, errors in scheduling and time management could become more frequent in AI-powered systems.
The study highlights the importance of improving AI’s core perception and reasoning skills. With AI becoming more prevalent in daily activities, closing these gaps will be essential to ensure accuracy and reliability in real-world applications.