2025-11-08
AI Models Fail Miserably at This One Easy Task: Telling Time
This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.The rapidly advancing abilities of AI have left many people worrying. But don’t fret quite yet: If you can read an analog clock correctly, you are still outperforming AI in that regard.AI models that are capable of analyzing different types of media in the form of text, images and video—called multimodal large language models (MLLMs)—are gaining traction in various applications, such as sports analytics and autonomous driving. But sometimes, these models can fail at what seems like the simplest of tasks, including accurately reading time from analog clock. This raises questions of which factors of image analysis, exactly, are these models struggling with. For example, when it comes to reading traditional clocks, do the models struggle to discern between the short and long hands? Or struggle to pinpoint the exact angle and direction of hands relative to the numbers? The answers to these seemingly trivial questions can provide critical insights into the major limitations of these models. Javier Conde, an assistant professor at the Universidad Politécnica de Madrid, and colleagues at Politécnico di Milano and Universidad de Valladolid, sought to investigate these limitations in a recent study. The results, published 16 October in IEEE Internet Computing, suggest that if a MLLM struggles with one facet of image analysis, this can cause a cascading effect that impacts other aspects of its image analysis. How Well Can AI Tell Time?First, the research team constructed a large dataset of publicly available images of analog clocks, which collectively displayed more than 43,000 indicated times, and tested the ability of four different MLLMs to read the times in a subset of images. All four models initially failed to tell time accurately. The researchers were able to boost performance of the models by training them with an additional 5,000 images from the dataset and testing the models again, using additional images they hadn’t seen before. However, the models’ performance dropped again when tested against a completely new collection of clock images. The results touch on a key limitation of many AI models: They are good at recognizing data they are familiar with, but often fail to recognize new scenarios they have not yet encountered in their training data. In other words, they often lack generalization. Conde and his colleagues wanted to dig deeper into what makes it so difficult for MLLMs to tell time. If the problem is related to the model’s sensitivity to the spatial directions of a clock’s hands, then further fine-tuning could address this limitation—simply expose the model to more data and then it will become better at the task at hand. In a series of experiments, they created new datasets of analog clocks, either with distorted shapes or altered the appearance of the clock hands, for example by adding arrows to the ends. “While such variations pose little difficulty for humans, models often fail at this task,” Conde explains, citing Salvador Dalí’s famous painting of distorted clocks, The Persistence of Memory. While humans can decipher the time of the warped, melting clocks, MLLMs struggle to tell the time of similarly warped clocks.The results show that MLLMS struggle to pinpoint the spatial orientation of the clock hands, but struggle even more so when the clock hands have a unique appearance (for example, arrows on their tips) which the model hasn’t been extensively exposed to. However, these issues were not exclusive from one another: Through additional experiments, the researchers found that if the MLLMs made an error in recognizing the clock hands, this in turn resulted in greater spatial errors. “It appears that reading the time is not as simple a task as it may seem, since the model must identify the clock hands, determine their orientations, and combine these observations to infer the correct time,” Conde explains, noting that the models struggle to process these changes simultaneously. In their study, the researchers underscored that, in more complex real-world scenarios such as medical image analysis or autonomous driving perception, these subtle yet critical failures could lead to more severe consequences.“These results demonstrate that we cannot take model performance for granted,” Conde says, emphasizing the need for extensive training and testing with varied inputs is necessary to ensure that models remain robust against the diverse scenarios they are likely to encounter in real-world applications.Many people anticipate that AI will continue to improve, and this in turn raises the question: will AI models eventually be able to accurately read traditional analog clocks? Only time will tell.