Species of Mind: Developmental Architecture for Human and LLM Intelligence

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

We compared four large language models (ChatGPT, Grok, Gemini, DeepSeek) with humans in reference to tests of cognitive development addressed to relational integration, linguistic awareness, general and domain-specific reasoning, and cognitive self-awareness. We aimed to specify how LLMs compare with humans along several cognitive development hierarchies. Given their theoretical importance for intelligence, LLMs were also asked to indicate how Descartes’s Cogito applies to them and self-rate on aspects of Artificial General Intelligence (AGI). There was a huge divide between verbal and logico-mathematical tasks, on the one hand, and visuo-spatial tasks, on the other hand. All LLMs attained perfect linguistic and metalinguistic performance. ChatGPT and Gemini matched or exceeded university-level human performance in mathematics and causal reasoning, Grok performed slightly lower, and DeepSeek weakest overall. All LLMs underperformed in visual–spatial tasks or reasoning tasks when shown visually as presented to children. Performance recovered when these tasks were presented in a fashion allowing LLMs to employ an analytical approach to visual patterns, signifying their unique architecture. Self-concept ratings broadly mirrored performance profiles: ChatGPT and Grok rated themselves high in reasoning and low in imagination, Gemini inflated imagination by reframing it as linguistic creativity, and DeepSeek consistently underrated itself. Each LLM restated Descartes’s Cogito differently as a description of itself and denied having much AGI. Hence, LLMs display human-like “subjective” task scaling implying algorithmic or functional metacognition, which captures the architectural gap between symbolic reasoning and imaginative cognition, but they are modest in claiming top human intelligence. Overall, LLMs display "savant-like intelligence" rather than top expert intelligence. Implications for an integrated natural-artificial intelligence theory are discussed. Also, a developmental engineering model is sketched that would allow removing limitations of each LLM.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a Structured PREreview. You can view the complete PREreview at https://prereview.org/reviews/17604779.

    Does the introduction explain the objective of the research presented in the preprint? Yes The introduction, along with the abstract, clearly explains the objective of the research presented in the preprint: the study compared four large language models: ChatGPT, Grok, Gemini, and DeepSeek, with humans using cognitive development tests to assess how these LLMs align with several cognitive development hierarchies. The primary aim was to examine the LLMs' cognitive profile and performance against the architecture and development of the human mind across different age periods, from early childhood to early adulthood. The specific cognitive processes addressed by the tests included relational integration, metalinguistic awareness, and problem solving across various forms of reasoning (such as deductive, inductive, analogical, categorical, mathematical, spatial, and social reasoning), as well as self-representation in all these domains. Furthermore, the LLMs were prompted to indicate how Descartes's Cogito applies to them and to self-rate on aspects of Artificial General Intelligence (AGI), emphasizing their theoretical importance for intelligence
    Are the methods well-suited for this research? Highly appropriate The methods are highly appropriate for this research because the batteries were systematically designed within the framework of the Theory of Developmental Priorities (DPT) and its core mechanism, SARA-C, enabling a structured comparison of LLM cognition with human developmental hierarchies. The chosen tests, including the Comprehensive Test of Cognitive Development (CTCD), the Relational Integration Test, and tests of metalinguistic awareness, specifically targeted critical cognitive functions such as relational integration, domain-specific reasoning, and abstraction, covering developmental levels from rule-based thought up to epistemic awareness. A key element of the methodology was its adaptability, as demonstrated by the necessary shift in presentation format for visual-spatial tasks from PDFs to screenshots and verbal descriptions; this crucial adjustment accommodated the architectural limitations of the LLMs, such as their underperformance in visual tasks and instances of "aphantasia," thereby allowing them to employ an analytical approach to complex patterns. Lastly, the inclusion of the cognitive self-concept inventory and philosophical questions about Descartes's Cogito and AGI characteristics provided a unique avenue for probing LLMs' algorithmic metacognition, yielding self-representation profiles that closely mirrored their objective performance and thereby providing insight into the boundary between synthetic and conscious cognition.
    Are the conclusions supported by the data? Highly supported The conclusions are strongly supported by the data, which extensively compares the cognitive profiles and self-representations of four large language models (LLMs) against human developmental hierarchies across various tests. The finding that LLMs display mastery in symbolic inference but deficits in embodied cognition is empirically validated by their perfect performance on linguistic awareness tasks and their high attainment, with ChatGPT and Gemini matching or exceeding university student levels in mathematical and causal reasoning. Additionally, this conclusion is reinforced by the evidence that all LLMs underperformed significantly in visual-spatial tasks and relational integration tests when items were presented visually, a difficulty that resolved dramatically when the tasks were reformatted in a symbolic or verbal medium, signifying their unique, non-perceptual architecture. Moreover, the conclusion about algorithmic metacognition is substantiated by the observed structural convergence, with LLMs' self-concept ratings closely mirroring their objective performance profiles across domains, as shown by their accurate low self-ratings in visual-spatial ability corresponding to their actual weaknesses. Lastly, the rejection of Cartesian selfhood in favor of a computational "Cogito" is directly evidenced by the LLMs' philosophical statements, where they restated the maxim to emphasize processing or system function rather than existential being (e.g., "I process, therefore I function"), and consistently assigned extremely low overall AGI possession percentages (0% to 20%), despite their objective g-based scores placing two models within the superior human IQ range.
    Are the data presentations, including visualizations, well-suited to represent the data? Highly appropriate and clear The data presentations, encompassing numerous tables and figures, are well-suited to represent the complex comparative and structural data generated by the research. Tables are systematically employed to quantify the core results, such as displaying the mean performance of the four large language models (LLMs) compared to multiple human age groups across various Specialized Capacity Systems (SCSs) within the Comprehensive Test of Cognitive Development (CTCD). Crucially, the data presentation captures necessary methodological details, illustrating the sharp contrast in LLM performance on the Relational Integration Test across different input conditions; raw versus screenshot which is central to understanding their unique architecture. Furthermore, figures are essential both for articulating the underlying theoretical framework such as the SARA-C mechanism and the mind mirror model, and for visualizing complex structural findings; for instance, Figure 4 effectively maps the LLMs' subjective self-concept ratings directly against their objective CTCD performance (scaled 1–7), providing empirical support for the conclusion regarding algorithmic metacognition. The use of figures to illustrate Structural Equation Modeling results (Figure 4A, Figure 4B) supports the complex conclusion regarding the structural convergence of performance and self-representation factors between humans and LLMs, while other tables and figures categorize domain differences in performance, SARA-C levels, and self-ratings on AGI characteristics.
    How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research? Very clearly The authors are highly clear in discussing, explaining, and interpreting their findings, anchoring them within the comprehensive theoretical framework of the Theory of Developmental Priorities (DPT) and its core mechanism, SARA-C (Search, Align, Relate, Abstract, Cognize). The discussion systematically interprets the core findings, explaining the LLMs' perfect performance on symbolic tasks as confirmation of their mastery of symbolic inference corresponding to upper developmental levels, while interpreting deficits in visual-spatial tasks as signifying their unique architecture's reliance on language-based relational encoding rather than perceptual simulation. They interpret the self-representational accuracy, the strong alignment between LLM self-ratings and objective performance as evidence of algorithmic metacognition, which they explain computationally as the LLMs' capacity for entropy monitoring, serving as the algorithmic equivalent of human cognizance.
    Is the preprint likely to advance academic knowledge? Highly likely The preprint is highly likely to advance academic knowledge by offering a unified theoretical framework: the Theory of Developmental Priorities (DPT) and its core SARA-C mechanism to bridge biological and artificial intelligence, thereby integrating human and LLM cognitive processes within a single developmental hierarchy. The study introduces novel methodologies by being the first published research, to the authors' knowledge, to prompt large language models (LLMs) to self-rate their possession of Artificial General Intelligence (AGI) attributes on a predefined checklist and to philosophically restate Descartes's Cogito, ergo sum to fit their own nature as problem-solvers. This unique approach facilitates the articulation and interpretation of emergent phenomena in LLMs, such as "algorithmic metacognition," which captures the structural convergence between LLM performance and self-representation. Moreover, the authors clearly discuss the implications for developing future AI, sketching a developmental engineering model and a "Developmental Roadmap" for AGI that specifies concrete research and development targets, such as integrating perceptual grounding and implementing explicit cognizance loops, thus utilizing LLMs as "developmental laboratories" for testing cognitive-developmental theories.
    Would it benefit from language editing? No The preprint is structured clearly, explaining complex theoretical frameworks like DPT and SARA-C with academic precision, which ensures the thorough comprehensibility of the arguments and findings; while minor stylistic or phrasing choices, such as "In the sake of this aim" or dense technical descriptions, may appear, these issues do not result in grammatical errors or unclear expressions that fundamentally hinder the understanding of the research design, results, or overall conclusions, thus aligning with the statement that there may be minor language issues, but they do not impact clarity or understanding.
    Would you recommend this preprint to others? Yes, it's of high quality
    Is it ready for attention from an editor, publisher or broader audience? Yes, after minor changes The preprint would benefit from minor language editing and stylistic refinement to enhance its scholarly polish, primarily addressing slight awkwardness in phrasing and ensuring consistent clarity across highly technical descriptions. For instance, revising constructions like "In the sake of this aim" would improve grammatical flow, and a focused effort to smooth transitions or simplify dense explanations of mechanisms, such as the Structural Equation Modeling results or the SARA-C process, could marginally increase accessibility for a broader academic audience, although the current presentation does not fundamentally hinder comprehension. It's a great paper.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they did not use generative AI to come up with new ideas for their review.

  2. This Zenodo record is a permanently preserved version of a Structured PREreview. You can view the complete PREreview at https://prereview.org/reviews/17532294.

    Does the introduction explain the objective of the research presented in the preprint? Yes The introduction explains the objective clearly, stating through what parameters LLM models responses are compared to human intelligence, to prove whether Artificial intelligence works similarly in cognitive functions as to human intelligence over different age groups of children. It gives an introductory outlook of how the human mind works and in what ways the cognitive abilities are compared with the LLM models, mentioning the defining parameters of cognitive development.
    Are the methods well-suited for this research? Highly appropriate The methods are well executed, wherein each test performed signifying one aspect of cognitive development, explains the aspect and how both human mind and LLM models respond to the aspect through the test given. Using tables and figures helps to interpret the findings of each test easily. Covering all the domains necessary to answer the objective of the research, including having a self representation test of cognitive abilities, helps to know in which aspect of cognitive development each LLM model is lacking, and how the LLM model reacts on self awareness. Thus, helping in giving valid conclusion to the research.
    Are the conclusions supported by the data? Somewhat supported Conclusions do explain the results obtained after doing the tests, explaining the findings of each LLM model, but what better could be done in each LLM model isn't explained that much. It also does not reflect on what the future research could be stressed upon considering current done research findings.
    Are the data presentations, including visualizations, well-suited to represent the data? Somewhat appropriate and clear The data presentations are well suited, using tables and figures now and then to explain the findings easily for reviewer was helpful. But, a better work can be done.
    How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research? Somewhat clearly The authors have explained the findings of this research clearly, stating how LLM models are better in mathematical tasks and casual reasoning than Human minds, but lack in visuo-spatial tasks. Also, explaining how LLM models and human mind work similarly in doing cognitive functions but have different origin pathways and understanding. But, the potential next steps for future research is not much explained in the conclusions.
    Is the preprint likely to advance academic knowledge? Highly likely Yes, the research is likely to advance in academic knowledge, explaining how Human mind and LLM models work in cognitive functions, who is better than the other in various tasks of casual reasoning, visuo-spatial learning, mathematical reasoning, etc, which I already illustrated before. Thus, helping in improving further LLM models in functioning and having the knowledge of LLM models in their functioning is crucial to be aware of any dangerous implications LLM models can have on Humans.
    Would it benefit from language editing? No There may be minor issues, but the research is well written and understood.
    Would you recommend this preprint to others? Yes, but it needs to be improved As, I stated before, the research has high value in academic knowledge advancement. But, to reflect upon future research advancements considering this research findings should have been mentioned in the Conclusions a little better.
    Is it ready for attention from an editor, publisher or broader audience? Yes, after minor changes Yes, as I stated before, minor changes are needed in the Conclusion to mention future research possibilities considering this research findings.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they did not use generative AI to come up with new ideas for their review.