Summary: an early review of xAI’s Grok 4, a multi-agent, multi-modal artificial intelligence/large language model, based on the AI Research Podcast, with some limitations mentioned.


Grok 4: an early assessment from the AI Research Podcast


On 29 July, I came across the following assessment of Elon Musk’s new large language model (LLM), Grok 4. As indicated by its title,‘xAI built an AI team, not a chatbot inside Grok 4 heavy’s insane power’, the podcast is highly laudatory (I initially described it as ‘a quasi panegyric’), although the two AI Research Podcast reviewers address some of Grok 4’s weak points towards the end of their assessment.

As I have not yet used Grok 4, I cannot offer a first-hand opinion of the system, but other YouTube reviews also suggest that Grok 4 constitutes a significant improvement over the earlier version of Elon Musk’s LLM (Grok 3).

Below is a summary of the podcast, generated by Grok 3 from a transcript of the clip produced by another AI system:


Grok 4 Heavy, developed by xAI, is a ground-breaking artificial intelligence model distinguished by its massive 1.7 trillion parameter architecture and a hybrid modular design. Unlike monolithic models, it incorporates multiple specialised neural subsystems, each optimised for specific tasks such as coding, mathematical reasoning or linguistic comprehension. Its multi-agent architecture enables several AI agents to collaborate in parallel, sharing and verifying insights to tackle complex problems, making it exceptionally effective for tasks requiring deep reasoning.

Trained on a vast and diverse dataset, including the real-time Twitter (X) stream and high-quality texts (books, academic papers, code repositories), Grok 4 Heavy employs intensive reinforcement learning to prioritise logical and accurate reasoning. With a context window of up to 256,000 tokens (roughly 200,000 words), it excels at analysing lengthy texts, such as books or legal documents, while integrating real-time data through internet searches or external APIs.

In benchmark tests, Grok 4 Heavy performs impressively: it achieved a perfect 100% score on the American Invitational Mathematics Examination, far surpassing Grok 3’s 52%, and scored nearly 50% on the Humanity’s Last Exam, almost doubling competitors like Gemini (27%). It also excels in coding (75% on SWE-bench) and physics (87% on GPQA), demonstrating exceptional capability in technical and scientific domains. Its multi-modal capabilities, though limited, support image analysis and generation, with planned enhancements for audio and video understanding.

Grok 4 Heavy is integrated into X, Telegram (through a £230 million partnership), and is accessible via grok.com and mobile apps for iOS and Android. An API compatible with OpenAI’s SDKs facilitates developer adoption. It is transforming fields like coding (game development, code optimisation), education (personalised tutoring), research (literature synthesis) and productivity (document summarisation, real-time data management).

However, limitations persist: responses can be overly verbose, the multi-agent mode is slow (sometimes taking minutes) and its visual capabilities lag behind specialised models. Safety and ethical alignment, particularly given the raw Twitter data, remain contentious, though xAI has strengthened filters following early incidents. The high cost (£230/month for Heavy mode) and lack of transparency (no public technical documentation) are also barriers.

xAI plans rapid improvements, particularly in multi-modal capabilities (video, audio), efficiency (reducing latency), and is considering open-source variants. Long-term, the goal is to advance towards artificial general intelligence (AGI), with potential integrations into Tesla and SpaceX systems. Grok 4 Heavy marks a significant leap towards an AI intellectual partner, but its rapid development raises critical questions about responsible use and societal impact.




Source: the AI Research Podcast, 25 July 2025, https://youtu.be/lZuAiqtEano

Other links on Grok 4:

The presentation with Elon Musk on the day of the launch of Grok 4: https://x.com/i/broadcasts/1lDGLzplWnyxm

https://x.ai/news/grok-4

https://www.youtube.com/results?search_query="Grok+4"&sp=EgYIAxABGAI%253D


Lausanne, the above was published on the third day of the eighth month of the year two thousand and twenty-five.