NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] DeepSeek #1 on US App Store, Nvidia stock tanks -17% • ButtondownTwitterTwitter

buttondown.com

Updated on January 28 2025

Chapters

AI Twitter Recap
AI Reddit Recap: OpenAI Employee's Reaction to Deepseek, DeepSeek R1's Coding Efficiency vs OpenAI O3, Debates on DeepSeek vs ChatGPT: A Censorship Perspective
Technology Development and Public Reaction
Innovative Models and Advanced Capabilities
Multi-Step Document Research Agents and AI Integration
Unsloth AI (Daniel Han) Discussions
Cascade Functionality and User Feedback
User Discussions on LLM Models and AI Applications
Aider Functionality Discussion
Challenges and Discussions on DeepSeek Services
Interconnects (Nathan Lambert) - Conversations on AI Trends and Developments
Interconnects (Nathan Lambert)
Eleuther General Discussions
MCP, NotebookLM, and GPU Discussion
Non-transformer vision models and optimization preferences
Cohere Legal Regulations, UI Feedback, Collaboration, and New Members
Discussion on Deepseek Algorithm Implementation
Social Networks & Footer

AI Twitter Recap

Claude 3.5 Sonnet presented the best of 4 runs of AI model releases and enhancements:

DeepSeek-R1 and V3 Efficiency: @teortaxesTex discussed how V3 trains a 236B model 42% faster than the previous 67B model. @nptacek highlighted DeepSeek-R1 requiring substantial GPUs with a MoE architecture having 671B parameters. @carpeetti praised DeepSeek-R1 for its chain-of-thought capabilities.
Qwen2.5 Models: @mervenoyann released Qwen2.5-VL, a multimodal model for images and videos with varying parameters. @omarasar0 detailed the vision capabilities of Qwen2.5 supporting long video understanding.
LangChain and LangGraph Integration: @LangChainAI shared tutorials on building AI chatbots using LangGraph and showcased applications like the DeFi Agent.

Under Compute and Hardware:

NVIDIA Impact: @teortaxesTex expressed concerns about training on a 32K Ascend 910C cluster, potentially affecting NVIDIA stocks. Users discussed DeepSeek-R1's inference speed optimizations leveraging NVIDIA H800 GPUs.
Compute Demand: @finbarrtimbers argued that compute demand will increase due to inference scaling, despite DeepSeek's efficiency. @cwolferesearch.

AI Reddit Recap: OpenAI Employee's Reaction to Deepseek, DeepSeek R1's Coding Efficiency vs OpenAI O3, Debates on DeepSeek vs ChatGPT: A Censorship Perspective

Theme 1: OpenAI Employee's Reaction to Deepseek

An OpenAI employee criticized data privacy concerns related to DeepSeek, highlighting its potential for local operation and open-source nature. Discussions revolved around censorship, model transparency, hardware accessibility, and alternative hosting services like TogetherAI.

Theme 2: DeepSeek R1's Coding Efficiency vs OpenAI O3

DeepSeek R1 was positioned as being 25x cheaper than OpenAI's o1 model and showcased superior coding performance compared to the unreleased o3. Concerns were raised about the credibility of performance claims and the marketing strategy of DeepSeek.

Theme 3: Debates on DeepSeek vs ChatGPT: A Censorship Perspective

A post discussing an octopus-inspired manipulator sparked discussions beyond the intended censorship focus. Users debated technological origins and ethical implications of AI in political censorship, showcasing the diverse nature of discussions in the AI subreddit.

Technology Development and Public Reaction

The technology highlighted in this section was developed by the University of Science and Technology of China, with testing also conducted in China. This clarification dispels any confusion about the technology's origin. The design and construction of the manipulator involve 3D printed pieces and operate using two threads, underscoring the importance of software in its functionality. There is interest in the software potentially being open source to enhance accessibility. The public reaction and humor surrounding the technology include discussions about robot tentacles and their potential applications in war and entertainment scenarios, showcasing the mixed feelings and creative speculation surrounding advanced robotics.

Innovative Models and Advanced Capabilities

DeepSeek R1: The DeepSeek platform faced frequent 503 errors and slow response times on its R1 model, attributed to high traffic and potential malicious activity. The platform limited new registrations and grappled with reliability concerns. On the other hand, BYOK (Bring Your Own Key) gained traction with discussions emphasizing mitigating rate limits and controlling expenses, albeit with a 5% fee on usage. Participants acknowledged the potential benefits of plugging in personal keys to avoid bottlenecks, while expressing concerns about cost management complexities.

Exciting New Models: Various innovative models were introduced, such as the Janus Pro model by DeepSeek, boasting advanced reasoning performance without high-end GPU requirements. Alibaba's Qwen2.5-VL-72B-Instruct showcased capabilities in long video comprehension and advanced visual recognition. Discussions also revolved around GPRO's advancements in PPO (Proximal Policy Optimization) and the usage of PydanticAI for structured output in generative apps. Notably, a member explored building a workout logging app that converts natural language to DSL, highlighting the challenges of voice-to-DSL pipelines.

Industry Insights: The section also delves into discussions around open-source initiatives such as the MCP (Model Context Protocol), Latent Space podcast updates, and advancements in deep learning tools. Furthermore, emerging technologies like HeyGen, Mosaic GPU DSL, and NotebookLM were explored for creating lifelike avatar videos, improving language models, and generating podcast summaries.

Cutting-Edge Developments: Members of various Discord channels shared their experiences, challenges, and achievements in the AI and GPU domains. They discussed topics like stabilizing AI models, optimizing GPU performance, and the evolution of AI hardware. Additionally, these communities showcased collaborative efforts, troubleshooting strategies, and innovative applications of AI technologies.

Multi-Step Document Research Agents and AI Integration

Community members praised the accessible structure of Presenter and the potential for it to evolve into advanced presentation-building agents. MarcusSchiesser released an open-source template for multi-step document research agents, inspiring by Google's approach. The template can handle complex research workflows and integrate analysis and referencing efficiently. Scaleport AI partnered with a travel insurer to automate claim estimation using LlamaIndex, resulting in significant time savings and showcasing AI-driven risk analysis for insurance processes. LlamaIndex now integrates with the DeepSeek-R1 API, providing support for deepseek-chat and deepseek-reasoner, enhancing synergy and offering boosted model usage through API-key onboarding.

Unsloth AI (Daniel Han) Discussions

Nailed the NLP Course with High Praise: A member excelled in an NLP course and AI text detection software, diving into LLMs. They found an information retrieval course invaluable for RAG systems.
Exploring SmoLlm Fine-Tuning: Members discuss SmoLlm fine-tuning and its value. One member successfully fine-tuned with unsloth and ran it with ollama.
Ollama's Default Temperature Revealed: Members share ollama's default temperature (0.8), aiding others to save time.
DeepSeek R1's Rise in AI Models: A video shares DeepSeek R1's advancements, challenging OpenAI O1. Members show interest in its capabilities.
Humorous Take on CUDA Memory: Light moments as one member humorously relates their body converting water and chips to GPU memory errors.
Unsloth AI (Daniel Han)'s Showcase: Discussions include model training, loss concerns, and peer collaboration in the AI community.
Dataset Format Discussion: Members highlight the significance of proper dataset formatting and its impact on model training outcomes.

Cascade Functionality and User Feedback

Cascade now offers the ability to search the web automatically or through URL input, allowing users to ask live queries or paste links for context. Users can utilize commands like @web and @docs to enhance usability across popular documentation sites. However, some users have reported performance issues with Windsurf, particularly with Cascade failing to execute commands properly. Changes in credit limits and deepseek integration have also sparked mixed feedback among users. While some users find Cascade helpful, others have faced issues with it ignoring rules and modifying code improperly. Additionally, extension compatibility issues with Windsurf have been noted, with users advised to try installing older versions of extensions with less strict IDE version requirements.

User Discussions on LLM Models and AI Applications

The section discusses various user conversations related to LLM models and AI applications. It includes inquiries about R1 Distillation Models, Llama 3 performance, Image Captioning with DeepSeek, building AI assistants, and tips for optimizing learning velocity. Additionally, there are discussions on human-like responses in LLMs, crisis communication strategies, LLM Live2D Assistant, Qwen2.5-VL model, LangChain's ChatPromptTemplate, and user experiences with Aider and DeepSeek API issues. The section provides insights into the ongoing developments and challenges in the AI landscape.

Aider Functionality Discussion

Deepseek API Issues

Users reported issues with the Deepseek API being down or slow, impacting responses in Aider, even when the status page showed it was operational. Several members highlighted the importance of checking API performance and key configurations.

Architect Mode in Aider

A member noted that in architect mode, responses from the editor model are not visible, only the response from the architect model is shown. Discussion continued on whether this could be a bug or a compatibility issue with the browser feature.

Difficulties in Switching Models in Aider

Users expressed frustration with temporarily switching models in Aider, noting that changing the main model also changed the editor model, creating workflow disruptions. Participants shared workaround strategies, including using specific commands to switch models for single prompts.

Token Usage Monitoring in Aider

Questions arose about how to track token usage for both the architect and editor models separately while using Aider. Clarification was sought on whether commands like /tokens --model sonnet would work as intended.

Integrating New Crates in Rust with Aider

A new user inquired about incorporating new Rust crates into Aider for better model contextual understanding and usage. The capability to add external libraries to the model's context was explored, highlighting Aider's integration with different programming languages.

Challenges and Discussions on DeepSeek Services

The section discusses the high demand on DeepSeek services, leading to challenges in maintaining stable performance. There is emphasis on the reliability issues experienced by OpenRouter, potentially making DeepSeek a preferred provider over R1. The section also mentions various links related to DeepSeek services and upcoming events, highlighting the continuous advancements in AI research and development.

Interconnects (Nathan Lambert) - Conversations on AI Trends and Developments

The Interconnects section covers various discussions related to AI trends and developments in the industry. Topics include the support for the Self-Play Paradigm in AI advancement, the importance of scaling synthetic data pipelines, the inclusion of Self-Determination Theory in AI discussions, questions about Deepseek's data sources, and critiques of media reporting practices. The section also delves into controversies surrounding Gary Marcus, the launch of Nous Psyche, discussions on nation states' relevance in AI, and the implications of AI breakthrough narratives. Additionally, there are conversations about DeepSeek's impact on the AI market, the launch of Qwen2.5-VL model, and investor sentiments post-launch. Public interest in AI and comparisons between AI and stock market dynamics are also highlighted. Discussions in various channels touch on aspects like Reinforcement Learning, Tulu series analyses, and market reactions to DeepSeek's releases.

Interconnects (Nathan Lambert)

Lectures and Projects

Discussion on Channel Inappropriateness: Concern raised about the appropriateness of a post in the channel, hinting at self-promotion.
Upcoming Job Board Concept: Anticipation for the launch of a job board-like platform to address community employment needs.

Posts

Community Buzz Around Deepseek: Interest in understanding implications of Deepseek within a finance firm.
Improving Chatbot Format: Focus on enhancing chatbot format for better community engagement.
Positive Community Feedback: Appreciation for a post fostering a supportive community atmosphere.
Welcome Message to New Members: Inclusion of new members through a welcoming approach to build connections.

Policy

China's AI Industry Boost: Announcement of a substantial investment in China's AI industry over the next five years.
US Struggles with Industrial Policy: Concerns over the US government's ability to implement effective industrial policies under the Republican party.
AI Race and Military Industry: Mobilization of resources for AI under the narrative of great power competition, particularly for military applications.
Historical Context of Industrial Policy: Insights into the CHIPS Act and challenges in GPU export controls.
Challenges Posed by the Jones Act: Discussion on regulatory barriers impacting US defense manufacturing competitiveness.

Eleuther General Discussions

Error Handling in Bolt

Users reported frequent errors and issues with Bolt, including rate limits and network errors, leading to frustration after consuming large amounts of tokens. Many have resorted to asking for help with migration issues or seeking professional assistance to resolve their problems.

Billing and Token Consumption

Users raised concerns about the high consumption of tokens when using Bolt, with some claiming to have spent millions on prompts with little progress. Discussions included refund possibilities after facing issues and highlighted the disparity in costs versus achieved outcomes.

Implementing User Roles

A user successfully created an app with multiple login roles including super admin and admin, overcoming complexities with Supabase policies. The implementation process was challenging but ultimately achieved a fully functional system.

Deployment with Netlify

Users inquired about connecting Bolt projects to custom domains via Netlify, clarifying that redeployment is necessary for updates to take effect. Emphasis was placed on changes made in Bolt not automatically reflecting on Netlify.

Connecting GitHub to Bolt

A user sought help on importing existing GitHub repositories into Bolt but encountered access issues with private repositories. Currently, users are unable to access private repos within Bolt, which is a limitation being addressed.

MCP, NotebookLM, and GPU Discussion

The discussions in this section cover various topics related to Model Context Protocol (MCP), NotebookLM, and GPU-related issues. In the MCP section, users discuss open source projects, user opinions on API management, and the launch of new tools like the MCP Variance Log Tool and KoboldCPP-MCP Server. In the NotebookLM section, users share workflows for HeyGen avatars, podcast integration experiences, and innovative mixing techniques. The GPU-related discussions focus on issues like PTX ASM segfaults, CUDA kernel loading errors, DeepSeek events, CUDA versions and compatibility, as well as NCCL timeout debugging in multi-node training setups.

Non-transformer vision models and optimization preferences

Most non-transformer vision models do not perform well with Adam optimization, preferring alternatives like Triangular or WSD learning rate schedules.
A wider discussion on the efficacy of various optimization strategies in vision tasks ensued.

Links mentioned:

Cohere Legal Regulations, UI Feedback, Collaboration, and New Members

The section discusses legal regulations affecting AI trade with Japan, with Cohere models not impacted and regulations slated for May 2025. Feedback on Cohere's UI suggests improvements like clearer button placements. Users seek partnerships for collaborative projects, emphasizing diversity over redundancy. The community welcomes new members, fostering a friendly and engaging atmosphere.

Discussion on Deepseek Algorithm Implementation

Members are inquiring about the implementation of the deepseek algorithm, with one member asking if anyone is reproducing it. Another member suggests it may refer to grpo, which was recently added in trl.

Social Networks & Footer

The section includes links to the podcast's Twitter account, website, and newsletter. It also mentions finding AI news elsewhere through social networks. The footer acknowledges the platform hosting the newsletter.

FAQ

Q: What is DeepSeek-R1 and its efficiency compared to previous models?

A: DeepSeek-R1 is a model highlighted for its efficiency in training a 236B model 42% faster than the previous 67B model.

Q: What are some features of the Qwen2.5 models released?

A: Qwen2.5 models include multimodal capabilities for images and videos with varying parameters, supporting vision capabilities and long video understanding.

Q: How are LangChain and LangGraph integrated for AI chatbots?

A: LangChain and LangGraph integration allows for building AI chatbots and showcasing applications like the DeFi Agent.

Q: What were the concerns raised regarding NVIDIA's impact on training?

A: Concerns were raised about training on a 32K Ascend 910C cluster impacting NVIDIA stocks, with discussions on DeepSeek-R1's inference speed optimizations using NVIDIA H800 GPUs.

Q: What were some themes discussed by OpenAI employees about DeepSeek?

A: Themes included data privacy concerns, coding efficiency comparisons with OpenAI's models, and debates on censorship perspectives between DeepSeek and ChatGPT.

Q: What were some issues faced by the DeepSeek R1 platform?

A: Issues included frequent 503 errors, slow response times, high traffic, and potential malicious activity, leading to reliability concerns and limiting new registrations.

Q: What were some exciting new models introduced alongside DeepSeek R1?

A: Exciting new models included Janus Pro by DeepSeek, Alibaba's Qwen2.5-VL-72B-Instruct, GPRO's advancements in PPO, and the usage of PydanticAI for structured output in generative apps.

Q: What were some discussions related to Bolt and its usage?

A: Discussions include error handling, billing concerns, user roles implementation, deployment with Netlify, and connecting GitHub repositories to Bolt.

Q: What were some insights shared regarding AI trends and developments in the Industry Insights section?

A: Insights included discussions on open-source initiatives like MCP, advancements in deep learning tools like HeyGen and NotebookLM, and the collaborative efforts in the AI and GPU domains.

Q: What challenges were faced in the deployment of Bolt projects?

A: Challenges included rate limits, network errors, token consumption concerns, implementing user roles, and connecting to custom domains via Netlify.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo