NEWTrain a custom GPT Chatbot on YouTube videosTry Now

[AINews] not much happened today • ButtondownTwitterTwitter

buttondown.com

Updated on January 30 2025

Chapters

AI Twitter and Reddit Recap
DeepSeek Models and Misconceptions
Custom GPTs and Future AI Developments
Potential Fixes
AGI Breakthrough and Auto-download Links Controversy
Discussions on AI Models and Functionality
OpenRouter (Alex Atallah) General
Interconnects: Nathan Lambert Policy
Yannick Kilcher General Discussions
RTX Blackwell Architecture and GPU Discussions
Cohere Safety Modes Overview
Contextual, Strict Safety, and Turning Off Safety Mode
Interpreter

AI Twitter and Reddit Recap

The Twitter recap includes discussions on the advancements and performance of DeepSeek models, performance benchmarks, training costs and infrastructure, deployment platforms, community contributions, AI safety, risks, ethics, market reactions, investment in AI, and humorous interactions within the AI community. The Reddit recap for /r/LocalLlama covers confusion over DeepSeek R1 models and distillations.

DeepSeek Models and Misconceptions

The section discusses the challenges of naming confusion around DeepSeek models like Qwen 2.5 and Llama 3.3, which are often perceived as the full R1 model. It highlights misinformation on platforms like YouTube and TikTok, where creators falsely claim to run DeepSeek locally. The technical distinction between distillation and fine-tuning is emphasized, clarifying that the 'distilled models' are actually just fine-tuned versions of Qwen 2.5 and Llama 3.3. The author expresses frustration over repeated explanations needed due to common misconceptions. Additionally, OpenAI accuses China's DeepSeek of using its models, sparking concerns of intellectual property theft. The discussion also delves into the CEO of DeepSeek advocating for a sustainable competitive moat through a strong team and organizational culture. There are comparisons between DeepSeek's use of PTX versus CUDA, potential bans on DeepSeek in the US, and criticisms of OpenAI's approach in handling foreign tech competition.

Custom GPTs and Future AI Developments

The section discusses the community's discovery of using zero-width space characters to manipulate GPTs, as well as issues with Custom GPTs in handling links reliably. There are rumors about upcoming AI models Grok3 and O3-mini, with O3-mini promising faster speeds than its predecessor. Additionally, the section touches upon DeepSeek's integration with OpenWebUI and its advancements in reasoning capabilities. The community also explores the potential of Qwen2.5-VL for OCR tasks and discusses Asynchronous Federated Learning approaches. Finally, there are highlights about DeepSeek's daily query limit increase in the Perplexity AI Discord, as well as a debate on the usage of Java SDK versions and Alibaba's potential new AI model in the Aider (Paul Gauthier) Discord.

Potential Fixes

This section highlights various developments and discussions across different Discord channels. From proposed fixes in Softmax variations and Deep RL concerns to the training of massive models like DeepSeek and Qwen2 VL, the community is abuzz with advancements and controversies. From the upcoming O3-mini's speed enhancement to the cost of training models like Claude 3.5, the section covers a wide range of topics. Additionally, it delves into neuroscience projects, new text generation techniques like 'min-p' sampling, and debates on model generalization. The Discord channels also discuss GPU improvements, backend solutions, and the integration of AI models into various projects. Furthermore, the section talks about the stability of AI models, challenges faced during installations, and collaborative efforts in improving language models. Lastly, it touches upon the development of human-AI interfaces, handling errors and potential features in new AI models, and the community's exploration of different AI applications and tools.

AGI Breakthrough and Auto-download Links Controversy

A member in the Unsloth AI Discord channel shared insights on an AGI breakthrough related to money and evolution, and provided a link to a paper titled Cybergod. They summarized their findings concisely. Another discussion arose regarding auto-download links being labeled as 'evil' by a member, sparking a humorous reaction from another member. The channel had additional discussions on various topics including memory requirements for DeepSeek R1, training model issues, support for Qwen2.5-VL, and manipulation of model parameters for efficiency.

Discussions on AI Models and Functionality

This section covers various discussions related to different AI models and their functionalities. Users are comparing DeepSeek with OpenAI models, expressing concerns over censorship in AI, proposing the use of multiple AI models for comprehensive answers, highlighting limitations in creative writing with GPT models, and discussing the implementation of real-time functionality with AI assistants. Additionally, conversations revolve around invisible characters to avoid link formatting, issues with memory and context in GPT models, contradictions in GPT responses, challenges with user memory recognition, and interest in solving specific riddles with AI models. The section also delves into the performance and requirements of different AI models, exploring hardware requirements, model loading issues, and handling CSV data with language models.

OpenRouter (Alex Atallah) General

Several users in the OpenRouter general channel reported limitations with the performance of the DeepSeek v3 model and translations for languages like Polish, highlighting incorrect results and lacking context. Concerns were raised about the pricing structure of OpenRouter, with users finding it expensive and suggesting lower fees on API requests. Users expressed interest in integrating image generation capabilities into the platform for enhanced functionality. Discussions also revolved around device and communication issues with models, concerns about DeepSeek R1 model output inconsistencies, and recommendations for effective translation models like Grok and Claude.

Interconnects: Nathan Lambert Policy

DeepSeek IP Concerns

AI Czar David Sacks suggested that DeepSeek appears to have used a technique called distillation to extract knowledge from OpenAI's models.
DeepSeek reportedly trained on a significant amount of ChatGPT tokens, prompting investigations from Microsoft and OpenAI.

ChatGPT Influence on Llama's Reasoning

Discussions emerged about whether Llama is influenced by ChatGPT, with suggestions that its output may resemble ChatGPT itself due to distillation.
Emphasis on ChatGPT's stylistic influence if substantial distillation occurred.

Discussions on Inference Costs

Doubts raised about DeepSeek's potential use of the Ascend 910b due to hardware limitations like memory capacity and processing power.
Concerns whether alternatives like the H800 would be more suitable for their needs.

White House Export Restrictions Considered

Reports surfaced that White House might expand export restrictions to include Nvidia H20's.
Reflects broader concerns about technological dominance and security.

Skepticism Towards DeepSeek's Capabilities

Prevalent skepticism regarding DeepSeek's abilities, with users expressing distrust and questioning motives behind allegations.
Acknowledgment of potential technical prowess despite lacking strong evidence.

Yannick Kilcher General Discussions

Softmax Variations, Deep Reinforcement Learning Challenges, RTX 5090 Release Discussions, Performance Metrics in AI, Community Engagement Issues

A member discussed a new approach to Softmax that might improve model performance, suggesting it could lead to new state-of-the-art results. The conversation included insights on how conventional Softmax can lead to noisy accuracy and suboptimal learning in certain scenarios.
Discussion highlighted that traditional Softmax may not be suitable for deep RL, as it can hinder effective learning and contribute to mode collapse. Members advocated for more flexible methods in reinforcement learning to enhance learning efficiency and model performance.
Chat participants noted that people were already lining up for the release of the RTX 5090, indicating high demand and excitement. This sparked conversations about consumer interest and market trends related to new GPU releases.
A member ran tests comparing the accuracy and loss of their Softmax variations, finding that while accuracy improved, stability suffered. Visual representations shared indicated that regular Softmax had an easier time finding simpler decision boundaries compared to the new methods.
Concerns were raised about the impact of certain members on the community atmosphere, with some feeling discouraged about returning due to the interactions. A call for better management of community discussions was suggested to keep the space welcoming for serious professionals.

RTX Blackwell Architecture and GPU Discussions

The RTX Blackwell architecture revealed a 27% increase in FP16/32 throughput compared to the 4090, with no performance boost from 5th gen Tensor Cores over the 4th gen for consumer cards.
Concerns were raised about the marketing of the 5th gen tensor cores for the RTX 5090, as it was noted they are similar to 4th gen cores.
Discussions revolved around the confusion over whether the RTX 5090 supports microtensor scaling and its sm version.
Members discussed running PyTorch on GB200s, availability of pre-built PyTorch containers, PR merging roles, and help with torch._scaled_mm API.
The announcement of RTX 5090's microtensor scaling was also a topic of discussion within the GPU MODE channels.

Cohere Safety Modes Overview

Safety Modes Overview Explained: Safety Modes provide users control over model behavior, effectively enhancing safety for interactions with the newest models while being inactive for older versions.
- The three modes are CONTEXTUAL, STRICT, and NONE; each mode adjusts the output restrictions accordingly.
Contextual Safety Mode Emphasized: The CONTEXTUAL mode aims to balance safety and creativity, providing users with useful suggestions and completing prompts in a meaningful way.

Contextual, Strict Safety, and Turning Off Safety Mode

Contextual mode allows wide-ranging interactions suitable for entertainment, creative, and educational applications. Strict Safety Mode enforces guardrails to avoid sensitive topics, ideal for general and enterprise use. Turning Off Safety Mode deactivates all safeguards, permitting unrestricted content output. Cohere's Safety Modes documentation offers detailed explanations and essential resources for implementing different safety modes effectively.

Interpreter

The Interpreter chunk discusses the capabilities of Goose, an open-source wonder that allows developers to contribute and customize freely, run locally for efficiency and control, handle tasks autonomously, and receive high praise from engineers. It also covers the proposal for an interactive learning tool for Tinygrad basics and the importance of code architecture in Tinygrad for structured learning and better engagement.

FAQ

Q: What is DeepSeek and what are some of the discussions surrounding its advancements?

A: DeepSeek is a model that has sparked discussions on topics such as performance benchmarks, training costs, deployment platforms, community contributions, AI safety, ethics, and market reactions. There have also been discussions on challenges related to naming confusion, misinformation on platforms, technical distinctions between distillation and fine-tuning, and comparisons between DeepSeek and other models.

Q: What are the concerns raised about DeepSeek potentially using OpenAI's models?

A: There have been concerns, including accusations from OpenAI, that DeepSeek may have used distillation to extract knowledge from OpenAI's models. Additionally, there are reports of DeepSeek training on a significant amount of ChatGPT tokens, prompting investigations from Microsoft and OpenAI.

Q: What are some of the discussions about inference costs concerning DeepSeek?

A: Discussions have touched on doubts about DeepSeek's potential use of certain hardware like the Ascend 910b due to limitations in memory capacity and processing power. There are concerns about whether alternatives like the H800 would be more suitable for DeepSeek's needs.

Q: What White House export restrictions are being considered in relation to AI models like DeepSeek?

A: There are reports suggesting that the White House might expand export restrictions to include Nvidia H20's, reflecting broader concerns about technological dominance and security.

Q: What are some of the skepticism expressed towards DeepSeek's capabilities?

A: There is prevalent skepticism regarding DeepSeek's abilities, with users expressing distrust and questioning the motives behind various allegations. Despite this skepticism, there is acknowledgment of potential technical prowess despite lacking strong evidence.

Q: What are some of the discussions related to Softmax variations, Deep Reinforcement Learning challenges, and the RTX 5090 release in the AI community?

A: Discussions in the AI community have covered topics such as new approaches to Softmax that could improve model performance, challenges with traditional Softmax in deep reinforcement learning, consumer interest in the RTX 5090 release, and concerns about specific features like microtensor scaling. Members have been advocating for more flexible methods in reinforcement learning to enhance efficiency and performance.

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!

Start For Free

Book a Demo