[AINews] Clémentine Fourrier on LLM evals • ButtondownSummary of Discord MessagesTwitterTwitter
Chapters
Clémentine Fourrier on LLM Evals
AI Twitter Recap
AI Discord Recap
Interaction Highlights in Various Discord Channels
Discord Channels Highlights
LLM Finetuning (Hamel + Dan) ▷ #workshop-2
LLM Finetuning Discussions
HuggingFace Discussions
LM Studio General Chat
Eleuther Research and Discussions
LLM Quality and Quantity of Data Discussion
CUDA MODE ▷ rocm (6 messages)
Interconnects
Discord Channels Highlights
Clémentine Fourrier on LLM Evals
Clémentine Fourrier, a key figure behind Huggingface's Open LLM Leaderboard, recently shared her thoughts on LLM Evals in a blog post. The post discusses three main ways of evaluation: Automated Benchmarking, Humans as Judges, and Models as Judges. Each method has its advantages and challenges when it comes to evaluating models and preventing regressions. Evaluations serve as a way to rank models and track progress in the field.
AI Twitter Recap
The AI Twitter Recap section provides updates on various topics related to NVIDIA's earnings and stock performance, Mistral AI model updates, Meta's Llama and commitment to open source, Anthropic's Constitutional AI, Google's AI announcements and issues, open source debates and developments, AI safety and regulation discussions, emerging AI architectures and techniques, AI benchmarking and evaluation, emerging applications and frameworks, compute trends and developments, AI-generated voices and identities, miscellaneous AI news and discussions.
AI Discord Recap
A summary of summaries of summaries related to AI developments discussed in various Discord channels. Key highlights include technical discussions on model performance optimization and new releases, fine-tuning strategies and challenges, open-source AI innovations and collaborations, AI API integrations and community efforts, GPU optimization and technical workshops, and insights into specific AI models like Gemini, Mistral, and AlphaFold. The discussions also touch on topics like deep learning frameworks, diverse AI applications, and the use of AI for query support and language-specific tasks. Community members share resources, external links, and engage in debates around the future of AI technology and its impact on different sectors.
Interaction Highlights in Various Discord Channels
- Pythia's Pocketbook: Discussions in the Eleuther Discord included the cost estimation for training Pythia models, highlighting the bill for the largest model.
- Diverse Discussions in OpenAI Discord: Topics ranged from model preferences (YAML vs. JSON) to collaborations between GPT-4o and DALL-E 3, as well as issues with formatting in the OpenAI playground.
- Versatile Conversations in CUDA MODE Discord: Discussions covered topics such as GPU optimization workshops, CUDA function intricacies, and AI research spotlights.
- Varied Exchanges in OpenAccess AI Collective Discord: Highlights included challenges with GPU memory limitations, research on medical language models, and troubleshooting multi-GPU setups.
- Engagement Points in LAION Discord: Topics discussed encompassed content moderation challenges, user dissatisfaction with GPT4o, and technical debates on AI systems.
- Lively Dialogues in Latent Space Discord: Conversations included talks on scaling laws, Mistral model releases, and discussion on the open-source AI landscape.
- Innovation Showcased in OpenInterpreter Discord: Innovations like prompt management with VSCode and unique applications of hardware chips were shared among members.
- Technical Chats in tinygrad Discord: Members deliberated on mathematical approximations, range reduction techniques and their implications, and examined ShapeTracker limitations.
- Community Welcomes and Multilingual AI in Cohere Discord: Cohere Discord saw warm welcomes to global creatives, discussions on interacting with AI, and the launch of Aya 23 multilingual models.
Discord Channels Highlights
LangChain AI Discord
- Members discussed GraphRAG, PySpark, Pinecone challenges, and upcoming events and benefits related to LangSmith for AI drug discovery.
DiscoResearch Discord
- Mistral-7B enhancements and community approval, new Mistral-7B instruct version, and Eldar Kurtic's tweet mentioned.
MLOps @Chipro Discord
- Discussions on GaLore, InRank, event sync, ImageMAE paper, and community appreciation for the AI field.
LLM Finetuning (Hamel + Dan) ▷ #workshop-2
Workshop Discussions
- Automate Standup Transcripts with Ticket Updates: Proposal to read standup meeting transcripts and update tickets based on mentioned statuses with fine-tuning.
- Custom Stop Sequences to Prevent Jailbreaks: Discussion on using custom stop sequences instead of common tokens to resist jailbreak attempts.
- Lightweight Text Tagging and Entity Generation Models: Suggestions on lightweight LLM projects for text tagging, classification, generating training data, and creating cool names for Python packages.
- Prompt Injection Protections: Highlighted challenges in preventing jailbreaks and resources for prompt injection protection tools.
- EDA Assistant and Suggestions: Projects for chatbots to aid in exploratory data analysis of time series data and processing EDA outputs into actionable steps.
LLM Finetuning Discussions
LLM Finetuning (Hamel + Dan) ▷ #axolotl (80 messages🔥🔥):
- Llama 3 Fine-Tuning Issues: Users discussed troubleshooting issues while fine-tuning Llama 3, such as encountering a
NoneType
error and CUDA-related problems. Solutions included using Docker images and adjusting configurations. - Docker Solutions for Axolotl: Recommendations were made to use Docker containers to resolve setup issues, with links to Docker Hub and Jarvis Labs setups shared.
- BitsAndBytes GPU Support: Correction of CUDA paths in
.bashrc
was advised to resolve GPU support errors, suggesting cloud providers for smoother installations. - Axolotl on Different GPUs: Users noted issues with Flash Attention 2 on Turing GPUs, proposing alternatives like
sdp_attention
for broader model support. - Cache Management in Axolotl: Data caching challenges when re-running experiments were discussed, recommending renaming dataset files and updating configurations.
LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (96 messages🔥🔥):
- Complimentary GPU Optimization Event: An upcoming workshop on GPU optimization was announced, featuring speakers from OpenAI, NVIDIA, Meta, and Voltron Data, with resources shared for livestream details.
- Training Tips on A100s and H100s: General rules for training on A100 and H100 GPUs, including utilizing
bf16
and optimizing batch sizes, were discussed for improved performance. - VRAM Calculation Challenges: Issues with larger sequence lengths on an A6000 GPU were explored, suggesting mixed precision strategies and optimization for memory management.
- Paged ADAMW 8-bit Optimizer Discussion: Users shared experiences with the paged_adamw_8bit optimizer and its performance equivalent to normal Adam.
- Interest in Latest Optimizers: Experimentation with new optimizers like Sophia and Adam_LoMo was discussed, highlighting potential performance benefits.
LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (56 messages🔥🔥):
- Accelerate command issues resolved with installation tips: Troubleshooting steps were shared for running Accelerate commands, including resolving dependencies and utilizing CUDA through a detailed tutorial.
- Docker implementation for Axolotl on GPU VM: Solutions for running Axolotl on GCP deep learning VM with GPU were provided, involving Docker images and specific Docker run commands.
- Default parameter values in Axolotl config: Discussions on default parameter values within Axolotl's configuration and setup queries were exchanged among members.
- Challenges running Axolotl on Modal: Issues running Axolotl on Modal and solutions for mismatched builds and gated models were discussed, with potential solutions shared through a pull request.
- Preprocessing advice for LLM finetuning: Questions and advice on preprocessing datasets for LLM finetuning were addressed, with references to Axolotl's dataset preprocessing documentation for further insights.
HuggingFace Discussions
This section highlights various discussions and updates within the HuggingFace community, including innovative datasets, virtual try-on experiences, and technical issues. Members discuss topics such as model capabilities, training strategies, and the practical use of AI models. Additionally, members share resources on topics like reinforcement learning, code optimization, and advancements in natural language generation. The section also covers queries on model implementation challenges, programming language preferences, and the evolving landscape of quantum computing.
LM Studio General Chat
The general chat in LM Studio includes various discussions such as complaints about Llama 3 8k context performance, inquiries about Idefics 2.0 multimodal model compatibility, the impact of context length on performance, improvements in ONNX Runtime and GPU drivers, and helpful resources for LM Studio usage. Members also shared links to tutorials and resources, including a YouTube video on running LM Studio locally.
Eleuther Research and Discussions
LeanAttention seeks to outperform FlashAttention: A member shared an Arxiv link to a paper proposing LeanAttention, targeting optimizations beyond the computational phases handled by FlashAttention. Another member commented that it seems like a 'marginally better version' of FlashDecoding.
'Secret ingredient' controversy in benchmarking: A conversation revealed humorous remarks on using unauthorized sources for improvement. One member joked, 'The secret ingredient is crime,' alluding to unorthodox methods like using libgen to enhance performance on benchmarks such as MMLU.
Seeking EMNLP submission tips: A member asked for advice on submitting work to EMNLP and received feedback that it's a reputable conference for NLP/CL, comparable to ACL and NAACL. This exchange highlights the peer-support aspect of the community.
Debate on JEPA and AGI potential: Members discussed whether JEPA and the ideas in 'A Path Towards Autonomous Machine Intelligence' could lead to AGI. Key points of skepticism were the lack of scalability and economically important task solutions compared to LLMs, despite Yann LeCun's advocacy.
Concerns about non-AI generated data quality: In a debate on the future
LLM Quality and Quantity of Data Discussion
In a discussion regarding LLMs, members debated the quality and quantity of non-AI generated data. Concerns about redundancy and processing costs of video data were raised, while others highlighted the vast unused data sources and compute limits as the real constraints. This discussion sheds light on the challenges and considerations surrounding data inputs for LLM models.
CUDA MODE ▷ rocm (6 messages)
GitHub link remains stagnant:
A member shared a link to the flash-attention GitHub repository, noting that this branch hasn't been worked on in 5 months and that backward isn't working.
GPU Dilemma: 7900xtx vs 3090:
A member is considering selling their 7900xtx in favor of getting another 3090 due to exhaustion with current performance issues.
4090 Struggles:
Another member shared their frustration, mentioning they have dual 4090s and said, 'yeah 💩 don't work.'
Triton Attention issues:
Triton fused attention worked but was slow, leading to the decision to ultimately give up on it.
Future Hope with MI300:
There is some hope that after the success of the MI300, a new gaming card that actually works might be released.
Link mentioned: GitHub - howiejayz/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to howiejayz/flash-attention development by creating an account on GitHub.
Interconnects
Nathan Lambert shared updates about his Shopify store, Interconnects, including humorous insights about logistics and product updates. Suggestions were made for more inclusive merchandise options, and a risky shirt design was adjusted. Support for fair labor practices was emphasized, and a quirky feature of Anthropic AI's experiment was shared. In another section, OpenRouter announced new features like support for Anthropic and Gemini models, a new roleplay model, and significant price drops for various models. Improvements for better performance and load balancing were also mentioned. Additionally, an AI roleplaying app, RoleplayHub, was launched with a generous free tier. In a different section, members discussed topics related to TikTok content creation, job search assistants, and batch inference for GenAI applications. The challenges of maintaining context in AI agents and insights on Mistral-7B v0.3 model were also shared.
Discord Channels Highlights
This section highlights some of the discussions and activities in various Discord channels related to AI. It covers topics such as preorder shipping status, Apple AirPods Pro teardown request, ESP32 chip details, M5Stack Flow UI software, bypassing macOS ChatGPT app waitlist, range reduction techniques, IBM's implementation, and more. Members share tips, experiences, ask questions, and discuss new advancements in the field of AI and machine learning.
FAQ
Q: What are the three main ways of evaluation discussed in Clémentine Fourrier's blog post?
A: The three main ways of evaluation discussed are Automated Benchmarking, Humans as Judges, and Models as Judges.
Q: What were some of the key highlights from the Discord channels related to AI developments?
A: Some key highlights include discussions on model performance optimization, new releases, fine-tuning strategies and challenges, open-source AI innovations, GPU optimization, diverse AI applications, and the use of AI for query support and language-specific tasks.
Q: What were some of the challenges discussed in the LLM Finetuning sections for different Discord channels?
A: Challenges discussed included troubleshooting issues while fine-tuning models, suggestions for using Docker containers for setup, GPU support errors, VRAM calculation challenges, and experimentation with new optimizers.
Q: What were some of the topics covered in the Workshop Discussions related to AI?
A: Topics covered included automating standup transcripts, using custom stop sequences for preventing jailbreaks, lightweight text tagging and entity generation models, prompt injection protections, and projects for chatbots aiding in exploratory data analysis.
Q: What were some of the issues and discussions related to AI models and GPU optimization in the various Discord channels?
A: Discussions covered issues like GPU optimization workshops, CUDA function intricacies, challenges with GPU memory limitations, cache management in model training, and discussions on utilizing different GPU models for training AI models.
Get your own AI Agent Today
Thousands of businesses worldwide are using Chaindesk Generative
AI platform.
Don't get left behind - start building your
own custom AI chatbot now!