Hot data science related posts every hour. Chat: https://telegram.me/r_channels Contacts: @lgyanf
I suck at programming and I feel so bad
I failed an introductory programming exam (Python) at university and honestly, it made me feel really stupid and inadequate.
I come from a BA in pure linguistics in Germany and I had taken a programming course on Codecademy last year ( still during my BA), but after that, I hadn’t touched Python at all.
Plus, the course at my MSc was terribile, after covering functions it focused almost entirely on regex, which I had never worked with before.
On top of that, I had a lot of other exams to prepare for, so I barely studied and did very little practice. I do enjoy programming—I’ve gone over the “theory” multiple times—but I struggle to remember concepts and apply critical thinking when trying to solve problems. I lack hands-on experience. If you asked me to write even the simplest program, I wouldn’t know where to start.
I mean, at the exam I couldn’t even figure out, recall, how to invert a string or how to join 2 dictionaries…
I had problems in saving a file in Visual studio Code on a different laptop.
I felt so dumb and not suited for this path.
While, most of my colleagues were just great at programming and did fine at the exam.
It feels like I’m just memorizing code rather than truly understanding how to use it.
This whole experience has been pretty discouraging because I know how important programming skills are in this field—especially when there are people with computer science degrees who have been coding since high school.
So now I don’t know where to start. As I said I’ve read the theory multiple times ( how to join dicyionaries, what are functions and hoe they work etv..) bit then if you put me a concrete problem to solbe, even a very dumb one, i dont knkw where to star5t.
That said, I’m currently taking an NLP and ML course at university, which requires basic programming knowledge. So I was thinking of following a hands-on NLP course that also covers regex. That way, I could improve my programming skills while reinforcing what I’m studying now.
Or would it be better to start from the basics of Python again maybe going thru tutorials once again and focusing on practice ?
/r/LanguageTechnology
https://redd.it/1isgphw
NAACL 2025 Decision
The wait is almost over, and I can't contain my excitement for the NAACL 2025 final notifications!
Wishing the best of luck to everyone who submitted their work! Let’s hope for some great news!!!!!
/r/LanguageTechnology
https://redd.it/1i6sbwy
Deepseek’s AI model is ‘the best work’ out of China but the hype is ‘exaggerated,’ Google Deepmind CEO says
https://www.cnbc.com/2025/02/09/deepseeks-ai-model-the-best-work-out-of-china-google-deepmind-ceo.html
/r/deeplearning
https://redd.it/1ilw5bu
huawei's ascend 910c chip matches nvidia's h100. there will be 1.4 million of them by december. don't think banned countries and open source can't reach agi first.
recently the world was reminded about sam altman having said "it’s totally hopeless to compete with us on training foundation models." he was obviously trying to scare off the competition. with deepseek r1, his ploy was exposed as just hot air.
you've probably also heard billionaire-owned news companies say that china is at least a few years behind the united states in ai chip development. they say that because of this, china and open source can't reach agi first. well, don't believe that self-serving ploy either.
huawei's 910c reportedly matches nvidia's h100 in performance. having been tested by baidu and bytedance, huawei will make 1.4 million of them in 2025. 910c chips sell for about $28,000 each, based on reports of an order of 70,000 valued at $2 billion. that's about what nvidia charges for its h100s.
why is this such awesome news for ai and for the world? because the many companies in china and dozens of other countries that the us bans from buying nvidia's top chips are no longer at a disadvantage. they, and open source developers, will soon have powerful enough gpus to build top-ranking foundation ai models distilled from r1 at a very low cost that they can afford. and keep in mind that r1 already comes in at number 3 on the chatbot arena leaderboard:
https://lmarena.ai/?leaderboard
if an open source developer gets to agi first, this will of course be much better for the world than if one of the ai giants beats them there. so don't believe anyone who tells you that china, or some other banned country, or open source, can't get to agi first. deepseek r1 has now made that both very possible and very affordable.
/r/deeplearning
https://redd.it/1ihecl0
Why does the DeepSeek student model (7B parameters) perform slightly better than the teacher model (671B parameters)? D
This is the biggest part of the paper that I am not understanding - knowledge distillation to match the original teacher model's distribution makes sense, but how is it beating the original teacher model?
/r/MachineLearning
https://redd.it/1ie46nq
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1hq5o1z
The Great ChatGPT o1 pro Downgrade Nobody’s Talking About
Let’s talk about what’s happening with OpenAI’s $200/month o1 pro tier, because this is getting ridiculous.
Remember when you first got access? The performance was incredible. Complex analysis, long documents, detailed code review - it handled everything brilliantly. Worth every penny of that $200/month premium.
Fast forward to now:
Can’t handle long documents anymore
Loses context after a few exchanges
Code review capability is a shadow of what it was
Complex tasks fail constantly
And here’s the kicker: OpenAI never published specifications, disabled their own token counting tool for o1 pro, and provided no way to verify anything. Convenient, right?
Think about what’s happening here:
Launch an amazing service
Get businesses hooked and dependent
Quietly degrade performance
Keep charging premium prices
Make it impossible to prove anything changed
We’re paying TEN TIMES the regular ChatGPT Plus price ($200 vs $20), and they can apparently just degrade the service whenever they want, without notice, without acknowledgment, without any way to verify what we’re actually getting.
This isn’t just about lost productivity or wasted money. This is about a premium service being quietly downgraded while maintaining premium pricing. It’s about a company that expects us to pay $200/month for a black box that keeps getting smaller.
What used to take 1 hour now takes 4. What used to work smoothly now requires constant babysitting. Projects are delayed, costs are skyrocketing, and we’re still paying the same premium price for what feels like regular ChatGPT with a fancy badge.
The most alarming part? OpenAI clearly knows about these changes. They’re not accidental. They’re just counting on the fact that without official specifications or metrics, nobody can prove anything.
This needs to stop.
If you’re experiencing the same issues, make some noise. Share this post. Let them know we notice what’s happening. We shouldn’t have to waste our time documenting their downgrades while paying premium prices for degraded service.
OpenAI: if you need to reduce capabilities, fine. But be transparent about it and adjust pricing accordingly. This silent downgrade while maintaining premium pricing isn’t just wrong - it’s potentially fraudulent.
/r/LanguageTechnology
https://redd.it/1i56cmx
Guide to Making the Best Self Driving Dataset
https://medium.com/voxel51/how-to-make-the-best-self-driving-dataset-c2170cb47bff
/r/computervision
https://redd.it/1i1cki1
Looking for a CV group
Hi All,
I am looking for folks who are in computer vision/ ML space who might be interested in forming a small group to do weekly paper readings. One of my favorite things in grad school was being able to keep up to date with SOTA in CV/ML using research group meetings where folks would do a short form presentation, followed by discussion. My work is closely related to 3D computer vision and CV deep learning but I am not up to date with the latest and the greatest.
Alternatively, if there are groups or discords already out there, I would be happy to join them.
/r/deeplearning
https://redd.it/1htukqk
Looking for Good Cameras Under $350 for Autonomous Vehicles (Compatible with Jetson Nano)
Hi everyone,
I'm working on a project to build an autonomous vehicle that can detect lanes and navigate without a driver. For our last competition, we used a 720p Logitech webcam, and it performed decently overall. However, when the sun was directly overhead, we had a lot of issues with overexposure, and the camera input became almost unusable.
Since we are aiming for better performance in varying lighting conditions, we’re now looking for recommendations on cameras that would perform well for autonomous driving tasks like lane detection and obstacle recognition. Ideally, we're looking for something under $350 that can handle challenging environments (bright sunlight, low-light situations) without the overexposure problem we encountered.
It’s also important that the camera be compatible with the Jetson Nano, as that’s the platform we are using for our project.
If anyone here has worked on a similar project or has experience with cameras for autonomous vehicles, I’d love to hear your advice! What cameras have worked well for you? Are there specific features (like high dynamic range, wide field of view, etc.) that you’d recommend focusing on? Any tips for improving camera performance in harsh lighting conditions?
Thanks in advance for your help!
/r/computervision
https://redd.it/1hqeggo
If you were to start from scratch, how would you delve into CL/NLP/LT?
Hello!
I graduated with a degree in Linguistics (lots of theoretical stuff) a few months ago and I would like to pursue a master's degree focusing on CL/NLP/LT in the upcoming year.
I was able to take a course on "computational methods" used in linguistics before graduating, which essentially introduced me to NLP practices/tools such as regex, transformers and LLMs. Although the course was very useful, it was designed to serve as an introduction and not teach us very advanced stuff. And since there is still quite a lot of time until the admissions to master's programs start, I am hoping to brush up on what might be most useful for someone wanting to pursue a master's degree in CL/NLP/LT or learn completely new things.
So, my question is this: Considering what you do -whether working in the industry or pursuing higher education- how would you delve into CL/NLP/LT if you were to wake up as a complete beginner in today's world? (Feel free to consider me a "newbie" when giving advice, some other beginners looking for help might find it more useful that way). What would your "road map" be when starting out?
Do you think it would be better to focus on computer science courses (I was thinking of Harvard's CS50) to build a solid background in CS first, learn how to code using Python or learn about statistics, algorithms, maths etc.?
I am hoping to dedicate around 15-20 hours every week to whatever I will be doing and just to clarify, I am not looking for a way to get a job in the industry without further education; so, I am not looking for ways to be an "expert". I am just wondering what you think would prepare me the best for a master's program in CL/NLP/LT.
I know there probably is no "best" way of doing it but I would appreciate any advice or insight. Thanks in advance!
/r/LanguageTechnology
https://redd.it/1hk338l
D Best survey papers of 2024?
As an AI researcher who is starting out, I usually start by seeing survey papers related to a field, then creating a roadmap to further deep dive into my research topic. I am eager to see the sub's viewpoint of the best survey papers they came across in 2024.
/r/MachineLearning
https://redd.it/1hgwjqu
D The winner of the NeurIPS 2024 Best Paper Award sabotaged the other teams
Presumably, the winner of the NeurIPS 2024 Best Paper Award (a guy from ByteDance, the creators of Tiktok) sabotaged the other teams to derail their research and redirect their resources to his own. Plus he was at meetings debugging his colleagues' code, so he was always one step ahead. There's a call to withdraw his paper.
https://var-integrity-report.github.io/
I have not checked the facts themselves, so if you can verify what is asserted and if this is true this would be nice to confirm.
/r/MachineLearning
https://redd.it/1hctf36
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1h3u444
PyTorch implementation of Levenberg-Marquardt training algorithm
Hi everyone,
In case anyone is interested, here’s a PyTorch implementation of the Levenberg-Marquardt (LM) algorithm that I’ve developed.
GitHub Repo: torch-levenberg-marquardt
A PyTorch implementation of the Levenberg-Marquardt (LM) optimization algorithm, supporting mini-batch training for both regression and classification problems. It leverages GPU acceleration and offers an extensible framework, supporting diverse loss functions and customizable damping strategies.
A TensorFlow implementation is also available: tf-levenberg-marquardt
# Installation
pip install torch-levenberg-marquardt
/r/deeplearning
https://redd.it/1h4m51a
RT-DETRv2: Is it possible to use it on Smartphones for realtime Object Detection + Tracking?
Any help or hint appreciated.
For a research project I want to create an App (Android preferred) for realtime object detection and tracking. It is about detecting person categorized in adults and children. I need to train with my own dataset.
I know this is possible with Yolo/ultralytics.
However I have to use Open Source with Apache or MIT license only.
I am thinking about using the promising RT-Detr Model (small version) however I have struggles in converting the model into the right format (such as tflite) to be able to use it on an Smartphones.
Is this even possible? Couldn't find any project in this context.
Plan B would be using MediaPipe and its pretrained efficient model with finetuning it with my custom data.
Open for a completely different approach.
So what do you recommend me to do?
Any roadmaps to follow are appreciated.
/r/computervision
https://redd.it/1iqunlw
D What happened to SSMs and linear attentions?
Someone who is upto date with this area of research can summarize what is current state of SSMs and softmax attention alternatives? Are they used in cusomer focused models yet or are still in research? Does their promise only appears to be in benchmarks on a paper? or are the hardware accelerators have etched the attention so that it is fully juiced up and using SSMs or linear attention alternatives only provide marginal gains which does appeal with the level of complexity in them?
/r/MachineLearning
https://redd.it/1in9y30
D Monthly Who's Hiring and Who wants to be Hired?
For Job Postings please use this template
>Hiring: [Location\], Salary:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] and [Brief overview, what you're looking for\]
For Those looking for jobs please use this template
>Want to be Hired: [Location\], Salary Expectation:[\], [Remote | Relocation\], [Full Time | Contract | Part Time\] Resume: [Link to resume\] and [Brief overview, what you're looking for\]
​
Please remember that this community is geared towards those with experience.
/r/MachineLearning
https://redd.it/1ie5qoh
[D] Which software tools do researchers use to make neural net architectures like this?
/r/MachineLearning
https://redd.it/1ig6k3l
The scale vs. intelligence trade-off in retrieval augmented generation Discussion
Retrieval Augmented Generation (RAG) has been huge in the past year or two as a way to supplement LLMs with knowledge of a particular set of documents or the world in general. I've personally worked with most flavors of RAG quite extensively and there are some fundamental limitations with the two fundamental algorithms (long-context, and embedding) which almost all flavors of RAG are built on. I am planning on writing a longer and more comprehensive piece on this, but I wanted to put some of my thoughts here first to get some feedback and see if there are any perspectives I might be missing.
Long-context models (e.g. Gemini), designed to process extensive amounts of text within a single context window, face a critical bottleneck in the form of training data scarcity. As context lengths increase, the availability of high-quality training data diminishes rapidly. This is important because of the neural scaling laws, which have been remarkably robust for LLMs so far. There is a great video explaining them here. One important implication is that if you run out of human-generated training data, the reasoning capabilities of your model are bottle-necked no matter how many other resources or tricks you throw at the problem. This paper provides some nice empirical support for this idea. Across all of the "long-context" models the reasoning capabilities decrease dramatically as the context length increases.
A graph I generated based on one of the main tables in the paper showing how reasoning capabilities degrade as context length increases.
Embeddings based RAG has much better scalability but suffers from some pretty serious issues with high-level reasoning tasks. Here is a small list from this paper:
https://preview.redd.it/huig4ipulufe1.png?width=967&format=png&auto=webp&s=62743d60ba1c9162c9e1bf5ff6d05af20d577868
The authors also have a nice statement as to the core reason why towards the beginning of the paper:
>
This structural limitation is particularly problematic when dealing with documents that require deep understanding and contextual interpretation such as a complex book. Often there will not only be an important internal structure to each document, but also an important meta-structure across documents (think of scientific papers that cite specific portions of other scientific papers). There are tricks like using knowledge graphs that try to get around some of these issues, but they can only do so much when the fundamental method shreds any structure the documents might have had before any of the secondary steps even begin.
The scalability limitations of long-context, and the reasoning limitations of embedding, lead to an important trade-off for anyone building a RAG system. Long-context models excel in creativity and complex reasoning but are limited to small document sets due to training data constraints. Conversely, embeddings-based approaches can handle vast corpuses but function more like enhanced search engines with minimal reasoning abilities. For many tasks, this trade-off is fine as the task already fits well on one side or the other of the trade-off. Many other tasks however, are simply not easily achievable with SoTA RAG methods due to the fact that they require both large amounts of documents and advanced reasoning over these documents.
/r/MachineLearning
https://redd.it/1ick63j
Feb 4 - Best of NeurIPS Virtual Event
[Register for the virtual event.](https://voxel51.com/computer-vision-events/best-of-neurips-feb-4-2025/)
I have added a second date to the Best of NeurIPS virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Talks will include:
* [No "Zero-Shot" Without Exponential Data](https://arxiv.org/abs/2404.04125) \- Vishaal Udandarao at University of Tuebingen
* [Understanding Bias in Large-Scale Visual Datasets](https://arxiv.org/abs/2412.01876) \- Boya Zeng at University of Pennsylvania
* [Map It Anywhere: Empowering BEV Map Prediction using Large-scale Public Datasets ](https://arxiv.org/abs/2407.08726)\- Cherie Ho, Omar Alama, and Jiaye Zou at Carnegie Mellon University
/r/computervision
https://redd.it/1i8f9y1
System Design resources for building great CV products
Hi all,
It seems like there are many resources for system design for regular developer based roles. However, I'm wondering if there are any good books/resources that can help one get better in designing systems around computer vision. I'm specifically interested in building scalable CV systems that involve DL inference. Please give your inputs.
Also, what are typically asked in a system design interview for CV based roles? Please tell, thank you.
/r/computervision
https://redd.it/1i35ysu
D Self-Promotion Thread
Please post your personal projects, startups, product placements, collaboration needs, blogs etc.
Please mention the payment and pricing requirements for products and services.
Please do not post link shorteners, link aggregator websites , or auto-subscribe links.
--
Any abuse of trust will lead to bans.
Encourage others who create new posts for questions to post here instead!
Thread will stay alive until next one so keep posting after the date in the title.
--
Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.
/r/MachineLearning
https://redd.it/1htw7hw
AutoEncoder Embedding vectors
I have a question about an AutoEncoder(AE) embedding vector(latent vector). Let's suppose the training set is "FashionMNIST".
What we clarify about the AE training objective when we set the loss function is minimizing the difference between the pixels of input image and output image. There's no instructions about "Mapping the similar items to similar embedding vector" or "similar latent space region".
But, after the training, It shows that similar items are mapped to similar embedding vector. How this can be happened?
Is there any fundamental principle that can explain this phenomena?
\- e.g) because the gradient backpropagate in the way that \~\~\~
/r/deeplearning
https://redd.it/1hr2jkm
Why flatter local minima is better than sharp local minima?
My goal is to understand how Deep Learning works. My initial assumption were:
1. "as long as the loss value reach 0, all good, the model parameters is tuned to the training data".
2. "if the training set loss value and test set loss value has a wide gap, then we have overfitting issue".
3. "if we have overfitting issue, throw in a regularization method such as label smoothing".
I don't know the reason behind overfitting.
Now, I read a paper called "Sharpness-Aware Minimization (SAM)". It shattered my assumption. Now I assume that we should set the learning rate as small as possible, and prevent exploding gradients at all cost.
PS: I don't know why exploding gradient is a bad thing if what matters was the lowest loss value. Will the model parameters be different for the model that was trained with a technique that didn't cause exploding gradients if compared to a model that was trained without the technique?
I binged a bit and found this image.
PS: I don't know what is a generalization loss. How does the generalization loss was calculated? Does this use the same loss function but use the testing set instead of training set?
In the image, it shows 2 minimum, one is sharp, the other is flat. If it's sharp, there is a large gap if compared to the generalization loss. If it's flat, there is a small gap if compared to the generalization gap.
Sharp and Flat Minimum
/r/deeplearning
https://redd.it/1hltl6r
Best Computer Vision Books for Beginners to Advanced
https://codingvidya.com/best-computer-vision-books-for-beginners/
/r/computervision
https://redd.it/1hi8425
What is an interesting/niche NLP task or benchmark dataset that you have seen or worked with?
With LLMs front and center, we're all familiar with tasks like NER, Summarization, and Question Answering.
Yet given the sheer volume of papers that are submitted to conferences like AACL, I'm sure there's a lot of new/niche tasks out there that don't get much attention. Through my personal project, I've been coming across things like metaphor detection and the cloze test (the latter is likely fairly well-known among the Compling folks).
It has left me wondering - what else is out there? Is there anything that you've encountered that doesn't get much attention?
/r/LanguageTechnology
https://redd.it/1he98sh
Advice Math for Deep learning Book
Hello Everyone,
I want to learn more about the mathematics approach behind deep learning architecture.
I precise that I have no mathematical background in university (medical study), but I already create deep learning architecture (AE, CNN, GAN) and know every concept.
I realise that I need the mathematic logic creativity to personnalise new deep architecture, for future medicals papers.
Have you read a book about this subject and advise one ? , I already see this three books, but I don't know, who is the better ? :
\- Math for Deep learning
\- Math and Architectures for Deep learning
\- Essential math for AI
Thank you very much for your advice
/r/deeplearning
https://redd.it/1h7x5lm
Can NLP exist outside of AI
I live in a Turkish speaking country and
Turkish has a lot of suffixes with a lot of edge cases. As a school project I made an algorithm that can seperate the suffixes from the base word. It also can add suffixes to another word. The algorithm relies solely on the Turkish grammar and does not use AI. Does this count as NLP? If it does it would be a significant advantage for the project
/r/LanguageTechnology
https://redd.it/1h4diir
From humanities to NLP
How impossible is it for a humanities student (specifically English) to get a job in the world of computational linguistics?
To give you some background: I graduated with a degree in English Studies in 2021 and since then I have not known how to fit my studies into real job without having to be an English teacher. A year ago I found an approved UDIMA course (Universidad a Distancia de Madrid) on Natural Language Processing at a school aimed at humanistic profiles (philology, translation, editing, proofreading, etc.) to introduce them to the world of NLP. I understand that the course serves as a basis and that from there I would have to continue studying on my own. This course also gives the option of doing an internship in a company, so I could at least get some experience in the sector. The problem is that I am still trying to understand what Natural Language Processing is and why we need it, and from what I have seen there is a lot of statistics and mathematics, which I have never been good at. It is quite a leap, going from analyzing old texts to programming. I am 27 years old and I feel like I am running out of time. I do not know if this field is too saturated or if (especially in Spain) profiles like mine are needed: people from with a humanities background who are training to acquire technical skills.
I ask for help from people who have followed a similar path to mine or directly from people who are working in this field and can share with me their opinion and perspective on all this.
Thank you very much in advance.
/r/LanguageTechnology
https://redd.it/1h12gyo