Have a live conversation about a basketball game with GPT4V, Whisper, TTS
/r/computervision
https://redd.it/17ywiwp
Start with Large Language Models (LLMs) in 2023
This is a complete guide to start and improve your LLM skills in 2023 without an advanced background in the field and stay up-to-date with the latest news and state-of-the-art techniques!
The complete article: https://www.louisbouchard.ai/from-zero-to-hero-with-llms/
All the links on GitHub: https://github.com/louisfb01/start-llms
Artificial is a fantastic field, and so are language models like GPT-4, Claude..., but it goes extremely fast. Don't miss out on the most important and exciting news by joining great communities, people, newsletters, and more you can all find in this guide!
This guide is intended for anyone with a small background in programming and machine learning. Simple python knowledge is enough to get you started. There is no specific order to follow, but a classic path would be from top to bottom. If you don't like reading books, skip it, if you don't want to follow an online course, you can skip it as well. There is not a single way to become a "LLM expert" and with motivation, you can absolutely achieve it.
/r/deeplearning
https://redd.it/17qo9lt
how well do NLP masters prepare you for industry jobs?
Hey everyone!
I am currently a second-year student enrolled in a linguistics program but I am taking some NLP courses (python programming, machine learning, neural networks, machine translation, databases, maybe statistics). I have a great interest in phonetics & speech technology and am therefore looking into Edinburgh's speech and language processing master as well as other one-year more general NLP masters. However, with all of this I am just unsure if a one-year masters, especially a highly specialised one such as edinburgh's, will sufficiently prepare me for an industry job right after graduation? Should I be looking more into two-years programs such as the ones in Germany?
I would greatly appreciate any input!
/r/LanguageTechnology
https://redd.it/17nm5i9
Part 1: Building Vision Transformer from Scratch: A PyTorch Deep Dive
I've just published the first installment of my Vision Transformer series article. find the full functional example of colab link given in the article
Part 1: Building Vision Transformer from Scratch: A PyTorch Deep Dive Plus a Teaser on LORA for Part 2
pashashaik/part-1-building-vision-transformers-from-scratch-a-pytorch-deep-dive-plus-a-teaser-on-lora-for-beef0f3aef5c">pashashaik/part-1-building-vision-transformers-from-scratch-a-pytorch-deep-dive-plus-a-teaser-on-lora-for-beef0f3aef5c" rel="nofollow">https://medium.com/@pashashaik/part-1-building-vision-transformers-from-scratch-a-pytorch-deep-dive-plus-a-teaser-on-lora-for-beef0f3aef5c
/r/deeplearning
https://redd.it/17m6cix
Give a few prompts to A.I. and it can now generate videos such as this...
https://vimeo.com/877454859
/r/LanguageTechnology
https://redd.it/17g7lfu
Is it useful to take a statistics class for computer vision?
Apologies for inundating the subreddit with questions about courses lol. Am wondering if it is useful somehow though, or only marginally (since stats helps with ML, which helps with computer vision).
Also, is it more useful to know in context of a research/academia setting, or in a industrial setting?
/r/computervision
https://redd.it/17cp5d5
Is it better to take a class on 3D understanding, or a course on the physics of visual appearance?
The former course talks about: Explicit, Implicit, and Neural 3D Representations, Differentiable Rendering, Single-view 3D Prediction: Objects, Scenes, and Humans, Neural Rendering, Multi-view 3D Inference: Radiance Fields, Multi-plane Images, Implicit Surfaces, etc., Generative 3D Models, Shape Abstraction, Mesh and Point cloud processing.
The second course talks about more physics and optics stuff, like principles pf photometry, light fields, reflection, refraction, polarization, caustics, lighting and shadows, BRDFs, vision in bad weather, and applications in aerial, underwater, medical, and microscopic imaging.
At this point, I think I'm interested in biomedical applications of computer vision, so I am leaning towards the second course. However, I don't know enough about the job market out there for computer vision -- it would seem to me that self-driving cars and all would prefer the former course. Furthermore, I wonder if that course is better for expanding my understanding, or the second one. I know that I should pick based on what I want to do, not what's more popular necessarily -- but I also don't quite have that figured out either. I just feel that I don't find generative AI my type of thing (despite it being a big deal lol).
I also have a background in classical signal processing, electromagnetism, etc. being an EE so I thought maybe the 2nd course would complement my background more.
Any advice is greatly appreciated!
/r/computervision
https://redd.it/17bsmnh
Context aware chunking with LLM
I'm working on an embedding and recalll project.
My database is made mainly on a small amount of selected textbooks. With my current chunking strategy, however, the recall does not perform very well since lots of info are lost during the chunking process. I've tried everything... Even with a huge percentage of overlap and using the text separators, lots of info are missing. Also, I tried with lots of methods to generate the text that I use as query: the original question, rephrased (by llm) question or a generic answer generated by LLM. I also tried some kind of keyword or "key phrases ", but as I can see the problem is in the chunking process, not in the query generations.
I then tried to use openai api to chunk the file: the results are amazing... Ok, i had to do a lots of "prompt refinement", but the result is worth it. I mainly used Gpt-3.5-turbo-16k
(obviously gpt4 is best, but damn is expensive with long context. Also text-davinci-003 and it's edit version outperform gpt3.5, but they have only 4k context and are more expensive than 3.5 turbo)
Also, I used the llm to add a series of info and keywords to the Metadata.
Anyway, as a student, that is not economically sustainable for me.
I've seen that llama models are quite able to do that task if used with really low temp and top P, but 7 (and I think even 13B) are not enough to have a an acceptable reliability on the output.
Anyway, I can't run more than a 7B q4 on my hardware.
I've made some research and I've found that replicate could be a good resources, but it doesn't have any model that have more than 4k of context length. The price to push a custom model is too much for me.
Someone have some advice for me? There is some project that is doing something similar? Also, there is some fine tuned llama that is tuned as "edit" model and not "complete" or chat?
Thanks in advance for any kind of answers.
/r/LanguageTechnology
https://redd.it/171r2c1
Why everyone is asking about C++?
When I look at the job posts I see a lot of C++ requirements for AI/DL/ML related jobs.
I assume this is to create optimized models. However when I check online I couldn't see any specific benefit of using C/C++ over python.
When de they plan to use C/C++ and for what? I checked some benchmark comparsions and they're very similar etiher. Furthermore can't we use cython instead of C/C++ anyways?
Do you have any ideas about this?
/r/deeplearning
https://redd.it/16zw8ct
“Decoder-only” Transformer models still have an encoder…right? Otherwise how do they “understand” a prompt?
The original transformer model consisted of both encoder and decoder stages. Since that time, people have created encoder-only models, like BERT, which have no decoder at all and so function well as base models for downstream NLP tasks that require rich representations.
Now we also have lots of “decoder-only“ models, such as GPT-*. These models perform well at creative text generation (though I don’t quite understand how or why).
But in many (all?) use cases of text generation, you start with a prompt. Like the user could ask a question, or describe what it wants the model to do, and the model generates a corresponding response.
If the model’s architecture is truly decoder-only, by what mechanism does it consume the prompt text? It seems like that should be the role of the encoder, to embed the prompt into a representation the model can work with and thereby prime the model to generate the right response?
So yeah, do “decoder-only” models actually have encoders? If so, how are these encoders different from say BERT’s encoder, and why are they called “decoder-only”? If not, then how do the models get access to the prompt?
/r/LanguageTechnology
https://redd.it/16nl811
Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM?
Assuming using the same cloud service, Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM? (ie. do we pay a premium when running a closed sourced LLM compared to just running anything on the cloud via GPU?)
One eg. I am thinking of is running Llama 2 13b GPTQ in Microsoft Azure vs. GPT-3.5 Turbo.
I understand there are a lot of parameters to consider (such as choosing which GPU to use in Microsoft Azure etc.), but I am really looking at what’s the cheapest way to run Llama 2 13b GPTQ or a performance-equivalent closed sourced LLM.
/r/LanguageTechnology
https://redd.it/16p6ceo
Why use ONNX with Triton Inference Server? Why use ONNX in general?
Since Triton can support TensorFlow and PyTorch via torchscript. I was wondering why you would want to convert your model to ONNX? Is it simply to use TensorRT?
Also just wanted to know why use ONNX in general? What are the main advantages?
/r/computervision
https://redd.it/16ogz45
iPhone 15 Stereo Imaging
In yesterday’s keynote event Apple released the iPhone 15 pro max. Apparently you can now take 3d images (only available on the iPhone 15 pro). Well, it uses two of its camera lenses to take two images from slightly different angles to perform stereo imaging - obtaining depth.
So I’m sitting here thinking - every iPhone can do that - right? I’m looking at my iPhone 11 Pro Max thinking about writing up a program in iOS that can utilize two lenses and to take a “3d image.”
Sounds like a doable project right? I did stereo imaging and depth estimation projects for one of my classes so I think I can take on the challenge.
/r/computervision
https://redd.it/16ihrtk
D The ML Papers That Rocked Our World (2020-2023)
Hey everyone! 👋
I’ve been on a bit of a deep-dive lately, trying to catch up on all the awesome stuff that’s been happening in the ML space. It got me wondering, from 2020 to 2023, what have been the absolute must-read papers that shook the foundations and got everyone talking?
Whether it’s something that reinvented the wheel in your specific niche or just made waves industry-wide, I wanna hear about it!
I’m curious to see how different the responses will be, and hey, this might even become a go-to list for anyone looking to get the lowdown on the hottest trends and discoveries of the past few years.
Can’t wait to hear your thoughts!
# tl;dr
I decided to aggregate your best suggestions into categories for anyone interested in reading them without searching through the whole comment section in the future.
## Theoretical:
[Neural Networks are Decision Trees](https://arxiv.org/abs/2210.05189)
Cross-Validation Bias due to Unsupervised Preprocessing
[The Forward-Forward Algorithm: Some Preliminary Investigations](https://arxiv.org/abs/2212.13345)
LoRA: Low-Rank Adaptation of Large Language Models (included here as it has applications beyond LLMs)
[Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets](https://arxiv.org/abs/2201.02177)
## Image:
ViT related:
[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)](https://arxiv.org/abs/2010.11929)
Emerging Properties in Self-Supervised Vision Transformers
[Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877v2)
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[A ConvNet for the 2020s (a CNN that implements several key components that contribute to the performance of Vision Transformers)](https://arxiv.org/abs/2201.03545)
(CLIP) Learning Transferable Visual Models From Natural Language Supervision
Diffusion related:
High-Resolution Image Synthesis with Latent Diffusion Models
[Denoising Diffusion Probabilistic Models (DDPM)](https://arxiv.org/abs/2006.11239)
Classifier-Free Diffusion Guidance
[Taming Transformers for High-Resolution Image Synthesis (VQGAN)](https://arxiv.org/abs/2012.09841)
Segment Anything (SAM)
[DINOv2: Learning Robust Visual Features without Supervision](https://arxiv.org/abs/2304.07193)
Bayesian Flow Networks
## NLP:
[Language Models are Few-Shot Learners (GPT-3)](https://arxiv.org/abs/2005.14165)
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
[Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
Training Compute-Optimal Large Language Models (Chinchilla)
[The Flan Collection: Designing Data and Methods for Effective Instruction Tuning](https://arxiv.org/abs/2301.13688)
LLaMA: Open and Efficient Foundation Language Models
[Toolformer: Language Models Can Teach Themselves to Use Tools](https://arxiv.org/abs/2302.04761)
## 3D Rendering:
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[Highly accurate protein structure prediction with AlphaFold](https://www.nature.com/articles/s41586-021-03819-2)
## Misc:
Human-level play in the game of Diplomacy by combining language models with strategic reasoning
For a well-made and maintained list of ML resources (not only the newest like here) you can check out
How do these nickname tools work?
Hey everyone! I recently came across this interesting nickname generator (it is not the only one). It gave me a surprisingly accurate "japanese viking" name, which piqued my curiosity. From a linguistic perspective, how might such a tool understand and combine linguistic elements to produce coherent and culturally relevant nicknames? Does it consider phonetics, morphology, or other linguistic rules? Would love to get your insights!
/r/LanguageTechnology
https://redd.it/16d781w
Serverless development experience for embedded computer vision
I recently published the version 1.0 of the Pipeless framework. It provides the development experience of serverless web frameworks to create computer vision applications that run directly on devices.
What that means is you provide some Python functions and those are executed when there is a new video frame. The framework manages everything for you including parallelization, streams management, executing your functions when they have to be executed, etc. You can run it in your devices and provide input streams via a CLI or REST API. It supports multi-stream processing, dynamic stream configuration, ships some inference runtimes so you just need to provide a model, and a bunch of other cool features.
It is working at a very decent performance. I have reached real-time 15 FPS in a CPU of 4 cores with a YOLO model. When using GPU it more than doubles that.
If someone is interested I really appreciate feedback!
You can find the repo here: https://github.com/pipeless-ai/pipeless
/r/computervision
https://redd.it/17vemga
D What AI topics are you curious about but rarely see in the spotlight?
I'm a data engineer who somehow ended up as a software developer. So many of my friends are working now with the OpenAI api to add generative capabilities to their product, but they lack A LOT of context when it comes to how LLMs actually works.
This is why I started writing popular-science style articles that unpack AI concepts for software developers working on real-world application. It started kind of slow, honestly I wrote a bit too "brainy" for them, but now I've found a voice that resonance with this audience much better and I want to ramp up my writing cadence.
I would love to hear your thoughts about what concepts I should write about next?
What get you excited and you find hard to explain to someone with a different background?
/r/MachineLearning
https://redd.it/17riznw
R Idempotent Generative Network
Paper: https://arxiv.org/abs/2311.01462
Blog: https://assafshocher.github.io/IGN/
Abstract:
>We propose a new approach for generative modeling based on training a neural network to be idempotent. An idempotent operator is one that can be applied sequentially without changing the result beyond the initial application, namely f(f(z))=f(z). The proposed model f is trained to map a source distribution (e.g, Gaussian noise) to a target distribution (e.g. realistic images) using the following objectives: (1) Instances from the target distribution should map to themselves, namely f(x)=x. We define the target manifold as the set of all instances that f maps to themselves. (2) Instances that form the source distribution should map onto the defined target manifold. This is achieved by optimizing the idempotence term, f(f(z))=f(z) which encourages the range of f(z) to be on the target manifold. Under ideal assumptions such a process provably converges to the target distribution. This strategy results in a model capable of generating an output in one step, maintaining a consistent latent space, while also allowing sequential applications for refinement. Additionally, we find that by processing inputs from both target and source distributions, the model adeptly projects corrupted or modified data back to the target manifold. This work is a first step towards a ``global projector'' that enables projecting any input into a target data distribution.
​
/r/MachineLearning
https://redd.it/17otzfw
What’s your responsibility as computer vision developer?
I feel like I’m not doing computer vision. I have started a project at my current organization where I am building the defect detection system from scratch. I am mainly spending my time collecting the dataset, labeling them and trying various models. My team uses highly accurate available models from AWS Rekognition and Roboflow which is one click training process. I feel like anyone can collect, label and test the models.
/r/computervision
https://redd.it/17kssfp
imops - ultra fast classical CV algorithms for Python
Hi everyone!
tl/dr: imops is a collection of carefully optimized CV algorithms for numpy arrays of any dimension!
I work in medical imaging, mostly with 3D CT/MRI. This is a pretty computational heavy field with a focus on near-real time processing.
To my surprise, many of the CV algorithms from scipy and skimage are painfully slow. We reimplemented some of them in Cython and added support for arrays of any dimension.
You can find the project here, and the benchmarks section contains a comparison with scipy/skimage counterparts.
If you're interested in contributing, or would like to see another function implemented, don't hesitate to open a PR or create an issue!
/r/computervision
https://redd.it/17h0i9t
D Is Computer Vision dead? - “Quo Vadis, Computer Vision?”
In ICCV23, several top notch researchers shared their insights (in a workshop called “Quo Vadis, Computer Vision?”) wrt the current state of Computer Vision, especially in light of the meteoric raise of LLMs. Has CV stalled? Is CV dead?
E.g.MIT’s professor Bill Freeman, has some interesting points on foundation models: “FM aren’t fundamental, therefore not stable". Jitendra Malik argues "video can describe the world better than text."
/r/MachineLearning
https://redd.it/17eak3w
What should I self-study specifically to become hirable in the NLP/machine learning field
Hi, I have majors in cognitive science, linguistics, philosophy, and a minor in computer science. I know python, java, and SQL at the leetcode/interview level. I have studied algebra, linear algebra, calculus, differential equations. I would love to apply my linguistics knowledge in a tech job (I have not had a tech job at all before). However, I did not get the chance to study NLP or machine learning in college, so I feel like I do not know how to bridge the gap between these two disciplines. What should I do or study to know what I am doing, to be able to get an entry-level NLP position?
/r/LanguageTechnology
https://redd.it/16mrr90
Seeking learning resources that go deep into NLP foundations and which target advanced-intermediate technical learners
# TL;DR:
Semi-experienced MLE seeking to deepen my knowledge of the modeling side of NLP. Can you recommend any courses or other resources to pursue for this?
Ideally I'd like resources which target advanced-intermediate practitioners and which spend only minimal time on theoretical linguistics concepts (e.g., "What is syntax?"; "What is a morpheme?"; "What is distributional semantics?" - Less because that stuff is unimportant, and more because I already know it inside and out.)
------------
# My brief background
Theoretical linguistics grad here. Several years ago and many years out of school, I set off to STEM-up and become a machine learning engineer (MLE) in NLP. I spent a few years hardcore self-studying math, programming, ML/DL, and NLP. In the end I succeeded, learning just enough to get hired as an NLP MLE using only web-based resources and a few PDFs.
I've been in the role for 3-4 years now, and in the interim I have learned a TON of practical skills about SWE and DevOps that I hadn't learned while self-studying the theory. However, my knowledge of ML/DL/NLP theory hasn't actually grown much. This is mostly because where I work, the modeling is left to PhD'ed researchers, not as much the engineers, and not at all the junior engineers like me. So I'd like to get back into learning that side of things.
Because I prioritized breadth over depth during that initial self-study period, I'm passingly familiar with a fair number of NLP tasks and techniques. But I am a master of none. So on this second pass, because now I know the basics and can "speak the lingo", I'd like to go deep, especially on the foundations on NLP.
# What I'm hoping to get from this post
With this background, can anyone recommend (ideally) courses, books, blogs, or other resources for learning things such as the following?
- foundational NLP tasks (e.g., POS-tagging, dependency parsing, NER, sentiment analysis, summarization) and associated popular models/approaches
- probability theory for NLP
- traditional/non-deep NLP
- sequence classification
- clustering/unsupervised methods for NLP
- multitask learning in NLP
# Why? "Who cares?"
At this point, someone may ask:
> You've already "made it" as an MLE, and modeling clearly is not required. So who cares?
I would retort that as one advances from junior to mid to senior and beyond, "hey I can write great code" should gradually take a backseat to "hey I can make informed decisions about what data and models to pursue given the goals and constraints of a specific business case."
I really want to be able to reason about and make pragmatic arguments like "Well, business case A can be re-conceptualized as a kind of question-answering task (random example), therefore it would be reasonable to start with model B and optimize a cost function C with optimizer D. For that, we'd want to collect at least E examples of data type F, and run some model experiments on G cloud resource, ..." Etc. etc etc. Right now I could follow along if a senior engineer came up with such an argument, then probably execute the work mostly on my own. But I would struggle to come up with and then stand behind the argument myself.
I recently had my first technical interviews (for senior level), and while I did well in the coding rounds, I really felt my deficiencies when answering "What would you do in scenario X, and why?" type questions and talking about the nitty gritty of model architectures. My responses were often just like "Um, I'd throw a transformer model at it", in part because that's 90% of what we do where I work, but also because I lack experience which makes me tend towards "when all you have is a hammer everything looks like a nail" style thinking. Hence, this post.
Anyway, enough ramble. I really look forward to any suggestions I receive. Thanks in advance.
/r/LanguageTechnology
https://redd.it/174k2wv
Stable validation curves on NLP project with BERT
/r/deeplearning
https://redd.it/16w3h3f
Meta Unfolds a 'Universe of AI' Across Instagram, Facebook, and WhatsApp
Meta has unveiled colossal AI updates peppered across its platform that would fundamentally alter user experiences on Instagram, Facebook, and WhatsApp, opening up a "universe of AI" solutions.
For the latest advancements in AI, look here first.
https://preview.redd.it/6od0fkjtp1rb1.png?width=2048&format=png&auto=webp&s=e424d2cd2e614728123005b10431c2c13e780871
Spearheading the AI Universe - Meta AI Chatbot
The “advanced conversational assistant” is set to enhance Messenger, WhatsApp, and Instagram services and will be incorporated into upcoming Ray-Ban Meta smart glasses and Quest 3.
Real-time information capabilities have been bolstered through a partnership with Microsoft Bing, and image generation is powered by a new model, Emu.
A Galaxy of AI Personalities
Meta rolled out 28 AIs in beta, featuring sterling personas such as Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka, thus amplifying the interactivity quotient.
AI Studio - Empowering Businesses
The AI Studio Platform is equipped to enable businesses to build AI chatbots for messaging services on Facebook, Instagram, and Messenger.
Also, Meta will provide a sandbox tool in the upcoming year for users to experiment with creating their own AI.
Generative AI Stickers - A New Co-creating Experience
AI editing tools will allow users to edit images and co-create content with friends.
The tool uses Llama 2 and the new image generation model, Emu, to convert text prompts into stickers in seconds.
Ray-Ban Smart Glasses with Meta AI
The Ray-Ban smart glasses are equipped with Meta AI, allowing users to receive information, incite creativity, and manage the glasses using just their voice.
(source)
P.S. If you like this kind of analysis, I write a free newsletter with the latest and most impactful news in AI. Professionals from Google, Meta, and OpenAI read it daily.
/r/deeplearning
https://redd.it/16uow1h
How do Large Language Models compare to NLP toolkits for NLP tasks?
I need to do some NLP on text in a number of different languages (English, Spanish, Russian etc). I've experimented using spaCy, stanza and NLTK, as well as some LLMs like ChatGPT, Bard, LLaMa 2 and GPT-4, to do things like lemmatization and POS tagging.
In my experimentation, GPT-4 with adequate prompting outperformed everything else in every language. I wasn't able to spot any errors.
The other LLMs were more or less on par with NLP toolkits: LLMs were a bit more robust to imperfections in the input strings (typos, weird punctuation etc), but were more likely to make very simple mistakes too.
​
Have you guys tried to use LLMs for NLP?
Can you confirm my experimental results, or did you get a different outcome?
Is anyone trying to take advantage of the power of LLMs for these tasks? For instance, is anyone trying to extract NLP features from the insides of models like LLaMa 2?
/r/LanguageTechnology
https://redd.it/16gtrk4
R Unveiling theory of mind in large language models: A parallel to single neurons in the human brain - Harvard University 2023
Paper: https://arxiv.org/abs/2309.01660
Abstract:
>With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.
​
https://preview.redd.it/2wduugp4svnb1.png?width=1098&format=png&auto=webp&s=d59878eec6a6570a15ac2a3f9d3485a3c140eb73
https://preview.redd.it/qkobarp4svnb1.png?width=1094&format=png&auto=webp&s=08c17207e282effc21149984e88e143f0878c154
https://preview.redd.it/qz9zydp4svnb1.png?width=1116&format=png&auto=webp&s=a08f4257235a60597ec9a85be3cd6c7df409d755
https://preview.redd.it/c0v4qmp4svnb1.png?width=1143&format=png&auto=webp&s=62c238c1bde2bce7e56de5e738ad2abce71d042d
/r/MachineLearning
https://redd.it/16h1tup
I've created a neural network library in c++ and trained image super resolution in it, the results are surprisingly good.
Hey.
To cut the story short, I've created a library in C++ from scratch using only the Eigen library (still writing most algorithms by hand because of terrible Eigen performance). Anyways I've been experimenting with image super resolution for the past 2 weeks, and I finally found the correct formula for creating a reasonably performing image upscaler.
I'm using a really small network with only 5 convolutional layers of really small kernel sizes (5 and 3) and pixel shuffle layer at the end. The network is trained to correct the error of bicubic interpolation, rather than upscaling the image directly, and thats the reason why it might be performing so 'well', but you can be the judge of that...
Here is an example of upscaled image by the network:
2x Image upscaling
And of course my upscaled pup:
https://preview.redd.it/8frhiak2dwlb1.png?width=1918&format=png&auto=webp&s=05d647b176764dc34350fa9fa9db5b0d71bc38ab
The network mostly just reconstructs the edges in the image, but doesn't really 'hallucinate' any new detail, so the results are quite pleasing. (Still outperforms FSR1 by a lot from my testing). And it should be able to run in real-time on GPU if it were to be ported...
And here is link to the tool : https://github.com/Panjaksli/BNN/tree/v1.0a
You can try it out, and tell me what you think. Thanks.
/r/deeplearning
https://redd.it/168b0p7