Interviews
With Ilya Sutskever (GPT-4 creator and creator of many others important milestones in AI and NLP research):
https://youtu.be/SjhIlw3Iffs
With Sam Altman (OpenAI CEO):
https://youtu.be/L_Guz73e6fw
Explainability for NLP
With the raise of LLMs from ClosedAI, the research in explainability for NLP is important as never before. Still, a lot of work should be done in the field. However, you already can experiment and try explain your fine-tuned LLMs on a specific task. For now, the majority of methods are explored for texts classification tasks and are adjusted from tabular data.
How it can be done?
1. Baseline approach: Leave-one-out explanations. For instance, you have a regression layer as one of the last layers in your model. You can check the tokens with major weights. Then, exclude them from the text and check if the model's answer has changed. If the tokens were indeed important, the answer should change dramatically as the model cannot orient on this words to make a correct decision.
2. Local Surrogate (LIME). Modification of the previous idea. Now, you delete each word from the sentence and check the results. The "importance" of the word will be estimated based on how the model's answer differ each time.
3. SHAP (SHapley Additive exPlanations). It is based on a game theory with the main idea to tell us how to fairly distribute the “payout” (= the prediction) among the features. So, one more modification of previous approaches with estimation of a score with three parameters — local accuracy, missingness, and consistency.
More details about how explainability can be used for general ML you can read in the book "Interpretable Machine Learning". The TUM, where I am right now, already did an overview of explainability methods for NLP and you can check this paper.
If we have explained a model, what is next? How we can fix model's misbehavior with such explanations? The continuation of explainability story will be in further posts😉
Bing runs on GPT-4
https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4
GPT-4 is here
🥁
https://openai.com/research/gpt-4
To get access to API, you can sign up to the waiting list.
Reinforcement Learning Summer School
10 day Summer School in Barcelona🏝 dedicated to the dive into Reinforcement Learning.
* Where: The summer school will be in the Campus Poblenou at Universitat Pompeu Fabra in Barcelona (Spain).
* When: June 26th to July 5th, 2023
* Suggested targeted audience: MsC and PhD students that are not yet expert in reinforcement learning but have prior knowledge in machine learning (some notions of reinforcement learning is also necessary). Post-docs, researchers and professionals working on related fields and willing to learn about reinforcement learning can also apply.
* Application ends on March 27, 2023.
The school has fees! For students, it is 200 Euros.
The official website [link]
The program [link]
Application form [link]
TowardsNLP Online Meetup
Let us start!
The link to join:
https://tum-conf.zoom.us/j/69545123140?pwd=TWkreHhrTDlvaGhkUzlnaHpTRUhTQT09
We are collecting donations to help refugees in Germany during the call. Send to PayPal: dardem96@gmail.com
Sorry if you timezone is not covered🙏 If you have some questions, leave the comments, I will try to cover them and publish the recording later😉
Читать полностью…TowardsNLP Online Meetup
Fix the date — February 26th, let us have the online meeting where we can chat all about NLP🤗
The online question that I want to discuss is the time when exactly we should start a call. Please, select the option.
The collected costs during the call will go to DaMigra that helps refugees in Germany.
Multimodal Deep Learning Book
A comprehensive overview of state-of-the-art modes for NLP and CV as well as how they can be connected. Of course, text2img generation is included!
Book
#edu
P.S. Big thanks to everyone for the reply and voting in the previous post🤗 Stay tuned for the further meeting details announcements 😉
Friday's evening ACL submissions
We live in a new reality...
I am positively and negatively surprised at the same time🤔
Make AI research responsible again!
The link to the blog:
https://2023.aclweb.org/blog/ACL-2023-policy/
Text2Img: Recent Trends Learning Materials
If you are like me how have read all the hype about text2img generation but did not have time ti dive into the models' details, I prepared the list of sources to learn the recent advantages of the topic.
Estimation of time to go through all of it: 2-3 evenings.
* [link] Amazing visualization of ALL models and papers that made text2img so cool as it is now: image detection -> fake image generation -> style transfer -> text2img. Really helps to track the history and understand the reason why it is going as it is going.
* [link] Series of lectures from Fast.ai about Stable Diffusion. I really recommend to at least watch the first lecture (time code included) — it is enough to understand the main idea. Additionally, github repo with tutorials.
* [link] After general idea, it is worth to watch CLIP paper analysis.
* [link] Of course, thanks to Jay Alammar, we also have The Illustrated Stable Diffusion.
* [link] The biggest dive into the code: The Annotated Diffusion Model (colab).
Enjoy💪
P.S. There is a chance that I will create 1-2 lectures based on this material for TUM NLP course. Let me know, it you are interested in the recording.
#edu #text2img
Multiverse🌌: Multilingual Evidence for Fake News Detection
Our extended work about the multilingual feature for fake news detection.
Nowadays, each language bubble is like separate universe with its biased view on the event. Our approach can help to compare news across media in different languages and give more critical information about the event for the reader.
The work contains broad fake news field analysis in general as well. Might be useful for the introduction into fake news detection topic.
[link] Full paper
The State of Multilingual AI
There are around 7,000 languages spoken around the world. Around 400 languages have more than 1M speakers and around 1,200 languages have more than 100k ... Reviewing papers published at ACL 2008, she found that 63% of all papers focused on English. For a recent study, we similarly reviewed papers from ACL 2021 and found that almost 70% of papers only evaluate on English. 10 years on, little thus seems to have changed.
by Sebastian Ruder:
* Status Quo;
* Recent Progress;
* Challenges and Opportunities;
https://ruder.io/state-of-multilingual-ai/
NLP for Social Good - Daryna Dementieva | Munich NLP
Recently I was a guest at MunichNLP seminars series. You are very welcome to watch if you want to know more about:
* general idea of what is going in AI and NLP for Social Good;
* fake news detection and how multilingual evidence can help to improve it;
* what is going on in the field of texts detoxification.
The recorded video is available on Youtube📣
Scaling Instruction-Finetuned Language Models
TL;DR Additional fine-tuning of T5 or PaLM models on 1k (!) tasks make them better on evaluation tasks, make them to cover more languages, and scale to the new unseen tasks better.
Google Brain team experimented with new methods of fine-tuning of Large Language Models. The main recipes for better LLMs:
* the bigger amount of the tasks for pre-training you have, the better;
* smarter prompts are also help more. By smarter here we can understand the usage of instructions and Chain-of-thought (see screenshots). Translating to human language, the more clues you give the model in the request, the more precise answer you will receive. The Chain-of-thought concept is quite interesting, the original paper of it is here.
The optimal amount of tasks of pre-training is still an open research question (authors in their experiments jumped from 282 tasks directly to 1,836 tasks, quite a gap of number to explore).
But, in the end, if we want to solve a new task and we generate smarter prompts for it, as the model was pre-trained, it will significantly improve zero-shot performance.
The original paper with all details and a lot of table and examples of performances on different tasks.
🤗model cards: all variations of t5, flan-t5-base for illustration.
ELLIS Summer Schools
European Laboratory for Learning and Intelligent Systems (ELLIS) this summer holds a series of summers schools:
https://ellis.eu/events
I am personally interested in Summer School on Large-Scale AI for Research and Industry — how to train models and operate with data with distributed computations. In the era of LLM seems important to obtain such skills😉
Find your school of interest🤘
To all women in this channel — all equal possibilities to pursue your career in computer science and become super cool professionals💪
Читать полностью…Introducing ChatGPT and Whisper APIs
>write a blog-post about ChatGPT API release
>We are excited to announce the release of the ChatGPT API, which enables developers to integrate the power of OpenAI's cutting-edge language model into their applications. With this release, developers can access the full capabilities of ChatGPT, a large language model trained on vast amounts of text data, through a simple API.
ChatGPT is a state-of-the-art language model that can understand natural language input and generate human-like responses. It can be used in a variety of applications, such as chatbots, virtual assistants, and customer service automation. By integrating the ChatGPT API into their applications, developers can provide their users with a more natural and intuitive experience.
📌Release notes [link]
TowardsNLP Online Meetup
Sooo... We are meeting each other on February 26th, 6pm (GM+1)📍
The link will be posted right before the event.
Also, to have more or less structure of what to discuss, you can leave your the most desirable questions and topic here in the comments👇
Читать полностью…News from *GPT-World
Or how the world has changed within a week!
* Scientists from TUM and LMU (including me🙃) have published a statement paper how CharGPT can be (or not) used for teaching process. Of course, the possibilities are immersive, but still there are a lot of issues to be aware and solve. Check out the full paper [link].
* DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature [link]. Now we can detect generated papers! (almost) The idea is based on checking probabilities: if the change of the word is resulting in smaller probability to be generated by LM, then it was "not chosen" for generation. However, the results are arguable. Our students checked the system with CharGPT generated texts and the DetectGPT failed to catch them.
* Bard empovered by LaMDA: coversational AI service by Google. [link]
* GPT-3.5 is now integrated into Bing search 🤝. [link]
TowardsNLP Online Meeting
Greetings all subscribers of TowrdsNLP channel👋 I am very glad to have ~1.2k audience and hope that the material posted here is useful for you.
I want to try new format to interact with you. In February, it will be 3 years of existence of the channel🎉 I started it as a platform for university seminars and now I continue to tell here about my research and research interests. During these years I had a huge amount of experience in Data Science and NLP, both industrial and academical.
I want to schedule a Zoom meeting once in February evening where I can answer your questions and we can have a discussion:
* about education in DS: comparison of traditional and innovative universities;
* about pursuing PhD in DS/NLP;
* about paper publishing;
* about how to choose career track;
* about modern NLP;
* about DS education in EU (🇩🇪) and CIS countries;
* I can talk about my research and PhD thesis;
* or any other topic you suggest😉
There will be also financial purpose of this meeting: during the seminar I will start a fundraising (of course, it is not mandatory for participation in call🤗) and collected donations will go to support Ukrainian refugees.
Please, let me know if you are interested in such a meeting!
Let's build GPT: from scratch, in code, spelled out.
A new video from Andrej Karpathy Youtube-series "Neural Networks: Zero to Hero". The previous videos can introduce you to the basics of Neural Networks and language modeling.
[link] Now it is about how to build your own GPT!
Interesting not only for beginners, but for pro's as well to recap and grab new details🧐
Happy New 2023 Year!
I will not hide, it was a tough year as for a citizen of Ukraine 🇺🇦
Thank it anyway to teach us to be more strong, to love more, and to know indeed your people ❤️
Let us support peace, love, and science in 2023! 🕊️
GALACTICA
The amount of papers being published every month, week, and even day now is very overwhelmed. In May 2022, an average of 516 papers per day were submitted to arXiv. How will it be nice if there is a tool that helps researches to find papers for review more precisely, summarize it and help to organize research better? Now it is possible💪
The researches from Meta AI introduced new language model Galactica. What makes this model capable to work with equations, chemistry sequences, references, code, plain text, and other symbolic chains so good?
* Dataset: The Galactica Corpus. Contains of 48m papers, 106b tokens from papers, reference material, encyclopedias and other scientific sources.
* Tokenization: special type of tokenization and separation tokens for each type of sequences: citation, mathematics, chemistry sequences, and others.
* Working Memory Token: recently, there was introduced chain-of-thoughts concept. In this work, the authors go further: memory token <work>
that wraps prompting into step-by-step reasoning part.
* Prompt Pre-Training (similar to FLAN) based on different tasks: QA, summarization, NER extraction, reasoning, dialogue, others.
* Architecture: a Transformer architecture in a decoder-only setup.
Now, using the demo, you can search by reference, short description of the main idea of the paper or even formula, and ask for summarization.
Thanks for the Twitter community, the demo is now shouted down🫣 However, as always, the presented scientific is still interesting by itself.
In a meanwhile, we will wait to again test the model in its full power.
[link] The main page
[link] The paper about Galactica LLM
Stanford Seminar — ML Explainability
If you want to be introduced into explainability topic, there is a cool seminar from Stanford! From the basics to the new horizons of research in this field.
Videos on Youtube: link
Slides: link
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Remember BLOOM 🌸 model? Now there are BLOOM datasets: multimodal multilingual datasets covering 363 languages across 32 language families💪!
Four datasets are released:
* bloom-lm for language modeling in 351 languages;
* bloom-captioning for image-to-text or text-to-image tasks in 351 languages;
* bloom-vist for visual storytelling in 351 languages;
* bloom-speech for speech-to-text and text-to-speech tasks in 56 languages.
The original paper with all details about collection process and datasets here.
MTEB: Massive Text Embedding Benchmark
Indeed massive work of comparison of 33 models on 56 datasets and 112 languages💪
Now, if you are interested in some task, you can go to this leaderbord and orient to the best models for this task in specific language. Or, if you have new model, you can perform more clear and fair comparison.
Paper: https://arxiv.org/abs/2210.07316 (useful to read more details about the tasks, abbreviations, details of the datasets and the models)
Github: https://github.com/embeddings-benchmark/mteb
Leaderboard at 🤗: https://huggingface.co/spaces/mteb/leaderboard