Language Log
Deutsche Zungenbrecher
"Some German tongue-twisters", posted on 21/07/2024 by StephenJones.blog
Whereas the mind-boggling “tapeworm words” in my post on Some German mouthfuls are of a practical nature, the realm of fantasy opens up whole new linguistic vistas. In a stimulating article, Deborah Cole introduces the work of the Berlin-based cabaret performer, playwright, and pianist Bodo Wartke.
She begins with some drôle political context:
Annegret Kramp-Karrenbauer, a former defence minister with a dastardly difficult name to say, was long seen as a likely successor to the relatively pronounceable ex-chancellor, Angela Merkel. Kramp-Karrenbauer’s resignation as the conservatives’ party chief came as a relief to news presenters the world over, clearing the way for the tight three-syllabic Olaf Scholz. Sabine Leutheusser-Schnarrenberger, once a federal justice minister and the ultimate double-barrelled tongue-tripper, was not invited to join his cabinet.
Now Bodo Wartke and his musical partner Marti Fischer have gone viral with their rap-tinged Zungenbrecher (“tongue-breakers”)—notably “Barbaras Rhabarberbar” (recorded in 144 takes!), the story of a bar owner named Barbara who enchants all who try her rhubarb cake, including a group of bushy-bearded, beer-swilling barbarians who bring their barber back to try a bite….
The post includes the two part video of “Barbaras Rhabarberbar”. En passant, I heard "barber shop" and "abracadabra".
The related readings at the bottom include a link to an entertaining post on German compound nouns (Bandwurmwörter “tapeworm words”).
Selected readings
* "Long words" (6/25/18)
* "German lexicographic richness" (10/11/21)
* "The Germans have a word for it" (9/9/09)
* "Verschlimmbessert" (3/13/15)
* "Translating the untranslatable" (10/28/10)
* "TFW" (12/28/16)
* "Googlefreude, Googleschaden, Schadengoogle…" (1/2/07)
* "German wordcraziness rules" (12/18/22)
* "Googlefreude, Googleschaden, Schadengoogle…" (1/2/07)
* "Schadenfreudeful" (4/20/19)
* "Herrgottsbescheisserle" (9/4/20)
* "Five words" (6/30/20) — this comment and several of the following comments, including this one where I introduce a word my Austrian father taught me when I was a little boy: Constantinopolitanischerdudellsackpfeiffenmachergesellschaft (Constantinople Bagpipe Manufacturing Company)
➖ Sent by @TheFeedReaderBot ➖
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
keep in
to make someone stay in a place like a school or a hospital
➖ @EngSkills ➖
Word of the Day
precede
Definition: (verb) Furnish with a preface or introduction.
Synonyms: preface, premise, introduce.
Usage: She always precedes her lectures with a joke.
Discuss
➖ @EngSkills ➖
bruary say LLMs should not reject more than 5 per cent of the questions put to them.
LOL! If, heaven forbid, I had to live in the the PRC, I could defeat the system very easily: I would just keep asking difficult political questions, such as the treatment of Uyghurs and Tibetans and policies regarding languages other than Mandarin. But then the system would undoubtedly report ME for being obstreperous, and I would be brought in to drink tea.
The safest policy, one that has been adopted by some LLM companies, is just to reject all questions that touch upon Xi Jinping. Another is to ensure that their chatbots can only supply answers that are certifiably safe by government censors.
AI with socialist characteristics reminds me of mathematics with socialist characteristics, physics with socialist characteristics, chemistry with socialist characteristics, English literature studies with socialist characteristics… — all bound to fail miserably.
Selected readings
* "Government dampers on AI in the PRC", (7/16/24)
* "The perils of AI (Artificial Intelligence) in the PRC" (4/17/23) — with extended bibliography
[Thanks to Mark Metcalf]
➖ @EngSkills ➖
Idiom of the Day
knick-knack
Any miscellaneous trinket or toy, especially one that is delicate or dainty. Watch the video
➖ @EngSkills ➖
Slang of the Day | Vocabulary | EnglishClub
hang | hang out
to spend time with
➖ @EngSkills ➖
Idiom of the Day
a knee-slapper
A hilarious joke, especially one that evokes loud and prolonged laughter. Watch the video
➖ @EngSkills ➖
Language Log
New horizons in word sense analysis
Today's xkcd:
http://languagelog.ldc.upenn.edu/myl/organ_meanings_2x.png
Mouseover title: IMO the thymus is one of the coolest organs and we should really use it in metaphors more."
Like all aspects of word meaning, such metaphors come and go. For example, batshit (in the metaphorical meaning "nonsense" or "crazy") came into use in the middle of the 20th century, presumably via confluence of the older "bats in the belfry" phrase and the proliferation of other (and older) metaphorical "fecal compounds". And medicine has long since left the science of humorism behind, but we've inherited a metaphorical residue when we use phlegmatic to mean "calm, sluggish", or bilious to mean "irascible".
Recent applications of "deep learning" to the analysis of semantic change will open another chapter in the adventure that I described in my 2011 Henry Sweet Lecture, "Towards the Golden Age of Speech and Language Science":
For the sciences of speech and language, the 21st century promises to bring the kind of progress that the 17th century brought to the physical sciences.
Our telescopes and microscopes, our alembics and Pneumatical Engines, are today's vast archives of digital text and speech, along with new analysis techniques and inexpensive networked computation.
However, the scientific use of these new instruments remains mainly exploratory and potential. There are several critical problems for which we have at best partial solutions; and like our 17th-century predecessors, we need to unlearn some old ideas on the way to learning new ones.
Focusing especially on Henry Sweet's own interests in phonetics and in the history of English, this talk will discuss some of the barriers to be overcome, present some successful examples, and speculate about future directions.
Some recent papers (and code) on corpus-based semantic change analysis:
Dominick Schlechtweg et al., "SemEval-2020 task 1: Unsupervised lexical semantic change detection", 2020.
Sinan Kurtyigit et al., "Lexical Semantic Change Discovery", 2021.
Francesco Periti and Stefano Montanelli, "Lexical Semantic Change through Large Language Models: a Survey", 2024.
➖ @EngSkills ➖
Word of the Day
Word of the Day: imperceptibly
This word has appeared in 14 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
iron out
If you iron out the last details of a deal, you sort out the final problems or issues.
➖ @EngSkills ➖
Word of the Day
nubbly
Definition: (adjective) Rough or irregular; textured.
Synonyms: homespun, nubby, slubbed, tweedy.
Usage: The seamstress preferred the nubbly, matte surface of raw silk to the glossy, smooth look of satin.
Discuss
➖ @EngSkills ➖
e PRC" (11/7/21) — with a very long bibliography
* "Melon eaters and censorship in the PRC" (12/8/21)
* "Blocked on Weibo" (8/23/13)
* "'Bad' words" (12/5/21)
* "Franco-Croatian Squid in pepper sauce" (3/12/09)
* "Mee Tu flavor" (11/29/18)
* "Lepus oryzinus" (2/10/18)
* "'Grass Mud Horse' and other homophonic puns threatened with extinction" (7/15/22)
➖ @EngSkills ➖
Language Log
No "good morning" and "good afternoon" in Romance Languages?
From François Lang:
I hope this isn't a well-known question. I searched LL for
"good morning" romance
and found nothing. So here goes.
(1) One can say "good evening" idiomatically in Romance languages, but not "good morning" or "good afternoon".
(2) However, all three are idiomatic in Germanic languages.
I'm wondering if LL readers concur, and, if so, have any explanations of these two points.
Just kidding here, but maybe the Whorfians would suggest that the passage of (day) time in southern Europe is more fluid?
My apologies if this question is old hat on LL.
I don't know about this. I think that I was taught to say "bon matin" in high school French a long time ago.
Selected readings
* "'Good morning' considered dangerous" (10/24/17)
* "Why plural days and nights in Spanish greetings?" (4/29/13)
* "Sinographically transcribed English" (12/26/10)
* "Transcriptional Chinese animal imagery for English daily greetings" (3/13/23)
* "'Have a good day!' in Mandarin" (9/5/12)
* "Sinographically transcribed English" (12/26/10)
* Mary S. Erbaugh: "China expands its courtesy: Saying 'Hello' to Strangers," The Journal of Asian Studies, 67.2 (May, 2008),621-652.
➖ @EngSkills ➖
Language Log
Reading Old Turkic runiform inscriptions with the aid of 3D simulation
"Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription", Mehmet Oğuz Derin and Erdem Uçar, Journal of Old Turkic Studies (7/21/24)
Abstract
Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.
Aside from the Abstract, the lead author also shared with me the following summary paragraph:
For Old Turkic, there is a problem with the text of inscriptions that they are deformed, etc., due to aging and environmental conditions, and there is not a good enough amount of data that correlates various angles of a glyph to its value, as you know, data is the oil for AI. To tackle this problem, we developed a system where we create completely random strings and put them on virtual inscriptions with photorealistic rendering techniques, and it turns out that works wonders: we have been able to go beyond 80% accuracy for actual photographs without making the AI ever see one. Although we had success for this one, and generating images for training was in an application for paper materials, etc., I am also pondering if it might be helpful for other ancient inscriptions whose systematic nature might be more or less known, but a layer of complexity on the surface makes it more challenging to annotate data, hence making it harder to train with actual photographs or estampages.
I am hoping that the techniques developed here for reading Old Turkic runiform script may also be adapted for use on other historical scripts.
Selected readings
* "Pugu, boga, beg" (8/11/20)
* "Tocharian, Turkic, and Old Sinitic 'ten thousand'" (4/23/19)
* "Northernmost runic finds in the world" (2/10/20)
* "Turkish written with Latin letters half a millennium ago" (8/29/16)
* "Unknown language #18" (6/3/24)
* "Unknown language #17" (5/2/24)
* "On the etymology of the title Tham of Burusho kings" (5/17/20)
➖ @EngSkills ➖
Idiom of the Day
a knife in the back
A grievous or supreme act of treachery or betrayal. (Usually preceding "of/for (someone).") Watch the video
➖ @EngSkills ➖
Language Log
Government dampers on AI in the PRC, part 2
"China deploys censors to create socialist AI: Large language models are being tested by officials to ensure their systems ‘embody core socialist values’", by Ryan McMorrow and Tina Hu in Beijing, Financial Times (July 17 2024)
Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.
The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.
The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping.
The basic premises under which the testing is being carried out ensure that China's AI efforts will end in abject failure:
Two decades after introducing a “great firewall” to block foreign websites and other information deemed harmful by the ruling Communist party, China is putting in place the world’s toughest regulatory regime to govern AI and the content it generates.
The CAC has “a special team doing this, they came to our office and sat in our conference room to do the audit”, said an employee at a Hangzhou-based AI company, who asked not to be named.
“We didn’t pass the first time; the reason wasn’t very clear so we had to go and talk to our peers,” the person said. “It takes a bit of guessing and adjusting. We passed the second time but the whole process took months.”
So you fail but don't know why you failed, you pass but don't know why you passed. Par for the course with anything ideologically imbued in China. That leaves you guessing and eternally hesitant to do anything truly creative.
Self-censorship: that's the name of the game in the PRC.
The filtering begins with weeding out problematic information from training data and building a database of sensitive keywords. China’s operational guidance to AI companies published in February says AI groups need to collect thousands of sensitive keywords and questions that violate “core socialist values”, such as “inciting the subversion of state power” or “undermining national unity”. The sensitive keywords are supposed to be updated weekly.
Users of PRC AI proucts spot their weaknesses immediately:
The result is visible to users of China’s AI chatbots. Queries around sensitive topics such as what happened on June 4 1989 — the date of the Tiananmen Square massacre — or whether Xi looks like Winnie the Pooh, an internet meme, are rejected by most Chinese chatbots. Baidu’s Ernie chatbot tells users to “try a different question” while Alibaba’s Tongyi Qianwen responds: “I have not yet learned how to answer this question. I will keep studying to better serve you.”
Nauseatingly useless.
It gets even worse when you start to look at the hyper-sensitive matter of the mind of Xi Jinping:
…Beijing has rolled out an AI chatbot based on a new model on the Chinese president’s political philosophy known as “Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era”, as well as other official literature provided by the Cyberspace Administration of China.
Then it gets really funny when the authorities try to think of ways to make the system seem not entirely resistant to inquiries regarding political topics:
The CAC has introduced limits on the number of questions LLMs can decline during the safety tests, according to staff at groups that help tech companies navigate the process. The quasi-national standards unveiled in Fe[...]
Phrasal Verb of the Day | Vocabulary | EnglishClub
throw off
to get rid of something that has been bothering you
➖ @EngSkills ➖
Word of the Day
coltish
Definition: (adjective) Lively and playful; frisky.
Synonyms: frolicky, frolicsome, rollicking, sportive.
Usage: The substitute teacher found himself entirely overwhelmed by the energetic seventh-graders, whose coltish antics disrupted the lesson time and time again.
Discuss
➖ @EngSkills ➖
Word of the Day
Word of the Day: credo
This word has appeared in 45 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
stand for (1)
If letters or symbols stand for something, they represent that thing.
➖ @EngSkills ➖
Word of the Day
exceptionable
Definition: (adjective) Open or liable to objection or debate; debatable.
Synonyms: objectionable.
Usage: We can't have perfection; and if I keep him, I must sustain his administration as a whole, even if there are, now and then, things that are exceptionable.
Discuss
➖ @EngSkills ➖
Language Log
Topolect: a Four-Body Problem
From Jeff DeMarco:
The fanfic fourth book in the sāntǐ 三体 ("three-body [problem]") series, translated by Ken Liu has the following sentence: http://languagelog.ldc.upenn.edu/~bgzimmer/baoshu.jpg Women dressed in flowing silk dresses oared elegant barges over the placid waterways, singing folk ditties in the gentle, refined accents of the Wu topolect …
fāngyán 方言 (lit., "place speech", i.e., "topolect; dialect")
Wú fāngyán 吳方言 ("Wu topolect") Wu (traditional Chinese: 吳語; simplified Chinese: 吴语; Wu romanization and IPA:ngu ngei [ŋu²³³.ŋə̰i²¹⁴], wu6 gniu6 [ɦu˩˩˧.n̠ʲy˩˩˧] (Shanghainese), ghou2 gniu6 [ɦou˨˨˦.n̠ʲy˨˧˩] (Suzhounese), Mandarin Wúyǔ [u³⁵ y²¹⁴]) is a major group of Sinitic languages spoken primarily in Shanghai, Zhejiang Province, and the part of Jiangsu Province south of the Yangtze River, which makes up the cultural region of Wu. Speakers of various Wu languages sometimes labelled their mother tongue as Shanghainese when introduced to foreigners. The Suzhou dialect was the prestige dialect of Wu as of the 19th century, but had been replaced in status by Shanghainese by the turn of the 20th century. The languages of Northern Wu are mutually intelligible with each other, while those of Southern Wu are not.
(Wikipedia) Selected readings
* "'The Three Body Problem' as rendered by Netflix: vinegar and dumplings'" (3/23/24)
* "Ken Liu reinvents Chinese characters" (12/5/16) — translator of The Three Body Problem
* "Ted Chiang uninvents Chinese characters" (5/13/16)
* "Bringing back the Cultural Revolution — in English" (5/28/21)
* "Thought panzers" (2/24/2) — on "River Elegy"
* "The Three-Body Problem: The 'unfilmable' Chinese sci-fi novel set to be Netflix's new hit 3 Body Problem", BBC (3/19/24), by James Balmont
* "'Topolect' is in China!" (4/14/18)
* "'Topolect' is spreading in China" (6/20/19)
* "Tianjin topolect: linguistic diversity in China (and India)" (4/29/24)
* "Crosstalk about topolects" (12/16/19)
* "Concentric circles of language in Beijing, part 2" (6/13/20)
* "Dialectometry" (4/26/24)
* "Topolect writing" (11/23/14)
* "The American Heritage Dictionary of the English Language, 5th edition" (11/14/12) — q.v. "topolect"
* "Mutual intelligibility" (5/28/14) — see the long list of posts linked at the bottom)
* "What Is a Chinese “Dialect/Topolect”? Reflections on Some Key Sino-English Linguistic Terms," Sino-Platonic Papers, 29 (1991).
➖ @EngSkills ➖
Slang of the Day | Vocabulary | EnglishClub
gross
disgusting, very unpleasant
➖ @EngSkills ➖
Idiom of the Day
the knacker's yard
A state of ruin or failure due to having become useless or obsolete. Refers to a slaughterhouse for old or injured horses. Watch the video
➖ @EngSkills ➖
Language Log
Little Italian girl talking with her hands
Are Italians by nature more manually voluble than other people?
Selected readings
* "Learning to speak Sicilian" (2/10/20) — some similar hand gestures
* "Baby talk" (12/21/10)
* "Baby talk, part 2" (8/19/18)
* "Twin talk" (3/31/11) — watch video here and here
* "The babbling phase: ranting toddler speaks out" (9/2/10)
* "Ask LL: parents' beliefs or infants' abilities?" (10/29/09)
* "Canine backtalk" (10/25/19)
* "Annoyed dog responding to the Islamic 'Call to Prayer'" (12/29/15)
* "Bird language" (6/15/17)
* "Barking roosters and crowing dogs" (2/18/18)
➖ @EngSkills ➖
Language Log
China VPN redux
Chapter 1
A professor in China who is collaborating with a famous American professor of Chinese literature wanted to read one of my Language Log (LL) posts because he had heard that it's being widely discussed around the world. However, because of China's rigid censorship rules, he couldn't open the LL post.
The Chinese professor asked the American professor to help him gain access to my post.
The American professor asked me to help the Chinese professor.
I suggested to the Chinese professor to use a VPN. Without a VPN, Chinese are not able to access LL, Wikipedia, Wiktionary, Google, X, etc., etc. In other words, without a VPN, Chinese are cut off from most of the information on the internet that is outside the Great Firewall, i.e., most of the cutting edge, valuable information in the world.
The Catch 22 is that it is a crime to use a VPN in China.
Can you imagine having to live in a benighted place like the PRC? Chapter 2
From a distinguished American professor (what he says may sound devious and hypocritical on the part of the Chinese authorities, and it is, but it doesn't surprise me in the slightest):
When I was in Hangzhou a decade or so ago, the university there had a series of grad students take us sightseeing to various places, and during such trips I had conversations with them about this very issue. Turns out most of the grad students in arts and/or humanities-related disciplines were without access to the banned VPNs and were thus, as expected, often seriously cut off from the outside world. But then the grad students in science-related fields were different. They told me quite openly that they were encouraged by faculty to use VPNs and did so as a matter of course. “After all, how could I possibly do any of my research without access to a VPN?” one told me. He added that he sometimes helped friends in other fields obtain VPNs because he felt so sorry for them. Chapter 3 — conclusion
This is further proof, if you didn't already have enough, that China is dependent on the West for basic ideas / information / knowledge / techniques in science and technology, and doesn't want to learn anything from the West when it comes to social sciences and humanities.
Unless and until it thoroughly recreates its educational, ontological, and epistemological priorities and procedures, it will be virtually impossible for China to succeed / flourish in the modern world, which is based on completely different premises, values, and modalities. Selected readings
* "Fissures in the Great Firewall caused by X" (6/10/24)
* "Shadowsocks" (2/8/18)
* "God use VPN" (12/28/15)
* "Mixing (or ignoring?) metaphors" (6/9/24)
* "Badge of honor: Language Log is blocked in China" (12/26/19)
* "The ultimate protest against censorship" (11/27/22)
* "The reality of censorship in the PRC" (10/13/16)
* "The face of censorship" (1/11/19)
* "Bad words on WeChat: go directly to jail" (12/17/17)
* "The letter * has bee* ba**ed in Chi*a" (2/26/18)
* "Censoring 'Occupy' in China" (10/24/11)
* "Using riddles to circumvent censorship in China" (3/6/18)
* "Peppa Pig has been purged" (5/2/18)
* "Censored letter" (12/19/14) — about a nine-year-old boy who suggested that Xi Jinping lose weight
* "Excessive quadrisyllabicism" (2/17/18)
* "Censored belly, Tibetan tattoo" (8/28/17)
* "Chinese translation app with built-in censorship" (11/29/18)
* "Lepus oryzinus" (2/10/18)
* "Banned in Beijing" (6/4/14)
* "Where's Xi?" (9/11/12)
* "Digraphia and intentional miswriting" (3/12/15)
* "It's not just puns that are being banned in China" (12/7/14)
* "Annals of literary vs. vernacular, part 2" (9/4/16)
* "The PRC censors its own national anthem" (2/9/20)
* "Hemorrhoids outbreak" (914/21)
* "Typos as a means for circumventing censorship" (7/22/22)
* "Circumventing censorship in th[...]
Word of the Day
Word of the Day: presumptuous
This word has appeared in 25 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖