j_links | Unsorted

Telegram-канал j_links - Just links

5628

That's just link aggregator of everything I consider interesting, especially DL and topological condensed matter physics. @EvgeniyZh

Subscribe to a channel

Just links

Show HN: Factorio Learning Environment – Agents Build Factories (🔥 Score: 159+ in 2 hours)

Link: https://readhacker.news/s/6qKug
Comments: https://readhacker.news/c/6qKug

I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).
FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.
A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.
The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.
Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.
Agents interact with FLE through a REPL pattern:
1. They observe the world (seeing the output of their last action)
2. Generate Python code to perform their next action
3. Receive detailed feedback (including exceptions and stdout)
We provide two main evaluation settings:
- Lab-play: 24 structured tasks with fixed resources
- Open-play: An unbounded task of building the largest possible factory on a procedurally generated map
We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).
The code is available at https://github.com/JackHopkins/factorio-learning-environment.
You'll need:
- Factorio (version 1.1.110)
- Docker
- Python 3.10+
The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.
We would love to hear your thoughts and see what others can do with this framework!

Читать полностью…

Just links

Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation https://arxiv.org/abs/2503.03492

Читать полностью…

Just links

Finite-temperature quantum topological order in three dimensions https://arxiv.org/abs/2503.02928

Читать полностью…

Just links

Warning Signs of a Possible Collapse of Contemporary Mathematics https://web.math.princeton.edu/~nelson/papers/warn.pdf

Читать полностью…

Just links

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning https://arxiv.org/abs/2502.13834

Читать полностью…

Just links

https://fixupx.com/deepseek_ai/status/1895279409185390655

Читать полностью…

Just links

Comment on "InAs-Al hybrid devices passing the topological gap protocol", Microsoft Quantum, Phys. Rev. B 107, 245423 (2023) https://arxiv.org/abs/2502.19560

Читать полностью…

Just links

Welcome to Ladybird, a truly independent web browser (🔥 Score: 150+ in 1 hour)

Link: https://readhacker.news/s/6pZHn
Comments: https://readhacker.news/c/6pZHn

Читать полностью…

Just links

Hardware-efficient quantum error correction via concatenated bosonic qubits https://www.nature.com/articles/s41586-025-08642-7

Читать полностью…

Just links

Universal Quantum Computation with the S3 Quantum Double: A Pedagogical Exposition https://arxiv.org/abs/2502.14974

Читать полностью…

Just links

Muon is Scalable for LLM Training https://github.com/MoonshotAI/Moonlight

Читать полностью…

Just links

BaxBench: Can LLMs Generate Secure and Correct Backends? https://baxbench.com/

Читать полностью…

Just links

mathconstruct: Challenging LLM Reasoning with Constructive Proofs https://arxiv.org/abs/2502.10197

Читать полностью…

Just links

Interferometric single-shot parity measurement in InAs–Al hybrid devices https://www.nature.com/articles/s41586-024-08445-2

Читать полностью…

Just links

Roadmap to fault tolerant quantum computation using topological qubit arrays https://arxiv.org/abs/2502.12252

Читать полностью…

Just links

https://knzhou.github.io/handouts/Prelim.pdf
via @avvablog

Читать полностью…

Just links

GamingAgent - Personal Computer Gaming Agent https://github.com/lmgame-org/GamingAgent

Читать полностью…

Just links

Enforced Gaplessness from States with Exponentially Decaying Correlations https://arxiv.org/abs/2503.01977

Читать полностью…

Just links

Anomalies of Coset Non-Invertible Symmetries https://arxiv.org/abs/2503.00105

Читать полностью…

Just links

Zagier, D. (1990). How Often Should You Beat Your Kids? Mathematics Magazine63(2), 89–92. https://doi.org/10.1080/0025570X.1990.11977493

<...>
We, however, maintain that only the most degenerate parent would play against a two-year-old for money, and that our concern must therefore be, not by how much you can expect to win, but with what probability you will win at all. Our principal result is that this probability tends asymptotically to 85.4% (more precisely: to 1/2 + 1/sqrt(8)) as n tends to infinity. This shows with what unerring instinct Levasseur's mother selected the game — the high 85% loss rate will instill in the young progeny a due respect for the immense superiority of their parents, while the 15% win rate will maintain their interest and prevent them from succumbing to feelings of hopelessness and frustration.
<...>

Читать полностью…

Just links

't Hooft anomalies in metals https://arxiv.org/abs/2502.19471

Читать полностью…

Just links

The three-dimensional Kakeya conjecture, after Wang and Zahl
https://terrytao.wordpress.com/2025/02/25/the-three-dimensional-kakeya-conjecture-after-wang-and-zahl
via @cme_channel

Читать полностью…

Just links

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language https://arxiv.org/abs/2405.12856

Читать полностью…

Just links

Detecting emergent 1-form symmetries with quantum error correction https://arxiv.org/abs/2502.17572

Читать полностью…

Just links

LIMO: Less is More for Reasoning https://arxiv.org/abs/2502.03387

Читать полностью…

Just links

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving https://arxiv.org/abs/2502.00212

Читать полностью…

Just links

Topology from Nothing https://arxiv.org/abs/2502.12121

Читать полностью…

Just links

https://fixupx.com/IanCutress/status/1892246045385515266

Читать полностью…

Just links

Scaling Test-Time Compute Without Verification or RL is Suboptimal https://arxiv.org/abs/2502.12118

Читать полностью…

Just links

Extracting the topological spins from bulk multipartite entanglement https://arxiv.org/abs/2502.12259

Читать полностью…
Subscribe to a channel