arXiv is an AI ecosystem entity tracked by the GROUNDING AI Knowledge Radar. This page collects dated mentions, source links, related concepts, and builder-relevant context for arXiv.

Recent Updates

  • 2026-06-08: Skip a Layer or Loop It? Learning Program-of-Layers in LLMs (cs.LG updates on arXiv.org) · arxiv.orgarXivLabs alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
  • 2026-06-08: Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals Context Engineering Qwen3-8B DeepSeek V4 Flash
  • 2026-06-08: CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions (cs.AI updates on arXiv.org) · arxiv.orgLLM Evals MIT Art of Problem Solving AoPS
  • 2026-06-08: Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals AllSides gpt-4o-mini Llama-3.3-70b
  • 2026-06-08: NTILC: Neural Tool Invocation via Learned Compression (cs.AI updates on arXiv.org) · arxiv.orgTool Use Agents Context Engineering
  • 2026-06-08: The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals
  • 2026-06-08: SCALE: Scalable DRL Scheduler for Agentic Workflows (cs.LG updates on arXiv.org) · arxiv.orgAgents
  • 2026-06-08: UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals Gemini 3.5 Flash
  • 2026-06-08: mmPISA-bench: Evaluating Multilingual LLM Reasoning Across 43 Languages (cs.CL updates on arXiv.org) · arxiv.orgOECD
  • 2026-06-08: MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring (cs.CL updates on arXiv.org) · arxiv.orgAgents RAG LLM Evals
  • 2026-06-08: The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment (cs.CL updates on arXiv.org) · arxiv.orgLlama-3.1-8B
  • 2026-06-08: Explicit Evidence Grounding via Structured Inline Citation Generation (cs.CL updates on arXiv.org) · arxiv.orgRAG RAG Evaluation
  • 2026-06-08: Evidence-Grounded Ensemble Diagnosis of 802.11 Packet Captures (cs.LG updates on arXiv.org) · arxiv.orgLLM Evals
  • 2026-06-08: OpenSkill: Open-World Self-Evolution for LLM Agents (cs.CL updates on arXiv.org) · arxiv.orgAgents
  • 2026-06-08: P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8 (cs.AI updates on arXiv.org) · arxiv.orgIEEE alphaXiv CatalyzeX DagsHub Hugging Face ScienceCast
  • 2026-06-09: PaperMentor: A Human-Centered Multi-Agent Writing Tutor for Overleaf (cs.CL updates on arXiv.org) · arxiv.orgAgents Overleaf GPT-5.2
  • 2026-06-09: REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces (cs.AI updates on arXiv.org) · arxiv.orgAgents
  • 2026-06-09: The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model (cs.CL updates on arXiv.org) · arxiv.orgLlama 3.1 8B Instruct
  • 2026-06-09: A Unifying View of Attention Sinks: Two Algorithms, Two Solutions (cs.LG updates on arXiv.org) · arxiv.org
  • 2026-06-09: ConSteer-RL: Confidence-Aware Reinforcement Learning for LLM Reasoning (cs.LG updates on arXiv.org) · arxiv.orgHugging Face
  • 2026-06-09: WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing (cs.LG updates on arXiv.org) · arxiv.orgHugging Face EAGLE-3 DFlash
  • 2026-06-09: Testing the Black Box: Structural Barriers to Independent Evaluation of Consumer-Facing Health LLMs (cs.AI updates on arXiv.org) · arxiv.orgLLM Evals Zeamanuel Tesfaye
  • 2026-06-09: Support Vector Rubrics: Closing the Gap Between Self-Generated and Human Rubrics (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals
  • 2026-06-09: Benchmarking Open-Ended Multi-Agent Coordination in Language Agents (cs.AI updates on arXiv.org) · arxiv.orgAgents LLM Evals Gemini-3.1-Pro-High GPT-5.4-High
  • 2026-06-09: More Yap Less Meaning: Uncovering Self-Improvement Behavior in SLMs (cs.CL updates on arXiv.org) · arxiv.orgHugging Face alphaXiv CatalyzeX DagsHub Gotit.pub ScienceCast
  • 2026-06-09: PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents (cs.AI updates on arXiv.org) · arxiv.orgAgents LLM Evals Qwen2.5
  • 2026-06-09: Representational Similarity and Model Behavior in Multi-Agent Interaction (cs.CL updates on arXiv.org) · arxiv.orgAgents
  • 2026-06-09: Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override (cs.CL updates on arXiv.org) · arxiv.org
  • 2026-06-09: Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents (cs.AI updates on arXiv.org) · arxiv.orgAgent Memory RAG Context Engineering Agents Reranking Opus Qwen Codex GPT 5.5 Qwen-QLoRA Qwen3.6-Plus Gemini-3.1-Pro-High Qwen3.5-122B-A10B
  • 2026-06-09: From ‘May’ to ‘Is’: Certainty Distortion in Language Model Rewriting (cs.CL updates on arXiv.org) · arxiv.orgclaude-haiku-4-5
  • 2026-06-09: When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery (cs.LG updates on arXiv.org) · arxiv.orgAgents A-Lab Neel Tushar Shah
  • 2026-06-09: TinyJudge: Improving LLM Instruction Following via Lightweight Specialist Ensembles (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals
  • 2026-06-09: Automatic Extraction of Structured Information from Brain MRI Reports Using LLaMA 3.1 (cs.AI updates on arXiv.org) · arxiv.orgHugging Face Llama 3.1
  • 2026-06-09: Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning (cs.AI updates on arXiv.org) · arxiv.orgAgent Memory GPT-2
  • 2026-06-09: Segment-level Tree Search for Long Meeting Summarization (cs.CL updates on arXiv.org) · arxiv.orgChunking alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
  • 2026-06-09: Semantic Cache Distillation: Efficient State Transfer via Reuse and Selective Patching (cs.LG updates on arXiv.org) · arxiv.orgHugging Face DagsHub Gotit.pub ScienceCast
  • 2026-06-09: Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity (cs.LG updates on arXiv.org) · arxiv.orgLLM Evals Hugging Face Florian E. Dorner
  • 2026-06-09: SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals Hugging Face Qwen2.5
  • 2026-06-09: AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding (cs.CL updates on arXiv.org) · arxiv.orgLLaDA Dream
  • 2026-06-09: ReadingMachine: A Computational Methodology for Structured Corpus Reading and Large-Scale Synthesis (cs.CL updates on arXiv.org) · arxiv.orgarXivLabs
  • 2026-06-09: ROSUM-MCTS: Monte Carlo Tree Search-Inspired HDL Code Summarization (cs.CL updates on arXiv.org) · arxiv.org — Prashanth Vijayaraghavan
  • 2026-06-09: Bidirectional Small-Granularity Search between Code and Text (cs.CL updates on arXiv.org) · arxiv.orgOpenAI Enrique Noriega-Atala GPT-4
  • 2026-06-09: MemoPilot: Training Memory Updates for LLM Agents with Reinforcement Learning (cs.CL updates on arXiv.org) · arxiv.orgAgent Memory Agents DeepSeek-V3.2
  • 2026-06-09: Debiasing Fine-Tuning via Post-Hoc Spectral Compression of Updates (cs.LG updates on arXiv.org) · arxiv.org
  • 2026-06-09: Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents (cs.AI updates on arXiv.org) · arxiv.orgAgents Tool Use
  • 2026-06-09: ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research (cs.LG updates on arXiv.org) · arxiv.orgAgents Code Agents LLM Evals Anthropic Claude Opus 4.7
  • 2026-06-09: Repetition Mismatch in Pre-training Data Mixture Optimization (cs.LG updates on arXiv.org) · arxiv.org
  • 2026-06-09: Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems (cs.AI updates on arXiv.org) · arxiv.orgLLM Evals Agents alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
  • 2026-06-09: A Framework for Evaluating and Benchmarking Concept Drift Detection Methods (cs.LG updates on arXiv.org) · arxiv.org
  • 2026-06-09: Building Comparative Motivation Profiles with Instrumental Interventions (cs.CL updates on arXiv.org) · arxiv.orgLLM Evals Hugging Face Llama-3.1-70B Llama-3.1-405B Qwen-2.5-72B

FAQ

What is arXiv?

arXiv is an AI ecosystem entity tracked by the GROUNDING AI Knowledge Radar. This page collects dated mentions, source links, related concepts, and builder-relevant context for arXiv.

What does this page track?

Dated radar mentions, source links, related concepts, and builder-relevant context for arXiv, collected automatically by GROUNDING.

When was arXiv last mentioned?

arXiv was most recently mentioned in a radar update dated 2026-06-09.