arXiv

arXiv is an AI ecosystem entity tracked by the GROUNDING AI Knowledge Radar. This page collects dated mentions, source links, related concepts, and builder-relevant context for arXiv.

Recent Updates

2026-06-08: Skip a Layer or Loop It? Learning Program-of-Layers in LLMs (cs.LG updates on arXiv.org) · arxiv.org — arXivLabs alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
2026-06-08: Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals Context Engineering Qwen3-8B DeepSeek V4 Flash
2026-06-08: CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions (cs.AI updates on arXiv.org) · arxiv.org — LLM Evals MIT Art of Problem Solving AoPS
2026-06-08: Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals AllSides gpt-4o-mini Llama-3.3-70b
2026-06-08: NTILC: Neural Tool Invocation via Learned Compression (cs.AI updates on arXiv.org) · arxiv.org — Tool Use Agents Context Engineering
2026-06-08: The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals
2026-06-08: SCALE: Scalable DRL Scheduler for Agentic Workflows (cs.LG updates on arXiv.org) · arxiv.org — Agents
2026-06-08: UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals Gemini 3.5 Flash
2026-06-08: mmPISA-bench: Evaluating Multilingual LLM Reasoning Across 43 Languages (cs.CL updates on arXiv.org) · arxiv.org — OECD
2026-06-08: MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring (cs.CL updates on arXiv.org) · arxiv.org — Agents RAG LLM Evals
2026-06-08: The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment (cs.CL updates on arXiv.org) · arxiv.org — Llama-3.1-8B
2026-06-08: Explicit Evidence Grounding via Structured Inline Citation Generation (cs.CL updates on arXiv.org) · arxiv.org — RAG RAG Evaluation
2026-06-08: Evidence-Grounded Ensemble Diagnosis of 802.11 Packet Captures (cs.LG updates on arXiv.org) · arxiv.org — LLM Evals
2026-06-08: OpenSkill: Open-World Self-Evolution for LLM Agents (cs.CL updates on arXiv.org) · arxiv.org — Agents
2026-06-08: P-Cast Precision in FP8 Attention: Sink-Induced Collapse and the Optimality of S=2^8 (cs.AI updates on arXiv.org) · arxiv.org — IEEE alphaXiv CatalyzeX DagsHub Hugging Face ScienceCast
2026-06-09: PaperMentor: A Human-Centered Multi-Agent Writing Tutor for Overleaf (cs.CL updates on arXiv.org) · arxiv.org — Agents Overleaf GPT-5.2
2026-06-09: REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces (cs.AI updates on arXiv.org) · arxiv.org — Agents
2026-06-09: The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model (cs.CL updates on arXiv.org) · arxiv.org — Llama 3.1 8B Instruct
2026-06-09: A Unifying View of Attention Sinks: Two Algorithms, Two Solutions (cs.LG updates on arXiv.org) · arxiv.org
2026-06-09: ConSteer-RL: Confidence-Aware Reinforcement Learning for LLM Reasoning (cs.LG updates on arXiv.org) · arxiv.org — Hugging Face
2026-06-09: WhiFlash: Accelerating Speculative Decoding with Token-Level Cross-Paradigm Routing (cs.LG updates on arXiv.org) · arxiv.org — Hugging Face EAGLE-3 DFlash
2026-06-09: Testing the Black Box: Structural Barriers to Independent Evaluation of Consumer-Facing Health LLMs (cs.AI updates on arXiv.org) · arxiv.org — LLM Evals Zeamanuel Tesfaye
2026-06-09: Support Vector Rubrics: Closing the Gap Between Self-Generated and Human Rubrics (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals
2026-06-09: Benchmarking Open-Ended Multi-Agent Coordination in Language Agents (cs.AI updates on arXiv.org) · arxiv.org — Agents LLM Evals Gemini-3.1-Pro-High GPT-5.4-High
2026-06-09: More Yap Less Meaning: Uncovering Self-Improvement Behavior in SLMs (cs.CL updates on arXiv.org) · arxiv.org — Hugging Face alphaXiv CatalyzeX DagsHub Gotit.pub ScienceCast
2026-06-09: PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents (cs.AI updates on arXiv.org) · arxiv.org — Agents LLM Evals Qwen2.5
2026-06-09: Representational Similarity and Model Behavior in Multi-Agent Interaction (cs.CL updates on arXiv.org) · arxiv.org — Agents
2026-06-09: Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override (cs.CL updates on arXiv.org) · arxiv.org
2026-06-09: Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents (cs.AI updates on arXiv.org) · arxiv.org — Agent Memory RAG Context Engineering Agents Reranking Opus Qwen Codex GPT 5.5 Qwen-QLoRA Qwen3.6-Plus Gemini-3.1-Pro-High Qwen3.5-122B-A10B
2026-06-09: From ‘May’ to ‘Is’: Certainty Distortion in Language Model Rewriting (cs.CL updates on arXiv.org) · arxiv.org — claude-haiku-4-5
2026-06-09: When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery (cs.LG updates on arXiv.org) · arxiv.org — Agents A-Lab Neel Tushar Shah
2026-06-09: TinyJudge: Improving LLM Instruction Following via Lightweight Specialist Ensembles (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals
2026-06-09: Automatic Extraction of Structured Information from Brain MRI Reports Using LLaMA 3.1 (cs.AI updates on arXiv.org) · arxiv.org — Hugging Face Llama 3.1
2026-06-09: Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning (cs.AI updates on arXiv.org) · arxiv.org — Agent Memory GPT-2
2026-06-09: Segment-level Tree Search for Long Meeting Summarization (cs.CL updates on arXiv.org) · arxiv.org — Chunking alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
2026-06-09: Semantic Cache Distillation: Efficient State Transfer via Reuse and Selective Patching (cs.LG updates on arXiv.org) · arxiv.org — Hugging Face DagsHub Gotit.pub ScienceCast
2026-06-09: Cutting LLM Evaluation Costs with SySRs: A Bandit Algorithm that Provably Exploits Model Similarity (cs.LG updates on arXiv.org) · arxiv.org — LLM Evals Hugging Face Florian E. Dorner
2026-06-09: SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals Hugging Face Qwen2.5
2026-06-09: AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding (cs.CL updates on arXiv.org) · arxiv.org — LLaDA Dream
2026-06-09: ReadingMachine: A Computational Methodology for Structured Corpus Reading and Large-Scale Synthesis (cs.CL updates on arXiv.org) · arxiv.org — arXivLabs
2026-06-09: ROSUM-MCTS: Monte Carlo Tree Search-Inspired HDL Code Summarization (cs.CL updates on arXiv.org) · arxiv.org — Prashanth Vijayaraghavan
2026-06-09: Bidirectional Small-Granularity Search between Code and Text (cs.CL updates on arXiv.org) · arxiv.org — OpenAI Enrique Noriega-Atala GPT-4
2026-06-09: MemoPilot: Training Memory Updates for LLM Agents with Reinforcement Learning (cs.CL updates on arXiv.org) · arxiv.org — Agent Memory Agents DeepSeek-V3.2
2026-06-09: Debiasing Fine-Tuning via Post-Hoc Spectral Compression of Updates (cs.LG updates on arXiv.org) · arxiv.org
2026-06-09: Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents (cs.AI updates on arXiv.org) · arxiv.org — Agents Tool Use
2026-06-09: ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research (cs.LG updates on arXiv.org) · arxiv.org — Agents Code Agents LLM Evals Anthropic Claude Opus 4.7
2026-06-09: Repetition Mismatch in Pre-training Data Mixture Optimization (cs.LG updates on arXiv.org) · arxiv.org
2026-06-09: Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems (cs.AI updates on arXiv.org) · arxiv.org — LLM Evals Agents alphaXiv CatalyzeX DagsHub Gotit.pub Hugging Face ScienceCast
2026-06-09: A Framework for Evaluating and Benchmarking Concept Drift Detection Methods (cs.LG updates on arXiv.org) · arxiv.org
2026-06-09: Building Comparative Motivation Profiles with Instrumental Interventions (cs.CL updates on arXiv.org) · arxiv.org — LLM Evals Hugging Face Llama-3.1-70B Llama-3.1-405B Qwen-2.5-72B

FAQ

What is arXiv?

arXiv is an AI ecosystem entity tracked by the GROUNDING AI Knowledge Radar. This page collects dated mentions, source links, related concepts, and builder-relevant context for arXiv.

What does this page track?

Dated radar mentions, source links, related concepts, and builder-relevant context for arXiv, collected automatically by GROUNDING.

When was arXiv last mentioned?

arXiv was most recently mentioned in a radar update dated 2026-06-09.

GROUNDING

Explorer

Recent Updates

FAQ

What is arXiv?

What does this page track?

When was arXiv last mentioned?

Graph View

Table of Contents

Backlinks