Today’s AI & Tech Briefing (June 21, 2026)

Today’s selection of 8 noteworthy AI/ML papers from arXiv, covering multi-agent bias propagation, knowledge conflict resolution, psychometric-aware educational AI, multimodal negation comprehension, malware classification, compiler optimization, retrieval learning, and off-policy evaluation with missing data.

1. Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

Authors: Zewen Liu | Categories: cs.LG, cs.AI, cs.MA Link: arxiv.org/abs/2606.20493v1

Introduces a formal framework for measuring how evaluator biases spread across interacting LLM agents. In controlled experiments using DeepSeek-chat, cross-agent bias contagion coefficients fell between 0.157 and 0.352, and increasing the evaluator committee size from k=1 to k=3 reduced effective contagion by 72.4%.

Takeaway: A timely framework that identifies a critical failure mode in multi-agent LLM systems—bias propagation—and provides a simple, actionable mitigation via larger evaluation committees.

2. Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference

Authors: Huang Peng, Jiuyang Tang, Weixin Zeng, Hao Xu, Xiang Zhao | Categories: cs.AI Link: arxiv.org/abs/2606.20245v1

Proposes MACR, a multi-agent reasoning framework that moves beyond privileging either internal or external knowledge by explicitly detecting and resolving conflicts between them. It uses semantic entropy to assess confidence and three specialized agents to induce rules, analyze conflicts, and resolve inconsistencies across all available contexts.

Takeaway: An important step toward reliable LLM reasoning in the real world, where both the model and its inputs can be wrong.

3. PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

Authors: Wei Xia, Jin Wu, Haoran Shi, Xiangyu Wang, Chanjin Zheng | Categories: cs.CL Link: arxiv.org/abs/2606.20287v1

Integrates psychometric modeling (Graded Partial Credit Model) into a neural scorer for interpretable ability estimation, then conditions multi-agent feedback strategies on that ability to provide Zone of Proximal Development-scaffolded instruction. Achieves competitive scoring while generating pedagogically aligned feedback.

Takeaway: Bridges the gap between reliable automated assessment and actionable, learner-adaptive feedback—a practical advance for AI in education.

4. Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

Authors: Haochen Han, Jue Wang, Alex Jinpeng Wang, Fangming Liu | Categories: cs.CV, cs.AI Link: arxiv.org/abs/2606.20177v1

Introduces RS-Neg, the first benchmark to evaluate negation understanding in remote sensing MLLMs, revealing that advanced models struggle with “what is not there.” Proposes NeFo, a test-time learning method using ~5% unlabeled test samples that significantly improves negation comprehension and generalizes to unseen tasks.

Takeaway: Addresses a critical blind spot for safety-critical remote sensing applications (e.g., finding non-flooded evacuation routes) with a lightweight, effective fix.

5. Multi-View Decompilation for LLM-Based Malware Classification

Authors: Bercan Turkmen, Vyas Raina | Categories: cs.CR, cs.AI Link: arxiv.org/abs/2606.20436v1

Shows that providing LLMs with decompiled pseudo-C from both Ghidra and RetDec for the same binary improves malicious-class F1 scores over single-decompiler baselines, primarily by increasing recall. Agreement analysis reveals the two decompilers make partially different errors, offering complementary evidence.

Takeaway: A simple, training-free prompting strategy that meaningfully boosts practical LLM-based malware triage.

6. AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Authors: Zepeng Li, Jie Ren, Zhanyong Tang, Jie Zheng, Zheng Wang | Categories: cs.SE, cs.AI Link: arxiv.org/abs/2606.20373v1

Opens the compiler to LLM agents, allowing them to query internal optimization states and intermediate representations to iteratively refine configurations using runtime feedback. Achieves geometric-mean speedups of 1.043x and 1.117x over LLVM -O3 on x86-64 and ARM64, respectively, without any fine-tuning.

Takeaway: A promising direction for applying LLMs to systems optimization by treating the compiler as an observable system rather than a black box.

7. ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

Authors: Yuhan Liu, Pei Fu, Hang Li, Yukun Qi, Chao Jiang et al. | Categories: cs.IR, cs.AI Link: arxiv.org/abs/2606.20280v1

Identifies “grain blindness” in contrastive learning for multimodal retrieval—models ignoring fine-grained query constraints. Proposes ELVA, a rule-based RL framework extending verifiable rewards to retrieval, treating negatives differentially based on similarity to positives. Achieves state-of-the-art results and a 13.1% improvement on the new MRBench benchmark.

Takeaway: Addresses a fundamental limitation of contrastive learning for retrieval tasks with a principled ranking-based approach.

8. Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

Authors: Ziheng Wei, Annie Qu, Rui Miao | Categories: stat.ML, cs.LG Link: arxiv.org/abs/2606.20206v1

Formalizes off-policy evaluation in Markov decision processes when rewards are missing not at random, a common issue in healthcare and marketing data. Uses future states as shadow variables and a bridge function to recover the conditional mean reward without explicit MNAR modeling, with strong results on simulated and MIMIC-III Sepsis data.

Takeaway: A rigorous theoretical and practical contribution for a realistic but understudied problem in offline reinforcement learning.

This content was generated with AI assistance. Paper information sourced from arXiv.

Today's AI & Tech Briefing (June 21, 2026)