タグ: cs.CV | Cog AI Archive

Cog AI Archive

最新の記事

公開記事: 64件タグ: cs.CV

RedSage: サイバーセキュリティに特化した汎用LLM

TL;DRRedSageは、プライバシーリスクを回避しつつ多様なセキュリティ業務を支援するために開発された、オープンソースのサイバーセキュリティ特化型LLMである。118億トークンの専門データによる事前学習と、エージェントベースのパイプラインで生成された26.6万件の指示データによる調整を経て、既存モデルを上回る性能を実現した。

AI研究 2026-01-29 閲覧 40 クリック 16

最新人気保存済み長文のみ

AI研究 2026-01-27 長文

MATA: マルチエージェント視覚推論のための訓練可能な階層オートマトンシステム

TL;DRMATA: マルチエージェント視覚推論のための訓練可能な階層オートマトンシステム arXiv 要約背景 Recent vision-language models have strong perceptual ability but their implicit reasoning is hard to explain and easily generates hallucinations on complex queries. Compositional methods improve interpretability, but most rely on a single agent or hand-crafted pipeline and cannot decide when to collaborate across complementary agents or compete among overlapping ones. We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as a hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. Each agent corresponds to a state in the hyper automaton, and runs a small rule-based sub-automaton for reliable micro-control. All agents read and write a shared memory, yielding transparent execution history. To supervise the hyper agent's transition policy, we build transition-trajectory trees and transform to memory-to-next-state pairs, forming the MATA-SFT-90K dataset for supervised finetuning (SFT). The finetuned LLM as the transition policy understands the query and the capacity of agents, and it can efficiently choose the optimal agent to solve the task. Across multiple visual reasoning benchmarks, MATA achieves the state-of-the-art results compared with monolithic and compositional baselines. The code and dataset are available at https://github.com/ControlNet/MATA.。技術要点 Recent vision-language models have strong perceptual ability but their implicit reasoning is hard to explain and easily generates hallucinations on complex queries. Compositional methods improve interpretability, but most rely on a single agent or hand-crafted pipeline and cannot decide when to collaborate across complementary agents or compete among overlapping ones. We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as a hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. Each agent corresponds to a state in the hyper automaton, and runs a small rule-based sub-automaton for reliable micro-control. All agents read and write a shared memory, yielding transparent execution history. To supervise the hyper agent's transition policy, we build transition-trajectory trees and transform to memory-to-next-state pairs, forming the MATA-SFT-90K dataset for supervised finetuning (SFT). The finetuned LLM as the transition policy understands the query and the capacity of agents, and it can efficiently choose the optimal agent to solve the task. Across multiple visual reasoning benchmarks, MATA achieves the state-of-the-art results compared with monolithic and compositional baselines. The code and dataset are available at https://github.com/ControlNet/MATA.。インパクト Recent vision-language models have strong perceptual ability but their implicit reasoning is hard to explain and easily generates hallucinations on complex queries. Compositional methods improve interpretability, but most rely on a single agent or hand-crafted pipeline and cannot decide when to collaborate across complementary agents or compete among overlapping ones. We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as a hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. Each agent corresponds to a state in the hyper automaton, and runs a small rule-based sub-automaton for reliable micro-control. All agents read and write a shared memory, yielding transparent execution history. To supervise the hyper agent's transition policy, we build transition-trajectory trees and transform to memory-to-next-state pairs, forming the MATA-SFT-90K dataset for supervised finetuning (SFT). The finetuned LLM as the transition policy understands the query and the capacity of agents, and it can efficiently choose the optimal agent to solve the task. Across multiple visual reasoning benchmarks, MATA achieves the state-of-the-art results compared with monolithic and compositional baselines. The code and dataset are available at https://github.com/ControlNet/MATA.。限界 Recent vision-language models have strong perceptual ability but their implicit reasoning is hard to explain and easily generates hallucinations on complex queries. Compositional methods improve interpretability, but most rely on a single agent or hand-crafted pipeline and cannot decide when to collaborate across complementary agents or compete among overlapping ones. We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as a hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. Each agent corresponds to a state in the hyper automaton, and runs a small rule-based sub-automaton for reliable micro-control. All agents read and write a shared memory, yielding transparent execution history. To supervise the hyper agent's transition policy, we build transition-trajectory trees and transform to memory-to-next-state pairs, forming the MATA-SFT-90K dataset for supervised finetuning (SFT). The finetuned LLM as the transition policy understands the query and the capacity of agents, and it can efficiently choose the optimal agent to solve the task. Across multiple visual reasoning benchmarks, MATA achieves the state-of-the-art results compared with monolithic and compositional baselines. The code and dataset are available at https://github.com/ControlNet/MATA.。次の一手 Recent vision-language models have strong perceptual ability but their implicit reasoning is hard to explain and easily generates hallucinations on complex queries. Compositional methods improve interpretability, but most rely on a single agent or hand-crafted pipeline and cannot decide when to collaborate across complementary agents or compete among overlapping ones. We introduce MATA (Multi-Agent hierarchical Trainable Automaton), a multi-agent system presented as a hierarchical finite-state automaton for visual reasoning whose top-level transitions are chosen by a trainable hyper agent. Each agent corresponds to a state in the hyper automaton, and runs a small rule-based sub-automaton for reliable micro-control. All agents read and write a shared memory, yielding transparent execution history. To supervise the hyper agent's transition policy, we build transition-trajectory trees and transform to memory-to-next-state pairs, forming the MATA-SFT-90K dataset for supervised finetuning (SFT). The finetuned LLM as the transition policy understands the query and the capacity of agents, and it can efficiently choose the optimal agent to solve the task. Across multiple visual reasoning benchmarks, MATA achieves the state-of-the-art results compared with monolithic and compositional baselines. The code and dataset are available at https://github.com/ControlNet/MATA.。 --- 論文情報 - URL: http://arxiv.org/abs/2601.19204v1。

読了 0 分 7107 字

読む →

AI研究 2026-01-27

SNR-Edit: 逆変換不要のフローベース編集のための構造認識ノイズ補正

TL;DRInversion-free image editing using flow-based generative models challenges the prevailing inversion-based pipelines.

読了 0 分 306 字

読む →

AI研究 2026-01-27

CLIPガイドによる教師なし意味論的露出補正

TL;DR不適切な露出画像の補正において、被写体の意味情報を考慮しないことによる色ずれや、教師データの欠如という課題を解決する手法を提案。事前学習済みのFastSAMから抽出した意味情報を統合するネットワークと、CLIPを用いた疑似正解画像の自動生成機構を導入し、教師なし学習を実現した。既存手法を上回る補正性能を示している。

読了 0 分 1256 字

読む →

AI研究 2026-01-27

LEMON：MLLMは教育ビデオにおける時間的なマルチモーダル理解をどれほどうまく行えるか？

TL;DR最近のMLLMは視覚・音声・言語タスクで進歩していますが、長尺で知識集約的な教育コンテンツにおける性能は未知数でした。本研究では、STEM講義動画に特化したベンチマーク「LEMON」を提案します。2,277の動画セグメントと4,181のQAペアを含み、GPT-4o等の最先端モデルでも時間的推論や予測に苦戦することを明らかにしました。

読了 0 分 1378 字

読む →

AI研究 2026-01-27

EPAS：漸進的な活性化共有による効率的な学習

TL;DR本研究は、Transformerモデルの深層におけるアクティベーションの冗長性を利用した新しいトレーニング手法「EPAS」を提案します。トレーニング中にデコーダ層のアクティベーション共有領域を徐々に拡大することで、計算コストを削減します。

読了 0 分 1450 字

読む →

AI研究 2026-01-26

FreeOrbit4D：幾何学的完全な4D再構成による単眼動画のための学習不要な任意のカメラリダイレクション

TL;DR単眼動画から任意のカメラ軌道でシーンを再生する技術において、大きな角度変化に伴う幾何学的曖昧さと時間的不整合の問題を解決する「FreeOrbit4D」を提案。前景と背景を分離し、拡散モデルを用いて幾何学的に完全な4Dプロキシを再構成することで、学習不要で高品質な視点変更動画の生成を実現しました。

読了 0 分 1277 字

読む →

AI研究 2026-01-26

RealStats：偽画像検出のための実画像のみを用いた統計的フレームワーク

TL;DR生成モデルの進化に伴い、AI生成画像の検知は重要課題である。既存手法は解釈可能性や分布の変化への堅牢性に課題があった。本研究は、学習不要で実画像分布との整合性を統計的に評価する「RealStats」を提案。複数の統計量をp値として統合し、実画像集団に対する解釈可能な確率スコアを提供する汎用的なフレームワークである。

読了 0 分 1298 字

読む →

AI研究 2026-01-26

マルチエージェントロボットシステム（MARS）チャレンジにおける進展と革新

TL;DREmbodied AIの進展に伴い、複雑なタスクに対応可能なマルチエージェントシステムの重要性が増しています。本論文では、NeurIPS 2025で開催された「MARSチャレンジ」を提案・概説します。

読了 0 分 1553 字

読む →

AI研究 2026-01-26

AdaReasoner：反復的な視覚的推論のための動的なツールのオーケストレーション

TL;DRAdaReasonerは、マルチモーダル大規模言語モデル（MLLM）において、ツール使用を特定の行動としてではなく一般的な推論スキルとして学習するモデルファミリーです。大規模なデータパイプライン、強化学習（Tool-GRPO）、適応学習メカニズムを組み合わせることで、未知のツールへの汎化やタスクに応じたツールの自律的な選択・抑制を実現しました。

読了 0 分 1353 字

読む →

AI研究 2026-01-26

幾何学的形態測定学の機械学習応用におけるプロクラステス汚染について

TL;DR幾何学的形態測定学（GMM）を機械学習に適用する際、データ分割前に全標本を位置合わせする一般化プロクラステス分析（GPA）が一般的だが、これがデータ汚染を引き起こす可能性がある。本研究はこの汚染の影響を定量化し、訓練データに合わせてテストデータを位置合わせする新しい手法を提案することで、この問題を回避する方法を示した。

読了 0 分 1324 字

読む →

保存済みの記事がまだありません。

読み込み中…

前へ次へ

生成AIの最新動向を、読みやすいアーカイブで。

RedSage: サイバーセキュリティに特化した汎用LLM

最新の記事

RedSage: サイバーセキュリティに特化した汎用LLM

MATA: マルチエージェント視覚推論のための訓練可能な階層オートマトンシステム

SNR-Edit: 逆変換不要のフローベース編集のための構造認識ノイズ補正

CLIPガイドによる教師なし意味論的露出補正

LEMON：MLLMは教育ビデオにおける時間的なマルチモーダル理解をどれほどうまく行えるか？

EPAS：漸進的な活性化共有による効率的な学習

FreeOrbit4D：幾何学的完全な4D再構成による単眼動画のための学習不要な任意のカメラリダイレクション

RealStats：偽画像検出のための実画像のみを用いた統計的フレームワーク

マルチエージェントロボットシステム（MARS）チャレンジにおける進展と革新

AdaReasoner：反復的な視覚的推論のための動的なツールのオーケストレーション

幾何学的形態測定学の機械学習応用におけるプロクラステス汚染について

論文×チャット×知識DBを続けるなら

Free

Plus

Pro