Selected Top-tier Publication
[Note] Selected peer-reviewed papers listed below. For the full and most up-to-date publication list, see Google Scholar: Qi She
★ Selected Highlights

MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation

Beyond text-visual attention: Exploiting visual cues for effective token pruning in vlms

Mammothmoda: Multi-modal large language model

Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy

On Learning Contrastive Representations for Learning with Noisy Labels

Learning from Temporal Gradient for Semi-supervised Action Recognition

MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis

Involution: Inverting the Inherence of Convolution for Visual Recognition

ACTION-Net: Multipath Excitation for Action Recognition

OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning

Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM
Journal 7
-
Background-aware Classification Activation Map for Weakly Supervised Object Localization TPAMIIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023. - Generative Adversarial Networks in Time Series: A Systematic Literature Review CSURACM Computing Surveys (CSUR), 2023.
- Power Law in Deep Neural Networks: Sparse Network Generation and Continual Learning With Preferential Attachment TNNLSIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022.
-
Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy CSURACM Computing Surveys (CSUR), 2021. -
An Efficient and Flexible Spike Train Model via Empirical Bayes TSPIEEE Transactions on Signal Processing (TSP), 2021. -
-
Conference 23
- Video-KTR: Reinforcing Video Reasoning via Key Token Attribution ICLRInternational Conference on Learning Representations (ICLR), 2026.
-
ThinkGen: Generalized Thinking for Visual Generation CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2026. - CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2026.
-
UniEdit-I: Training-free Image Editing for Unified VLM via Iterative Understanding, Editing and Verifying CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2026. -
TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning ICLRInternational Conference on Learning Representations (ICLR), 2026. -
Branchgrpo: Stable and efficient grpo with structured branching in diffusion models ICLRInternational Conference on Learning Representations (ICLR), 2026. - Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs NeurIPSNeural Information Processing Systems (NeurIPS), 2025.
-
Beyond text-visual attention: Exploiting visual cues for effective token pruning in vlms ICCVInternational Conference on Computer Vision (ICCV), 2025. -
PDO-s3DCNNs: Partial Differential Operator Based Steerable 3D CNNs ICMLInternational Conference on Machine Learning (ICML), 2022. -
Weakly Supervised Object Localization as Domain Adaption CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2022. -
On Learning Contrastive Representations for Learning with Noisy Labels CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2022. -
Learning from Temporal Gradient for Semi-supervised Action Recognition CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2022. -
Unifying Nonlocal Blocks for Neural Networks ICCVInternational Conference on Computer Vision (ICCV), 2021. -
MT-ORL: Multi-Task Occlusion Relationship Learning ICCVInternational Conference on Computer Vision (ICCV), 2021. -
MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis ICCVInternational Conference on Computer Vision (ICCV), 2021. -
Involution: Inverting the Inherence of Convolution for Visual Recognition CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2021. -
ACTION-Net: Multipath Excitation for Action Recognition CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2021. -
Learning the Superpixel in a Non-iterative and Lifelong Manner CVPRConference on Computer Vision and Pattern Recognition (CVPR), 2021. -
OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning ICRAThe International Conference on Robotics and Automation (ICRA), 2020. -
Are We Ready for Service Robots? The OpenLORIS-Scene Datasets for Lifelong SLAM ICRAThe International Conference on Robotics and Automation (ICRA), 2020. - Neural Dynamics Discovery via Gaussian Process Recurrent Neural Networks UAIUncertainty in Artificial Intelligence (UAI), 2019.
- Reduced-Rank Linear Dynamical Systems AAAIAAAI Conference on Artificial Intelligence (AAAI), 2018.
-
Stochastic Dynamical Systems Based Latent Structure Discovery in High-dimensional Time Series ICASSPThe international Conference on Acoustics, Speech, & Signal Processing (ICASSP), 2018.
Workshop 3
-
Avalanche: an End-to-End Library for Continual LearningCVPR 2021, Workshop on Continual Learning in Computer Vision.. -
CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture RecognitionCVPR 2020, Workshop on Continual Learning in Computer Vision.. - A Neuro-AI Interface for Evaluating Generative Adversarial NetworksICLR 2020, Workshop on Bridging AI and Cognitive Science..
Preprint 9
-
MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and GenerationarXiv preprint arXiv:2511.18262, 2025. -
ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and BetterarXiv preprint arXiv:2511.17106, 2025. - On the Faithfulness of Visual Thinking: Measurement and EnhancementarXiv preprint arXiv:2510.23482, 2025.
- Loss-Oriented Ranking for Automated Visual Prompting in LVLMsarXiv preprint arXiv:2506.16112, 2025.
-
FastInit: Fast Noise Initialization for Temporally Consistent Video GenerationarXiv preprint arXiv:2506.16119, 2025. -
TimeSearch: Hierarchical video search with spotlight and reflection for human-like long video understandingarXiv preprint arXiv:2504.01407, 2025. -
ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure GuidancearXiv preprint arXiv:2412.06163, 2024. -
MC-LLaVA: Multi-concept personalized vision-language modelarXiv preprint arXiv:2411.11706, 2024. -
Patent 2
- Object identification based on adaptive learningUS Patent 12,511,887, 2025.
- Trajectory prediction using directed graph and destination featuresUS Patent 12,198,460, 2025.
No publications match these filters.