“‘Video Analysis’ Tag”,2019-12-29 ():
![]()
Bibliography for tag
ai/video/analysis, most recent first: 2 related tags, 93 annotations, & 9 links (parent).
- See Also
- Links
- “CT Foundation: Taking Medical Imaging Embeddings 3D”, 2024
- “Long-Term Tracking of Social Structure in Groups of Rats”, et al 2024
- “Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-Modal LLMs in Video Analysis”, et al 2024
- “InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and Generation”, et al 2023
- “Test-Time Training on Video Streams”, et al 2023
- “Magenta Green Screen: Spectrally Multiplexed Alpha Matting With Deep Colorization”, et al 2023
- “PaLI-X: On Scaling up a Multilingual Vision and Language Model”, et al 2023
- “ImageBind: One Embedding Space To Bind Them All”, et al 2023
- “Scaling Vision Transformers to 22 Billion Parameters”, et al 2023
- “VideoCoCa: Video-Text Modeling With Zero-Shot Transfer from Contrastive Captioners”, et al 2022
- “VindLU: A Recipe for Effective Video-And-Language Pretraining”, et al 2022
- “Videogenic: Video Highlights via Photogenic Moments”, et al 2022
- “AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies”, et al 2022
- “Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends”, et al 2022
- “TVLT: Textless Vision-Language Transformer”, et al 2022
- “EVL: Frozen CLIP Models Are Efficient Video Learners”, et al 2022
- “X-CLIP: Expanding Language-Image Pretrained Models for General Video Recognition”, et al 2022
- “X-CLIP: End-To-End Multi-Grained Contrastive Learning for Video-Text Retrieval”, et al 2022
- “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos”, et al 2022
- “OmniMAE: Single Model Masked Pretraining on Images and Videos”, et al 2022
- “LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling”, et al 2022
- “MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing”, et al 2022
- “Uni-Perceiver-MoE: Learning Sparse Generalist Models With Conditional MoEs”, et al 2022
- “Revisiting the “Video” in Video-Language Understanding”, et al 2022
- “VidIL: Language Models With Image Descriptors Are Strong Few-Shot Video-Language Learners”, et al 2022
- “Masked Autoencoders As Spatiotemporal Learners”, et al 2022
- “Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-Time Planning”, et al 2022
- “ViS4mer: Long Movie Clip Classification With State-Space Video Models”, 2022
- “Socratic Models: Composing Zero-Shot Multimodal Reasoning With Language”, et al 2022
- “Reinforcement Learning With Action-Free Pre-Training from Videos”, et al 2022
- “CLIP Meets GamePhysics: Towards Bug Identification in Gameplay Videos Using Zero-Shot Transfer Learning”, et al 2022
- “Robot Peels Banana With Goal-Conditioned Dual-Action Deep Imitation Learning”, et al 2022
- “Hierarchical Perceiver”, et al 2022
- “MuZero With Self-Competition for Rate Control in VP9 Video Compression”, et al 2022
- “BLIP: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation”, et al 2022
- “MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition”, et al 2022
- “CAST: Character Labeling in Animation Using Self-Supervision by Tracking”, et al 2022
- “AV-HuBERT: Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction”, et al 2022
- “Noether Networks: Meta-Learning Useful Conserved Quantities”, et al 2021
- “MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions”, et al 2021
- “MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video”, et al 2021
- “Florence: A New Foundation Model for Computer Vision”, et al 2021
- “Scaling ASR Improves Zero and Few Shot Learning”, et al 2021
- “ADOP: Approximate Differentiable One-Pixel Point Rendering”, et al 2021
- “VideoCLIP: Contrastive Pre-Training for Zero-Shot Video-Text Understanding”, et al 2021
- “Perceiver IO: A General Architecture for Structured Inputs & Outputs”, et al 2021
- “CLIP-It! Language-Guided Video Summarization”, et al 2021
- “CLIP2Video: Mastering Video-Text Retrieval via Image CLIP”, et al 2021
- “Revisiting ResNets: Improved Training and Scaling Strategies”, et al 2021
- “Learning from Videos to Understand the World”, et al 2021
- “Perceiver: General Perception With Iterative Attention”, et al 2021
- “Video Transformer Network”, et al 2021
- “Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning”, et al 2021
- “MSR-VTT: A Large Video Description Dataset for Bridging Video and Language”, et al 2021
- “CLIP: Learning Transferable Visual Models From Natural Language Supervision”, et al 2021
- “Transformers in Vision: A Survey”, et al 2021
- “Object-Based Attention for Spatio-Temporal Reasoning: Outperforming Neuro-Symbolic Models With Flexible Distributed Architectures”, et al 2020
- “Accuracy and Performance Comparison of Video Action Recognition Approaches”, et al 2020
- “Self-Supervised Learning through the Eyes of a Child”, et al 2020
- “Gesticulator: A Framework for Semantically-Aware Speech-Driven Gesture Generation”, et al 2020
- “SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded from the Infant’s Perspective”, et al 2020
- “Axial Attention in Multidimensional Transformers”, et al 2019
- “CATER: A Diagnostic Dataset for Compositional Actions and TEmporal Reasoning”, 2019
- “CLEVRER: CoLlision Events for Video REpresentation and Reasoning”, et al 2019
- “Training Kinetics in 15 Minutes: Large-Scale Distributed Training on Videos”, et al 2019
- “A Short Note on the Kinetics-700 Human Action Dataset”, et al 2019
- “Billion-Scale Semi-Supervised Learning for Image Classification”, et al 2019
- “VideoBERT: A Joint Model for Video and Language Representation Learning”, et al 2019
- “Real-Time Continuous Transcription With Live Transcribe”, 2019
- “CCNet: Criss-Cross Attention for Semantic Segmentation”, et al 2018
- “Evolving Space-Time Neural Architectures for Videos”, et al 2018
- “Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow”, et al 2018
- “A Short Note about Kinetics-600”, et al 2018
- “Large-Scale Visual Speech Recognition”, et al 2018
- “Playing Hard Exploration Games by Watching YouTube”, et al 2018
- “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning”, et al 2018
- “The Sound of Pixels”, et al 2018
- “One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning”, et al 2018
- “Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition”, et al 2017
- “Reinforced Video Captioning With Entailment Rewards”, 2017
- “Tracking As Online Decision-Making: Learning a Policy from Streaming Videos With Reinforcement Learning”, III & 2017
- “Learning to Learn from Noisy Web Videos”, et al 2017
- “Quo Vadis, Action Recognition? A New Model I3D and the Kinetics Dataset”, 2017
- “The Kinetics Human Action Video Dataset”, et al 2017
- “Dense-Captioning Events in Videos”, et al 2017
- “Time-Contrastive Networks: Self-Supervised Learning from Video”, et al 2017
- “LipNet: End-To-End Sentence-Level Lipreading”, et al 2016
- “Deep Visual Foresight for Planning Robot Motion”, 2016
- “Temporal Convolutional Networks: A Unified Approach to Action Segmentation”, et al 2016
- “Clockwork Convnets for Video Semantic Segmentation”, et al 2016
- “Artistic Style Transfer for Videos”, et al 2016
- “YFCC100M: The New Data in Multimedia Research”, et al 2015
- “UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild”, et al 2012
- Sort By Magic
- Miscellaneous
- Bibliography