Recent Advances in Computer Vision Archives

Instructor

Danna Gurari (first name pronounced “dah-nah”, similar to “Donna”; last name rhymes with Ferrari)

If easier, feel free to just call me Dr. G.

Course Manager

Josh Myers-Dean: josh.myers-dean@colorado.edu

Josh will host office hours weekly on Tuesdays from 2-3pm via Zoom. To attend, please sign-up on the spreadsheet shared in Canvas.

Location

AERO 114 (3775 Discovery Drive)

Syllabus

You can find more details about the course in the syllabus.

Course Platforms

All course materials and communications will take place on this website and in Canvas.

Acknowledgements

We are fortunate to receive a Google Cloud Education grant and an NSF Explore Access grant to support our use of state-of-the-art AI methods.

Date	Topics (lecture slides hyperlinked)	Assigned Readings (due prior to class)	Assignments (due prior to class and posted on Canvas)
Mon, Aug 26	Course Introduction
Wed, Aug 28	Rise of Neural Networks	How to Read a CS Research Paper, How to Read an Engineering Research Paper
Mon, Sep 2	No Class (Labor Day)
Wed, Sep 4	Object Recognition: Dataset Challenges and Fundamentals of CNNs	ImageNet: A large-scale hierarchical image database	Reading Assignment
Mon, Sep 9	Object Recognition: CNNs	ImageNet classification with deep convolutional neural networks (AlexNet)	Reading Assignment
Wed, Sep 11	Object Recognition: Transformers	An Image is Worth 16X16 Words: Transformers for Image Recognition at Scale (ViT)	Reading Assignment
Mon, Sep 16	Scene and Attribute Classification	Learning Deep Features for Scene Recognition using Places Database	Reading Assignment
Wed, Sep 18	Semantic Segmentation	Fully Convolutional Networks for Semantic Segmentation	Reading Assignment
Mon, Sep 23	Object Detection	End-to-End Object Detection with Transformers (DETR)	Reading Assignment
Wed, Sep 25	Instance Segmentation		Project Proposal
Mon, Sep 30	Object Tracking	Tracking Without Bells and Whistles	Reading Assignment
Wed, Oct 2	Vision-Language Tasks: Image Captioning and Visual Question Answering	VQA: Visual Question Answering	Reading Assignment
Mon, Oct 7	Foundation Models and Prompts	Foundational Models Defining a New Era in Vision: A Survey and Outlook	Reading Assignment
Wed, Oct 9	Unpaired Image Translation: Style Transfer		Project Outline
Mon, Oct 14	Scene Text Recognition	Revisiting Scene Text Recognition: A Data Perspective	Reading Assignment
Wed, Oct 16	Object Part Segmentation	Required: Towards Open-World Segmentation of Parts; Optional: PartImageNet: A Large, High-Quality Dataset of Parts	Reading Assignment
Mon, Oct 21	Gaze Estimation	Required: EFE: End-to-end Frame-to-Gaze Estimation; Optional: MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation	Reading Assignment
Wed, Oct 23	Referring Expression Comprehension	Required: ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension; Optional: Modeling Context in Referring Expressions	Reading Assignment
Mon, Oct 28	Keypoint Detection	Deep Learning-Based Human Pose Estimation: A Survey (Sections 1-2.3)	Reading Assignment
Wed, Oct 30	Pose Estimation	Required: Human Pose as Compositional Tokens; Optional: 2D Human Pose Estimation: New Benchmark and State of the Art Analysis	Reading Assignment
Mon, Nov 4	Visual Dialog	Visual Dialog	Reading Assignment
Wed, Nov 6	Action Recognition	Required: A Closer Look at Spatiotemporal Convolutions for Action Recognition; Optional: UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild	Reading Assignment
Mon, Nov 11	Person/Object Re-identification	Required: Spatial-Temporal Person Re-identification; Optional: Scalable Person Re-identification: A Benchmark	Reading Assignment
Wed, Nov 13	Video Summarization	Video Understanding with Large Language Models: A Survey	Reading Assignment
Mon, Nov 18	3D Object Tracking	Required: Center-based 3D Object Detection and Tracking; Optional: nuScenes: A multimodal dataset for autonomous driving	Reading Assignment
Wed, Nov 20	Mapping	Required: CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation; Optional: A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation	Reading Assignment
Mon, Nov 25	No Class (Fall Break)
Wed, Nov 27	No Class (Fall Break)
Mon, Dec 2	Efficient Computer Vision
Wed, Dec 4	Responsible Computer Vision
Mon, Dec 9	Responsible Computer Vision and Course Summary		Project Presentation
Wed, Dec 11	Final Project Presentations		Peer Evaluations
Wed, Dec 18	No Class (Final Exam Week)		Final project report due

* Blue entries indicate dates when we will have student-led lectures

Overview

The goal of the student-led lectures is for you to develop your skills in analyzing and presenting modern research in computer vision. Each student team will lead one lecture on one computer vision topic. (Note: only graduate students will lead lectures)

This effort will constitute 30% of your total class grade. Your grade for this effort will be calculated as follows:

Assignment	% of Final Project Grade
Candidate paper proposal	10%
Presentation review	40%
Lecture	50%

Candidate Paper Proposal

For the paper proposal, you and your teammate you should:

Identify 4-6 candidate papers that were recently published at a premiere computer vision conference (e.g., CVPR, ICCV, ECCV) on your topic. Two of these papers will be assigned as readings, with one needing to be about a specific dataset challenge (optional reading) and the other about a computer vision model (required reading). Valuable resources for finding candidate papers include Google Scholar, including by searching for survey papers on the topic, as well as Papers with Code.
Select a 30 minute time slot that works for all group members to meet with the instructor to discuss which papers to cover during the presentation. Available time slots will be posted via Canvas. This meeting must be at least two weeks prior to your lecture.
Email the candidate papers to the instructor at least 48 hours before the meeting.

Presentation Review

The presentation review is a chance to receive feedback on the slide deck for your lecture and resolve any open questions. You will be expected to:

Select a 30 minute time slot that works for all group members to meet with the instructor to review the lecture slides. Available time slots will be posted via Canvas. This meeting must be at least one week prior to your lecture.
Email a completed draft of the lecture slides to the instructor at least 24 hours before the meeting.

General guidelines for developing your presentation:

How to get started: create a presentation outline
- Each lecture should take about 50 minutes and consist of two parts. The first portion should: (i) define the problem, (ii) motivate the practical importance of solving this problem with a computer vision solution (i.e., applications that can/do benefit society), (iii) describe 1-2 datasets used to track progress on this problem, and (iv) describes metric(s) used to evaluate the performance of computer vision models. The second portion should introduce at least one computer vision model, covering: (i) its claimed novelty, (ii) mechanisms used to validate the claims, and (iii) open technical questions/problems.
- A common recommendation is to talk about each slide for ~2 minutes and so your talk should have ~25 slides.
- Determine how much time you want to allocate to each portion of your talk, iteratively refining how many slides go to each section until you fit into the allocated time. This should help you work smarter rather than harder, by avoiding creating material you ultimately won’t have enough time to cover.
What content to put on each slide
- Each slide should be your guide for what to say. Accordingly, avoid full sentences and lots of content. Otherwise, your audience will have to decide whether to listen to you and ignore the slides OR ignore you and read the slides (your audience can’t do both).
- Each slide should have only 1 idea/punchline/take-home-point.
- You can incorporate materials from outside sources (for example, content from the paper’s authors or slides), but proper credit MUST be given.
How to create a professional presentation (and avoid cognitive overload for your audience)
- Slide layout/design
  - Be consistent with your choices across all slides (e.g., font sizes and styles, background and foreground colors, margins, capitalization).
  - Avoid using more than 3 font sizes in your presentations.
  - All font must readable from a distance (e.g., a minimum of 18- or 20-point size, depending on your font style).
- Figures and tables
  - Provide a punchline clarifying the take-home point you want the audience to learn from each figure/table. If you have multiple punchlines, show them one at a time.
  - Make sure all figure/table content is visible. For example, this can mean re-creating the x-axis and y-axis values in power point.
How to present
- Let your body move naturally, including by potentially presenting with a wireless presentation remote control.
- Speak to your audience, rather than the screen.
Other useful resources:
- Even a Geek Can Speak: My go-to when I was a PhD student
- Presentation remote control I use
- Suggestions from Dan Larremore
- Suggestions from Matt Might

Lecture

The class meeting will start with your lecture and then will conclude with a facilitated class discussion about the lecture topic. You and your teammate(s) should decide how to divide who presents what material during the lecture. The discussion will be organized by the instructor around the questions and discussion points submitted by all students.

Overview

The goal of this project is for you to develop your skills in conducting and communicating original work. This is an opportunity for you to enhance your expertise on a topic you feel passionate about, as it will be a self-designed project. The only requirement is that your project involves computer vision.

Your final project will constitute 30% of your total class grade. Your project grade will be calculated as follows:

Assignment	% of Final Project Grade
Project proposal	10%
Project outline	20%
Project presentation	20%
Peer evaluation	10%
Final project report	40%

Project Proposal

The project proposal should:

Establish the research problem and new idea you will tackle for your final project
Identify relevant related work

When choosing your topic, general guidelines are to:

Choose a problem you have an idea for how to solve
Choose a problem someone else cares about
Choose a problem that is not yet solved (review current literature!)
Choose a problem that you can objectively evaluate by tying it to a task

Relevant topics include (but are not limited to):

Implementation and evaluation of an existing or new computer vision model alongside experiments that demonstrate the model’s strengths and weaknesses
A comprehensive and critical survey of the literature on a particular computer vision problem

You will need to submit a PDF that is 1-2 pages long (excluding references). Please use the latex template from a mainstream conference or journal, such as ECCV. A great latex paper editor is Overleaf. The paper should include each of the following:

Title for your project
[Section 1] Introduction
- Paragraph 1: Explain the motivation for your work; e.g., Why anyone should care? What are the desired benefits?
- Paragraph 2: Explain why existing solutions are inadequate for the motivated problem; e.g., Is there a gap in the literature? Is there a weakness in existing approaches?
- Paragraph 3: Explain what you are proposing, what is new about your idea, and why you believe this solution will be better than previous solutions; e.g., Are you asking a new question, offering a greater understanding of a research problem, establishing a new methodology to solve a problem, building a new software tool, or offering greater understanding about existing methods/tools?
[Section 2] Related Work
- Identify 2-4 related topics. Then, for each topic, cite 2-4 related papers (must include the bibliography). Finally, for each cluster of related works, give a 1-2 sentence explanation describing the key difference(s) of your proposed idea to the cluster of prior works. One way to format each topic is as follows:
  
  Topic:
  Reference 1
  Reference 2
  Reference 3
  Reference 4
  
  Our work is different from these works because…
Bibliography: this must be formatted correctly.

Please note that your project proposal is not a binding contract. You will continue to update and improve it as you learn more from your readings and/or feedback.

Project Outline

The project outline should map out the entire project. You will be expected to:

Submit a detailed project outline that is 3-4 pages long (excluding references).

The paper should include each of the following:

Title
[Section 1] Introduction – improve upon the material from your proposal

[Section 2] Related Work – update the material from your proposal so this section includes a paragraph per topic instead of a bullet list per topic; the bullet list served only as a foundation to help you concisely identify how your work will improve upon what is available/known today
[Section 3] Methods – describe the implementation of your proposed idea (e.g., features, algorithm(s), training overview) so that:
- A reader could reproduce your set-up
- A reader understand why you made your design decisions
[Section 4] Experimental Design – describe 1-2 experiments or analyses you plan to conduct in order to demonstrate/validate the target contribution(s) of your work. Your description should be detailed enough so that a reader could reproduce it. Your description should include the following for each experiment:
- Main purpose: 1-3 sentence high level explanation
- Evaluation Metric(s): which ones will you use and why?
[Section 5] Bibliography

Please note that your project outline is not a binding contract. You will continue to update and improve it as you learn more from your readings and/or feedback.

Project Presentation

The project presentation submission will be a:

PDF document of a poster

It should convey the following:

Motivate the problem your work is designed to solve.
(Very) briefly explain what other solutions are available and why they are not suitable.
Demo your idea, approach, and key design decisions.
Highlight key findings from your experiments and offer insights into what your work has taught us. Focus on finding 1-3 punchlines that explain why your work is exciting/valuable.

Your poster should provide a concise framework for you to communicate about your project to a lay audience. In other words, your mom, dad, friend, or a potential employer should be able to see it and understand what you did and why what you did is valuable. If you would like to work from a template, the following are suggested by popular conferences for computer vision material: CVPR, NeurIPS, and ECCV. Alternatively, you can follow general suggestions for how to create a Better Scientific Poster.

On the last day of class, all students will present their work as well as learn about other students’ works. When pitching your work, you will have up to 2 minutes to pitch the material in the poster to the class, with the poster projected from the computer for everyone to see.

Peer Evaluation

You will evaluate the poster presentation from every person in the course at the link shared by the instructor. The evaluations will be done during the day of the last class meeting. The evaluations that you do for other students’ projects will not affect your own grade, except that you will be penalized if you do not complete an evaluation (following the requirements) for every person (excluding your own).

Final Project Report

For the final project submission, you should submit a report that is at least 7 pages long (excluding references). It should include each of the following:

Title
Abstract – one paragraph summary of your paper describing the motivation, problem, conducted experiments, and experimental findings
[Section 1] Introduction (improve upon the material from your project outline)
[Section 2] Related Work (improve upon the material from your project outline; if you have not already, you should remove the bulleted structure you used in the initial proposal and instead have a paragraph form)
[Section 3] Methods (improve upon the material from your project outline)
[Section 4] Experiments
- Improve upon the material from your project outline
- Report the experimental results, what general trends are observed, and insights/speculations into why your results may be turning out the way they are.
- Also include at least one paragraph explaining what questions are not fully answered by your experiments and natural next steps for this direction of research.
[Section 5] Conclusions
- Summarize in one paragraph what is the main take-away point from your work.
- Add a final paragraph discussing any potential ethical implications of your project (e.g., fairness, accountability, transparency, privacy, social impact, etc).
[Section 7] Bibliography

Common Structure of Research Papers

The research community has adopted similar structures for many (but not all) research papers. My hope is you will learn how to recognize these common patterns to help you more efficiently analyze existing papers as well as possibly get your own research papers accepted by the research community.

General

Where can I find a paper’s motivation? – It should be provided in the Introduction. It may also be repeated with the same or different language in the Abstract and Conclusions.
Where can I find the paper’s key contributions? – It should appear multiple times. First, it should be summarized in the Introduction and the Abstract. Also, sometimes the last sentence(s) of each Related Work paragraph/subsection state a contribution, by explaining how the proposed work fills gaps in prior work.
Where can I find how the proposed work extends prior work? – Typically, the best place to find the answer is in the Related Work Section. Often, it can be found in the last sentence(s) of each paragraph/subsection, where the authors clarify how the proposed work relates to a cluster of discussed prior work, such as by (1) filling a gap, (2) applying prior work to a new context, or (3) extending prior work’s abilities. The answer is also sometimes stated/summarized in the Introduction. Of note, the answer to this question can overlap with the “key contributions” of prior work.
How can I identify the limitations of a paper? – Often authors will cite limitations of their work in either the methods, experiments, and/or conclusions sections. Additionally, it will be helpful for you to analyze what may be deficiencies based on what you learned (e.g., datasets, architecture constraints such as number of categories supported) as well sa to read critiques from others about the paper’s limitations.
Others’ advice for how to read research papers: How to Read a CS Research Paper, How to Read an Engineering Research Paper
Leveraging papers beyond the target paper of interest: I recommend also reading subsequent papers that discuss the paper you are trying to learn about. Often, these papers concisely convey (1) what is novel/important about prior work as well as (2) what are its limitations/weaknesses. A great way to find such papers is to review papers citing your paper of interest.

Dataset Challenge Papers

The key contribution of dataset challenges are to provide a community-shared infrastructure for evaluating AI models in a fast, reproducible, and objective manner at scale. This frees up researchers’ time to focus solely on improving AI models, and particularly on the problems embedded in the datasets. When reading a publication about a dataset challenge, I recommend you focus on being able to understand/critique each of the following three (typically essential) components:

Introduction of a static collection of trusted, human-annotated examples against which models’ predictions are compared.

Evaluation metrics for assessing the similarity between models’ predictions and the human-annotated examples.

New challenges for the AI community to tackle, established by characterizing how the dataset differs from existing ones and showing that state-of-the-art AI models perform poorly on the dataset.

Some dataset publications are also released alongside public evaluation servers with leaderboards to enable teams to learn their models’ performance on the test examples, without accessing the test annotations (which reduces the chances of models overfitting).

Algorithm Papers

The key contribution of algorithm papers is novelty in how algorithms are designed and/or trained alongside experiments demonstrating the proposed method yields improved performance compared to existing approaches.

Application Papers

The key contribution of an application paper is a solution improving upon the status quo for a particular domain (e.g., cancer diagnosis) alongside experiments demonstrating the solution is preferable over existing approaches. Such papers span many types, including new systems, new evaluation methodologies, and new tasks. Typically, such papers do not appear in CVPR, ECCV, and ICCV, which are focused primarily on novelty in datasets and algorithms. Rather, a popular publication venue for these papers is the annual IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), in the “Applications” track.

Survey Papers

The key contributions of survey papers are (1) a snapshot of the state-of-the-art for a particular topic and (2) a discussion of open challenges for future research to explore. The topic covered in the paper could include datasets, algorithms, evaluation metrics, or their combination. Typically, such papers are published in journals, such as IJCV and PAMI.