Projects:Augment marker tracking with visual tracking: Difference between revisions

From Collective Computational Unit
Jump to navigation Jump to search
Admin (talk | contribs)
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Overview ==  
== Overview ==  


- Input : Video data, 2D location of birds , 3D trajectory (Labelled and unlabeled)
While the tracking output of the Vicon system is in general very reliable, there are still mistakes and missing data in the trajectories. The idea is therefore to use visual tracking as an additional source of information to cope with e.g. markers which are missing due to occlusion. Since the Vicon system allows to easily generate a large amount of ground truth data for the task of object detection, this is also a way to train visual object detectors for the future and more "in-the-wild" scenarios.
- Output:
A. Create 2D trajectories. Object detection on birds. (using simple blob tracking, or using machine learning)
B. Match 3D - 2D trajectories to identify identity flips.
C. Unlabeled trajectories can be combined with 2D trajectories, to fill gaps in 3D trajectories. (Project 2 output can be useful).  


Subprojects:
3.1 Offline data processing;
3.2 Online/quasi-real time solution working from the data stream and video stream (video stream is not directly available in real time for processing, but realtime view is generated in the software, so this could be grabbed)


== Contact ==
== Contact ==


Add name of and preferred method how to contact the main PI (i.e. you).
* Mate Nagy, mnagy@orn.mpg.de
* Hemal Naik, hnaik@orn.mpg.de
* Bastian Goldluecke, bastian.goldluecke@uni-konstanz.de




== Aims ==
== Aims ==


List the aims of your project, or what you expect anyone taking up the project is supposed to hopefully achieve. The more specific, the better.
The project essentially has two main parts. The first is to establish a pipeline for generating training data for visual object detection
using the Vicon system. The second part is to use the trained detectors to augment the tracking, i.e. filling in gaps, helping with establishing identity, etc.
 
Just like in other projects, both a high-quality offline solution as well as an online/quasi-real time solution working from the data stream and video stream would be desirable (the video stream is not directly available for processing, but a real-time view is generated in the software, so this could be grabbed).




== Estimated level of difficulty ==
== Estimated level of difficulty ==


If you have an estimate, classify level of difficulty according to
Generating training data for visual object detection should be a pretty straight-forward standard problem, and is a good way to get into the project and data structures in the framework of a Bachelor/Master project.
the description of the CCU in the cluster proposal into
* Standard problems which just require applying existing methods (Hiwi level)
* Elaborate problems which require substantial adaptation or extension of existing methods (Master student level)
* Special problems which require research of entirely new methods and might lead to a paper or two (Ph.D. student level)
Maybe add a short clarification of what you believe are the main difficulties, and why you believe this is the right classification.


== Provided data ==
An elaborate problem is to find a way to integrate the visual detections into the overall tracking pipeline, since this requires to find a suitable new algorithmic framework. It is closely related to [[Projects:Improve tracking of individual markers and marker patterns|this project]]. Real-time is of course always harder.


Give a specific description of the datasets you provide or can provide which people need to use to solve your problem. If available and/or necessary, also suggest some means for reading the data format. If you can provide links to the data so people can download an take a look, all the better. Also list any known limitations, whether you can easily acquire/record new data, or any other useful information.


<strong>Note:</strong> Once the CCU server is up and running, datasets should be stored there for easy availability. See the howtos on storage for details.
== Provided data ==


The project uses [[Vicon:Data format documentation|data from the Vicon system]] to establish (partially labeled) 3D tracks, as well as input from RGB video cameras. Code for reading the data and calibration, as well as mapping 3D points to 2D images is available (TODO: put on CCU server one git server is up).




== Suggested/tested approaches ==
== Suggested/tested approaches ==


If you have an idea about how to approach the problem, or have tried something already which did not work well, please provide details here. If available, link some papers or code which might provide a possible solution or algorithm.
* to generate training data and build an initial incarnation of the detector:
** find valid segments of 3D trajectories
** use existing code to project 3D tracks into 2D images
** find suitable bounding box and use image crop as a training image
** build database of these and retrain object detector, see [[CCU:Tutorials|Tutorials]].
 
* talk to people working on [[Projects:Improve tracking of individual markers and marker patterns|tracking]] for ideas on how to integrate visual and marker detections.

Latest revision as of 09:08, 21 May 2019

Overview

[edit]

While the tracking output of the Vicon system is in general very reliable, there are still mistakes and missing data in the trajectories. The idea is therefore to use visual tracking as an additional source of information to cope with e.g. markers which are missing due to occlusion. Since the Vicon system allows to easily generate a large amount of ground truth data for the task of object detection, this is also a way to train visual object detectors for the future and more "in-the-wild" scenarios.


Contact

[edit]
  • Mate Nagy, mnagy@orn.mpg.de
  • Hemal Naik, hnaik@orn.mpg.de
  • Bastian Goldluecke, bastian.goldluecke@uni-konstanz.de


Aims

[edit]

The project essentially has two main parts. The first is to establish a pipeline for generating training data for visual object detection using the Vicon system. The second part is to use the trained detectors to augment the tracking, i.e. filling in gaps, helping with establishing identity, etc.

Just like in other projects, both a high-quality offline solution as well as an online/quasi-real time solution working from the data stream and video stream would be desirable (the video stream is not directly available for processing, but a real-time view is generated in the software, so this could be grabbed).


Estimated level of difficulty

[edit]

Generating training data for visual object detection should be a pretty straight-forward standard problem, and is a good way to get into the project and data structures in the framework of a Bachelor/Master project.

An elaborate problem is to find a way to integrate the visual detections into the overall tracking pipeline, since this requires to find a suitable new algorithmic framework. It is closely related to this project. Real-time is of course always harder.


Provided data

[edit]

The project uses data from the Vicon system to establish (partially labeled) 3D tracks, as well as input from RGB video cameras. Code for reading the data and calibration, as well as mapping 3D points to 2D images is available (TODO: put on CCU server one git server is up).


Suggested/tested approaches

[edit]
  • to generate training data and build an initial incarnation of the detector:
    • find valid segments of 3D trajectories
    • use existing code to project 3D tracks into 2D images
    • find suitable bounding box and use image crop as a training image
    • build database of these and retrain object detector, see Tutorials.
  • talk to people working on tracking for ideas on how to integrate visual and marker detections.