Problem identification

JangYungi, JiwooPark, Muhammad Umair

What is the problem your team is trying to solve?

Unlike that of text documents, the contents of videos are not easily searchable.

How do we know this problem exists? Why is this problem important?

Online learning platforms such as MOOCs contain several videos on a topic. These videos are often long and contain a variety of topics/ideas/concepts. If there is a topic we want to find among those videos, we find it is very difficult because these concepts are not searchable and cannot be directly accessed by typing a keyword. Even when you find the right video, finding topics of your interest within the video involves skimming timeline and thumbnails. This is very time-consuming and frustrating experience for a learner trying to find a particular information. By providing ways to search videos with their indexes based on time-series tags, we want to present a much easier way to find the very information from a massive number of videos. Movida Labs 1 is a platform to index a video, extract people, places, organizations, keywords, and locations etc. Pavel et. al presented Video Digests 2, which makes it easy to browse and skim contents of informational videos by segmenting them into chapters. Crowdy 3 by Weir et. al involves learners to generate subgoals for how to videos and finally there 4 is another approach in the existing literature to index videos.

Why use crowdsourcing for the problem?

There exist thousands of videos online and using automated approach would not only be costly but error prone also. To deal with these videos online, group of experts would be extremely expensive, time-costing and not scalable. Also, machine algorithm is not that improved to control this type of tasks as they require comprehensive and detailed speech/image recognition. There are thousands of online-video watchers, they have detailed and sensitive speech/image recognition as well as scalability. We intend to use these watchers to generate our content.

What specific challenges exist?

  1. HMW let crowd workers type in appropriate tags on the video?
  2. HMW provide interesting experiences for the volunteer workers to participate in and motivate them?
  3. HMW aggregate collected data in natural language from the crowd?
  4. HMW control the noise and prevent possible attacks from malicious users?
  5. HMW promote the goodwill of making videos searchable?
  6. HMW make crowd workers get something meaningful while working for this project?
  7. HMW make criteria for users to view the results?
  8. HMW use the crowd to segment based on different topics discussed in a video?
  9. HMW gather “initial” users?
  10. HMW make validation process done automatically by the system?

Solutions

Solution #1: VidIndex

What is the one-sentence summary of the idea?

Making lecture videos searchable using crowdsourcing.

Describe a scenario from the requester's point of view.

  1. Requesters upload the video using our platform.
  2. The video is then made available to crowd workers.
  3. The final result includes an indexed video with topics as labels. These labels are then searchable using the search option.

Describe a scenario from the worker's point of view.

  1. Workers are the people who are watching the video.
  2. During the learning experience as crowd workers, we will ask them to describe the topic of what they are watching right now. They will be allowed to type in a textbox.
  3. Such results will be collected from every learner watching these videos.
  4. These learners are able to search videos finally.

Analyze the idea using the seven dimensions above.

Solution #2: Video Bookmarking

What is the one-sentence summary of the idea?

Make a plugin to save bookmarks for videos then use it as a source for generating video indexes.

Describe a scenario from the requester's point of view.

There is no explicit requester. This system is powered by the instrinsic behavior of watchers. Which means there is no need work for requesters.

Describe a scenario from the worker's point of view.

  1. Workers mark the bookmark while they watch the video. The system would notice them, the marking behavior will give a benefit to other watchers as well as themselves.
  2. They mark not only the specific period, the interval of the video, but also they could give the tags, descriptions of the video.
  3. The system uses collected data to make video searchable.

Analyze the idea using the seven dimensions above.

Solution #3: Video Telepathy!

What is the one-sentence summary of the idea?

An ESP-game-like approach that makes crowd workers provide time-series tags of videos by giving overlapping segments of videos and giving the sense of playing a word matching telepathy game.

Describe a scenario from the requester's point of view.

  1. A requester uploads a video. The selection could be done both automatically and manually. This video has no metadata or time-series index at this moment.
  2. Then, the video is segmented and used as a material for questions in this game.
  3. After crowd workers finish playing the game, the video is full of user generated metadata as a result of playing the game. With this time-series indexes, the video is easily searchable.

Describe a scenario from the worker's point of view.

  1. As a crowd worker, a game player starts to play the game.
  2. The player sees a 15-second-length video, which is a part of the entire video to be tagged.
  3. After seeing the video, the player is asked to type in the description of the video.
  4. Then, using input agreement and output agreement, the validation of user contribution is made.
  5. When the worker properly answers the question and it is correct, then the player gets points. The score of each user is posted on the leaderboard for encouraging user competition.

Analyze the idea using the seven dimensions above.


  1. http://movidalabs.com/

  2. Pavel, Amy, et al. "Video digests: a browsable, skimmable format for informational lecture videos." UIST. 2014.

  3. Weir, S., Kim, J., Gajos, K.Z., and Miller, R.C. Learnersourcing Subgoal Labels for How-to Videos. CSCW’15, ACM (2015).

  4. Irani, M., & Anandan, P. (1998). Video indexing based on mosaic representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 86(5), 905–921.