Problem identification
JangYungi, JiwooPark, Muhammad Umair
What is the problem your team is trying to solve?
Unlike that of text documents, the contents of videos are not easily searchable.
How do we know this problem exists? Why is this problem important?
Online learning platforms such as MOOCs contain several videos on a topic. These videos are often long and contain a variety of topics/ideas/concepts. If there is a topic we want to find among those videos, we find it is very difficult because these concepts are not searchable and cannot be directly accessed by typing a keyword. Even when you find the right video, finding topics of your interest within the video involves skimming timeline and thumbnails. This is very time-consuming and frustrating experience for a learner trying to find a particular information. By providing ways to search videos with their indexes based on time-series tags, we want to present a much easier way to find the very information from a massive number of videos. Movida Labs is a platform to index a video, extract people, places, organizations, keywords, and locations etc. Pavel et. al presented Video Digests , which makes it easy to browse and skim contents of informational videos by segmenting them into chapters. Crowdy by Weir et. al involves learners to generate subgoals for how to videos and finally there is another approach in the existing literature to index videos.
Why use crowdsourcing for the problem?
There exist thousands of videos online and using automated approach would not only be costly but error prone also. To deal with these videos online, group of experts would be extremely expensive, time-costing and not scalable. Also, machine algorithm is not that improved to control this type of tasks as they require comprehensive and detailed speech/image recognition. There are thousands of online-video watchers, they have detailed and sensitive speech/image recognition as well as scalability. We intend to use these watchers to generate our content.
What specific challenges exist?
- HMW let crowd workers type in appropriate tags on the video?
- HMW provide interesting experiences for the volunteer workers to participate in and motivate them?
- HMW aggregate collected data in natural language from the crowd?
- HMW control the noise and prevent possible attacks from malicious users?
- HMW promote the goodwill of making videos searchable?
- HMW make crowd workers get something meaningful while working for this project?
- HMW make criteria for users to view the results?
- HMW use the crowd to segment based on different topics discussed in a video?
- HMW gather “initial” users?
- HMW make validation process done automatically by the system?
Solutions
Solution #1: VidIndex
What is the one-sentence summary of the idea?
Making lecture videos searchable using crowdsourcing.
Describe a scenario from the requester's point of view.
- Requesters upload the video using our platform.
- The video is then made available to crowd workers.
- The final result includes an indexed video with topics as labels. These labels are then searchable using the search option.
Describe a scenario from the worker's point of view.
- Workers are the people who are watching the video.
- During the learning experience as crowd workers, we will ask them to describe the topic of what they are watching right now. They will be allowed to type in a textbox.
- Such results will be collected from every learner watching these videos.
- These learners are able to search videos finally.
Analyze the idea using the seven dimensions above.
- Motivation - why would a crowd worker do this?
- Crowd workers are interested in the topic they are watching. They are aware of the purpose of data we are collecting from them. It is likely for them to contribute to the system they would like to use in future.
- Aggregation - how are results from multiple workers combined?
- Results from multiple workers are combined by choosing the best one. Another group of crowd workers is to be asked to select the best result.
- Crowd pool
- Crowd pool are the intrinsically motivated crowd, who are willing to learn from the videos.
- Quality control - how to ensure valid results?
- Validity is generated by asking other learners to vote the results of existing learners.
- Reputation system.
- Human skill - what kind of human skill is required to complete a task?
- There are no specific human skills needed for crowd workers. They should know how to operate a computer.
- Process order - in what order is the work processed between computer, worker, and requester?
- Videos are uploaded to the system by instructors.
- Learners watch those videos and generate the list of topics discussed in the video. These topics are then verified by other sets of learners.
- Finally, when a video is completely indexed. It is allowed to search by future learners.
- Goal visibility - how much of the overall goal of the system is visible to an individual crowd worker?
- The goal is visible here. Every learner will know why he is being asked to submit the current topic discussed which will be key motivation point for the learners.
Solution #2: Video Bookmarking
What is the one-sentence summary of the idea?
Make a plugin to save bookmarks for videos then use it as a source for generating video indexes.
Describe a scenario from the requester's point of view.
There is no explicit requester. This system is powered by the instrinsic behavior of watchers. Which means there is no need work for requesters.
Describe a scenario from the worker's point of view.
- Workers mark the bookmark while they watch the video. The system would notice them, the marking behavior will give a benefit to other watchers as well as themselves.
- They mark not only the specific period, the interval of the video, but also they could give the tags, descriptions of the video.
- The system uses collected data to make video searchable.
Analyze the idea using the seven dimensions above.
- Motivation - why would a crowd worker do this?
- Crowd worker's behavior (marking bookmark, tags...) to rewatch some part of the video.
- Aggregation - how are results from multiple workers combined?
- System will collect the data from the user to make video searchable.
- Crowd can view the tags, and they can filter out what is good tags or bad tags by voting.
- Crowd pool
- All the users who want video is in our crowd pool.
- Quality control - how to ensure valid results?
- They can review previous workers' bookmarks and report to the system.
- Human skill - what kind of human skill is required to complete a task?
- Basic bookmarking skill is needed to complete a task.
- Process order - in what order is the work processed between computer, worker, and requester?
- In this solution, there is no specific requester, all the videos on the web could be a target.
- The system only runs in the background and help when workers want to tag them.
- Workers just do their bookmarking.
- Videos on the website
- Workers watching videos
- System deals with collected data
- Goal visibility - how much of the overall goal of the system is visible to an individual crowd worker?
- All workers can know what they are doing since we will announce them their work will be a help making video searchable.
Solution #3: Video Telepathy!
What is the one-sentence summary of the idea?
An ESP-game-like approach that makes crowd workers provide time-series tags of videos by giving overlapping segments of videos and giving the sense of playing a word matching telepathy game.
Describe a scenario from the requester's point of view.
- A requester uploads a video. The selection could be done both automatically and manually. This video has no metadata or time-series index at this moment.
- Then, the video is segmented and used as a material for questions in this game.
- After crowd workers finish playing the game, the video is full of user generated metadata as a result of playing the game. With this time-series indexes, the video is easily searchable.
Describe a scenario from the worker's point of view.
- As a crowd worker, a game player starts to play the game.
- The player sees a 15-second-length video, which is a part of the entire video to be tagged.
- After seeing the video, the player is asked to type in the description of the video.
- Then, using input agreement and output agreement, the validation of user contribution is made.
- When the worker properly answers the question and it is correct, then the player gets points. The score of each user is posted on the leaderboard for encouraging user competition.
Analyze the idea using the seven dimensions above.
- Motivation - why would a crowd worker do this?
- For fun, it’s a game.
- Humanitarian philanthropic intention to enrich the searchability of videos on the internet!
- Aggregation - how are results from multiple workers combined?
- Selective aggregation is used. There are two steps for contributions to pass. Using input agreement, the validity of user contributions are checked. After that, using output agreement, only matching words are used as indexes.
- Crowd pool
- Gamers who are willing to play fun games to enrich the experience of searching on the internet.
- Quality control - how to ensure valid results?
- As workers are game players, they want to get high scores. In order to get good scores, they have to sincerely type in tags that matching other users may type.
- Using the selective aggregation with input/output agreement, the validation is ensured.
- Human skill - what kind of human skill is required to complete a task?
- It does not require any specific skills to play this game.
- Process order - in what order is the work processed between computer, worker, and requester?
- Requester posts the list of videos to be tagged.
- The system segmentizes the video into 15-second-length overlapping segments.
- Using the segmented video, workers play telepathy games to tag the video.
- Aggregation is done to make the time-series indexes from the result of the gameplay.
- Goal visibility - how much of the overall goal of the system is visible to an individual crowd worker?
- The purpose of playing the game can be shared.
- However, when players play the game, the goal or the current progress is not explicitly visible.