The data is mainly from feeds initially.
It then goes through some scripts (depending on the site) to obtain further data by reading the html for the vid page.
Then a few more scripts run to process and sort against a custom thesaurus table. It's a bit of a process but we can get the indexing done twice a day without too much a performance hit. It's not easy, a real challenge! It's not just scripts, you need a data layer of human interpretation and analysis of how you categorize but that's the jist of it!
We're willing to work with other content producers who have short clips on offer to link directly through - as long as we can get some tagging data, thumbnails and the clip is worthwhile for the user then by all means we'll help!
|