[SOLVED] Multi threaded IMDB fetching?

English > Support

<< < (5/8) > >>

Happy2k:

--- Quote from: nostra on August 27, 2010, 08:48:10 pm ---
--- Quote ---But i really like PVD - it was hard getting to work properly in the beginning, but the possiblities are endless.
--- End quote ---

Unfortunately it seems like many users have difficulties when starting using PVD. If you have suggestions how to make the process easier for beginners, then feel free to post in the Feature Suggestions board.

--- End quote ---

Tutorials would probably help alot. A video tutorial on how to add movies, series and how to fix common errors.
If i get any others idea, i will post them in the suggestions board ;)

patch:

--- Quote from: Happy2k on August 27, 2010, 09:06:19 pm ---Tutorials would probably help alot. A video tutorial on how to add movies, series and how to fix common errors.

--- End quote ---

Better still, update the wiki while you are still learning and have your best appreciation of a beginners experience.

rick.ca:
Now there's an idea! Videos for every possible help topic could be produced, and a PVD database created to catalogue them—along with any textual help that might be available (e.g., a wiki or forum topic). Users could add the videos to their collection and import the catalogue. Help topics could be found using Search or Advanced search, and the videos played directly from PVD. 8)

Do we have any volunteers for this project? ;)

patch:

--- Quote from: Happy2k on August 27, 2010, 10:55:30 am ---
--- Quote from: rick.ca on August 27, 2010, 02:25:24 am ---
--- Quote ---Multi threadded IMDB fetching
--- End quote ---

That's an interesting idea for other reasons (e.g., update speed), but I would be very surprised if IMDb did not ban IP's making multiple simultaneous requests. I'm actually surprised were able to get away with what we are doing.

--- End quote ---

Before using PVD, i was trying out different programs. Most of them had a very fast IMDb fetch. I dont think IMDb would care - ive never heard of IMDb ip banning someone.
I wouldnt mind being a ginniepig for testing out multi threaded IMDb fetching.

--- End quote ---

I strongly suspect most time is spent waiting for the remote web sites to respond and downloading data.
PVD is likely to be slower than some other programs as it downloads more data.
Trying to increase the download speed by having multiple requests from the same user / program going to these databases is likely to cause more maintance problems as the html request stream generated will put further stress on these remote databases, not generate income fore them, and differ more from a browsing user for which they were designed / marketed at.

Things which are at least theoretically possibly
1) Run different plugins / web site queries in parallel. By this I mean if for a movie you are getting information from imdb, allmovie, and amazon then the sites could be accessed in series but the 3 sites could be accessed in parallel by overlapping accesses for different movies. Implementing this would probably mean calling different plugins from different tasks with a queue between each so involve considerable change to the plugin calling code but less changes to to individual plugins. Maximum speed up of x3 for 3 plugins run in parallel.

2) Only information which is going to be actually stored in PVD could be downloaded. By this I mean PVD could look at the field flags set for the plugin and the data already in PVD and determine what pages actually need to be downloaded for each website - PVD movie update. The speed benefit would be minimal when initially populating an empty PVD movie record from the first web site, so would be no faster than 1) for a mass movie import but maybe a lot faster for later incremental updates to an established PVD database. From a coding perspective it would be a major change to the plugin architecture with each plugin being passed information on which fields need to be filled, them each plugin being code to selectively download web pages as required for each PVD movie record update. So it implies a re-write of each plugin for which selective access is practical and considerable changes to the plugin calling code.

Of course I have no idea how hard any of this is for nostra but the theoretical discussion is still interesting. None of it sounds easy to me so I'm not confident any of it will be implemented.

rick.ca:
The idea of running in parallel is fine—assuming each field is filled using only one source. But most multi-source configurations are going to include some degree of "get this from the first source, but if there's nothing there use the second source." That could be handled, but it would make things a lot more complicated—and probably lose much of the speed improvement that might otherwise be achieved.

Even more problematic is the idea of getting only data for fields that need it. (I'm not sure this is what you meant, but it's implied by "maybe a lot faster for later incremental updates to an established PVD database.") Data changes. The only way to determine whether data needs to be updated is to compare it to what's currently available. So it's faster just to download all the data.

It would be helpful if fields set to "ignore" were omitted. This would make no difference to the "average" plugin configuration—where only a small number of fields are ignored. For special purpose updates of one or a few fields (e.g., updating the IMDb Top 250 rank and votes), however, would probably be considerably faster. This applies only to plugins. For a scripts, a different version of a script can be created for downloading only the data required—if that would be faster in a particular circumstance.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version