Author Topic: Silent Mode  (Read 19306 times)

0 Members and 1 Guest are viewing this topic.

mgpw4me@yahoo.com

  • Guest
Silent Mode
« on: January 11, 2010, 11:17:36 pm »
I've been playing with the scripting language.  Overall, it's an excellent development environment, but I think I'm wearing a hole in the 'recompile' button from all the use it's getting.  := is a killer for a 'c' guy.

Anyway, I have a number of sites that have celebrity images, but no bio or other info.  This makes it difficult to determine, except manually, that you have the correct person.  In fact, some sites do a relevance search and pull up the wrong person name entirely.  I can do a 'sanity check' (by name) for this, but ultimately, if there's two people with the same name it won't help.

I'd like to ensure that people don't run a number of the scripts I'll be posting, in silent mode.  I can post warnings / disclaimers all I want, but somebody will still do this.  Of course, you and Rick are bigger targets, so you'll end up fronting the abuse.

Is there a way to disable or query for, silent mode in a script?

When sites do have biographical information, it would be nice if the PVD birthplace, birthday, AKAs and filmography of the person were available for sanity checking.  I could query IMDB for this, I guess, but there's still the issue for people with the same name on that site....let alone the maintenance issues when IMDB changes their site format.

If / when such things are available, I'll deal with it then.

The first batch of scripts will be posted soon.  You may want to start considering how to deal with a large "import from" menu.  At the moment, I have plans for over 50 scripts.

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Silent Mode
« Reply #1 on: January 12, 2010, 12:20:50 am »
Quote
You may want to start considering how to deal with a large "import from" menu.  At the moment, I have plans for over 50 scripts.

If all or most of these are for getting photos, would it not be better to combine them into one script that can be configured as to which sites are used? I don't know if an interactive configuration is possible. But even if it could only be done by editing parameters in the script itself, it would be better than having 50 separate scripts to choose from. Users could still have as many configurations as they want—by making copies of the script configured in different ways.

A combined script could also offer a more automated way to look for photos from a variety of sites. For example, get images only within a specified range of resolution, stop after X such images obtained, etc. The could be used to get a limited number of "good" photos, with the hope that at least one would be the correct person and otherwise acceptable.

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #2 on: January 12, 2010, 01:17:01 am »
Yup, you're right.

<brainfart value='begin'>

I have thought of this, but haven't quite figured out the logic.  Maybe writing this will clear my head.

All the sites have been categorized by sex (m/f), rating (g/pg/r), size of images, number of images, niche (vintage, region...asia / italy / etc, current celebs, genre, etc).

To really make the script work, I'd need to have preferences to reflect the interests of the person, as well as some information from other sources...ie from imdb (actor/actress) so I'd need access to the URL in the database via the script, and preferrably (don't want rewrite the 'get a person' code from the IMDB plugin) their place of birth and birthday.  I can then assign priorities to the various sites based on that information and search them in order.  There would have to be an override on this to reflect 'no nudity', min / max images sizes.

Personal preferences could indeed be hardcoded into the script.  For specific images of oriental or vintage film cast members, I'd still want to hit specific sites first, which would be a runtime setting.

Assuming that is resolved, or just ignored, I start processing. 

I go to a site and find some images.  The site has several pages of images, so iterate through pages, adding hits to the list of links that need to be followed.  If I 'early out' the routine, any site that has images later in the list won't ever be processed.  If I don't, I have to iterate through every site using bandwidth / time.

So I assume, let's go through them all and sort it out later.

If I only have one link, and it's from a site that needs to be validated manually, I can't reasonably allow an update without manual intervention. 

So that site can't be included in the script.

I now have a list of 20 links that lead to image pages.  If I use the first link as default, I should have done an 'early out'.  If I don't I have to assume the person is sitting there making the selection.

I assume the person is not there. 

Do I choose a link by the number of images on the page, or do I have the order set such that the most reliable site comes first?

I assume reliability, so after a hit on a reliable site, I can 'early out'...none of the others would be selected anyway.

This all sounds doable, and would suit my uses perfectly.  For everyone else, there's a lot of guesswork involved which makes the whole thing inefficient.  If I brute-force it, the result could be brutal...hit 50+ sites and find nothing.

Re-reading your post, it occurs to me that I could list one image from each site (or a text link if not available on the listing page), with a link to the image page(s).  Click on the image, get a list of images from that site.  I'm not sure how easy it will be to select the routine that selects what parsing procedure to call for each link.  I suppose I could put a tag in the listing title.

Now it's starting to make better sense to me.

I have one script done, minus sanity checks so I'll use that as a base and see what problems come up.

</brainfart>

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Silent Mode
« Reply #3 on: January 12, 2010, 03:25:06 am »
It's a difficult thing to balance. Some users will expect such a script to help them in their quest for the perfect photo for their favourite people. Accordingly, they will want a lot of options, and expect it to be interactive. Others will expect it to be fully automatic, and be able to get reasonable photos for any/all persons selected. Considering the numbers involved, I would lean towards the latter. There's 25,000 people in my database, and there's no easy way for me to identify the 10-20% I really want photos for. So I'm probably going to need to do a mass update.

The ideal solution would be something that can be configured both ways. I can see myself configuring such a script in three different versions. One for a mass update to get one photo as quickly as possible. Another to get 3 - 6 photos of good quality from a variety of sites. I would then review the results and select one to display (or delete all but one). Finally, I would have one configured to get a large selection from a large number of sites. This would be for "difficult cases" and would be run only on one or a few people at a time.

I think the biggest problem you have is finding a way to ensure the person is correctly identified. IMDb photos may be crap, but at least they're of the right person (I'm sure there are errors, but I've never detected one). Nothing could be more disastrous than letting such a script run in silent mode if it were adding photos when not 99.9% sure the person is the same. At the same time, not allowing it to run in silent mode renders it useless for mass updates. That may have to be so, but then manual intervention may not help either. In most cases, I want the photo to help me identify the person—I'm not the one to ask whether a photo is correct or not.

Hmmm. So unless you can incorporate some face-recognition technology, I suppose it has to be interactive. But this raises another question... nostra suggested using another plugin or script to get photos where IMDb has none. I wonder how his code looks up an existing people in different databases. Maybe it doesn't, and that why I had this problem getting the Kinopoisk script to download a photo. :-\

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #4 on: January 12, 2010, 04:06:20 am »
I can certainly add a 'be safe be sure' or 'I'll monitor it' option to the script.  If I could look at the program stack to determine how many iterations of the scripts were set to run, that wouldn't be necessary.  Of course stack access would be a major programming blooper if it were implemented.  Silent mode is the one thing that prevents full automation.  There's no reason there couldn't be a script for each.  It would be better to be allowed to include one script within another, so the parsing code could remain unchanged, and just the control structures would be different.  Maintenance would only have to be applied in one place when site changes occurred.

The problem more than anything is the limited information about the person available to the script.  I can pass the name to the url, and parse that, so at least I know who I'm looking for, but without an imdb url to parse or birth info, or filmography, it's difficult to determine who is who.  Most sites give a clue about the person (AKAs, idbm url, date of birth) that I could check against, if it was available.  To some extent, it's reasonable to say that a search on IMDB that provides only a single 100% result would be enough to ensure the right person was selected...assuming imdb was alway 100% correct and inclusive (not!!!).  I still prefer filmography as that is more commonly available.  Just knowing that a person was primarily involved in TV narrows the list of sites from 54 to about 3 (with sanity checking).

To some extent, it comes down to knowing who you are downloading at any one time, and knowing what extra info might be on the page to help you determine what is up.  Since the script runs system modal, there's no way to look at the info in PVD unless you move the window, and happen to have the right page open.  I've often been able to look at multiple images and determine from the age of the image what person I was looking for.

Did you know that facial recognition only requires 80 data points, and that current technology only requires 40 for a positive match?  Now if only I had access to the bitmap <grin>.

If I had access to the url via script, I'd actually write a script to pull all the images from IMDB and put up a list to select from.  I'd write a dll and just deal with the whole thing, but I'm using the free c++ builder and it doesn't support adding components.  My old c++ builder supports components, but it's version 1.0 so current component structures aren't supported (ie. can't be added).

It sucks to get old (software).

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #5 on: January 14, 2010, 05:17:36 am »
I nearly have the IMDB person verification code complete on the script for image searches.  The configuration settings I have planned are as follows:

// CONFIGURATION:     Configuration settings supported by this script:
//                    IMAGE_LARGE     'False'    default value. Images under 1024 X 768 are preferred.
//                                            'True'     images 1024 X 768 or larger are preferred.
//                    IMDB_IMAGES     'False'    default value, Images from IMDB won't be included.
//                                            'True'     search will include IMDB images.
//                    IMDB_VALIDATE  'True'     default value. Person must be listed in IMDB for images to be downloaded.
//                                            'False'    IMDB data will be used, if available, but isn't required.
//                    NUDITY              'False'    default value.  Sites with nudity will not be listed.
//                                            'True'     Nudity sites are allowed, but listed last.
//                    SEARCH_AKA       'False'    default value.  AKAs for people will not be searched.
//                                            'True'     searches will be repeated for AKAs on each site, if nothing is found for the 'primary' name.
//                    MAX_LINKS         '50'       default value.  Limit results to 50 image gallery links.

I'd appreciate feedback from any interested party on both the configuration options, as well as their default values.  Suggestions for other options I've missed would be even better.

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: Silent Mode
« Reply #6 on: January 16, 2010, 02:04:06 am »
Oh man, for a programmer you talk pretty much :)

1. You can't check if Silent mode is on from the script, but. in fact, I do not see any problems here as PVD only automatically retrieves entries having [bold]1 100% match[/bold], so if you pass multiple identical entries the whole movie/person will be skipped preventing retrieving wrong data.
2. You can retrieve the contents of most fields by calling GetFieldValue(FieldID : Integer);
Here are those ids:
Code: [Select]
mfMID          = 0;
  mfNum          = 1;
  mfTitle        = 2;
  mfOrigtitle    = 3;
  mfAka          = 4;
  mfYear         = 5;
  mfMPAA         = 6;
  mfRelease      = 7;
  mfURL          = 8;
  mfIMDBRating   = 9;
  mfRating       = 10;
  mfOtherRating  = 11;
  mfOtherName    = 12;
  mfLocation     = 13;
  mfTagline      = 14;
  mfDescription  = 15;
  mfComment      = 16;
  mfDateAdded    = 17;
  mfModDate      = 18;
  mfReleaseDate  = 19;
  mfBudget       = 20;
  mfMoney        = 21;
  mfAspectRatio  = 22;
  mfOrigLang     = 23;
  mfQuality      = 24;
  mfLength       = 25;
  mfResolution   = 26;
  mfFrameRate    = 27;
  mfVideoCodec   = 28;
  mfVideoBitrate = 29;
  mfSize         = 30;
  mfPath         = 31;
  mfMediaType    = 32;
  mfMediaCount   = 33;
  mfFeatures     = 34;
  mfBarcode      = 35;
  mfViewed       = 36;
  mfViewDate     = 37;
  mfWish         = 38;
  mfBookmark     = 39;
  mfLoaned       = 40;
  mfSeries       = 41;
  mfEPID         = 42;
  mfVisible      = 43;
  mfParentSeason = 44;
  mfEpisode      = 45;
  mfSeason       = 46;
  mfGenres       = 47;
  mfCountries    = 48;
  mfCategory     = 49;
  mfLabels       = 50;
  mfSubs         = 51;
  mfStudios      = 52;
  mfTags         = 53;
  mfActors       = 54;
  mfDirectors    = 55;
  mfWriters      = 56;
  mfComposers    = 57;
  mfProducers    = 58;
  mfBorrower     = 59;
  mfLoanDate     = 60;
  mfLoanPeriod   = 61;
  mfUserMail     = 62;
  mfLinks        = 63;
  mfAwards       = 64;
  mfAudioStreams = 65;
  mfPoster       = 66;
  mfScreenshots  = 67;
  mfFrontCover   = 68;
  mfCDCover      = 69;
  mfCredits      = 70;
  mfEpisodes     = 71;
  mfLanguages    = 72;
  mfTranslations = 73;
  mfAudioBitrate = 74;
  mfAudioCodec   = 75;
  mfChannels     = 76;
  mfSampling     = 77;
  mfAudioCount   = 78;
  mfExtension    = 79;
  mfImageList    = 80;
  mfAwardWon     = 81;
  mfAwardEvent   = 82;
  mfAwardYear    = 83;
  mfAwardCat     = 84;

  //Person values
  pfPID          = 0;
  pfName         = 1;
  pfTransName    = 2;
  pfAltNames     = 3;
  pfBirthday     = 4;
  pfDeath        = 5;
  pfBirthplace   = 6;
  pfURL          = 7;
  pfRating       = 8;
  pfDateAdded    = 9;
  pfModDate      = 10;
  pfBio          = 11;
  pfComment      = 12;
  pfBookmark     = 13;
  pfVisible      = 14;
  pfGenres       = 15;
  pfAge          = 16;
  pfFilmography  = 17;
  pfCareer       = 18;
  pfAwards       = 19;
  pfPhoto        = 20;
  pfImageList    = 21;
3. I do not really like the idea of multiple sources in one script. I think it is better to have multiple scripts which you can combine using batch files...

Quote
If I had access to the url via script, I'd actually write a script to pull all the images from IMDB and put up a list to select from.  I'd write a dll and just deal with the whole thing, but I'm using the free c++ builder and it doesn't support adding components.  My old c++ builder supports components, but it's version 1.0 so current component structures aren't supported (ie. can't be added).

Why don't you want to use Visual Studio Express??? It's free and pretty full featured for this task. There are some C++ headers already made by another user some time ago: http://www.videodb.info/forum_en/index.php?topic=1211.0 (could be not quite compatible with the current configuration, but still a good point to start)
Gentlemen, you can’t fight in here! This is the War Room!

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #7 on: January 16, 2010, 04:09:59 am »
Oh man, for a programmer you talk pretty much :)

1. You can't check if Silent mode is on from the script, but. in fact, I do not see any problems here as PVD only automatically retrieves entries having [bold]1 100% match[/bold], so if you pass multiple identical entries the whole movie/person will be skipped preventing retrieving wrong data.
2. You can retrieve the contents of most fields by calling GetFieldValue(FieldID : Integer);
Here are those ids:
Code: [Select]
mfMID          = 0;
  mfNum          = 1;
  mfTitle        = 2;
  mfOrigtitle    = 3;
  mfAka          = 4;
  mfYear         = 5;
  mfMPAA         = 6;
  mfRelease      = 7;
  mfURL          = 8;
  mfIMDBRating   = 9;
  mfRating       = 10;
  mfOtherRating  = 11;
  mfOtherName    = 12;
  mfLocation     = 13;
  mfTagline      = 14;
  mfDescription  = 15;
  mfComment      = 16;
  mfDateAdded    = 17;
  mfModDate      = 18;
  mfReleaseDate  = 19;
  mfBudget       = 20;
  mfMoney        = 21;
  mfAspectRatio  = 22;
  mfOrigLang     = 23;
  mfQuality      = 24;
  mfLength       = 25;
  mfResolution   = 26;
  mfFrameRate    = 27;
  mfVideoCodec   = 28;
  mfVideoBitrate = 29;
  mfSize         = 30;
  mfPath         = 31;
  mfMediaType    = 32;
  mfMediaCount   = 33;
  mfFeatures     = 34;
  mfBarcode      = 35;
  mfViewed       = 36;
  mfViewDate     = 37;
  mfWish         = 38;
  mfBookmark     = 39;
  mfLoaned       = 40;
  mfSeries       = 41;
  mfEPID         = 42;
  mfVisible      = 43;
  mfParentSeason = 44;
  mfEpisode      = 45;
  mfSeason       = 46;
  mfGenres       = 47;
  mfCountries    = 48;
  mfCategory     = 49;
  mfLabels       = 50;
  mfSubs         = 51;
  mfStudios      = 52;
  mfTags         = 53;
  mfActors       = 54;
  mfDirectors    = 55;
  mfWriters      = 56;
  mfComposers    = 57;
  mfProducers    = 58;
  mfBorrower     = 59;
  mfLoanDate     = 60;
  mfLoanPeriod   = 61;
  mfUserMail     = 62;
  mfLinks        = 63;
  mfAwards       = 64;
  mfAudioStreams = 65;
  mfPoster       = 66;
  mfScreenshots  = 67;
  mfFrontCover   = 68;
  mfCDCover      = 69;
  mfCredits      = 70;
  mfEpisodes     = 71;
  mfLanguages    = 72;
  mfTranslations = 73;
  mfAudioBitrate = 74;
  mfAudioCodec   = 75;
  mfChannels     = 76;
  mfSampling     = 77;
  mfAudioCount   = 78;
  mfExtension    = 79;
  mfImageList    = 80;
  mfAwardWon     = 81;
  mfAwardEvent   = 82;
  mfAwardYear    = 83;
  mfAwardCat     = 84;

  //Person values
  pfPID          = 0;
  pfName         = 1;
  pfTransName    = 2;
  pfAltNames     = 3;
  pfBirthday     = 4;
  pfDeath        = 5;
  pfBirthplace   = 6;
  pfURL          = 7;
  pfRating       = 8;
  pfDateAdded    = 9;
  pfModDate      = 10;
  pfBio          = 11;
  pfComment      = 12;
  pfBookmark     = 13;
  pfVisible      = 14;
  pfGenres       = 15;
  pfAge          = 16;
  pfFilmography  = 17;
  pfCareer       = 18;
  pfAwards       = 19;
  pfPhoto        = 20;
  pfImageList    = 21;
3. I do not really like the idea of multiple sources in one script. I think it is better to have multiple scripts which you can combine using batch files...

Quote
If I had access to the url via script, I'd actually write a script to pull all the images from IMDB and put up a list to select from.  I'd write a dll and just deal with the whole thing, but I'm using the free c++ builder and it doesn't support adding components.  My old c++ builder supports components, but it's version 1.0 so current component structures aren't supported (ie. can't be added).

Why don't you want to use Visual Studio Express??? It's free and pretty full featured for this task. There are some C++ headers already made by another user some time ago: http://www.videodb.info/forum_en/index.php?topic=1211.0 (could be not quite compatible with the current configuration, but still a good point to start)

Too much government work...meetings are easy money.  Wait 'til you see my code...comments are everywhere, and have a debug mode built-in.  Should be a good source for future reference and easy (as it gets) to maintain.

Thanks for the very useful comments.

1) With the ability to verify people info against the html, silent mode isn't an issue.

2) Kewl.  I have IMDB parsed out, but will be able to reduce my code size significantly with this.

3) That was my original intent.  If you prefer this, I also do.  Multiple sources are very messy to deal with.  This also eliminates the need to do more than document the type of content on each site.

I'll backtrack and start churning out scripts.  I'll post them in batches of 6.

I'll have to look at Visual Studio Express sometime.  That's the Microsoft product?  If so, I have it on my hard drive somewhere.  I won't do any of the image gathering via dll though.  Well commented scripts will be much easier to maintain and I can't say I'll always be around to fix any issues that come up.

I'm on it right now.

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Silent Mode
« Reply #8 on: January 16, 2010, 04:52:29 am »
Quote
3) That was my original intent.  If you prefer this, I also do.  Multiple sources are very messy to deal with.  This also eliminates the need to do more than document the type of content on each site.

I understand why a separate script for each source is preferable, but I don't think many users are going to realize the full potential of your efforts if you leave it that way. Can you not do a "front-end" script that determines the user's choices (i.e., sources, quality, number of photos, etc.) and then calls the necessary scripts? Also, a person record can have multiple photos, but if I try to add one with a plugin (run on a selection of records), it will replace the existing photo. Maybe this is also something you can deal with in a front-end script, combining the user's instructions for add/replace, desired number of photos, etc.

What I'm imagining is something that make it very easy to experiment with settings to discover how to get a (or several) good photos in a reasonable time. If I knew the perfect settings, creating a batch file would be easy. But discovering those settings with experimental batch files would be rather cumbersome.

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #9 on: January 16, 2010, 05:13:50 am »
Spot on as usual.

The main thing right now is that the functionality doesn't exist at all.  It will definitely be unfriendly from a user perspective, but given the front-end that is required, it would hugely difficult for anyone without significant experience to deal with any issues that come up.

I'd definitely like to pre-process the information from PVD against the list of available sites, hit the best ones first and make it easy for the user.  This is a big task.  I can turn out a script in a very short time...it's just a copy / paste and change a few lines.

I haven't programmed in Pascal since Turbo Pascal days, and went to Turbo C immediately when it became available.  Part of this is my lack of knowledge of Pascal and the learning curve to do a proper implementation.  People that don't know what they're doing end up building ugly code that can't be maintained.

If I could simply call one routine from another, that would be great.  I don't see any mechanism for this, but maybe Nostra can help with that.  Given that scripts are byte-code, it would require a separate area in memory for each, which I have no idea the implications of.

At any rate, I do intend to follow the project to a point where I can (as a user) deal with it in a reasonable manner.  I don't see batch files as anything other than a stop-gap until I can call the scripts from a single source.  That may be a dll.  At this point, I admit I'm clueless about this.

Status Quo.  We'll find a better solution and a better implementation.  I'm all ears....

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: Silent Mode
« Reply #10 on: January 16, 2010, 03:17:22 pm »
The only possibility to combine multiple sources in a nice way having each source in a separate file/unit is to write a plugin like the Script Engine and make it do the job. It is a bit of work to do and frankly I am not sure I am getting why is it needed at all. If common settings are needed you can easily make all your scripts read settings from the same configuration file.
Gentlemen, you can’t fight in here! This is the War Room!

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #11 on: January 16, 2010, 08:58:51 pm »
I think the issue with batch files is that is no process control.

I have 54,000 people in my database X 54 sites is a lot of processing...most of it unnecessary.  I don't want to select a specific set of sites for each person.  If I'm looking for an image of an asian person, there are only 4 primary sites for me to look at...2 of which are asians-only.

In my best case scenario, here's what I'd like to have happen (from a user perspective):
1) user sets preferences (no nudity, largest images possible, etc).  I don't care where info is stored...other users might.
2) user selects every person in their people list and runs the script against the list.
3) script selects the sites, gathers links to images and displays the gallery links with a validation ranking
    - validation ranking would be based on finding the person's birth place, birthday, filmography on the page in close proximity to the person name or specific words ('born', 'birth...day/date/date of/place', 'role'), and the site description...ie. age of person on a vintage site is important.
4) In silent mode, the validation ranking would be used to determine the best image source.  Processing would exit after the user specified number of images were gathered, or all sites were processed.
5) Batch mode would unnecessary...it would be the execution of the silent mode code.

That's a perfect world to me, and that's where I'd like to get.  I also don't want to build a monster script that can't be modified or maintained by intermediate level computer users.  This is also the reason I'd like to stay in the scripting environment versus dll.  The problem with scripting is that the control code would be a few thousand lines long and each site would need a parsing routine or two...we could end up with 10,000+ lines of code and 150 functions / procedures.  That would be a maintenance nightmare.

The first baby step is to create the functionality...getting images.  Somehow, the final product will have to made more user friendly...maybe a higher level dll calling the scripting engine?

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Silent Mode
« Reply #12 on: January 16, 2010, 11:02:01 pm »
Quote
I also don't want to build a monster script that can't be modified or maintained by intermediate level computer users.

It seems to me it all being in one script is should make maintenance easier for everyone. Expecting users to maintain 54 script files and user-created batch files for using them is asking for trouble. If you've created a well structured and documented script where each source/site subroutine includes only the code that pertains to processing that site, it would be much easier for everyone to modify and maintain. I imagine it also makes it easier for you to add error-checking code that gracefully handles the situation where the code stops working due to change in the design of the target site. If the script still produces the desired result without such a site, this may substantially reduce the frequency of maintenance.

BTW, have you considered somehow incorporating TinEye Reverse Image Search into the script. This is a great tool for finding an "original" image from a thumbnail, crop or otherwise poor quality version. Maybe the script could do a TinEye search of images that don't meet a specified resolution threshold. If there are results, you would report the TinEye link instead of the original. If there are no results, it's a safe bet a better version of the image is not going to be found elsewhere. If it can't be included the process, but maybe you can add "search for this image on TinEye" links to search results.

mgpw4me@yahoo.com

  • Guest
Re: Silent Mode
« Reply #13 on: January 17, 2010, 01:06:19 am »
My code for parsing a site is about 300 lines.  Of that, about 150 lines are preample set up by Nostra.  It is much friendlier to debug 300 lines of specific code than to deal with the complex mechanism that would be required to process multiple sites.  Another concern is that with a single script, the results are immediately visible to the user.  With 50+ sites, in silent or batch mode, no messaging would be visible to indicate there were any problems.

It's something for me to think about.  I don't know (at the moment) how else we can get to a really usable interface.  I need several scripts done before I can look at connecting anything.  

One thing that looks promising is to sort the people list (birthday, place of birth) before invoking a specialized batch file for a group of people.  Grouping by Genre would be another way to limit the search...the adult genre would obviously relate to nudity sites and horror probably would be b-movie sites.  This sort of processing could be done in batch so separate scripts would work fine.  Being able to somehow determine the size of a filmography (somehow, not in a script) would be a reasonable indicator of the status of a person, so 20+ jobs (tv episodes and / or movies) and a current birthday would probably indicate a site that specializes in events / red carpet.  Now I'm 'in for it'...this is exactly the sort of thing I complain about most...it's non-intuitive & undocumented.  At least we'll have you, Rick, to keep us straight <grin>.

TinEye looks interesting.  It's limited at 100 images per day, but for special cases it might be good.  I can't see this sort of advanced processing occurring until late in the development, but I don't see why it couldn't be done.

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Silent Mode
« Reply #14 on: January 17, 2010, 05:56:12 am »
Quote
It's limited at 100 images per day...

I hadn't noticed that. There's also this in the TOS...

Quote
Automated searching on TinEye via search scripts will not be tolerated, and will result in blocking of your IP address and/or other termination of your TinEye account.

...which may give pause, although I'm sure the link idea is fine.

Quote
At least we'll have you, Rick, to keep us straight

As long as it finds the best available portrait for all 25,000 people in my database while I sleep, you have nothing to worry about.