I don't see the scripting environment being a reasonable way to do what I want. The script engine works great for what it was designed to do, but even with improvements, it would still be slow and cumbersome for image collection. What I've found is that it's faster to invoke a script (saves typing), but once you find images, it's better to switch to a browser, download images to a directory, then load images into PVD from the directory. This process allows the selection, manipulation, and order of images to be controlled. It's also much faster and multiple images aren't an issue.
DLLs are too much trouble. In the time it takes to build a couple of DLLs, I can write a whole environment in PHP.
- PHP provides http via FOPEN, so I can grab exactly the files I want, with very little overhead. For example, many sites have the person name in the file path, so I can go directly to a file and start parsing. If the file doesn't exist, FOPEN returns an error.
- HTML and string functions to duplicate the PVD scripting environment either exist or would be easy to duplicate
- I can manipulate the images via the GDI or ImageMagik (sic?). For example, I can get the filesize that's being downloaded, do a colour histogram on it and determine from the data blocks what the image size is. Image size + histogram would be a reasonable way to determine if I have an image for a particular person, and whether a larger images was available. Conversion to grayscale prior to a histogram creation usually makes the process very reliable with 'altered' images. Histograms with a very limited colour range is to some extent a measure of image quality...not enough contrast = too dark or too bright. The ability to control images also allows me to set an exact image size for inclusion into PVD....2000 X 3000 just slows the database too much when displaying a person.
- I can write subroutine modules and include them as necessary to reduce the complexity of processing multiple websites
- I can build a nice checkbox selection / navigation routine in html
- I can connect directly to the database via the Interbase API
- There is also an language collation class so I can convert Élodie Bouchez to Elodie Bouchez, which makes searching more reliable on most sites. Given that there is a 'translated name' field, in the database, I could easily (?) populate this and use that when viewing people in my database.
It's a rich environment that would be hugely difficult to duplicate in the existing PVD scripting environment. I think php would make a great scripting engine, but given that it has direct access to the database, and to the user's file system, I can see where it would cause significant problems in the area of database integrity. To me, those are good things...I have images on my hard drive that I'd like to put into PVD in an automated fashion. I guess it would be possible to write a high-level dll to invoke php from PVD and eliminate unwanted accesses.
At any rate, I'm not done with my idea of collecting images. I'm simply changing the tools to something that makes an unreasonable task more reasonable.
If I can make the work I do reasonable for others to use, I'll post my code.
*** addendum ***
I've just tested command-line mode against a website and HTTP works without a server installed. Database access test is next.