Author Topic: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts  (Read 18336 times)

0 Members and 1 Guest are viewing this topic.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile

IF YOU DON'T READ THIS POST CAREFULLY AND FOLLOW EVERYTHING WRITTEN HERE, BUT JUST DOWNLOAD FILES, I COULD BET IT WILL NOT WORK FOR YOU AND YOU WILL COME BACK HERE ASKING QUESTIONS ALREADY ANSWERED IN THIS POST.


Almost 4 months after for the first time ever I heard the word "Selenium" knowing nothing about programming, I am finally bringing practically new PVD MOD considering amount of files and programs brought. It consists of the scripts and program as described here.

You now need only one script for IMDb movies, one for IMDb people and one for FilmAffinity movies for everything: search and download. Selenium scripts in the background are doing all "external" work, so in your PVD you have clean situation: 2 scripts and configurator for movies (plus .batch file for these 2 at your will), and one script and configurator for the people. Check the screenshots below.

I strongly suggest to rename your current "Scripts" folder to, for instance, "Scripts-Original", and to put this Scripts.7z in your PVD folder and extract it there. It will create "Scripts" folder with all the scripts and files needed for the PVD to work (as a bonus, I'm contributing source code for the Scripts Configurator program, as well as updated and polished UDL file for PVD scripts in Notepad++ that is just to be imported to Notepad++).  If you want to, after testing you can merge two folders, Selenium and non-Selenium scripts and files into "Scripts" folder.

Before that....
As stated here
ensure that:


Quote
A. You installed python
B. You installed selenium via cmd, with
Quote
pip install selenium

B. You installed requests via cmd, with
Quote
pip install requests

C. You have your Chrome bin on a PATH (to test this, open cmd and simply type "chrome" and check if Chrome opens).
D. You have Python folder on your PATH (to test this, open cmd and simply type "python --version" and check if got the proper feedback, for instance:
Quote
C:\Users\user>python --version
Python 3.12.6
E. pythonw.exe is not missing, or it's containing folder is on the PATH (to test this, open cmd and simply type "pythonw" and check if got the proper feedback, for instance:
Quote
C:\Users\user>pythonw

C:\Users\user> (empty output)


These scripts:


Quote
1. Use Chrome browser instead Firefox
2. Use chromedriver.exe instead geckodriver
3. Start chromedriver.exe silently
4. Silently invoke browser in a headless mode (no pop-up windows of browser)
5. Scrape .htm pages of a given urls
6. No path is needed to set manually inside the script - it is set to be relative to the path of selenium script!


You just use your PVD as ever, just be sure to extract as instructed above.

For using relative path, ensure:

Quote
6B. You put appropirate chromedriver.exe to the "Script" folder, too. There is no installation for chromedriver, just extract it from the .zip file into your "Scripts" folder described above. IMPORTANT!!!! You need to download chromedriver.exe of the same version as your Chrome browser. At the moment of this post, stable version is v134. You can find Crome browser download and appropriate chromedriver here. For example, for v134, Stable links are:
Chrome browser:

1. chrome   win32   https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/win32/chrome-win32.zip
2. chrome   win64   https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/win64/chrome-win64.zip
Chromedriver:

1. chromedriver   win32   https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/win32/chromedriver-win32.zip
2. chromedriver   win64   https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/win64/chromedriver-win64.zip

From this point on, everything is automated and headless, silent as never before.

Amount of data imported is huge! I have included dozens of new custom fields. IF YOU ARE DATA HUNGRY AS I AM, THE MOST DATA CAN BE COLLECTED IF YOU CHECK AND UNCHECK OPTIONS AS IN THE SCRIPTS CONFIGURATOR SCREENSHOTS BELOW. The updated table with all the fields in these 3 scripts can be found here. To see them all, you have to use Classic skin, or you have to add custom fields to your PVD and from there to create your own custom skin, or to use one of my skins from here, once I complete them and adjust them for the final Selenium v4 scripts.


Examine the table. That is the only way you will learn what fields comes from which movie/person page and if you want them or not. The less you want, the faster PVD will be.

Please feel free to test the scripts and give me a feedback if something doesn't work. When I say "it doesn't work", everything works, the only issue that can happen is that sites have changed html layout and again not all fileds are available, or you updated your Chrome automatically to a higher version and you didn't download and extract corespondent chromedriver version. The best indicator for this is that no .log file is created in "\Scripts\Tmp\" folder. I will update whatever you report in a month at most from the first report, to give us all the time to collect and report as many as possible issues.

What I have learned


On this long journey, what I have learned was how hard work coding is. Also, I had to learn pretty deep about Pascal/Delphi, about python, and most frustratingly - ahk! I had to revise all the scripts from the scratch several times. It was either because of the concepts i was developing along the way, or IMDb and FilmAffinity were changing their layout. For example, just yesterday new Chrome version was brought, and my chromedriver didn't work anymore, so I had to download new version of it too. Also, just 2 days ago I learned that FilmAffinity introduced their AKA for some movies, so I had to update FA script again. And so on and on for 4 months. Thus, I learned to appreciate it. The most important, now I even more appreciate EasyVVV's, and especially Ivek's work for more than a decade (!!!) to provide us with PVD alive!

So, humbly, I dedicate this hard work to EasyVVV, but most, and before anyone else I dedicate this to IVEK and to memory to his late mother! I HOPE IVEK CAN IMAGINE  HOW GRATEFUL TO HIM I PERSONALLY AM, AFTER I REALIZED HOW HARD THIS ALL WAS! GOD BLESS YOU IVEK!
« Last Edit: March 18, 2025, 12:39:52 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
New Selenium Scripts Configurator for PVD Selenium MOD v4
« Reply #1 on: March 13, 2025, 11:58:59 pm »
As i wrote, I completely rewrote Sripts configurator from the scratch practically in order to be able to bring all the options to automate scripts behaviour. It now is resizable and has scrollbars, so we can put there as much as we wish options from the scripts.

Also, the most of the data can be imported if you check and uncheck options as in the screenshots below. Feel free to test though. Don't be afraid! You can't mess your database, because backup files are created anyway!


READ THE MESSAGES UPON CLICKING "SAVE" BUTTON IN CONFIGURATOR. THOSE MESSAGES ARE VERY INFORMATIVE, HELPFUL AND ESSENTIAL TO UNDERSTAND WHAT IS GOING ON.
« Last Edit: March 14, 2025, 12:09:46 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
Re: PVD Selenium MOD v4
« Reply #2 on: March 14, 2025, 12:03:02 am »
Due to the post limit, in this message, Scripts Configurator  window shrinked as a proof of concept, with horizontal and vertical scrollbars. I haven't had no idea how huge challenge was to get scrollbars to this. AHK is pretty hard to get such feature and I lost at least 2 weeks to get it proper, so maybe in the future I will build this with python. I already tested it.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
IMDb ALL-IN-ONE SCRIPT
« Reply #3 on: March 23, 2025, 01:55:13 am »
IMPORTANT!!!

A few hours ago,
IMDb completely changed /fullcredits page html layout, so that page doesn't work any more. Soon, /reference page will be changed too. I know because I got popups offering me to peek to a new "Reference" page. So, until that happen, I will not update scripts, because both pages will share the same code again, and it will be easier to change. For now I made a quick fix everything to work if you check the options in Configurator as I suggested earlier. In addition you have to check "Download the Cast or Credit (text only) provider page to retrieve the full information. Or else, only the info from the main movie page will be downloaded." option and to download fullcredits page too!!! This should work until /reference page changes, or any other page changes meanwhile.

And it happened just when I finished ''all-in one script" while successfully doing final tests. Here's the pack.


Quote
So, with one IMDb Script you get all movies, Series, episode list, and then you apply the same script for episodes.


Also, new search window introduced, with different types of search and countdown of 10 seconds defaulted to "general" search.

It took only 600 additional lines comparing to Movie script, including a lot of commented out lines, and one simple python script to get all of this.

Extract and overwrite existing scripts with this pack.


I will soon start to re-birth AllMovie and RottentTomatoes scripts. I will not revive any other script.
« Last Edit: March 23, 2025, 02:47:15 am by afrocuban »

Offline Miguelh1020

  • User
  • ***
  • Posts: 47
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #4 on: September 12, 2025, 08:42:42 pm »
Hello! I'm trying to make it work with this new system, but when I go to scripts.7z Windows flags SeleniumPVDbScriptsConfig-v4.exe as a threat and won't let me unzip

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2872
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #5 on: September 13, 2025, 06:48:04 am »
Hello! I'm trying to make it work with this new system, but when I go to scripts.7z Windows flags SeleniumPVDbScriptsConfig-v4.exe as a threat and won't let me unzip

You probably mean the anti-virus program. It's a known issue, from my experiences with the NOD32 Antivirus program it was the same. The solution was when the antivirus program quarantined the SeleniumPVDbScriptsConfig-v4.exe file, I put it back out of quarantine and excluded it from being scanned again by the antivirus program. If necessary, I always repeated the described procedure and thus prevented it from being quarantined again by the antivirus program.

If it still won't be possible to unpack the file, let me know and I'll help you do it another way.

And this notice.

You need the SeleniumPVDbScriptsConfig-v4.exe file to be able to configure the script for transferring information.

It is very likely that the IMDb scripts will not work.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #6 on: October 19, 2025, 12:51:51 pm »
Soon, /reference page will be changed too. I know because I got popups offering me to peek to a new "Reference" page. So, until that happen, I will not update scripts, because both pages will share the same code again, and it will be easier to change.


Hello to all. As I already said in March, now we all know that this happened. I was busy meanwhile and had no time to deal with it. There are changes across many imdb pages, but reference page is the culprit. I will need couple of months to fix it, since you all know that I am not a programer, and I have to remind my self about everything, especially about special cases. For now, you can pull up significant portion of data
without reference page (unfortunately, not the whole cast). To do that and if you are using my scripts, open Script configurator, uncheck to download Reference page and PVD will restart. Then, in "Set Overwrite options...", check all the data you would want to download (Studio, etc..) and restart PVD. Then run the script and this would be what you can get at the moment.

After March, I had some minor adjustments to the scripts and started to work on a Reference page, so I'm uploading them so you could get the same amount data as me. Backup your existing scripts, then overwrite them with these, all to "Scripts" folder. Reminder: these scripts are just starting point to fix them, but they should get you more data comparing to March scripts could get you now, so please do not ask for the support for these scripts. I know they don't fully work.


What should be promising for you is that I'm not planning to abandon using PVD, so I will for sure fix the scripts at some point, so please be patient: if long time no see me, that just means I also haven't fixed the scripts, and I'm working on them. Meanwhile, just add your movies to PVD and later you will update with full data. That is exactly what I've been doing recently.

Best regards
« Last Edit: October 19, 2025, 03:03:52 pm by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2872
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #7 on: October 20, 2025, 04:19:15 pm »
IMDB_Movie_[EN][Selenium]-v4.psf

Between lines 138 and 139 is this:
Code: [Select]
  GET_FULL_CONNECTIONS  = False ; //To call Function ParsePage_IMDBMovieCONNECTIONS. Not used for parsing, since it is enough to choose both SIMPLE and COMPLEX connections.   --> to "True".
Between lines 7992 and 7993 is this:
Code: [Select]
If GET_FULL_CONNECTIONS Then
Begin


However, when you run the ** SeleniumPVDbScriptsConfig v4 ** script in PVD and uncheck all Connections settings and then restart PVD, the following happens.

IMDB_Movie_[EN][Selenium]-v4.psf does not work.

The following parts of the code are missing.

Between lines 138 and 139 is missing:
Code: [Select]
  GET_FULL_CONNECTIONS  = False ; //To call Function ParsePage_IMDBMovieCONNECTIONS. Not used for parsing, since it is enough to choose both SIMPLE and COMPLEX connections.   --> to "True".
However, between lines 7992 and 7993, this
Code: [Select]
[code] If GET_FULL_CONNECTIONS Then
Begin
[/code]

changes to this:
Code: [Select]
If GET_FULL_CONNECTIONS  = False ;


Add the following code to the script
Code: [Select]
//(*
function CustomStringReplace(const Source: string; const OldPattern: array of string; const NewPattern: array of string): string;
var
  i: Integer;
  ResultString: string;
begin
  ResultString := Source;
  for i := Low(OldPattern) to High(OldPattern) do
  begin
    ResultString := StringReplace(ResultString, OldPattern[i], NewPattern[i], True, False, True);
  end;
  Result := ResultString;
end; 
//*)

and then you can add the following code to the Metascore code.
Code: [Select]
ItemValue := CustomStringReplace(ItemValue, ['0</', '1</', '2</', '3</', '4</', '5</', '6</', '7</', '8</', '9</'], [',0', ',1', ',2', ',3', ',4', ',5', ',6', ',7', ',8', ',9']);
//ItemValue := StringReplace(ItemValue, '0</', '.0', True, False, True);
//ItemValue := StringReplace(ItemValue, '1</', '.1', True, False, True);
//ItemValue := StringReplace(ItemValue, '2</', '.2', True, False, True);
//ItemValue := StringReplace(ItemValue, '3</', '.3', True, False, True);
//ItemValue := StringReplace(ItemValue, '4</', '.4', True, False, True);
//ItemValue := StringReplace(ItemValue, '5</', '.5', True, False, True);
//ItemValue := StringReplace(ItemValue, '6</', '.6', True, False, True);
//ItemValue := StringReplace(ItemValue, '7</', '.7', True, False, True);
//ItemValue := StringReplace(ItemValue, '8</', '.8', True, False, True);
//ItemValue := StringReplace(ItemValue, '9</', '.9', True, False, True);

The same can be done with this.
Code: [Select]
    //Date ~Updated~ (choose simple or verbose version)
        Date := DateToStr(CurrentDateTime);
        //AddFieldValueXML('viewdate', Date); //Only date, don't admit time-. Set Seen value at the same time.
        //AddFieldValueXML('moddate', Date + ' ' + TimeToStr(CurrentDateTime)); //Block the dB saving
        ExplodeString(Date, DateParts, '-');
        Date:=DateParts[2]+'.'+ DateParts[1]+'.'+DateParts[0];
Date := CustomStringReplace(Date, ['01.', '02.', '03.', '04.', '05.', '06.', '07.', '08.', '09.'], ['1.', '2.', '3.', '4.', '5.', '6.', '7.', '8.', '9.']);
       // Date := StringReplace(Date, '01.', '1.', True, True, False);
        //Date := StringReplace(Date, '02.', '2.', True, True, False);
        //Date := StringReplace(Date, '03.', '3.', True, True, False);
        //Date := StringReplace(Date, '04.', '4.', True, True, False);
        //Date := StringReplace(Date, '05.', '5.', True, True, False);
        //Date := StringReplace(Date, '06.', '6.', True, True, False);
        //Date := StringReplace(Date, '07.', '7.', True, True, False);
        //Date := StringReplace(Date, '08.', '8.', True, True, False);
        //Date := StringReplace(Date, '09.', '9.', True, True, False);
        //AddCustomFieldValueByName('Updated', Date); // (Left for FA Script)
        //AddCustomFieldValueByName('Updated0', Date + ' at ' + TimeToStr(CurrentDateTime)); // Saved for RottenTomatoes
        AddCustomFieldValueByName('IUpdated', Date + ' at ' + TimeToStr(CurrentDateTime) + ' • ' + SCRIPT_FILE_NAME + ' ' + SCRIPT_VERSION);  // Annoying
        LogMessage('Function ParsePage -    Provider data info retreived Ok in ' + DateToStr(CurrentDateTime) + ' ' + TimeToStr(CurrentDateTime) + '| (~Updated~)');
        Mode := smFinished;
        LogMessage('Function ParsePage smNormal END====================== |');
        Exit;
    End;
« Last Edit: October 20, 2025, 04:21:02 pm by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2872
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #8 on: October 21, 2025, 07:36:21 am »
You need the latest chromedriver.exe, otherwise it won't work.

And this tip.

For the Episode List page, it would be necessary and even better if a script was made specifically for it.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #9 on: October 24, 2025, 01:33:43 am »

For the Episode List page, it would be necessary and even better if a script was made specifically for it.


Episode list works totally fine for me, even now. It would be probably impossible for me to create new script. We can look at it like this: Episode list has it's own script: it's Selenium script "Selenium_Chrome_IMDB_Episode_List_page_v4.py" which produces txt file with all episodes, and .psf script just scrapes that txt file, so I am not sure what we could achieve with additional psf script whose purpose would be only to scrape txt file.

Anyway, I have corrected FilmAffinity script, and made some improvements in IMDb Movie scripts (for example, storyline section  dynamic load scraping is fixed). I have improved FilmAffinity script speed enormously! In order this script to work properly and fast you need:
1. To install python 3.12+
2. in a cmd to install psutil with
Quote
pip install psutil
in order to hopefully prevent selenium hangups when html elements aren't found on the page.
3. To download and overwrite scripts I'm uploading in this post.
4. In a script configurator to deselect Reference page. PVD will restart. Then select "Studio" and "Description" and all others you want and do not restart PVD. Try to import data, and now you should get director, cast, tagline and some other original and related custom fields scraped from the Main page. If you don't get cast, tagline and director, restart PVD manually and try again to import. I am not sure about this second restart, so try both. One will work for sure.

I will not provide support for these scripts until I finish, because I know they still don't fully work. I just want to share with you same amount of data I'm getting at the moment when some significant improvement is done. In a pictures you can get the sense of what i'm getting now with IMDB script.

Once I finish IMDb script we will test and correct it together.
« Last Edit: October 24, 2025, 01:45:37 am by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2872
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #10 on: October 24, 2025, 06:03:06 am »

For the Episode List page, it would be necessary and even better if a script was made specifically for it.


Episode list works totally fine for me, even now. It would be probably impossible for me to create new script. We can look at it like this: Episode list has it's own script: it's Selenium script "Selenium_Chrome_IMDB_Episode_List_page_v4.py" which produces txt file with all episodes, and .psf script just scrapes that txt file, so I am not sure what we could achieve with additional psf script whose purpose would be only to scrape txt file.

We'll see how it works now. Before these fixes, there was a problem with the Episode List code for the movie title and the PVD froze, but if the code in the script was blocked and the episode field was unchecked, it worked without any problems.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #11 on: October 24, 2025, 10:47:52 pm »
I have optimized Selenium_Chrome_Base_page_v4.py script so now downloading of the IMDb main page should be dramatically faster - around 18 seconds on my computer.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2872
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #12 on: October 25, 2025, 09:20:01 am »
I have optimized Selenium_Chrome_Base_page_v4.py script so now downloading of the IMDb main page should be dramatically faster - around 18 seconds on my computer.

The Selenium_Chrome_Base_page_v4.py script works fine. However, it doesn't work at all with Firefox and Geckodriver options.

Changed the path for Geckodriver in Selenium_Chrome_Base_page_v4.py for Firefox script in 7.z file.
« Last Edit: October 26, 2025, 07:32:45 am by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 628
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #13 on: October 26, 2025, 10:55:50 pm »
Great. I have fixed Movie Connections, and full AKA. Also fixed Selenium_Chrome_Movie_Additional_pages_v4.py so now it clicks on "See more" and similar objects, again. I will not upload next iterations because I want to fix whole Base page function first, and  for me it looks that IMDb makes it harder and harder to scrape data, since for now, I'm often seeing errors IMDb killing connections to host when loading additional pages with selenium, so I have to investigate selenium and python options to mimic human browsing as best as possible.

If you still want to have current versions of my scripts, let me know so I could upload them with the manual how to get data I'm getting at the moment.