Author Topic: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts  (Read 19592 times)

0 Members and 1 Guest are viewing this topic.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #20 on: November 06, 2025, 03:00:47 pm »
I forgot to upload ahk for the Script Configurator .exe


SeleniumPVDbScriptsConfig-v4.exe is compiled with ahk2exe for AutoHotkey v1.1.37.02 option "U32 (default) bin", without compression.
« Last Edit: November 06, 2025, 06:13:24 pm by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #21 on: November 07, 2025, 11:00:17 pm »
I have finished CompanyCredits function and now no need for Reference page at all. Consequently, I have updated, redesigned and compiled Script Configurator.

Now I need to update FullCredits function, meaning to get full cast & crew, directors for series, producers and composers and everything will be done. I will not clean Reference page from the code. I will only comment it out, so it could be possibly used in the future IMDb page changes.

Most probably I will not post again until finishing everything.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #22 on: November 10, 2025, 12:13:09 am »
I have completed FullCredits function, so now all the data can be imported again to PVD. Before I publish it I want to check 2 things:
1. What to do with Reference page code, and should I exclude its options from Script Configurator, since now it is completely not needed.
2. To check People script and if I can fix it quickly, then I will publish all the scripts and files again in one, final package for this IMDb html layout change.

After that please check the scripts and let me know what doesn't work

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2877
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #23 on: November 10, 2025, 07:46:03 am »
I have completed FullCredits function, so now all the data can be imported again to PVD. Before I publish it I want to check 2 things:
1. What to do with Reference page code, and should I exclude its options from Script Configurator, since now it is completely not needed.
2. To check People script and if I can fix it quickly, then I will publish all the scripts and files again in one, final package for this IMDb html layout change.

After that please check the scripts and let me know what doesn't work

First of all, Function ParsePage_IMDBMovieREFERENCE should be moved to the end of the script before Function ParsePage.

Secondly, the entire part of the reference page code should be left and, as mentioned above, before Function ParsePage, or even better, it should be moved to the very end of the script, where there is already a History of changes, so that it can be re-included in the script if necessary in the future or completely blocked. All its options should also be excluded from both the script code and the script configurator, because now, as mentioned, it is no longer needed.

As for the People script, I don't know what works or doesn't work, because I don't use it.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #24 on: November 11, 2025, 01:52:13 pm »
Yes, I just commented out reference code in the script, and I already intended to leave it there and everywhere it is mentioned, to preserve the logic for the future. I was actually talking to comment it out in the Script Configurator too, to avoid inexperienced users to click on those options expecting to get data from /reference page. I will move the function before ParsePage function, though, that is a nice tip.

Now, meanwhile, I introduced another function:

Quote
function WaitForPageFile(const FileName, PageLabel: string;
  InitialWait: Integer; StabilizeMs: Integer): string; //BlockOpen
var
  i, currentWait: Integer;
  tryResult: string;
begin
  Result := '';
  i := 0;

  // provide defaults manually
  if InitialWait = 0 then
    InitialWait := 2000;
  if StabilizeMs = 0 then
    StabilizeMs := 1000;

  currentWait := InitialWait;

  while not FileExists(FileName) do
  begin
    LogMessage('Function DownloadPage - Waiting ' + IntToStr(currentWait div 1000) +
               's for the presence of: ' + FileName + '|');
    Wait(currentWait);

    // escalate wait times
    case i of
      0: currentWait := 5000;
      1: currentWait := 15000;
      2: currentWait := 10000;
      3: currentWait := 10000;
      4: currentWait := 15000;
      5: currentWait := 15000;
    end;

    Inc(i);
    if i = INTERNET_TEST_ITERATIONS then
    begin
      case MessageBox('IMDb Movie Function DownloadPage - Too many faulty attempts to internet connection for ' + PageLabel +
                      '. Cancel, Retry, or Continue (Ignore)? NOTE: IF YOU PRESS IGNORE YOU WILL NOT GET DATA FROM THAT PAGE, SO CONSIDER TO RETRY OR TO CANCEL AND START DOWNLOAD AGAIN! IMDb really makes it harder and harder to get the data.',
                      SCRIPT_FILE_NAME, 2) of
        3: // Cancel
        begin
          LogMessage('Function DownloadPage for ' + PageLabel +
                     ' ended with NO INTERNET connection ===============|');
          Result := '';
          Exit;
        end;
        4: // Retry
        begin
          i := 0;
          currentWait := InitialWait;
        end;
        5: // Ignore
        begin
          LogMessage('Function DownloadPage - Creating dummy ' + PageLabel +
                     ' HTML file due to Ignore selection|');
          with TStringList.Create do
          try
            Add('<html><body>Dummy ' + PageLabel + ' due to user Ignore.</body></html>');
            SaveToFile(FileName);
          finally
            Free;
          end;
          Break;
        end;
      end;
    end;
  end;

  // stabilization wait
  LogMessage('Function DownloadPage - ' + PageLabel +
             ' file detected, waiting extra ' + IntToStr(StabilizeMs) + 'ms to stabilize...');
  Wait(StabilizeMs);

  // manual error handling instead of try/except
  if not FileExists(FileName) then
  begin
    LogMessage(ProcException('FileError', 'NOT_FOUND: ' + FileName));
    Exit;
  end;

  tryResult := FileToString(FileName); // if your environment throws, replace with safe read
  if tryResult = '' then
    LogMessage(ProcException('FileError', 'FAILED to read ' + FileName))
  else
  begin
    Result := ConvertEncoding(tryResult, 65001);
    LogMessage(ProcException('FileRead', 'SUCCESS: ' + FileName));
  end;
end; //BlockClose

so, now instead this snippet for every page:

Quote

     
   // Initialize currentWait for the FullCredits file
      currentWait := 5000;  // Start with 5 seconds
      //Wait for the FullCredits file to finish downloading
      If Not(((USE_SAVED_PVDCONFIG) And ((ConfigOptions[8] = '0') And (ConfigOptions[10] = '0'))) Or
         (GET_FULL_CREDIT_FROM_REFERENCE And ((Pos('Series', MediaType) = 0) And (Pos('Series', GetFieldValueXML('category')) = 0)))) Then Begin // Also Known As (FullCredits)
      i := 0;
      currentWait := 2000;  // Initialize wait time
      while not FileExists(FilePath + FileTitleFullCredits) do begin
         LogMessage('Function DownloadPage - Waiting ' + IntToStr(currentWait div 1000) + 's (because of the people like Alfonso Cuaron with 268 wins and 208 nominations at the moment of writing this script) for the presence of: ' + FilePath + FileTitleFullCredits);
         wait(currentWait);
          // Increment the wait time for the next iteration
         case i of
            0: currentWait := 5000;  // 5 seconds
            1: currentWait := 15000;  // 15 seconds
            2: currentWait := 10000;  // 10 seconds
            3: currentWait := 10000;  // 10 seconds
            4: currentWait := 15000;  // 15 seconds
            5: currentWait := 15000;  // 15 seconds
         end;
         i := i + 1;
         if i = INTERNET_TEST_ITERATIONS then
         begin
            case MessageBox('IMDb Movie Function DownloadPage - Too many faulty attempts to internet connection for FullCredits. Cancel, Retry, or Continue (Ignore)? NOTE: IF YOU PRESS IGNORE YOU WILL NOT GET DATA FROM THAT PAGE, SO CONSIDER TO RETRY OR TO CANCEL AND START DOWNLOAD AGAIN! IMDb really makes it harder and harder to get the data.', SCRIPT_FILE_NAME, 2) of
               3: // Abort -> treat as Cancel
               begin
               LogMessage('Function DownloadPage for FullCredits END with NO INTERNET connection =============== |');
                  Result := '';
                  Exit;
               end;
               4: // Retry
               begin
                  i := 0;
                  currentWait := 2000;  // Reset wait time
               end;
               5: // Ignore->create dummy file
               begin
                  LogMessage('Creating dummy FullCredits HTML file due to Ignore selection...');
                  with TStringList.Create do
                  try
                     Add('<html><body>Dummy FullCredits due to user Ignore.</body></html>');
                     SaveToFile(FilePath + FileTitleFullCredits);
                  finally
                     Free;
                  end;
                  Result := FilePath + FileTitleFullCredits;
               end;
            end;
         end;
      end;


    // Add a short stabilization wait after the file is recognized
    LogMessage('Function DownloadPage - CHANGETHISWITHPROPERFILENAME file detected, waiting extra 1s to stabilize...');
    Wait(2000); // wait 2 second (adjust as needed)


      WebText := FileToString(FilePath + FileTitleFullCredits);
      WebText := ConvertEncoding(WebText, 65001); // UTF-8
      FullCreditsPageDownloaded := True;
      LogMessage('Function DownloadPage - FullCredits file found: ' + FilePath + FileTitleFullCredits);
      LogMessage('Value of FullCreditsPageDownloaded: ' + BoolToStr(FullCreditsPageDownloaded));
   end;


we will have only this


Quote



if not (USE_SAVED_PVDCONFIG and (ConfigOptions[15] = '0')) then
begin
  WebText := WaitForPageFile(FilePath + FileTitleParentalGuide, 'ParentalGuide', 5000, 2000);
  ParentalGuidePageDownloaded := WebText <> '';
  LogMessage('Function DownloadPage - Value of ParentalGuidePageDownloaded: ' +
             BoolToStr(ParentalGuidePageDownloaded));
end;


That way we will speed up the script and have hundreds if not thousands of lines less in the script.


I will do that and test for all repetitive tasks.

Also, I'm exploring ways IMDb not to refuse connections with selenium/python and testing at the moment creating fake userData folders, rotating user agents, and it goes good for now.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #25 on: November 11, 2025, 01:59:41 pm »
As of now, per title, I'm getting these timings:
Main Page downloading: ~18-22sec
Other 9 pages Page downloading: ~48-55 sec
Image downloading: ~6-8 sec
PVD script processing: ~5-10 sec

So in total for now, to get biggest possible amount of data per title, especially those with many awards, connections, etc... it is needed ~77-95 sec which is pretty acceptable for the amount of data.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #26 on: November 11, 2025, 02:04:52 pm »
Oh, and please remind me to upload purge_tmp_files.vbs if I forget, because I changed it by adding to delete fake UserData folders now too

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2877
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #27 on: November 11, 2025, 10:06:06 pm »
Oh, and please remind me to upload purge_tmp_files.vbs if I forget, because I changed it by adding to delete fake UserData folders now too

Ok.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts v4.2
« Reply #28 on: November 13, 2025, 07:40:43 pm »

Here are all scripts and files fully updated, fixed and polished in a less than a month I started to fix all 16 of them, and I was so happy I got back into it easily and quickly. I have tested all scripts and files against many border case titles and persons and for me everything worked more than smooth and satisfying.

They are now faster and more stable and I am not facing anymore internet interruptions, because I heavily redesigned the most problematic python selenium scripts.

If especially Selenium_Chrome_Movie_Additional_pages_v4.py script is demanding for your CPU when downloading movies and you experience lags of any kind, open the file in Notepad++ and i
n the line 375:

Quote
with ThreadPoolExecutor(max_workers=4) as executor:


reduce number 4 to 3, 2 or 1, just test it. Whenever you lessen the number, the process of downloading files will be longer, so find your balance. If you have good CPU and a lot of RAM, then you can even increase the number above 4.

I'd be happy to further fine tuning and fix it, so please let me know about each case details so I could reproduce it too and then being able to fix it. If you have any further suggestion, I'd be happy to hear it as well while I didn't forget it again, but please explain why and how by giving specific examples, because I am not a programmer, but just using common sense and AI, and that is the only way I can understand the problem.

My plan is versions to stay on v4.2 for a long time unless something significant in their design changes.


Enjoy!
;) :)
« Last Edit: November 14, 2025, 02:52:36 pm by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 640
    • View Profile
Re: PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts
« Reply #29 on: November 14, 2025, 01:46:15 am »
My next goal is to include new switch in the Script Configurator - UPDATE_DYNAMIC_VALUES_ONLY, by adding few dozens of lines into movie selenium script that would call only main page and update only dynamic values like: Rating, Top 250, Bottom 100, Number of votes.  And for the Awards summary when the movie is not older than 2 years than current date catching fresh wins for recent releases.