Yes, I just commented out reference code in the script, and I already intended to leave it there and everywhere it is mentioned, to preserve the logic for the future. I was actually talking to comment it out in the Script Configurator too, to avoid inexperienced users to click on those options expecting to get data from /reference page. I will move the function before ParsePage function, though, that is a nice tip.
Now, meanwhile, I introduced another function:
function WaitForPageFile(const FileName, PageLabel: string;
InitialWait: Integer; StabilizeMs: Integer): string; //BlockOpen
var
i, currentWait: Integer;
tryResult: string;
begin
Result := '';
i := 0;
// provide defaults manually
if InitialWait = 0 then
InitialWait := 2000;
if StabilizeMs = 0 then
StabilizeMs := 1000;
currentWait := InitialWait;
while not FileExists(FileName) do
begin
LogMessage('Function DownloadPage - Waiting ' + IntToStr(currentWait div 1000) +
's for the presence of: ' + FileName + '|');
Wait(currentWait);
// escalate wait times
case i of
0: currentWait := 5000;
1: currentWait := 15000;
2: currentWait := 10000;
3: currentWait := 10000;
4: currentWait := 15000;
5: currentWait := 15000;
end;
Inc(i);
if i = INTERNET_TEST_ITERATIONS then
begin
case MessageBox('IMDb Movie Function DownloadPage - Too many faulty attempts to internet connection for ' + PageLabel +
'. Cancel, Retry, or Continue (Ignore)? NOTE: IF YOU PRESS IGNORE YOU WILL NOT GET DATA FROM THAT PAGE, SO CONSIDER TO RETRY OR TO CANCEL AND START DOWNLOAD AGAIN! IMDb really makes it harder and harder to get the data.',
SCRIPT_FILE_NAME, 2) of
3: // Cancel
begin
LogMessage('Function DownloadPage for ' + PageLabel +
' ended with NO INTERNET connection ===============|');
Result := '';
Exit;
end;
4: // Retry
begin
i := 0;
currentWait := InitialWait;
end;
5: // Ignore
begin
LogMessage('Function DownloadPage - Creating dummy ' + PageLabel +
' HTML file due to Ignore selection|');
with TStringList.Create do
try
Add('<html><body>Dummy ' + PageLabel + ' due to user Ignore.</body></html>');
SaveToFile(FileName);
finally
Free;
end;
Break;
end;
end;
end;
end;
// stabilization wait
LogMessage('Function DownloadPage - ' + PageLabel +
' file detected, waiting extra ' + IntToStr(StabilizeMs) + 'ms to stabilize...');
Wait(StabilizeMs);
// manual error handling instead of try/except
if not FileExists(FileName) then
begin
LogMessage(ProcException('FileError', 'NOT_FOUND: ' + FileName));
Exit;
end;
tryResult := FileToString(FileName); // if your environment throws, replace with safe read
if tryResult = '' then
LogMessage(ProcException('FileError', 'FAILED to read ' + FileName))
else
begin
Result := ConvertEncoding(tryResult, 65001);
LogMessage(ProcException('FileRead', 'SUCCESS: ' + FileName));
end;
end; //BlockClose
so, now instead this snippet for every page:
// Initialize currentWait for the FullCredits file
currentWait := 5000; // Start with 5 seconds
//Wait for the FullCredits file to finish downloading
If Not(((USE_SAVED_PVDCONFIG) And ((ConfigOptions[8] = '0') And (ConfigOptions[10] = '0'))) Or
(GET_FULL_CREDIT_FROM_REFERENCE And ((Pos('Series', MediaType) = 0) And (Pos('Series', GetFieldValueXML('category')) = 0)))) Then Begin // Also Known As (FullCredits)
i := 0;
currentWait := 2000; // Initialize wait time
while not FileExists(FilePath + FileTitleFullCredits) do begin
LogMessage('Function DownloadPage - Waiting ' + IntToStr(currentWait div 1000) + 's (because of the people like Alfonso Cuaron with 268 wins and 208 nominations at the moment of writing this script) for the presence of: ' + FilePath + FileTitleFullCredits);
wait(currentWait);
// Increment the wait time for the next iteration
case i of
0: currentWait := 5000; // 5 seconds
1: currentWait := 15000; // 15 seconds
2: currentWait := 10000; // 10 seconds
3: currentWait := 10000; // 10 seconds
4: currentWait := 15000; // 15 seconds
5: currentWait := 15000; // 15 seconds
end;
i := i + 1;
if i = INTERNET_TEST_ITERATIONS then
begin
case MessageBox('IMDb Movie Function DownloadPage - Too many faulty attempts to internet connection for FullCredits. Cancel, Retry, or Continue (Ignore)? NOTE: IF YOU PRESS IGNORE YOU WILL NOT GET DATA FROM THAT PAGE, SO CONSIDER TO RETRY OR TO CANCEL AND START DOWNLOAD AGAIN! IMDb really makes it harder and harder to get the data.', SCRIPT_FILE_NAME, 2) of
3: // Abort -> treat as Cancel
begin
LogMessage('Function DownloadPage for FullCredits END with NO INTERNET connection =============== |');
Result := '';
Exit;
end;
4: // Retry
begin
i := 0;
currentWait := 2000; // Reset wait time
end;
5: // Ignore->create dummy file
begin
LogMessage('Creating dummy FullCredits HTML file due to Ignore selection...');
with TStringList.Create do
try
Add('<html><body>Dummy FullCredits due to user Ignore.</body></html>');
SaveToFile(FilePath + FileTitleFullCredits);
finally
Free;
end;
Result := FilePath + FileTitleFullCredits;
end;
end;
end;
end;
// Add a short stabilization wait after the file is recognized
LogMessage('Function DownloadPage - CHANGETHISWITHPROPERFILENAME file detected, waiting extra 1s to stabilize...');
Wait(2000); // wait 2 second (adjust as needed)
WebText := FileToString(FilePath + FileTitleFullCredits);
WebText := ConvertEncoding(WebText, 65001); // UTF-8
FullCreditsPageDownloaded := True;
LogMessage('Function DownloadPage - FullCredits file found: ' + FilePath + FileTitleFullCredits);
LogMessage('Value of FullCreditsPageDownloaded: ' + BoolToStr(FullCreditsPageDownloaded));
end;
we will have only this
if not (USE_SAVED_PVDCONFIG and (ConfigOptions[15] = '0')) then
begin
WebText := WaitForPageFile(FilePath + FileTitleParentalGuide, 'ParentalGuide', 5000, 2000);
ParentalGuidePageDownloaded := WebText <> '';
LogMessage('Function DownloadPage - Value of ParentalGuidePageDownloaded: ' +
BoolToStr(ParentalGuidePageDownloaded));
end;
That way we will speed up the script and have hundreds if not thousands of lines less in the script.
I will do that and test for all repetitive tasks.
Also, I'm exploring ways IMDb not to refuse connections with selenium/python and testing at the moment creating fake userData folders, rotating user agents, and it goes good for now.