English > Support
Imdb People script issues
Ivek23:
--- Quote from: afrocuban on December 23, 2024, 10:01:42 pm ---Your code also loops:
--- Quote ---1372: (12/23/2024 9:53:57 PM) Parsed Award: Dorian Award
Line 1379: (12/23/2024 9:53:57 PM) Award: Dorian Award
Line 1385: (12/23/2024 9:53:57 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 2646: (12/23/2024 9:53:59 PM) Parsed Award: Dorian Award
Line 2653: (12/23/2024 9:53:59 PM) Award: Dorian Award
Line 2659: (12/23/2024 9:53:59 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 3920: (12/23/2024 9:54:01 PM) Parsed Award: Dorian Award
Line 3927: (12/23/2024 9:54:01 PM) Award: Dorian Award
Line 3933: (12/23/2024 9:54:01 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 5194: (12/23/2024 9:54:02 PM) Parsed Award: Dorian Award
Line 5201: (12/23/2024 9:54:02 PM) Award: Dorian Award
Line 5207: (12/23/2024 9:54:02 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 6468: (12/23/2024 9:54:04 PM) Parsed Award: Dorian Award
Line 6475: (12/23/2024 9:54:04 PM) Award: Dorian Award
Line 6481: (12/23/2024 9:54:04 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 7742: (12/23/2024 9:54:06 PM) Parsed Award: Dorian Award
Line 7749: (12/23/2024 9:54:06 PM) Award: Dorian Award
Line 7755: (12/23/2024 9:54:06 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 9016: (12/23/2024 9:54:08 PM) Parsed Award: Dorian Award
Line 9023: (12/23/2024 9:54:08 PM) Award: Dorian Award
Line 9029: (12/23/2024 9:54:08 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 10290: (12/23/2024 9:54:09 PM) Parsed Award: Dorian Award
Line 10297: (12/23/2024 9:54:09 PM) Award: Dorian Award
Line 10303: (12/23/2024 9:54:09 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 11564: (12/23/2024 9:54:11 PM) Parsed Award: Dorian Award
Line 11571: (12/23/2024 9:54:11 PM) Award: Dorian Award
Line 11577: (12/23/2024 9:54:11 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 12838: (12/23/2024 9:54:13 PM) Parsed Award: Dorian Award
Line 12845: (12/23/2024 9:54:13 PM) Award: Dorian Award
Line 12851: (12/23/2024 9:54:13 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 14112: (12/23/2024 9:54:15 PM) Parsed Award: Dorian Award
Line 14119: (12/23/2024 9:54:15 PM) Award: Dorian Award
Line 14125: (12/23/2024 9:54:15 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 15386: (12/23/2024 9:54:17 PM) Parsed Award: Dorian Award
Line 15393: (12/23/2024 9:54:17 PM) Award: Dorian Award
Line 15399: (12/23/2024 9:54:17 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 16660: (12/23/2024 9:54:18 PM) Parsed Award: Dorian Award
Line 16667: (12/23/2024 9:54:18 PM) Award: Dorian Award
Line 16673: (12/23/2024 9:54:18 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 17934: (12/23/2024 9:54:20 PM) Parsed Award: Dorian Award
Line 17941: (12/23/2024 9:54:20 PM) Award: Dorian Award
Line 17947: (12/23/2024 9:54:20 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 19208: (12/23/2024 9:54:21 PM) Parsed Award: Dorian Award
Line 19215: (12/23/2024 9:54:21 PM) Award: Dorian Award
Line 19221: (12/23/2024 9:54:21 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 20482: (12/23/2024 9:54:23 PM) Parsed Award: Dorian Award
Line 20489: (12/23/2024 9:54:23 PM) Award: Dorian Award
Line 20495: (12/23/2024 9:54:23 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 21756: (12/23/2024 9:54:25 PM) Parsed Award: Dorian Award
Line 21763: (12/23/2024 9:54:25 PM) Award: Dorian Award
Line 21769: (12/23/2024 9:54:25 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 23030: (12/23/2024 9:54:26 PM) Parsed Award: Dorian Award
Line 23037: (12/23/2024 9:54:27 PM) Award: Dorian Award
Line 23043: (12/23/2024 9:54:27 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 24304: (12/23/2024 9:54:28 PM) Parsed Award: Dorian Award
Line 24311: (12/23/2024 9:54:28 PM) Award: Dorian Award
Line 24317: (12/23/2024 9:54:28 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 25578: (12/23/2024 9:54:30 PM) Parsed Award: Dorian Award
Line 25585: (12/23/2024 9:54:30 PM) Award: Dorian Award
Line 25591: (12/23/2024 9:54:30 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 26852: (12/23/2024 9:54:32 PM) Parsed Award: Dorian Award
Line 26859: (12/23/2024 9:54:32 PM) Award: Dorian Award
Line 26865: (12/23/2024 9:54:32 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 28126: (12/23/2024 9:54:33 PM) Parsed Award: Dorian Award
Line 28133: (12/23/2024 9:54:33 PM) Award: Dorian Award
Line 28139: (12/23/2024 9:54:33 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 29400: (12/23/2024 9:54:35 PM) Parsed Award: Dorian Award
Line 29407: (12/23/2024 9:54:35 PM) Award: Dorian Award
Line 29413: (12/23/2024 9:54:35 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 30674: (12/23/2024 9:54:37 PM) Parsed Award: Dorian Award
Line 30681: (12/23/2024 9:54:37 PM) Award: Dorian Award
Line 30687: (12/23/2024 9:54:37 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 31948: (12/23/2024 9:54:39 PM) Parsed Award: Dorian Award
Line 31955: (12/23/2024 9:54:39 PM) Award: Dorian Award
Line 31961: (12/23/2024 9:54:39 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 33222: (12/23/2024 9:54:41 PM) Parsed Award: Dorian Award
Line 33229: (12/23/2024 9:54:41 PM) Award: Dorian Award
Line 33235: (12/23/2024 9:54:41 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 34496: (12/23/2024 9:54:43 PM) Parsed Award: Dorian Award
Line 34503: (12/23/2024 9:54:43 PM) Award: Dorian Award
Line 34509: (12/23/2024 9:54:43 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 35770: (12/23/2024 9:54:44 PM) Parsed Award: Dorian Award
Line 35777: (12/23/2024 9:54:44 PM) Award: Dorian Award
Line 35783: (12/23/2024 9:54:44 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 37044: (12/23/2024 9:54:46 PM) Parsed Award: Dorian Award
Line 37051: (12/23/2024 9:54:46 PM) Award: Dorian Award
Line 37057: (12/23/2024 9:54:46 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 38318: (12/23/2024 9:54:48 PM) Parsed Award: Dorian Award
Line 38325: (12/23/2024 9:54:48 PM) Award: Dorian Award
Line 38331: (12/23/2024 9:54:48 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 39592: (12/23/2024 9:54:50 PM) Parsed Award: Dorian Award
Line 39599: (12/23/2024 9:54:50 PM) Award: Dorian Award
Line 39605: (12/23/2024 9:54:50 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 40866: (12/23/2024 9:54:52 PM) Parsed Award: Dorian Award
Line 40873: (12/23/2024 9:54:52 PM) Award: Dorian Award
Line 40879: (12/23/2024 9:54:52 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
Line 42140: (12/23/2024 9:54:54 PM) Parsed Award: Dorian Award
Line 42147: (12/23/2024 9:54:54 PM) Award: Dorian Award
--- End quote ---
so that's one more thing to resolve
--- End quote ---
Yes, I know about this issue in the log file.
Ivek23:
Yes, I know about this problem, I have looked into the log files and found a partial solution, which will help.
Code that is not complete::
--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
curPos, endPos: Integer;
ItemList, Event, Award, Category, Recipient, Year: String;
AValue: String; // Declaring AValue as a String
Won: Boolean;
FailSafe: Integer; // To prevent infinite loops
curPos1,curPos2,curPos3,curPos4,endPos1,endPos2:Integer;
Begin
LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
try
Result := prFinished;
// Log the initial HTML snippet being parsed
LogMessage('Initial HTML snippet: ' + Copy(HTML, 1, 500));
// Find the position of the Awards title
curPos := Pos('<h1 class="ipc-title__text">Awards</h1>', HTML);
If curPos > 0 Then Begin
// Find the position of the Awards section
curPos := PosFrom('<section class="ipc-page-section ipc-page-section--base">', HTML, curPos);
End;
If curPos > 0 Then Begin
// Find the end position of the Awards section
endPos := PosFrom('<h3 class="ipc-title__text"><span id="contribute">Contribute to this page</span>', HTML, curPos);
If endPos = 0 Then endPos := Length(HTML);
If (curPos > 0) AND (endPos > curPos) Then Begin
// Extract the Awards block
ItemList := Copy(HTML, curPos, endPos - curPos);
//LogMessage(ItemList);
//While curPos > 0 Do Begin
// Extract and log the award name
// Extract and log the event name
//curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, curPos);
//curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, 1);
curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text"><span id="ev', ItemList, 0);
FailSafe := 0; // Initialize fail-safe counter
//While curPos > 0 Do Begin
While (curPos > 0) And (FailSafe < 10) Do Begin
// Extract and log the award name
If curPos > 0 Then Begin
curPos := PosFrom('>', ItemList, curPos) + 29;
endPos := PosFrom('</span>', ItemList, curPos);
Event := Copy(ItemList, curPos, endPos - curPos);
LogMessage('** Parsed Event: ' + Event);
Event := RemoveTagsEx0(Event);
Event := Trim(Event);
LogMessage('* Parsed Event: ' + Event);
//Event := RemoveTagsEx1(Trim(Event));
// Remove the <span> tag
Event := Copy(Event, Pos('>', Event) + 1 , Length(Event));
LogMessage('Parsed Event: ' + Event);
//(*
// Parse each award item manually
curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item"', ItemList, 1);
//curPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos + 1);
If curPos > 0 Then Begin
//*)
//(*
curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1 + 16);
If curPos1 > 0 Then Begin
curPos1 := PosFrom('>', ItemList, curPos1) + 1;
endPos1 := PosFrom('</span>', ItemList, curPos1);
Award := Copy(ItemList, curPos1, endPos1 - curPos1);
LogMessage('Parsed Award: ' + Award);
//*)
(*
// Log the parameters before calling AddAward
//LogMessage('Before calling AddAward with parameters:');
//LogMessage('Event: ' + Event);
//LogMessage('Award: ' + Award);
//LogMessage('Category: ' + Category);
//LogMessage('Recipient: ' + Recipient);
//LogMessage('Year: ' + Year);
//LogMessage('Won: ' + CustomBoolToStr(Won));
*)
// Populate the custom field with AValue
//AddCustomFieldValueByName('IMDb People Awards', AValue);
// LogMessage('IMDb People Awards added ' + AValue)
//(*
//curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, EndPos1 + 10);
curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1 + 0);
//curPos1 := PosFrom('<span/class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1);
//curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, EndPos1);
End Else LogMessage('Error: Award not found.');
//*)
//(*
// Move to the next item
curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item"', ItemList, curPos + 0);
End Else LogMessage('Error: Awards not found.');
//*)
//curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text"><span id="ev', ItemList, curPos + 1);
//curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, 1);
//curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, curPos);
curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, EndPos);
End Else LogMessage('Error: Event title div not found.');
End;
End Else LogMessage('Error: Invalid endPos or curPos for Awards section');
End Else LogMessage('Error: Awards section not found');
except
Begin
LogMessage('Exception encountered');
Result := prError;
End;
end;
LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
Result := prFinished;
End;
--- End quote ---
Log details:
--- Quote ---(24.12.2024 14:28:23) Function DownloadPage END======================|
(24.12.2024 14:28:23) Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||
(24.12.2024 14:28:23) Initial HTML snippet: <!DOCTYPE html><html lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><script>if(typeof uet === 'function'){ uet('bb', 'LoadTitle', {wb: 1}); }</script><script>window.addEventListener('load', (event) => {
if (typeof window.csa !== 'undefined' && typeof window.csa === 'function') {
var csaLatencyPlugin = window.csa('Content', {
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000386">Kids' Choice Awards, USA
(24.12.2024 14:28:23) * Parsed Event: Kids' Choice Awards, USA
(24.12.2024 14:28:23) Parsed Event: Kids' Choice Awards, USA
(24.12.2024 14:28:23) Parsed Award: Blimp Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000616">Soap Opera Digest Awards
(24.12.2024 14:28:23) * Parsed Event: Soap Opera Digest Awards
(24.12.2024 14:28:23) Parsed Event: Soap Opera Digest Awards
(24.12.2024 14:28:23) Parsed Award: Soap Opera Digest Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000716">Young Artist Awards
(24.12.2024 14:28:23) * Parsed Event: Young Artist Awards
(24.12.2024 14:28:23) Parsed Event: Young Artist Awards
(24.12.2024 14:28:23) Parsed Award: Young Artist Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000718">YoungStar Awards
(24.12.2024 14:28:23) * Parsed Event: YoungStar Awards
(24.12.2024 14:28:23) Parsed Event: YoungStar Awards
(24.12.2024 14:28:23) Parsed Award: Young Artist Award
(24.12.2024 14:28:23) Function ParsePage_IMDBPeopleAWARDS END=====================||
(24.12.2024 14:28:23) Provider data info retreived Ok in 2024-12-24 14:28:23|
(24.12.2024 14:28:23) Function ParsePage smNormal END======================|
(24.12.2024 14:28:23) Person -> LoadStatic -> 0ms
(24.12.2024 14:28:23) Person -> LoadMultivalues -> 0ms
(24.12.2024 14:28:23) Person -> LoadFilms -> 0ms
(24.12.2024 14:28:23) Person -> LoadAwards -> 0ms
(24.12.2024 14:28:23) Person -> LoadImages -> 0ms
--- End quote ---
<span id="ev0000718">YoungStar Awards
is helpful for which event the awards refer to
Ivek23:
--- Quote from: Ivek23 on December 23, 2024, 01:29:13 pm ---Here is the IMDB_People_[EN][HTTPS]_Awards script, which now correctly transfers Awards data to the awards field for the 'Chico' Hernandez person from the url added below using a Python Selenium script
https://www.imdb.com/name/nm0379491/awards/
I have corrected or added some parts of the code to your code and it works.
Python Selenium script instructions and code will be published probably by the new year in the Integrating Selenium to PVD topic.
http://www.videodb.info/forum_en/index.php/topic,4357.0.html
--- End quote ---
Python Selenium script is at the link below.
http://www.videodb.info/forum_en/index.php/topic,4362.msg22691.html#msg22691
IMDB_[EN][HTTPS]_TEST_Aka script in link below.
http://www.videodb.info/forum_en/index.php/topic,4363.0.html
Ivek23:
Unfortunately, I don't plan on working on any Imdb Awards section anymore for any updates or fixes to the movies or people code in Function ParsePage_IMDBMovieAWARDS. It's too complicated and completely inappropriate layout or notation of the Awards page source code to be able to edit it to properly record the Awards data.
afrocuban:
I completely understand. It is so complicated that even AI can't do anything about so far.
The best I could do is to get 2 functions.
The first parses all events, but none of the awards:
--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
curPos, endPos: Integer;
Event, Award, Category, Recipient, Year: String;
Won: Boolean;
FailSafe: Integer; // To prevent infinite loops
Begin
LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
Result := prFinished;
// Log the first 500 characters of the initial HTML snippet
LogMessage('Initial HTML snippet (first 500 chars): ' + Copy(HTML, 1, 500));
// Log the last 500 characters of the initial HTML snippet
LogMessage('Initial HTML snippet (last 500 chars): ' + Copy(HTML, Length(HTML) - 499, 500));
// Initialize the search for the first event section
curPos := Pos('<section class="ipc-page-section ipc-page-section--base">', HTML);
LogMessage('curPos after finding first event section: ' + IntToStr(curPos));
If curPos > 0 Then Begin
FailSafe := 0; // Initialize fail-safe counter
While (curPos > 0) And (FailSafe < 200) Do Begin
// Ensure we don't exceed the HTML length
If curPos >= Length(HTML) Then Break;
// Extract the Event Name
curPos := PosFrom('<span id="ev', HTML, curPos);
If curPos > 0 Then Begin
curPos := PosFrom('>', HTML, curPos) + 1;
endPos := PosFrom('</span>', HTML, curPos);
Event := Trim(Copy(HTML, curPos, endPos - curPos));
LogMessage('Parsed Event: ' + Event);
// Process each award within the event
curPos := endPos;
While (curPos > 0) And (curPos < Length(HTML)) And (PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) = 0) And (PosFrom('</section><div class="nas-slot">', HTML, curPos) = 0) Do Begin
curPos := PosFrom('<div data-testid="sub-section-', HTML, curPos);
If curPos > 0 Then Begin
curPos := PosFrom('>', HTML, curPos) + 1;
endPos := PosFrom('<>', HTML, curPos);
Award := Copy(HTML, curPos, endPos - curPos);
// LogMessage('Extracted Award Content: ' + Award);
// Parse award details from the Award block
// Extract Award Name
curPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', HTML, curPos);
If curPos > 0 Then Begin
curPos := PosFrom('>', HTML, curPos) + 1;
endPos := PosFrom('</span>', HTML, curPos);
Award := Copy(HTML, curPos, endPos - curPos);
End;
// Extract Category
curPos := PosFrom('<span class="ipc-metadata-list-summary-item__li awardCategoryName"', Award, 1);
If curPos > 0 Then Begin
curPos := PosFrom('>', Award, curPos) + 1;
endPos := PosFrom('</span>', Award, curPos);
Category := Copy(Award, curPos, endPos - curPos);
End;
// Extract Recipient
curPos := PosFrom('<a class="ipc-metadata-list-summary-item__li ipc-metadata-list-summary-item__li--link"', Award, 1);
If curPos > 0 Then Begin
curPos := PosFrom('>', Award, curPos) + 1;
endPos := PosFrom('</a>', Award, curPos);
Recipient := Copy(Award, curPos, endPos - curPos);
End;
// Extract Year
curPos := PosFrom('<a class="ipc-metadata-list-summary-item__t"', Award, 1);
If curPos > 0 Then Begin
curPos := PosFrom('>', Award, curPos) + 1;
endPos := PosFrom(' ', Award, curPos); // Find the space after the year
Year := Copy(Award, curPos, endPos - curPos);
Year := Trim(Year);
End;
// Determine if the award was won
Won := PosFrom('Winner', Award, 1) > 0;
// Add award to the database
AddAward(Event, Award, Category, Recipient, Year, Won);
If Won Then
LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=True')
Else
LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=False');
End;
End;
End;
// Move to the next event or end of awards block
If PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) > 0 Then
curPos := PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) + Length('</section><section class="ipc-page-section ipc-page-section--base">')
Else If PosFrom('<div class="nas-slot">', HTML, curPos) > 0 Then Begin
LogMessage('End of awards block detected.');
Break;
End Else Begin
LogMessage('Error: Unable to identify next event or end of awards block.');
Break;
End;
Inc(FailSafe);
End;
End Else LogMessage('Error: First event section not found');
LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
Result := prFinished;
End;
//BlockClose
--- End quote ---
The second one parses all awards and only first event, and assigns all the awards to that event:
--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
curPos, endPos, awardPos, categoryPos, recipientPos: Integer;
Event, Award, Category, Recipient, Year: String;
Won: Boolean;
Begin
LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
Result := prFinished;
// Locate the start of the specific event section
curPos := Pos('<section class="ipc-page-section ipc-page-section--base">', HTML);
LogMessage('curPos after finding event section: ' + IntToStr(curPos));
If curPos > 0 Then Begin
// Extract event name
curPos := PosFrom('<span id="ev', HTML, curPos);
If curPos = 0 Then Begin
LogMessage('Event name not found');
Exit;
End;
curPos := PosFrom('>', HTML, curPos) + 1;
endPos := PosFrom('</span>', HTML, curPos);
Event := Trim(Copy(HTML, curPos, endPos - curPos));
LogMessage('Parsed Event: ' + Event);
curPos := endPos;
// Process awards within this event
While curPos > 0 Do Begin
// Find next award div
curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item">', HTML, curPos);
If curPos = 0 Then Begin
LogMessage('No more awards found in this event');
Break;
End;
LogMessage('curPos after finding award div: ' + IntToStr(curPos));
awardPos := curPos; // Save the starting position of the award
curPos := PosFrom('>', HTML, curPos) + 1;
endPos := PosFrom('<></section>', HTML, curPos); // Adjusted to the correct closing tag
If endPos = 0 Then Begin
LogMessage('No closing tag for award div found');
Break; // No more awards
End;
Award := Copy(HTML, awardPos, endPos - awardPos);
curPos := endPos + Length('<></section>');
// LogMessage('Award Content Extracted Successfully: ' + Award);
// Extract year
awardPos := PosFrom('<a class="ipc-metadata-list-summary-item__t"', Award, 1);
If awardPos = 0 Then Begin
LogMessage('Year not found');
Continue;
End;
awardPos := PosFrom('>', Award, awardPos) + 1;
endPos := PosFrom(' ', Award, awardPos); // Find the space after the year
Year := Copy(Award, awardPos, endPos - awardPos);
Year := Trim(Year);
LogMessage('Parsed Year: ' + Year);
// Determine if the award was won
Won := PosFrom('Winner', Award, 1) > 0;
If Won Then
LogMessage('Parsed Won: True')
Else
LogMessage('Parsed Won: False');
// Extract Category
categoryPos := PosFrom('<span class="ipc-metadata-list-summary-item__li awardCategoryName"', Award, awardPos);
LogMessage('EVE GA CAT: ' + IntToStr(categoryPos));
If categoryPos > 0 Then Begin
categoryPos := PosFrom('>', Award, categoryPos) + 1;
endPos := PosFrom('</span>', Award, categoryPos);
Category := Copy(Award, categoryPos, endPos - categoryPos);
End;
LogMessage('Parsed Category Name: ' + Category);
// Extract recipient
recipientPos := PosFrom('<a class="ipc-metadata-list-summary-item__li ipc-metadata-list-summary-item__li--link"', Award, categoryPos);
If recipientPos = 0 Then Begin
LogMessage('Recipient tag not found');
Continue;
End;
recipientPos := PosFrom('>', Award, recipientPos) + 1;
endPos := PosFrom('</a>', Award, recipientPos);
Recipient := Copy(Award, recipientPos, endPos - recipientPos);
LogMessage('Parsed Recipient: ' + Recipient);
// Extract award name
awardPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', Award, awardPos);
If awardPos = 0 Then Begin
LogMessage('Award Name not found');
Continue;
End;
awardPos := PosFrom('>', Award, awardPos) + 1;
LogMessage('Parsed awardPos ' + IntToStr(awardPos));
endPos := PosFrom('</span>', Award, awardPos);
LogMessage('Parsed endPos ' + IntToStr(endPos));
Award := Copy(Award, awardPos, endPos - awardPos);
LogMessage('Parsed Award Name: ' + Award);
// Add award to the database
AddAward(Event, Award, Category, Recipient, Year, Won);
If Won Then
LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=True')
Else
LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=False');
// Advance curPos to ensure moving to the next award
curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item">', HTML, curPos);
End;
End Else LogMessage('Error: Event section not found');
LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
Result := prFinished;
End;
//BlockClose
--- End quote ---
For the sake of my life I cannot do anything to combine them, no matter what I try. Not even close... :o :'(
How's that even possible?
The page I'm trying to parse is attached, as well as the script which containes fixed genres and bio.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version