English > Support

Imdb People script issues

<< < (4/9) > >>

Ivek23:

--- Quote from: afrocuban on December 23, 2024, 10:01:42 pm ---Your code also loops:



--- Quote ---1372: (12/23/2024 9:53:57 PM) Parsed Award:  Dorian Award
   Line   1379: (12/23/2024 9:53:57 PM) Award:  Dorian Award
   Line   1385: (12/23/2024 9:53:57 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   2646: (12/23/2024 9:53:59 PM) Parsed Award:  Dorian Award
   Line   2653: (12/23/2024 9:53:59 PM) Award:  Dorian Award
   Line   2659: (12/23/2024 9:53:59 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   3920: (12/23/2024 9:54:01 PM) Parsed Award:  Dorian Award
   Line   3927: (12/23/2024 9:54:01 PM) Award:  Dorian Award
   Line   3933: (12/23/2024 9:54:01 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   5194: (12/23/2024 9:54:02 PM) Parsed Award:  Dorian Award
   Line   5201: (12/23/2024 9:54:02 PM) Award:  Dorian Award
   Line   5207: (12/23/2024 9:54:02 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   6468: (12/23/2024 9:54:04 PM) Parsed Award:  Dorian Award
   Line   6475: (12/23/2024 9:54:04 PM) Award:  Dorian Award
   Line   6481: (12/23/2024 9:54:04 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   7742: (12/23/2024 9:54:06 PM) Parsed Award:  Dorian Award
   Line   7749: (12/23/2024 9:54:06 PM) Award:  Dorian Award
   Line   7755: (12/23/2024 9:54:06 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line   9016: (12/23/2024 9:54:08 PM) Parsed Award:  Dorian Award
   Line   9023: (12/23/2024 9:54:08 PM) Award:  Dorian Award
   Line   9029: (12/23/2024 9:54:08 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  10290: (12/23/2024 9:54:09 PM) Parsed Award:  Dorian Award
   Line  10297: (12/23/2024 9:54:09 PM) Award:  Dorian Award
   Line  10303: (12/23/2024 9:54:09 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  11564: (12/23/2024 9:54:11 PM) Parsed Award:  Dorian Award
   Line  11571: (12/23/2024 9:54:11 PM) Award:  Dorian Award
   Line  11577: (12/23/2024 9:54:11 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  12838: (12/23/2024 9:54:13 PM) Parsed Award:  Dorian Award
   Line  12845: (12/23/2024 9:54:13 PM) Award:  Dorian Award
   Line  12851: (12/23/2024 9:54:13 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  14112: (12/23/2024 9:54:15 PM) Parsed Award:  Dorian Award
   Line  14119: (12/23/2024 9:54:15 PM) Award:  Dorian Award
   Line  14125: (12/23/2024 9:54:15 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  15386: (12/23/2024 9:54:17 PM) Parsed Award:  Dorian Award
   Line  15393: (12/23/2024 9:54:17 PM) Award:  Dorian Award
   Line  15399: (12/23/2024 9:54:17 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  16660: (12/23/2024 9:54:18 PM) Parsed Award:  Dorian Award
   Line  16667: (12/23/2024 9:54:18 PM) Award:  Dorian Award
   Line  16673: (12/23/2024 9:54:18 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  17934: (12/23/2024 9:54:20 PM) Parsed Award:  Dorian Award
   Line  17941: (12/23/2024 9:54:20 PM) Award:  Dorian Award
   Line  17947: (12/23/2024 9:54:20 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  19208: (12/23/2024 9:54:21 PM) Parsed Award:  Dorian Award
   Line  19215: (12/23/2024 9:54:21 PM) Award:  Dorian Award
   Line  19221: (12/23/2024 9:54:21 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  20482: (12/23/2024 9:54:23 PM) Parsed Award:  Dorian Award
   Line  20489: (12/23/2024 9:54:23 PM) Award:  Dorian Award
   Line  20495: (12/23/2024 9:54:23 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  21756: (12/23/2024 9:54:25 PM) Parsed Award:  Dorian Award
   Line  21763: (12/23/2024 9:54:25 PM) Award:  Dorian Award
   Line  21769: (12/23/2024 9:54:25 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  23030: (12/23/2024 9:54:26 PM) Parsed Award:  Dorian Award
   Line  23037: (12/23/2024 9:54:27 PM) Award:  Dorian Award
   Line  23043: (12/23/2024 9:54:27 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  24304: (12/23/2024 9:54:28 PM) Parsed Award:  Dorian Award
   Line  24311: (12/23/2024 9:54:28 PM) Award:  Dorian Award
   Line  24317: (12/23/2024 9:54:28 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  25578: (12/23/2024 9:54:30 PM) Parsed Award:  Dorian Award
   Line  25585: (12/23/2024 9:54:30 PM) Award:  Dorian Award
   Line  25591: (12/23/2024 9:54:30 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  26852: (12/23/2024 9:54:32 PM) Parsed Award:  Dorian Award
   Line  26859: (12/23/2024 9:54:32 PM) Award:  Dorian Award
   Line  26865: (12/23/2024 9:54:32 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  28126: (12/23/2024 9:54:33 PM) Parsed Award:  Dorian Award
   Line  28133: (12/23/2024 9:54:33 PM) Award:  Dorian Award
   Line  28139: (12/23/2024 9:54:33 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  29400: (12/23/2024 9:54:35 PM) Parsed Award:  Dorian Award
   Line  29407: (12/23/2024 9:54:35 PM) Award:  Dorian Award
   Line  29413: (12/23/2024 9:54:35 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  30674: (12/23/2024 9:54:37 PM) Parsed Award:  Dorian Award
   Line  30681: (12/23/2024 9:54:37 PM) Award:  Dorian Award
   Line  30687: (12/23/2024 9:54:37 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  31948: (12/23/2024 9:54:39 PM) Parsed Award:  Dorian Award
   Line  31955: (12/23/2024 9:54:39 PM) Award:  Dorian Award
   Line  31961: (12/23/2024 9:54:39 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  33222: (12/23/2024 9:54:41 PM) Parsed Award:  Dorian Award
   Line  33229: (12/23/2024 9:54:41 PM) Award:  Dorian Award
   Line  33235: (12/23/2024 9:54:41 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  34496: (12/23/2024 9:54:43 PM) Parsed Award:  Dorian Award
   Line  34503: (12/23/2024 9:54:43 PM) Award:  Dorian Award
   Line  34509: (12/23/2024 9:54:43 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  35770: (12/23/2024 9:54:44 PM) Parsed Award:  Dorian Award
   Line  35777: (12/23/2024 9:54:44 PM) Award:  Dorian Award
   Line  35783: (12/23/2024 9:54:44 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  37044: (12/23/2024 9:54:46 PM) Parsed Award:  Dorian Award
   Line  37051: (12/23/2024 9:54:46 PM) Award:  Dorian Award
   Line  37057: (12/23/2024 9:54:46 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  38318: (12/23/2024 9:54:48 PM) Parsed Award:  Dorian Award
   Line  38325: (12/23/2024 9:54:48 PM) Award:  Dorian Award
   Line  38331: (12/23/2024 9:54:48 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  39592: (12/23/2024 9:54:50 PM) Parsed Award:  Dorian Award
   Line  39599: (12/23/2024 9:54:50 PM) Award:  Dorian Award
   Line  39605: (12/23/2024 9:54:50 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  40866: (12/23/2024 9:54:52 PM) Parsed Award:  Dorian Award
   Line  40873: (12/23/2024 9:54:52 PM) Award:  Dorian Award
   Line  40879: (12/23/2024 9:54:52 PM) Added Award to Database: Event=Ariel Awards, Mexico, Award= Dorian Award, Category=Screenplay of the Year, Recipient=Roma, Year=2014, Won: True
   Line  42140: (12/23/2024 9:54:54 PM) Parsed Award:  Dorian Award
   Line  42147: (12/23/2024 9:54:54 PM) Award:  Dorian Award
--- End quote ---


so that's one more thing to resolve
--- End quote ---

Yes, I know about this issue in the log file.

Ivek23:
Yes, I know about this problem, I have looked into the log files and found a partial solution, which will help.

Code that is not complete::

--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
  curPos, endPos: Integer;
  ItemList, Event, Award, Category, Recipient, Year: String;
  AValue: String; // Declaring AValue as a String
  Won: Boolean;
  FailSafe: Integer;  // To prevent infinite loops
  curPos1,curPos2,curPos3,curPos4,endPos1,endPos2:Integer;

Begin
  LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');


  try
    Result := prFinished;


    // Log the initial HTML snippet being parsed
    LogMessage('Initial HTML snippet: ' + Copy(HTML, 1, 500));


    // Find the position of the Awards title
    curPos := Pos('<h1 class="ipc-title__text">Awards</h1>', HTML);
    If curPos > 0 Then Begin
      // Find the position of the Awards section
      curPos := PosFrom('<section class="ipc-page-section ipc-page-section--base">', HTML, curPos);
    End;


    If curPos > 0 Then Begin
      // Find the end position of the Awards section
      endPos := PosFrom('<h3 class="ipc-title__text"><span id="contribute">Contribute to this page</span>', HTML, curPos);
      If endPos = 0 Then endPos := Length(HTML);


      If (curPos > 0) AND (endPos > curPos) Then Begin
        // Extract the Awards block
        ItemList := Copy(HTML, curPos, endPos - curPos);
      //LogMessage(ItemList);

       //While curPos > 0 Do Begin
          // Extract and log the award name
       
        // Extract and log the event name
      //curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, curPos);
      //curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, 1);
      curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text"><span id="ev', ItemList, 0);
      FailSafe := 0;  // Initialize fail-safe counter
      //While curPos > 0 Do Begin
      While (curPos > 0) And (FailSafe < 10) Do Begin
        // Extract and log the award name
         If curPos > 0 Then Begin
          curPos := PosFrom('>', ItemList, curPos) + 29;
          endPos := PosFrom('</span>', ItemList, curPos);
          Event := Copy(ItemList, curPos, endPos - curPos);
        LogMessage('** Parsed Event: ' + Event);
        Event := RemoveTagsEx0(Event);
          Event := Trim(Event);
        LogMessage('* Parsed Event: ' + Event);
        //Event := RemoveTagsEx1(Trim(Event));


          // Remove the <span> tag
          Event := Copy(Event, Pos('>', Event)  + 1 , Length(Event));
          LogMessage('Parsed Event: ' + Event);
       
      //(*
          // Parse each award item manually
        curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item"', ItemList, 1);
        //curPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos + 1);
          If curPos > 0 Then Begin
      //*) 
      //(*
          curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1  + 16);
          If curPos1 > 0 Then Begin
            curPos1 := PosFrom('>', ItemList, curPos1) + 1;
            endPos1 := PosFrom('</span>', ItemList, curPos1);
            Award := Copy(ItemList, curPos1, endPos1 - curPos1);
            LogMessage('Parsed Award: ' + Award);
      //*) 
 
 
 
      (*
                // Log the parameters before calling AddAward
                //LogMessage('Before calling AddAward with parameters:');
                //LogMessage('Event: ' + Event);
                //LogMessage('Award: ' + Award);
                //LogMessage('Category: ' + Category);
                //LogMessage('Recipient: ' + Recipient);
                //LogMessage('Year: ' + Year);
                //LogMessage('Won: ' + CustomBoolToStr(Won));
      *) 
 
                // Populate the custom field with AValue
                //AddCustomFieldValueByName('IMDb People Awards', AValue);
                //    LogMessage('IMDb People Awards added ' + AValue)
 
      //(*         
         //curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, EndPos1 + 10);
         curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1 + 0);
         //curPos1 := PosFrom('<span/class="ipc-metadata-list-summary-item__tst">', ItemList, curPos1);
         //curPos1 := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', ItemList, EndPos1);
          End Else LogMessage('Error: Award not found.');   
      //*) 
      //(*
        // Move to the next item
          curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item"', ItemList, curPos + 0);
          End Else LogMessage('Error: Awards not found.');   
      //*) 
       
       //curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text"><span id="ev', ItemList, curPos + 1);
       //curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, 1);
       //curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, curPos);
       curPos := PosFrom('?ref_=nmawd" class="ipc-title-link-wrapper" tabindex="0"><h3 class="ipc-title__text">', ItemList, EndPos);
        End Else LogMessage('Error: Event title div not found.');
      
      End;   

      End Else LogMessage('Error: Invalid endPos or curPos for Awards section');
    End Else LogMessage('Error: Awards section not found');
   
  except
    Begin
      LogMessage('Exception encountered');
      Result := prError;
    End;
  end;


  LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
  Result := prFinished;
End;
--- End quote ---

Log details:

--- Quote ---(24.12.2024 14:28:23)       Function DownloadPage END======================|
(24.12.2024 14:28:23) Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||
(24.12.2024 14:28:23) Initial HTML snippet: <!DOCTYPE html><html lang="en-US" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width"/><script>if(typeof uet === 'function'){ uet('bb', 'LoadTitle', {wb: 1}); }</script><script>window.addEventListener('load', (event) => {
        if (typeof window.csa !== 'undefined' && typeof window.csa === 'function') {
            var csaLatencyPlugin = window.csa('Content', {
             
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000386">Kids' Choice Awards, USA
(24.12.2024 14:28:23) * Parsed Event: Kids' Choice Awards, USA
(24.12.2024 14:28:23) Parsed Event: Kids' Choice Awards, USA
(24.12.2024 14:28:23) Parsed Award:  Blimp Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000616">Soap Opera Digest Awards
(24.12.2024 14:28:23) * Parsed Event: Soap Opera Digest Awards
(24.12.2024 14:28:23) Parsed Event: Soap Opera Digest Awards
(24.12.2024 14:28:23) Parsed Award:  Soap Opera Digest Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000716">Young Artist Awards
(24.12.2024 14:28:23) * Parsed Event: Young Artist Awards
(24.12.2024 14:28:23) Parsed Event: Young Artist Awards
(24.12.2024 14:28:23) Parsed Award:  Young Artist Award
(24.12.2024 14:28:23) ** Parsed Event: <span id="ev0000718">YoungStar Awards
(24.12.2024 14:28:23) * Parsed Event: YoungStar Awards
(24.12.2024 14:28:23) Parsed Event: YoungStar Awards
(24.12.2024 14:28:23) Parsed Award:  Young Artist Award
(24.12.2024 14:28:23) Function ParsePage_IMDBPeopleAWARDS END=====================||
(24.12.2024 14:28:23)     Provider data info retreived Ok in 2024-12-24 14:28:23|
(24.12.2024 14:28:23) Function ParsePage smNormal END======================|
(24.12.2024 14:28:23) Person -> LoadStatic -> 0ms
(24.12.2024 14:28:23) Person -> LoadMultivalues -> 0ms
(24.12.2024 14:28:23) Person -> LoadFilms -> 0ms
(24.12.2024 14:28:23) Person -> LoadAwards -> 0ms
(24.12.2024 14:28:23) Person -> LoadImages -> 0ms

--- End quote ---

<span id="ev0000718">YoungStar Awards

is helpful for which event the awards refer to

Ivek23:

--- Quote from: Ivek23 on December 23, 2024, 01:29:13 pm ---Here is the IMDB_People_[EN][HTTPS]_Awards script, which now correctly transfers Awards data to the awards field for the 'Chico' Hernandez person from the url added below using a Python Selenium script

https://www.imdb.com/name/nm0379491/awards/

I have corrected or added some parts of the code to your code and it works.

Python Selenium script instructions and code will be published probably by the new year in the Integrating Selenium to PVD topic.

http://www.videodb.info/forum_en/index.php/topic,4357.0.html
--- End quote ---

Python Selenium script is at the link below.

http://www.videodb.info/forum_en/index.php/topic,4362.msg22691.html#msg22691

IMDB_[EN][HTTPS]_TEST_Aka script in link below.

http://www.videodb.info/forum_en/index.php/topic,4363.0.html

Ivek23:
Unfortunately, I don't plan on working on any Imdb Awards section anymore for any updates or fixes to the movies or people code in Function ParsePage_IMDBMovieAWARDS. It's too complicated and completely inappropriate layout or notation of the Awards page source code to be able to edit it to properly record the Awards data.

afrocuban:
I completely understand. It is so complicated that even AI can't do anything about so far.
The best I could do is to get 2 functions.


The first parses all events, but none of the awards:

--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
  curPos, endPos: Integer;
  Event, Award, Category, Recipient, Year: String;
  Won: Boolean;
  FailSafe: Integer;  // To prevent infinite loops
Begin
  LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
  Result := prFinished;

  // Log the first 500 characters of the initial HTML snippet
  LogMessage('Initial HTML snippet (first 500 chars): ' + Copy(HTML, 1, 500));
  // Log the last 500 characters of the initial HTML snippet
  LogMessage('Initial HTML snippet (last 500 chars): ' + Copy(HTML, Length(HTML) - 499, 500));

  // Initialize the search for the first event section
  curPos := Pos('<section class="ipc-page-section ipc-page-section--base">', HTML);
  LogMessage('curPos after finding first event section: ' + IntToStr(curPos));

  If curPos > 0 Then Begin
    FailSafe := 0;  // Initialize fail-safe counter
    While (curPos > 0) And (FailSafe < 200) Do Begin
      // Ensure we don't exceed the HTML length
      If curPos >= Length(HTML) Then Break;

      // Extract the Event Name
      curPos := PosFrom('<span id="ev', HTML, curPos);
      If curPos > 0 Then Begin
        curPos := PosFrom('>', HTML, curPos) + 1;
        endPos := PosFrom('</span>', HTML, curPos);
        Event := Trim(Copy(HTML, curPos, endPos - curPos));
        LogMessage('Parsed Event: ' + Event);

        // Process each award within the event
curPos := endPos;
While (curPos > 0) And (curPos < Length(HTML)) And (PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) = 0) And (PosFrom('</section><div class="nas-slot">', HTML, curPos) = 0) Do Begin
  curPos := PosFrom('<div data-testid="sub-section-', HTML, curPos);
  If curPos > 0 Then Begin
    curPos := PosFrom('>', HTML, curPos) + 1;
    endPos := PosFrom('<>', HTML, curPos);
    Award := Copy(HTML, curPos, endPos - curPos);
   // LogMessage('Extracted Award Content: ' + Award);
         
            // Parse award details from the Award block
            // Extract Award Name
            curPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', HTML, curPos);
            If curPos > 0 Then Begin
              curPos := PosFrom('>', HTML, curPos) + 1;
              endPos := PosFrom('</span>', HTML, curPos);
              Award := Copy(HTML, curPos, endPos - curPos);
           
            End;
            // Extract Category
            curPos := PosFrom('<span class="ipc-metadata-list-summary-item__li awardCategoryName"', Award, 1);
            If curPos > 0 Then Begin
              curPos := PosFrom('>', Award, curPos) + 1;
              endPos := PosFrom('</span>', Award, curPos);
              Category := Copy(Award, curPos, endPos - curPos);
            End;

            // Extract Recipient
            curPos := PosFrom('<a class="ipc-metadata-list-summary-item__li ipc-metadata-list-summary-item__li--link"', Award, 1);
            If curPos > 0 Then Begin
              curPos := PosFrom('>', Award, curPos) + 1;
              endPos := PosFrom('</a>', Award, curPos);
              Recipient := Copy(Award, curPos, endPos - curPos);
            End;

            // Extract Year
            curPos := PosFrom('<a class="ipc-metadata-list-summary-item__t"', Award, 1);
            If curPos > 0 Then Begin
              curPos := PosFrom('>', Award, curPos) + 1;
              endPos := PosFrom(' ', Award, curPos);  // Find the space after the year
              Year := Copy(Award, curPos, endPos - curPos);
              Year := Trim(Year);
            End;

            // Determine if the award was won
            Won := PosFrom('Winner', Award, 1) > 0;

            // Add award to the database
            AddAward(Event, Award, Category, Recipient, Year, Won);
            If Won Then
              LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=True')
            Else
              LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=False');
          End;
        End;
      End;

      // Move to the next event or end of awards block
      If PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) > 0 Then
        curPos := PosFrom('</section><section class="ipc-page-section ipc-page-section--base">', HTML, curPos) + Length('</section><section class="ipc-page-section ipc-page-section--base">')
      Else If PosFrom('<div class="nas-slot">', HTML, curPos) > 0 Then Begin
        LogMessage('End of awards block detected.');
        Break;
      End Else Begin
        LogMessage('Error: Unable to identify next event or end of awards block.');
        Break;
      End;
      Inc(FailSafe);
    End;
  End Else LogMessage('Error: First event section not found');

  LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
  Result := prFinished;
End;
//BlockClose
--- End quote ---


The second one parses all awards and only first event, and assigns all the awards to that event:



--- Quote ---Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
  curPos, endPos, awardPos, categoryPos, recipientPos: Integer;
  Event, Award, Category, Recipient, Year: String;
  Won: Boolean;
Begin
  LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
  Result := prFinished;


  // Locate the start of the specific event section
  curPos := Pos('<section class="ipc-page-section ipc-page-section--base">', HTML);
  LogMessage('curPos after finding event section: ' + IntToStr(curPos));


  If curPos > 0 Then Begin
    // Extract event name
    curPos := PosFrom('<span id="ev', HTML, curPos);
    If curPos = 0 Then Begin
      LogMessage('Event name not found');
      Exit;
    End;
    curPos := PosFrom('>', HTML, curPos) + 1;
    endPos := PosFrom('</span>', HTML, curPos);
    Event := Trim(Copy(HTML, curPos, endPos - curPos));
    LogMessage('Parsed Event: ' + Event);
   
    curPos := endPos;


    // Process awards within this event
    While curPos > 0 Do Begin
      // Find next award div
      curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item">', HTML, curPos);
      If curPos = 0 Then Begin
        LogMessage('No more awards found in this event');
        Break;
      End;
      LogMessage('curPos after finding award div: ' + IntToStr(curPos));


      awardPos := curPos;  // Save the starting position of the award
      curPos := PosFrom('>', HTML, curPos) + 1;
      endPos := PosFrom('<></section>', HTML, curPos);  // Adjusted to the correct closing tag
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;  // No more awards
      End;


      Award := Copy(HTML, awardPos, endPos - awardPos);
      curPos := endPos + Length('<></section>');
      // LogMessage('Award Content Extracted Successfully: ' + Award);


      // Extract year
      awardPos := PosFrom('<a class="ipc-metadata-list-summary-item__t"', Award, 1);
      If awardPos = 0 Then Begin
        LogMessage('Year not found');
        Continue;
      End;
      awardPos := PosFrom('>', Award, awardPos) + 1;
      endPos := PosFrom(' ', Award, awardPos);  // Find the space after the year
      Year := Copy(Award, awardPos, endPos - awardPos);
      Year := Trim(Year);
      LogMessage('Parsed Year: ' + Year);


      // Determine if the award was won
      Won := PosFrom('Winner', Award, 1) > 0;
      If Won Then
        LogMessage('Parsed Won: True')
      Else
        LogMessage('Parsed Won: False');


      // Extract Category
      categoryPos := PosFrom('<span class="ipc-metadata-list-summary-item__li awardCategoryName"', Award, awardPos);
      LogMessage('EVE GA CAT: ' + IntToStr(categoryPos));
      If categoryPos > 0 Then Begin
        categoryPos := PosFrom('>', Award, categoryPos) + 1;
        endPos := PosFrom('</span>', Award, categoryPos);
        Category := Copy(Award, categoryPos, endPos - categoryPos);
      End;
      LogMessage('Parsed Category Name: ' + Category);


      // Extract recipient
      recipientPos := PosFrom('<a class="ipc-metadata-list-summary-item__li ipc-metadata-list-summary-item__li--link"', Award, categoryPos);
      If recipientPos = 0 Then Begin
        LogMessage('Recipient tag not found');
        Continue;
      End;
      recipientPos := PosFrom('>', Award, recipientPos) + 1;
      endPos := PosFrom('</a>', Award, recipientPos);
      Recipient := Copy(Award, recipientPos, endPos - recipientPos);
      LogMessage('Parsed Recipient: ' + Recipient);


      // Extract award name
      awardPos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', Award, awardPos);
      If awardPos = 0 Then Begin
        LogMessage('Award Name not found');
        Continue;
      End;
      awardPos := PosFrom('>', Award, awardPos) + 1;
      LogMessage('Parsed awardPos ' + IntToStr(awardPos));
      endPos := PosFrom('</span>', Award, awardPos);
      LogMessage('Parsed endPos ' + IntToStr(endPos));
      Award := Copy(Award, awardPos, endPos - awardPos);
      LogMessage('Parsed Award Name: ' + Award);


      // Add award to the database
      AddAward(Event, Award, Category, Recipient, Year, Won);
      If Won Then
        LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=True')
      Else
        LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + Award + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=False');


      // Advance curPos to ensure moving to the next award
      curPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item">', HTML, curPos);
    End;
  End Else LogMessage('Error: Event section not found');


  LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
  Result := prFinished;
End;
//BlockClose
--- End quote ---


For the sake of my life I cannot do anything to combine them, no matter what I try. Not even close... :o :'(


How's that even possible?


The page I'm trying to parse is attached, as well as the script which containes fixed genres and bio.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version