English > PVD Python Scripts
New IMDb People v3 (Selenium) script comments
Ivek23:
New IMDb People v3 (Selenium) script comments
Here are some pieces of code that would be added to the script or updated in an existing script.
Here are the basic settings.
--- Quote ---//Script Options-------------------------------------------------------------------------------------------------------
//Retreive Data Config
USE_SAVED_PVDCONFIG = True ; //Use the Overwrite Options of the script saved in pvdconf.ini for avoid download not used pages. Remember PVD only save in exit.
//MAX_IMAGE_HEIGHT = 12000; //Height limit of the stored photos.
MAX_IMAGE_HEIGHT = 1200; //Height limit of the stored photos.
//MAX_IMAGE_HEIGHT = 500; //Height limit of the stored photos.
//Process Data Config
PHOTO_URL_IN_TRANSNAME = False ; //Use the PVD field ~transname~ for storing the URL to the person photo for send to KODI in a Template.
//BIRTH_NAME_IN_TRANSNAME = True ; //Use the PVD field ~transname~ for storing the person Birth Name for Biography Pages. // No works
BIRTH_NAME_IN_TRANSNAME = False ; //Use the PVD field ~transname~ for storing the person Birth Name for Biography Pages. // No works
GET_FULL_BIO = True ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_BIO = False ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//BIO_INFO_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
BIO_INFO_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
//BIO_URL_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
BIO_URL_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
//IMDB_MINI_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
IMDB_MINI_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
GET_FULL_CREDIT = True ; //Download Credits (text only) provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_CREDIT = False ; //Download Credits (text only) provider page for retreive the info. Otherwise only the info of the principal peple page.
GET_FULL_GENRES = True ; //Download Genres provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_GENRES = False ; //Download Genres provider page for retreive the info. Otherwise only the info of the principal peple page.
GET_FULL_AWARDS = True ; //Download Awards provider page for retreive the info. Otherwise doesn't do nothingh because no info in the principal movie page.
//GET_FULL_AWARDS = False ; //Download Awards provider page for retreive the info. Otherwise doesn't do nothingh because no info in the principal movie page.
EVENTS_LIMIT = 1000; //Limit of number of events (USA Academy Awards, Golden Globes, etc) to retrive awards.
//Process Behaviour Config
BYPASS_SILENT = True ; //Ensure critical ShowMessage alerts bypassing Silent PVdB preferences
CHECK_WEBSITE = False ; //Add to SearchResult List the true HTTPS links 'Just to check the website' with the browser
//CHECK_WEBSITE = True ; //Add to SearchResult List the true HTTPS links 'Just to check the website' with the browser
POSTER_IN_SEARCH = False ; //Download and show people posters in the list of the SearchResult
//POSTER_IN_SEARCH = True ; //Download and show people posters in the list of the SearchResult
//SEARCH_ENGINE = True ; //If there isn't provider search results, try with Bing search engine
SEARCH_ENGINE = False ; //If there isn't provider search results, try with Bing search engine
PHOTO_DWN_RONDABOUT = True ; //Activate the "HTTPS image download function" and the "ImageListSearch exit" as RONDABOUT (bypass a bug) for download Photos.
// Because there is not choice (because its only one photo) normaly it download without asking but if PVdB begin to ask then
// with PVdB preference/Plugin/Silent Enable would be more confortable for large databases.*)
INTERNET_TEST_ITERATIONS = 6; //Attempts before to alert user that not internet connection detected. Increase if the provider has low speed.
//Script data------------------------------------------------------------------------------------------------------------
--- End quote ---
Downloading bio information without url links and without IMDb Mini Biography letters.
If these settings are changed
--- Quote --- BIO_URL_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
//BIO_URL_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
IMDB_MINI_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
//IMDB_MINI_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
--- End quote ---
then url links and IMDb Mini Biography letters.
--- Quote ---Function RemoveTagsEx00(AText:String):String; //BlockOpen
//Ivek23 function for get faster the script
Var
B,E:Integer;
Begin
Result:=AText;
B:=PosFrom('<link url="',Result,1);
E:=PosFrom('">',Result,B);
While (B>0) AND (B<E) Do Begin
Delete(Result,B,E-B+2);
B:=Pos('<link url="',Result);
E:=Pos('">',Result);
End;
End; //BlockClose
--- End quote ---
Ivek23:
Function ParsePage_IMDBPersonBASE changes
--- Quote ---//(*
Function ParsePage_IMDBPersonBASE(HTML:String):Cardinal; //BlockOpen
//Returns:
// Result:=prFinished; Script has finished gathering data
// Result:=prListImage; As RONDABOUT (bypass a bug) for download Photos
// Result:=prError; if żany big problem? with exit
//Retrieve: IMDB has a json container, easy to scrap.
// ~url~, ~name~,~altnames~(NO),~birthday~,~birthplace~,~death~,
// If Not(GET_FULL_BIO) ~bio~
// ~Photo~,
// If PHOTO_URL_IN_TRANSNAME. ~transname~ The PVdB ~transname~ Translated Name not stored in TheMovieDB. Used for PhotoURL
// ~genre~(NO) Female or Male (Even if PVB Scripting Manual say 'comma separeted list' because is in the same list that movie ~genre~)
// ~comment~ Not used
// ~age~ Not used. Calculated in PVdB. ~age~
// ~dateadded~ Not used. Calculated in PVdB.
// "homepage": Not used by PVdB
Var
curPos,endPos,debug_pos1,index:Integer;
PhotoURL,ItemValue,ItemList,ImageFile:String;
PersonID,ItemValue0,ItemValue1,ItemValue2,ItemValue3:String;
jobTitle,AltNames,AltNames1,DeathAge:String;
ItemList0,ItemList1,ItemList2,ItemList4:String;
Title,Role,Year,MovieURL:String;
AwardsValue,AwardList:String;
Begin
--- End quote ---
Under the getGet ~Photo~ code, the set of Alternate Names code is removed and updated code is added as seen below.
--- Quote --- //(*
//Get ~Photo~ . Remember that the PVdB ~transname~ Translated Name is not stored in TheMovieDB. Can be used for PhotoURL
ItemValue:=TextBetWeenFirst(ItemList,'"image":"','",'); // WEB_SPECIFIC.
If (Length(ItemValue)>0) and (Pos('nopicture',ItemValue)=0)Then Begin //"https://m.media-amazon.com/images/G/01/imdb/images/nopicture/...' NOT exists working httpS
PhotoURL:=TextBetWeenFirst(ItemValue,BASE_URL_IMAGE_PRE_TRUE,'.'); //Get poster code. Strings which opens/closes the data. WEB_SPECIFIC
If ((Length(PhotoURL)>0) and Not(USE_SAVED_PVDCONFIG and (Copy(PVDConfigOptions,opPhoto,1)='0'))) then begin //The Poster will be saved in PVD
PhotoURL:=BASE_URL_IMAGE_PRE_TRUE + PhotoURL; //Base poster URL without '.jpg'. WEB_SPECIFIC
ImageFile:=GetAppPath+'Scripts\'+BASE_DOWNLOAD_FILE_IMAGE_NAME+'-Photo.jpg'
// Avoid HTTPS redirection: Download https image to file
If (1=DownloadImage(PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg',ImageFile)) then begin //Dowload with the selected user's max size. WEB_SPECIFIC
//LogMessage('Image successfully downloaded to: ' + ImageFile);
//Dowload in the selected user max size. WEB_SPECIFIC
//Log the actual value being added
//LogMessage('Adding image with URL: ' + ImageFile + ' and type: itPoster');
//Call the AddImageURL procedure
AddImageURL(itPoster,ImageFile); //Get the photo to the database.But I don't know why but it doesnt work: not retrive the photo like in movie poster
//Log a confirmation message after adding the image
LogMessage('AddImageURL in user-s size has been called with ImageType: ' + IntToStr(itPoster) + ' and ImageFile: ' + ImageFile);
AddSearchResult(GetFieldValueXML('name'), '', '', ImageFile, ImageFile); //It's not possible avoid GetFieldValueXML because the name can't be the same.
if PHOTO_URL_IN_TRANSNAME then AddFieldValueXML('transname',PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg'); //For storing the URL to the person photo, for send to KODI in a Template
//LogMessage(' Get result PhotoURL:'+PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg'+'||');
LogMessage('Script end. After, PVdB will retreive from ListImage and info of person in order get the photo');
Result:=prListImage;
end else if (1=DownloadImage(ItemValue +'.jpg',ImageFile)) then begin //Donwload in the web base size. WEB_SPECIFIC
AddImageURL(itPoster,ImageFile); //Get the photo to the database.But I don't know why but it doesnt work: not retrive the photo like in movie poster
LogMessage('AddImageURL web based size has been called with ImageType: ' + IntToStr(itPoster) + ' and ImageFile: ' + ImageFile);
AddSearchResult(GetFieldValueXML('name'), '', '', ImageFile, ImageFile); //It's not possible avoid GetFieldValueXML because the name can't be the same.
if PHOTO_URL_IN_TRANSNAME then AddFieldValueXML('transname',PhotoURL+'.jpg'); //For storing the URL to the person photo, for send to KODI in a Template
//LogMessage(' Get result PhotoURL:'+PhotoURL+'.jpg'+'||');
LogMessage('Script end. After, PVdB will retreive from ListImage and info of person in order get the photo');
Result:=prListImage;
end;
End;
End Else Begin
PhotoURL:='';
End;
//*)
//(*
ItemList:='';
//~jobTitle~
//Begin of scrap the json container.
ItemList1:=TextBetWeenFirst(HTML,'<script type="application/ld+json">','</script>');
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList: '+'<script type="application/ld+json"'+ItemList+'}</script>'+'||');
ItemList1:=StringReplace(ItemList1,'}',',',True,True, False); //Replace the last } and then all the TheMovieDB jason fields finish with ',"' even the last. WEB_SPECIFIC.
//Get ~jobTitle~ jobTitle
ItemValue:=TextBetWeenFirst(ItemList1,'","jobTitle":["','"],'); //WEB_SPECIFIC.
ItemValue:=StringReplace(ItemValue,'","',', ',True,False,True);
if (2<Length(ItemValue)) then begin
If ItemValue <> '' then ItemList:=ItemList+'JobTitle: '+ItemValue+' ';
if ItemValue <> '' then AddFieldValueXML('careertype',ItemValue);
LogMessage(' Get result jobTitle:'+ItemValue+'||');
end;
//*)
//(*
//~Career~
If Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML)>0 Then Begin
curPos:=Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML);
If curPos>0 Then Begin
EndPos:=curPos;
//ItemValue:=TextBetween(HTML,'<div id="jumpto">','<div id="filmography">',True,curPos);
ItemValue1:=HTMLValues2(HTML,'<h3 class="ipc-title__text"><span id="credits">Credits</span>','<span class="ipc-chip__text">IMDbPro</span>','" class="ipc-chip ipc-chip--on-base-accent2" tabindex="0" aria-disabled="false"><span class="ipc-chip__text">','<span class="ipc-chip__count">',', ',EndPos);
If ItemValue1 <> '' then ItemList:=ItemList+#13+'Filmography - Career: '+ItemValue1+' ';
End;
End;
//*)
//(*
//Get ~Main Page URL~
--- End quote ---
And here too is part of the code change.
--- Quote --- //(*
//~Died~
curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item" data-testid="nm_pd_dl"><span class="ipc-metadata-list-item__label" aria-disabled="false">Died</span>',HTML);
If curPos>0 Then Begin
//*)
//(*
EndPos:=curPos;
ItemValue1:=HTMLValues(HTML,'<span class="ipc-metadata-list-item__label" aria-disabled="false">Died</span>','</li></ul></div></li>','<li role="presentation" class="ipc-inline-list__item test-class-react','</li></ul></div></li>',' ',EndPos);
ItemValue1:=StringReplace(ItemValue1,'<li role="presentation" class="ipc-inline-list__item">',' in ',True,False,True);
ItemValue1:=StringReplace(ItemValue1,'<span class="ipc-metadata-list-item__list-content-item--subText">',' ',True,False,True);
//LogMessage(' ** Parse Results Died10:'+ItemValue1+'||');
//ItemValue1:=RemoveTags(ItemValue1, False);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=RemoveTagsEx0(ItemValue1);
ItemValue1:=StringReplace(ItemValue1,'">','',True,False,True);
ItemValue1:=StringReplace(ItemValue1,' (undisclosed)','',True,False,True);
If ItemValue1 <> '' then ItemList:=ItemList+#13+'Died: '+ItemValue1 Else ItemList:=ItemList+#13;
LogMessage(' Parse Results Died10:'+ItemValue1+'||');
End;
//*)
//(*
//~AwardsSummary~
//If Pos('<section cel_widget_id="StaticFeature_Awards" class="ipc-page-section ipc-page-section--base celwidget" data-csa-c-id="1va8oc-j9jhr6-xyfws0-meswem" data-cel-widget="StaticFeature_Awards">',HTML)>0 Then Begin
curPos:=PosFrom('<div data-testid="awards" class="sc-710dd9d1-0 iiIXRd base"><div class="sc-710dd9d1-1 cmBtRN">',HTML,EndPos);
//If curPos>0 Then Begin
EndPos:=PosFrom('</ul></div></section>',HTML,curPos);
//AwardList:=Copy(HTML,curPos,endPos-curPos);
AwardList:=Trim(Copy(HTML,curPos,endPos-curPos));
//LogMessage(' * Parse Results AwardsSummary1x:'+AwardList+'||');
if (2<Length(AwardList)) then begin
AwardsValue:=TextBetWeenFirst(AwardList,'/awards/?ref_=nm_awd"','</span></li></ul></div>');
LogMessage(' * Parse Results AwardsSummary1a:'+AwardsValue+'||');
AwardsValue:=StringReplace(AwardsValue,'</a><div class="ipc-metadata-list-item__content-container"><ul class="ipc-inline-list ipc-inline-list--show-dividers ipc-inline-list--inline ipc-metadata-list-item__list-content base" role="presentation"><li role="presentation" class="ipc-inline-list__item"><span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">',' • ',False,True,True);
AwardsValue:=StringReplace(AwardsValue,'>','',True,False,True);
LogMessage(' * Parse Results AwardsSummary:'+AwardsValue+'||');
If AwardsValue <> '' then AwardsValue:=#13+'--------------------------------------------------------------------------'+#13+'<link url="http://www.imdb.com/name/'+PersonID+'/awards/">Awards link</link> •• '+AwardsValue+' •• ';
End;
//End;
//*)
//(*
//~Alternate Names~
curPos := Pos('<script>if(typeof uet === ''function''){ uet(''be'', ''StaticFeature_PersonalDetails'', {wb: 1}); }</script>', HTML);
LogMessage('curPos after finding Alternative Names curPos: ' + IntToStr(curPos));
If curPos > 0 Then Begin
EndPos := curPos;
LogMessage('EndPos set to curPos: ' + IntToStr(EndPos));
// Extract values between the specified tags
AltNames1 := HTMLValues(HTML, '<script>if(typeof uet === ''function''){ uet(''be'', ''StaticFeature_PersonalDetails'', {wb: 1}); }</script>', '"feature_contribution_header":"Contribute to this page"', '{"node":{"displayableProperty":{"value":{"plainText":"', '","__typename":"Markdown"},"__typename":"DisplayableNameAkaProperty"},"__typename":"NameAka"},"', ', ', EndPos);
LogMessage(' * Parsed Result Alternative Name: ' + AltNames1);
AltNames1:=StringReplace(AltNames1,'\u0026',#38,True,False,True);
If AltNames1 <> '' then AddFieldValueXML('AltNames', ItemValue1);
If AltNames1 <> '' then ItemList:=ItemList+#13+'Alternate Names: '+AltNames1+' ';
If AltNames1 <> '' then LogMessage(' Parsed Results All Expanded Alternative Names: ' + AltNames1 + '||');
End;
//*)
//(*
//~Height~
curPos:=Pos('<h3 class="ipc-title__text"><span id="personalDetails">Personal details</span>',HTML);
If curPos>0 Then Begin
EndPos:=curPos;
ItemValue0:=HTMLValues2(HTML,'<li role="presentation" class="ipc-metadata-list__item" data-testid="nm_pd_he"><span class="ipc-metadata-list-item__label" aria-disabled="false">Height</span>','</li></ul></div></li>','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</li></ul></div></li>','<br>',EndPos);
If ItemValue0 <> '' then ItemList:=ItemList+#13+'Height: '+ItemValue0+' ';
LogMessage(' Parse Results Height:'+ItemValue0+'||');
End;
//*)
//(*
//~Nicknames~
--- End quote ---
Ivek23:
And here too is part of the code change
--- Quote --- //(*
//~Nicknames~
curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item ipc-metadata-list__item--stacked" data-testid="name-dyk-nickname"><span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nickname</span>',HTML);
If curPos>0 Then Begin
EndPos:=curPos;
ItemValue0:=HTMLValues2(HTML,'<span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nickname</span>','</li></ul></div','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</span>','<br>',EndPos);
//LogMessage(' * Parse Results Nickname1:'+ItemValue0+'||');
//ItemValue0:=StringReplace(ItemValue0,' See more »','',True,False,True);
If ItemValue0 <> '' then ItemList:=ItemList+#13+'Nickname: '+ItemValue0+' ';
LogMessage(' Parse Results Nickname:'+ItemValue0+'||');
End;
//*)
//(*
//~Nicknames~
curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item ipc-metadata-list__item--stacked" data-testid="name-dyk-nickname"><span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nicknames</span>',HTML);
If curPos>0 Then Begin
EndPos:=curPos;
ItemValue1:=HTMLValues2(HTML,'<span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nicknames</span>','</li></ul></div','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</span>',', ',EndPos);
//LogMessage(' * Parse Results Nickname1:'+ItemValue0+'||');
//ItemValue0:=StringReplace(ItemValue0,' See more »','',True,False,True);
If ItemValue1 <> '' then ItemList:=ItemList+#13+'Nickname: '+ItemValue1+' ';
LogMessage(' Parse Results Nickname:'+ItemValue1+'||');
End;
//*)
ItemList:=ItemList+AwardsValue;
ItemList:=ItemList+#13+'--------------------------------------------------------------------------';
//(*
//Get ~Biography URL~
//http://www.imdb.com/name/nm0002031/bio?ref_=nm_ql_pdtls_1
EndPos:=Pos('">Biography</a></li>',HTML);
If endPos>0 Then Begin
ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/bio">Biography</link>';
If ItemValue0 <> '' then ItemList:=ItemList+#13+ItemValue0;
//LogMessage(' Parse Results Biography URL:'+ItemValue0+'||');
End;
//*)
//(*
//Get ~Awards URL~
//http://www.imdb.com/name/nm0002031/awards?ref_=nm_ql_op_1
//EndPos:=Pos('">Awards</a></li>',HTML);
curPos:=Pos('">Awards</a></li>',HTML);
//If endPos>0 Then Begin
If curPos>0 Then Begin
ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/awards">Awards</link>';
If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
LogMessage(' Parse Results Awards URL:'+ItemValue0+'||');
End;
//*)
//(*
//Get ~External Sites URL~
//http://www.imdb.com/name/nm0002031/externalsites?ref_=nm_ql_rel_3
EndPos:=Pos('">External sites</a>',HTML);
//If endPos>0 Then Begin
ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/externalsites">External Sites</link>';
//If ItemValue0 <> '' then
ItemList:=ItemList+#32#32+ItemValue0;
//LogMessage(' Parse Results External Sites URL:'+ItemValue0+'||');
//End;
//*)
//(*
//Get ~Genre index URL~ //http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role=nm0005455&ref_=nm_ql_flmg_4
//https://www.imdb.com/search/title/?explore=genres&role=nm0005455
EndPos:=Pos('">by Genre</a>',HTML);
//If endPos>0 Then Begin
ItemValue0:='<link url="http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role='+PersonID+'">Genres</link>';
//If ItemValue0 <> '' then
ItemList:=ItemList+#32#32+ItemValue0;
//If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
//LogMessage(' Parse Results Genre URL:'+ItemValue0+'||');
//End;
//*)
//(*
//Get ~Photo Gallery URL~
//http://www.imdb.com/name/nm0002031/mediaindex?ref_=nm_ql_pv_1
EndPos:=Pos('<h3 class="ipc-title__text">Photos<',HTML);
If endPos>0 Then Begin
ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/mediaindex/">Photo Gallery</link>';
If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
// LogMessage(' Parse Results Photo Gallery URL:'+ItemValue0+'||');
End;
//*)
//(*
//Get ~Filmography URL~
//http://m.imdb.com/name/nm0002031/filmotype
//https://m.imdb.com/name/nm0002031/?showAllCredits=true
//curPos:=Pos('<h3 class="ipc-title__text"><span>Credits</span></h3>',HTML);
curPos:=Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML);
If curPos>0 Then Begin
//ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/fullcredits">Filmography</link>';
ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/?showAllCredits=true">Filmography</link>';
If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
//LogMessage(' Parse Results Filmography URL:'+ItemValue0+'||');
End;
//*)
//(*
ItemList:=ItemList+#13+'--------------------------------------------------------------------------'+#13+SCRIPT_NAME+' on '+DateToStr(CurrentDateTime)+' at '+TimeToStr(CurrentDateTime);
If (Length(ItemList)>0) Then Begin
AddFieldValueXML('comment',ItemList);
//LogMessage(' Get result Filmography - Career:'+ItemList+'||');
End;
//*)
//Get ~dateadded~ Not used. Calculated in PVdB.
//Get ~orating~ Not documented in PVB Scripting Manual and in the script don't work even working in the skin.
//Get ~transname~ TranslateName. The PVdB ~transname~ Translated Name not stored in IMDB. Used for PhotoURL
//(*
LogMessage('Function ParsePage_IMDBPersonBASE END=====================||');
LogMessage('ParsePage_IMDBPersonBASE: Ending processing.');
End; //BlockClose
//*)
--- End quote ---
In the comment field it then looks like this.
--- Quote ---JobTitle: Producer, Writer, Actor
Filmography - Career: Additional Crew, Soundtrack, Director, Self, Thanks, Archive Footage
<link url="http://www.imdb.com/name/nm0005455">Main Page</link>
PID ID: 79324
People ID: nm0005455
Name: Aaron Spelling († 1923-2006)
Born: April 22, 1923 in Dallas, Texas, USA
Died: June 23, 2006 in Los Angeles, California, USA (complications following a stroke)
Alternate Names: Aaron & Candy
Shelly Colbert
Height: 1.65 m
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0005455/awards/">Awards link</link> •• Won 2 Primetime Emmys • 14 wins & 11 nominations total ••
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0005455/bio">Biography</link> <link url="http://www.imdb.com/name/nm0005455/awards">Awards</link> <link url="http://www.imdb.com/name/nm0005455/externalsites">External Sites</link> <link url="http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role=nm0005455">Genres</link> <link url="http://www.imdb.com/name/nm0005455/mediaindex/">Photo Gallery</link> <link url="http://www.imdb.com/name/nm0005455/?showAllCredits=true">Filmography</link>
--------------------------------------------------------------------------
IMDB_People_[EN][Selenium]-v3.2 on 2025-01-09 at 13:29:34
--- End quote ---
Ivek23:
Function ParsePage_IMDBPeopleBIO changes
And here is also part of the code change
--- Quote ---//(*
Function ParsePage_IMDBPeopleBIO(HTML:String):Cardinal; //BlockOpen
//Returns:
// Result:=prFinished; Script has finished gathering data
// Result:=prError; If żany big problem? with exit;
//Retrieve: ~bio~ Biography from "Mini Bio" IMDB section
Var
curPos,endPos,debug_pos1:Integer;
ItemValue:String;
PersonID,ItemValue0,ItemValue10,ItemValue1,ItemValue11:String;
ItemList,ItemList00,ItemList0,ItemList1,ItemList11,ItemList12:String;
FinalValue: String;
ItemList2,ItemList10,ItemList20,ItemValue3:String;
Begin
LogMessage('ParsePage_IMDBPeopleBIO: Starting processing.');
LogMessage('HTML length: ' + IntToStr(Length(HTML)));
LogMessage('Function ParsePage_IMDBPeopleBIO BEGIN=====================||');
Result:=prFinished; //It will change to prError if any big problem with exit;
LogMessage('Result set to prFinished'); //Log the initial result setting
//(*
//Get "Biography" info
curPos:=Pos('<h1 class="ipc-title__text">Biography</h1>',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
if (curPos=0) then Exit;
//*)
ItemList2:='';
ItemList11:='';
//*)
ItemList2:='';
ItemList11:='';
//(*
//Get PersonID
//LogMessage('Attempting to find PersonID');
PersonID := TextBetWeenFirst(HTML, '<link rel="canonical" href="https://', '/">'); //WEB_SPECIFIC
if (Length(PersonID) > 2) then begin
ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://' + PersonID + '/#overview">Biography Info</link>';
//ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://www.imdb.com/name/' + PersonID + '/bio/#overview">Biography Info</link>';
LogMessage('Get result PersonID: ' + PersonID + '||');
end else begin
LogMessage('Error: PersonID not found');
Result := prError; //Set the result to error if PersonID is not found
end;
//*)
//(*
//Get "Biography" info
LogMessage('Attempting to find Biography section');
curPos := Pos('<div data-testid="sub-section-mini_bio"', HTML); //Updated to reflect new layout
if (curPos = 0) then Begin
LogMessage('Error: Biography section not found');
Result := prError; //Set the result to error if the section is not found
Exit;
End;
endPos := Pos('</ul>', Copy(HTML, curPos, Length(HTML) - curPos + 1)) + curPos - 1;
if endPos = curPos - 1 then Begin
LogMessage('Error: End of Biography section not found');
Result := prError; //Set the result to error if the section is not found
Exit;
End;
ItemList0 := Copy(HTML, curPos, endPos - curPos + Length('</ul>')); //Include </ul> in the end position
LogMessage('Biography section found');
//Extract "Mini bio" Biography text
LogMessage('Extracting Mini Bio text:');
curPos := Pos('<div class="ipc-html-content-inner-div" role="presentation">', ItemList0); //Updated to reflect new layout
LogMessage('curPos for Mini Bio set to: ' + IntToStr(curPos));
if curPos > 0 then Begin
endPos := Pos('</ul>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1; //Update to match exact structure
LogMessage('endPos for Mini Bio set to: ' + IntToStr(endPos));
if endPos > curPos Then Begin
ItemValue := Trim(Copy(ItemList0, curPos, endPos - curPos + Length('</ul>')));
//Normalize whitespace but keep empty lines
ItemValue := StringReplace(ItemValue, #13#10, #10, True, True, False); //Normalize line endings
ItemValue := StringReplace(ItemValue, #13, #10, True, True, False);
ItemValue := StringReplace(ItemValue, #10#10, #13#10#13#10, True, True, False); //Preserve empty lines
ItemValue := StringReplace(ItemValue, #10, ' ', True, True, False);
ItemValue := StringReplace(ItemValue, #13#10#13#10, #10#10, True, True, False); //Revert empty line placeholders
While Pos(' ', ItemValue) > 0 Do
ItemValue := StringReplace(ItemValue, ' ', ' ', True, True, False);
//Transform links
ItemValue := StringReplace(ItemValue, '<a class="ipc-md-link ipc-md-link--entity" href="', '<link url="http://www.imdb.com', True, True, False);
ItemValue := StringReplace(ItemValue, '/?ref_=nmbio_mbio">', '/">', True, True, False);
ItemValue := StringReplace(ItemValue, '</a>', '</link>', True, True, False);
//Remove unwanted tags
ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
ItemValue := StringReplace(ItemValue, '</div>', '', True, True, False);
ItemValue := StringReplace(ItemValue, '</ul>', '', True, True, False);
If Not(BIO_URL_IN_BIO) then ItemValue:=RemoveTagsEx00(ItemValue);
If Not(BIO_URL_IN_BIO) then ItemValue:=StringReplace(ItemValue,'</link>','',True,True,False);
If ItemValue <> '' then ItemList := ItemValue;
//LogMessage(' Get result bio (from Mini bio)002:'+ItemList+'||');
If ItemList <> '' then ItemList11:=ItemList11+ItemList;
End Else LogMessage('Error: End position not found for Mini Bio');
End Else LogMessage('Error: Start position not found for Mini Bio');
//(*
//Extract the final "IMDb Mini Biography By: ..." value and clean tags
If Pos('- IMDb Mini Biography By:', ItemList0) > 0 Then Begin
curPos := Pos('- IMDb Mini Biography By:', ItemList0);
endPos := Pos('</div>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1;
FinalValue := Copy(ItemList0, curPos, endPos - curPos + Length('</div>'));
//Clean surrounding tags without using RemoveTags
FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
FinalValue := StringReplace(FinalValue, '</div>', '', True, True, False);
FinalValue := StringReplace(FinalValue, '</ul>', '', True, True, False);
//Append the final value to ItemList only if it's not already present
If Pos(FinalValue, ItemList) = 0 Then Begin
If Length(ItemList) > 0 Then
ItemList := ItemList + ' ' + FinalValue
Else
ItemList := FinalValue;
End;
LogMessage(' * Get result bio (from Mini bio)002:'+ItemList+'||');
If Not(IMDB_MINI_IN_BIO) then
curPos:=Pos('- IMDb Mini',ItemList);
if curPos >0 then ItemList := Copy(ItemList,0,curPos-1);
LogMessage(' Get result bio (from Mini bio) a:'+ItemList+'||');
LogMessage(' Get result bio (from Mini bio):'+ItemList+'||');
If ItemList <> '' then ItemList11:=ItemList11+ItemList;
End;
//*)
//AddFieldValueXML('bio', ItemList);
//LogMessage('Added ItemList to XML: ' + ItemList);
If (ItemList11 <> '') AND (ItemList2 <> '') Then
//ItemList12:=ItemList11;
ItemList12:=ItemList11+#13+ItemList2;
//Get "Birth name" Biography text
ItemList00:='';
//ItemList10:=TextBetWeenFirst(HTML,'" data-testid="title"><hgroup><h1 class="ipc-title__text"','<h3 class="ipc-title__text"><span>Contribute to this page</span></h3>');
curPos := PosFrom('<h3 class="ipc-title__text"><span id="overview">Overview', HTML,curPos);
EndPos:=PosFrom('</div></section>',HTML,curPos);
ItemList00:=Copy(HTML,curPos,endPos-curPos);
//LogMessage(' ** Parse Biography '+#13+ItemList00+' **');
//(*
If (Length(ItemList00)>0) Then Begin
ItemValue10:=TextBetWeenFirst(ItemList00,'<li role="presentation" class="ipc-metadata-list__item" id="name" data-testid="list-item"><span class="ipc-metadata-list-item__label" aria-disabled="false">Birth name</span>','</div></div></div></li>');
//if BIRTH_NAME_IN_TRANSNAME then
//if ItemValue10 <> '' then
//AddFieldValueXML('transname',ItemValue10);
If ItemValue10 <> '' then LogMessage(' Get result from Birth Name02:'+ItemValue10+'||');
If ItemValue10 <> '' then ItemValue10:='BirthName: '+ItemValue10;
If ItemValue10 <> '' then ItemList12:=ItemList12+#13+'--------------------------------------------------------------------------'+#13+ItemValue10;
End;
//*)
If BIO_INFO_IN_BIO then AddFieldValueXML('bio',ItemList12);
If Not(BIO_INFO_IN_BIO) Then AddFieldValueXML('bio',ItemList11);
Result := prFinished;
LogMessage('Function ParsePage_IMDBPeopleBIO END=====================||');
LogMessage('ParsePage_IMDBPeopleBIO: Ending processing.');
End; //BlockClose
//*)
--- End quote ---
Ivek23:
If these changes are applied to the Function ParsePage_IMDBPeopleBIO code, then the bio field will look like the one described below.
If these settings are in use
--- Quote --- GET_FULL_BIO = True ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_BIO = False ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//BIO_INFO_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
BIO_INFO_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
//BIO_URL_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
BIO_URL_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
//IMDB_MINI_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
IMDB_MINI_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
--- End quote ---
then in the bio field it then looks like this.
--- Quote ---Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for Polna hiša (1987), Fuller House (2016) and Days of Our Lives (1965). She was previously married to Jeremy Rytky.
--- End quote ---
If these settings are in use
--- Quote --- GET_FULL_BIO = True ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_BIO = False ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
BIO_INFO_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
//BIO_INFO_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
//BIO_URL_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
BIO_URL_IN_BIO = False ; //Use the PVD field ~bio~ for not storing
//IMDB_MINI_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
IMDB_MINI_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
--- End quote ---
then in the bio field it then looks like this.
--- Quote ---Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for Polna hiša (1987), Fuller House (2016) and Days of Our Lives (1965). She was previously married to Jeremy Rytky.
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0053347/bio/#overview">Biography Info</link>
--------------------------------------------------------------------------
BirthName: Andrea Laura Barber
--- End quote ---
If these settings are in use
--- Quote --- GET_FULL_BIO = True ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
//GET_FULL_BIO = False ; //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.
BIO_INFO_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
//BIO_INFO_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
BIO_URL_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
//BIO_URL_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
//IMDB_MINI_IN_BIO = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
IMDB_MINI_IN_BIO = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
--- End quote ---
then in the bio field it then looks like this.
--- Quote ---Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for <link url="http://www.imdb.com/title/tt0092359/">Polna hiša (1987)</link>, <link url="http://www.imdb.com/title/tt3986586/">Fuller House (2016)</link> and <link url="http://www.imdb.com/title/tt0058796/">Days of Our Lives (1965)</link>. She was previously married to Jeremy Rytky.
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0053347/bio/#overview">Biography Info</link>
--------------------------------------------------------------------------
BirthName: Andrea Laura Barber
--- End quote ---
Navigation
[0] Message Index
[#] Next page
Go to full version