Author Topic: Project Allocine script  (Read 11084 times)

0 Members and 1 Guest are viewing this topic.

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Project Allocine script
« on: February 08, 2015, 06:58:17 pm »
Hello,
I would like to make a script for allocine.fr.
I thought make two script because info are in different page.
The first script give the link of the movie and when we have it, use this link in a second scipt to take info from actors or image.
I started to make the first script (attached).

In Allocine.fr the page of result to a search of movie give several results, but it seems i can't obtain more than one result in my prList.
In The Code of the page we can see all the movies but it seems they are hide for the script exept the first.
I know nothing about HTML programming. If someone can explain That!

Quote
</span> <!-- /fs11 -->
</div></div></td></tr>

Probably it's after this (in red in the code page).

I stop the code because in the page of casting it's the same principle and if we can't resolve this..........

N.B: I tried different boucles.... Erase in the script because don't work.
I tried to Pos('Second movie') give 0.



This script has now been released and is available via the program's auto-update system. Run Help > Check for updates and choose Allocine.fr from the list.
« Last Edit: March 02, 2015, 07:18:45 pm by Ivek23 »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #1 on: February 08, 2015, 09:32:03 pm »
Try this code
Code: [Select]
procedure ParseSearchResults(HTML: string);
Var
CurPos, EndPos, EndSearch, nbResult, PosEssai1, PosEssai2 : Integer;
Year, MovieURL, ThumbURL, Title : String;

Begin
NbResult := 0;
CurPos := Pos('Films</h2>',HTML);
If CurPos < 1 then
Exit;

//Addresse Thumbnail :
CurPos := PosFrom('src=',HTML,CurPos)+5;
EndPos := PosFrom('.jpg',HTML,CurPos)+4;
ThumbURL := Trim(Copy(HTML,CurPos,EndPos-CurPos));
//Addresse page Film :
//CurPos := PosFrom('<a href=',HTML,EndPos);
CurPos := PosFrom('<a href=',HTML,CurPos);
//while curPos > 0 do begin
while (curPos > 0) AND (curPos < PosFrom('<button class="buttonform" type="submit">Rechercher</button>', HTML, endPos)) do begin
EndPos := PosFrom('.html',HTML,CurPos);
MovieURL := BASE_URL + Trim(Copy(HTML,CurPos+9,EndPos-CurPos-4));
//Nom Film :
CurPos := PosFrom('>',HTML,EndPos)+1;
EndPos := PosFrom('<br />',HTML,CurPos);
Title := Copy(HTML,CurPos,EndPos-CurPos);
Title := StringReplace(Title,'<b>','',True,True,True);
Title := StringReplace(Title,'</b>','',True,True,True);
Title := StringReplace(Title,'</a>',' (fr) / ',True,True,True);
LogMessage('Titre Recherche: ' + Title);
//Année :
CurPos := PosFrom('<span class="fs11">',HTML,EndPos);
if Curpos > 0 then begin
//Curpos := Curpos + 19;
//EndPos := PosFrom('<br />',HTML,CurPos);
//Year := Copy(HTML,CurPos,EndPos-CurPos);
Year := TextBetween(HTML, '<span class="fs11">', '<br />', True, CurPos);
LogMessage('Annee: ' + Year);
end else begin
Year := '';
curPos := endPos;
end;
//Ajout du film dans la liste des résultats :
AddSearchResult(Title,'', Year, MovieURL, ThumbURL);

//Addresse Thumbnail :
CurPos := PosFrom('src=',HTML,CurPos)+5;
EndPos := PosFrom('.jpg',HTML,CurPos)+4;
ThumbURL := Trim(Copy(HTML,CurPos,EndPos-CurPos));

CurPos := PosFrom('<a href=',HTML,CurPos);
end;

End;
it should work, at least for me it worked.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #2 on: February 08, 2015, 10:37:44 pm »
No, i'm just tryingit but i have just one movie in my windows result prlistImage while in the site there are several movies.
I try with word "anneaux".

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #3 on: February 09, 2015, 07:46:20 am »
No, i'm just tryingit but i have just one movie in my windows result prlistImage while in the site there are several movies.
I try with word "anneaux".

With word "anneaux" for me work is OK. See the attached image.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #4 on: February 09, 2015, 09:32:54 am »
Thanks, yes it works.
After several trying i've forgotten erase manually original title, sorry.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #5 on: February 09, 2015, 09:57:42 am »
Thanks, yes it works.
After several trying i've forgotten erase manually original title, sorry.

OK, no problem, nice to hear that it has been successfully solved the problem.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #6 on: February 09, 2015, 02:53:34 pm »
I've looked your modifications.
I don't understand why at the end of the boucle while, just put CurPos := PosFrom('<a href=',HTML,CurPos); doesn't work. We must return to Thumbnail address.

Sorry, i would like understand to not repeat the same error the next times.

And do you know some links for help to string function like StringReplace, TextBetween, RemoveTags, especially for boolean options. I put always true but some like wholeword, dolinebreaks, i'm not sure.

I'm not sure too to the language use in script. Pascal Delphi? Some function for pascal doesn't work in script.

I think it's important for a beginner to know where find help to make a script especially in the language use. Not just copy existing scripts and trying to addapt them.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #7 on: February 09, 2015, 03:12:30 pm »
And do you know some links for help to string function like StringReplace, TextBetween, RemoveTags, especially for boolean options. I put always true but some like wholeword, dolinebreaks, i'm not sure.

Unfortunately, I know only this Personal Video Database Scripting Manual connection, if anything helps.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #8 on: February 09, 2015, 05:37:32 pm »
Thanks for the link but i've already readed it, it's more for the language i need help.

I add :
CustomField Nationalite (as multiple choice), Note Presse (as short text), Note Spectateurs (as short text).

I put too Note Press in standard field Note in PVD.

But ..... Problems with french characters : é è à etc....
ex : é replace by é

I tried to use function stringreplace but nothing change!
« Last Edit: February 09, 2015, 05:42:55 pm by pra15 »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #9 on: February 09, 2015, 07:26:47 pm »
But ..... Problems with french characters : é è à etc....
ex : é replace by é

I tried to use function stringreplace but nothing change!

Try this UTF-8 encoding table and Unicode characters connection, if anything helps.

It could also, this dmm.co.jp script that shows how to solve a similar problem.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #10 on: February 09, 2015, 07:47:54 pm »
Ok thanks,

I change CODE_PAGE in 65001 Unicode utf 8, it's good for the moment.

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #11 on: February 10, 2015, 12:38:17 am »
Change some errors exeptions in parsesearchresults

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #12 on: February 11, 2015, 08:13:04 am »
Change some errors exeptions in parsesearchresults

Thanks to the new version, but now I procedure ParseSearchResults almost 99% of search this code
Code: [Select]
procedure ParseSearchResults(HTML: string);
Var
CurPos, EndPos, EndSearch, nbResult, PosEssai1, PosEssai2 : Integer;
Year, MovieURL, ThumbURL, Title : String;

Begin
NbResult := 0;
CurPos := Pos('Films</h2>',HTML);
If CurPos < 1 then
Exit;
CurPos := PosFrom('<div class="vmargin10t">',HTML,CurPos);
//Addresse Thumbnail :
CurPos := PosFrom('src=',HTML,CurPos)+5;
EndPos := PosFrom('alt=',HTML,CurPos)-2;
ThumbURL := Trim(Copy(HTML,CurPos,EndPos-CurPos));
//Addresse page Film :
CurPos := PosFrom('<a href=',HTML,CurPos)+9;
while (curPos > 0) AND (curPos < PosFrom('>1</em></li><li', HTML, endPos)) do begin
//while (curPos > 0) AND (curPos < PosFrom('<button class="buttonform" type="submit">Rechercher</button>', HTML, endPos)) do begin
EndPos := PosFrom('.html',HTML,CurPos)+5;
MovieURL := BASE_URL + Trim(Copy(HTML,CurPos,EndPos-CurPos));
Logmessage('MovieURL : ' + MovieURL);
//Nom Film :
CurPos := PosFrom('>',HTML,EndPos)+1;
EndPos := PosFrom('<br />',HTML,CurPos);
Title := Copy(HTML,CurPos,EndPos-CurPos);
Title := StringReplace(Title,'<b>','',True,True,True);
Title := StringReplace(Title,'</b>','',True,True,True);
Title := StringReplace(Title,'</a>',' / ',True,True,True);
LogMessage('Titre Recherche: ' + Title);
//Année :
CurPos := PosFrom('<span class="fs11">',HTML,EndPos);
if Curpos > 0 then begin
Year := TextBetween(HTML, '<span class="fs11">', '<br />', True, CurPos);
LogMessage('Annee: ' + Year);
end else begin
Year := '';
curPos := endPos;
end;
//Ajout du film dans la liste des résultats :
AddSearchResult(Title,'', Year, MovieURL, ThumbURL);

//Addresse Thumbnail :
CurPos := PosFrom('src=',HTML,CurPos)+5;
EndPos := PosFrom('alt=',HTML,CurPos)-2;
ThumbURL := Trim(Copy(HTML,CurPos,EndPos-CurPos));
LogMessage('ThumbURL :' + ThumbURL);

CurPos := PosFrom('<a href=',HTML,CurPos)+9;
end;

End;
can not find the results, while me with this code
Code: [Select]
procedure ParseSearchResults(HTML: string);
Var
CurPos, EndPos, EndSearch, nbResult, PosEssai1, PosEssai2 : Integer;
Year, MovieURL, ThumbURL, Title : String;

Begin
NbResult := 0;
CurPos := Pos('Films</h2>',HTML);
If CurPos < 1 then
Exit;
CurPos := PosFrom('<div class="vmargin10t">',HTML,CurPos);
//Addresse Thumbnail :
CurPos := PosFrom('src='+#39,HTML,CurPos);
EndPos := PosFrom('alt='+#39,HTML,CurPos);
ThumbURL := Trim(Copy(HTML,CurPos+5,EndPos-CurPos-7));
//Addresse page Film :
CurPos := PosFrom('<a href=',HTML,CurPos)+9;
//while (curPos > 0) AND (curPos < PosFrom('>1</em></li><li', HTML, endPos)) do begin
while (curPos > 0) AND (curPos < PosFrom('<button class="buttonform" type="submit">Rechercher</button>', HTML, endPos)) do begin
EndPos := PosFrom('.html',HTML,CurPos)+5;
MovieURL := BASE_URL + Trim(Copy(HTML,CurPos,EndPos-CurPos));
Logmessage('MovieURL : ' + MovieURL);
//Nom Film :
CurPos := PosFrom('>',HTML,EndPos)+1;
EndPos := PosFrom('<br />',HTML,CurPos);
Title := Copy(HTML,CurPos,EndPos-CurPos);
Title := StringReplace(Title,'<b>','',True,True,True);
Title := StringReplace(Title,'</b>','',True,True,True);
Title := StringReplace(Title,'</a>',' / ',True,True,True);
LogMessage('Titre Recherche: ' + Title);
//Année :
CurPos := PosFrom('<span class="fs11">',HTML,EndPos);
if Curpos > 0 then begin
Year := TextBetween(HTML, '<span class="fs11">', '<br />', True, CurPos);
LogMessage('Annee: ' + Year);
end else begin
Year := '';
curPos := endPos;
end;
//Ajout du film dans la liste des résultats :
AddSearchResult(Title,'', Year, MovieURL, ThumbURL);

//Addresse Thumbnail :
CurPos := PosFrom('src='+#39,HTML,CurPos);
EndPos := PosFrom('alt='+#39,HTML,CurPos);
ThumbURL := Trim(Copy(HTML,CurPos+5,EndPos-CurPos-7));
LogMessage('ThumbURL :' + ThumbURL);

CurPos := PosFrom('<a href=',HTML,CurPos)+9;
end;

End;
in 99% it finds search results for movie titles.

I think it will be necessary to keep this below ParseSearchResults procedure, because this really works, at least for me it works on about 20 movie titles that I've tested.

Even in Nationalité Allociné, Note Presse and Note Spectateurs code is a lot of false data transfers, add some movie titles with errors in data transmission:

Note Presse

American Pie Presents Band Camp
American Pie Presents: Beta House
American Pie Presents: The Book of Love
American Pie Presents: The Naked Mile

Apaches Rifles
http://www.allocine.fr/film/fichefilm_gen_cfilm=44222.html


Note Presse and Note Spectateurs

Canciones de amor en Lolita's club
http://www.allocine.fr/film/fichefilm_gen_cfilm=134873.html

Casualties of Love: The Long Island Lolita Story
http://www.allocine.fr/film/fichefilm_gen_cfilm=211875.html

Children of Invention
http://www.allocine.fr/film/fichefilm_gen_cfilm=142444.html

Lota
http://www.allocine.fr/film/fichefilm_gen_cfilm=230014.html


Nationalité Allociné

The Shooting of Dan McGoo
http://www.allocine.fr/film/fichefilm_gen_cfilm=170340.html

The Shooting of Thomas Hurndall
http://www.allocine.fr/film/fichefilm_gen_cfilm=178138.html

Who - The Kids Are Alright
http://www.allocine.fr/film/fichefilm_gen_cfilm=169939.html
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #13 on: February 11, 2015, 09:03:33 am »
Thanks, i'll see that, but for the moment i'm not sure of the structure of my script. Just one script, not two. I look up the allmovie script for ideas. It's not simple with several page and especially for image (several images in several pages).

Once the structure will be good, i'll can concentrate for search data.

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #14 on: February 12, 2015, 11:07:41 pm »
Sorry, i'm totally blocked;
i've tried to make an other script (only one) based on allmovie.

My problem is with the photos;
There is a principal page with list of all image, then we must (for each image) go to an other page to got the link of the image.
In My script i go to the first page and put the links of seconds pages in a array.

But during execution of the script i can't get to the second page, the mode change to ModePrephoto (Principal page where is the list of the image) to mode finished.

I try to put condition in getdownloadURL, or in NextMode but it seems doesn't work.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 1899
    • View Profile
Re: Project Allocine script
« Reply #15 on: February 13, 2015, 07:40:48 am »
Sorry, i'm totally blocked;
i've tried to make an other script (only one) based on allmovie.

My problem is with the photos;
There is a principal page with list of all image, then we must (for each image) go to an other page to got the link of the image.
In My script i go to the first page and put the links of seconds pages in a array.

But during execution of the script i can't get to the second page, the mode change to ModePrephoto (Principal page where is the list of the image) to mode finished.

I try to put condition in getdownloadURL, or in NextMode but it seems doesn't work.

Unfortunately, here I am powerless, I can not help it.

My opinion in this matter is the following, movie poster yes to transfer, the rest well, not to transfer.
Ivek23
Win 7 32bit, 64bit   PVD v0.9.9.21


Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #16 on: February 13, 2015, 05:31:12 pm »
I found a solution for image.
I put links of image page in extralinks and put links of image in array and at the end i just use addimageurl for all data of the array.
Again a lot of error, in count ... dimension of array but it's possible.

I use this page as exemple http://www.allocine.fr/film/fichefilm_gen_cfilm=29007.html

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #17 on: February 14, 2015, 01:07:07 am »
It's better like this, less some errors.

Just one thing, when you choose number of photo to download (nMaxPhotos) add +1 to the number choosed (if you want 3 write 4).
The affiche was download as poster and photo in screenshots.

To be tested for others errors.

Thanks for parsesearchresults of Ivek23 it seems to work good.

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #18 on: February 15, 2015, 02:55:09 am »
Add data from first page, Title,Dates, Duree, Genre, Nationalite, Notes.
Correction of errors of pages shown by Ivek23 (up message).

Now i'll make data from casting page.

Offline pra15

  • Power User
  • ****
  • Posts: 164
    • View Profile
Re: Project Allocine script
« Reply #19 on: February 16, 2015, 11:24:19 am »
Add info from page casting :
Actors,Directors,Writers,Producers,composers,studio.

+ all (technique.....) in a customfield 'General Info' (memo).