Author Topic: File Scanner and Regular Expressions  (Read 44974 times)

0 Members and 1 Guest are viewing this topic.

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
File Scanner and Regular Expressions
« on: January 22, 2009, 03:42:08 am »
Changelog:
-Feature: RegExp support and additional functionality for File Scanner and Organize files by episodes function

Is there a particular external RegEx reference we should use as a guide in creating our own expressions? I know very little about RegEx (other than it makes my brain hurt), but I assume there are different "flavours." Are there any special terms that are unique to PVD?

If the scanner is unable to match a file to an existing record, the title can be edited, but the only available action is to "Add new movie." It would be nice if it re-checked the database for a match if the user edits the title (and perhaps a "refresh" requested) so the "Update file path" action would be available.

When adding new episode files to existing series, the existing title and year are removed. Title is changed to "#S.E" and Year to "-1." This happens whether or not "Automatically import... Movie information" is selected (i.e., the problem can't be worked-around by selecting that).

What is meant by "Organize files by episodes function"?
« Last Edit: July 21, 2011, 10:27:06 pm by rick.ca »

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #1 on: January 23, 2009, 01:24:59 am »
Quote
Is there a particular external RegEx reference we should use as a guide in creating our own expressions? I know very little about RegEx (other than it makes my brain hurt), but I assume there are different "flavours." Are there any special terms that are unique to PVD?

This is the best references on the net (I think):
http://www.regular-expressions.info/tutorial.html

Quote
If the scanner is unable to match a file to an existing record, the title can be edited, but the only available action is to "Add new movie." It would be nice if it re-checked the database for a match if the user edits the title (and perhaps a "refresh" requested) so the "Update file path" action would be available.

I'll think about it

Quote
When adding new episode files to existing series, the existing title and year are removed. Title is changed to "#S.E" and Year to "-1." This happens whether or not "Automatically import... Movie information" is selected (i.e., the problem can't be worked-around by selecting that).

Try to make the plugin overwrite titles.

Quote
What is meant by "Organize files by episodes function"?

If you have all files of a series assigned to a main series record, you can right click the record and select "Organize files by episodes" to assign each file to it's corresponding episode.
Gentlemen, you can’t fight in here! This is the War Room!

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #2 on: January 23, 2009, 04:51:19 am »
Quote
Try to make the plugin overwrite titles.

The plugins I selected with "Automatically import... Movie information" were configured to overwrite titles and year. When I run them directly (after the scanner update has removed the title and date), they work correctly. It appears that when they are used with the scanner, they are run first (although the information displayed is not updated), and then the title and year are replaced with "#S.E" and "-1" at the same time the file path is added.

In any case, if nothing is selected under "Automatically import... Movie information," nothing should happen except for the addition of the file path.

Quote
If you have all files of a series assigned to a main series record, you can right click the record and select "Organize files by episodes" to assign each file to it's corresponding episode.

Sorry, you have explained this before. You've finally filled-up my brain's ability to remember features. You'll have to get rid of some. So why do we need this feature, if the scanner is going to organize files correctly in the first place? ;D

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #3 on: January 23, 2009, 05:31:22 am »
This screenshot illustrates something I wasn't really expecting from the scanner. The subject file is a test for a movie that doesn't yet exist in the database. What's happened is it's been matched to S14 E06 of ER—the title of which is "Test." I right-clicked on Action and changed it to Add new movie, but it added the file to the existing episode anyway.

Perhaps the logic of the matching routine can made to avoid matching movies to series (and series to movies, if that is even possible). Also, it should be possible to change the action using the context menu (and it should do what you tell it to do!).

[attachment deleted by admin]

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #4 on: January 24, 2009, 01:28:23 am »
Quote
The plugins I selected with "Automatically import... Movie information" were configured to overwrite titles and year. When I run them directly (after the scanner update has removed the title and date), they work correctly. It appears that when they are used with the scanner, they are run first (although the information displayed is not updated), and then the title and year are replaced with "#S.E" and "-1" at the same time the file path is added.

In any case, if nothing is selected under "Automatically import... Movie information," nothing should happen except for the addition of the file path.

OK, I'll check and fix this

Quote
I right-clicked on Action and changed it to Add new movie, but it added the file to the existing episode anyway.

Oh, I have forgotten to implement this functionality for episodes, sorry.

Quote
Perhaps the logic of the matching routine can made to avoid matching movies to series (and series to movies, if that is even possible).

It sounds like a good idea.
Gentlemen, you can’t fight in here! This is the War Room!

Offline patch

  • Power User
  • ****
  • Posts: 250
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #5 on: February 21, 2009, 07:55:35 am »
Some of my TV series are named without a letter designating series & episode eg for Avatar http://www.imdb.com/title/tt0417299/ Series 2 episode 1 is named
Avatar.The.Last.Airbender.-.201_The_Avatar_State_[Moonsong].avi

The scanner assigns it to series 1 episode 201

Has any one found a way to support this form of series numbering or should I rename my files?

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #6 on: February 21, 2009, 11:10:08 am »
(?i)^.*\\(?P<title>.*)(s|\b)(?P<season>[0-9]{1})(?P<episode>[0-9]{2}) works.

This regex stuff is going to make my head explode. I found this by trial and error, so I still don't have a clue what most of this means. I started with one of the default expressions, in which I guessed "[0-9]{1,3}" meant match "1 to 3 digits." So I changed that to "match 1 digit for season and then 2 digits for episode."

Now I'm going to lose sleep dreaming about all the ways this won't work. Series with more than 9 seasons, seasons with more than 99 episodes, another 3-digit number later in the filename, the "201" not delimited the way it is...

Who's going to be our regex guru? :-\

Offline patch

  • Power User
  • ****
  • Posts: 250
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #7 on: February 21, 2009, 03:09:20 pm »
(?i)^.*\\(?P<title>.*)(s|\b)(?P<season>[0-9]{1})(?P<episode>[0-9]{2}) works.

Thanks rick.ca this worked well for me.

Now I'm going to lose sleep dreaming about all the ways this won't work. Series with more than 9 seasons, seasons with more than 99 episodes, another 3-digit number later in the filename, the "201" not delimited the way it is...
Yep like how should we handle "Beverly Hills, 90210" http://www.imdb.com/title/tt0098749/ currently my file name is
Beverly.Hills.90210.S01E01.Class.Of.Beverly.Hills.avi

Looks like an early rule finding S<season>[0-9]{1-2}E<episode>[0-9]{2} which takes everything prior to this as the series title would work. "S" & "E" need to be upper or lower case.

Looks like I'm going to need to learn to read regex but it looks like hieroglyphics to me.

Would also like to use more of the path information if possible as I generally file my TV series as
Series title/s1/ etc

But perhaps I'm just going to have to make sure  my file names correspond to some standard
« Last Edit: February 21, 2009, 03:14:27 pm by patch »

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #8 on: February 21, 2009, 04:02:02 pm »
Have you took a look at the link I provided above: http://www.regular-expressions.info/tutorial.html
There is everything you need to know about regex, good explained...
Gentlemen, you can’t fight in here! This is the War Room!

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #9 on: February 21, 2009, 09:51:36 pm »
Quote
There is everything you need to know about regex

Yes, perhaps it will serve well as our regex guru. :)  It made my brain hurt the first time I looked, but now I see I can use it to figure things out fairly quickly. For example...

Quote
"S" & "E" need to be upper or lower case.

(?i) turns on case insensitivity mode.

Quote
Would also like to use more of the path information

That can be done by matching the "\" (i.e., that which denotes a directory). But the backslash is the escape character used to designate special characters—so it's matched with "\\". Note that is used multiple times in the default expressions for handling movies. The ^.*\\ in my expression means "match any number of any characters from the start of the string to '\'". So my guess is the required expression might be something like this:

(?i)^.*\\(?P<title>.*)\\(s|\b)(?P<season>[0-9]{1,2})\\(?P<episode>[0-9]{1,2}).?-.?(?P<eptitle>\w*\b)

Quote
?P<title>, ?P<season>

Recognizing the PVD field names, I understood what these were doing—but not how. Now I understand that which is inside "()" is a "group" which is "captured." ?P<name> names a group, which would otherwise just be numbered from left to right. This obviously determines how data is passed back to the program.

Nostra, are any fields other than title, season, episode and eptitle supported. For example, can I make an expressions to capture things like year, director, rating from filenames?

[Edit] Oops, I see year in one of them. Maybe more to the point: Can any fields be used, or just the ones you've provided for?
« Last Edit: February 22, 2009, 12:45:48 am by rick.ca »

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #10 on: February 22, 2009, 02:05:17 am »
Nostra, are any fields other than title, season, episode and eptitle supported. For example, can I make an expressions to capture things like year, director, rating from filenames?

You can only get title, original title, episode title, season number, episode number, and year.
(There is a bug in the current version that produces wrong results when getting original title without title)

P.S. Your observations above are perfectly right ;)
Gentlemen, you can’t fight in here! This is the War Room!

Offline patch

  • Power User
  • ****
  • Posts: 250
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #11 on: February 28, 2009, 05:02:57 pm »
Have you took a look at the link I provided above: http://www.regular-expressions.info/tutorial.html
Thanks for the reference, I also looked at http://en.wikipedia.org/wiki/Regex for a more concise but cryptic explanation.

Decided my two main series search patterns should be

(?i)^.*\\(?P<title>[^\\]*?)\bs(?P<season>[0-9]{1,2})[ .x]{0,2}e(?P<episode>[0-9]{1,2})\b[ .-]*(?P<epititle>[^\\]*)$

(?i)^.*\\(?P<title>[^\\]*?)[ .-]*\b(?P<season>[0-9]{1,2})x(?P<episode>[0-9]{1,2})\b[ .-]*(?P<epititle>[^\\]*)$

They basically look for a file name containing text of the form "s11e11" or "11x11" bounded by word boundaries. Where the s11 & e11 can be separated by " ", "-" or "x"

I usually store my TV series in the format
Title\s1\episode file
So I precede the above by directory structure searches i.e. Look for “Title\s1\” and use this data in preference. File names are searched from more to less stringent formats i.e. Look for “Title\s1\” with file name formats
    text s0e00 episode title
    text 00x00 episode title
    Text 0 optional not digit 00 episode title (digit but not word boundary required)
    text 00 episode title

My regex for which are

(?i)^.*\\(?P<title>[^\\]*)\\s(?P<season>[0-9]{1,2})\\[^\\]*?\bs([0-9]{1,2})[ .x]{0,2}e(?P<episode>[0-9]{1,2})\b[ .-]*(?P<epititle>[^\\]*)$

(?i)^.*\\(?P<title>[^\\]*)\\s(?P<season>[0-9]{1,2})\\[^\\]*?\b[0-9]{1,2}x(?P<episode>[0-9]{1,2})\b[ .-]*(?P<epititle>[^\\]*)$

(?i)^.*\\(?P<title>[^\\]*)\\s(?P<season>[0-9]{1,2})\\[^\\]*?(?<![0-9])[0-9]{1,2}[^\\0-9]*(?P<episode>[0-9]{2})(?![0-9])[ .-]*(?P<epititle>[^\\]*)$

(?i)^.*\\(?P<title>[^\\]*)\\s(?P<season>[0-9]{1,2})\\[^\\]*?(?<![s0-9])(?P<episode>[0-9]{2})(?![0-9])[ .-]*(?P<epititle>[^\\]*)$

Hope this helps some others despite the fact it is not yet working 100% as expected

Edit
regex updated
« Last Edit: March 01, 2009, 02:10:26 pm by patch »

Offline patch

  • Power User
  • ****
  • Posts: 250
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #12 on: March 01, 2009, 02:10:54 pm »
Residual problems

For file / path "G:\Video - TV Series\Tripping the Rift\s2\Tripping.The.Rift.S02E02.You.Wanna.Put.That.Where.DVDRip.avi"
PVD using the Regex by path or file name -> series name "ping the Rift" or "ping The Rift" respectively but I suspect that is a problem with the replace function

epititle doesn't seam to do anything


By the way
The best way I have found to get these regex working is to make a text file with examples of all the common and hard names you are trying to recognise.
Open this file with Edit Pad Pro
Copy each regex into the search window (it colours them for you to show the meaning of each character in the regex, helping find syntax errors)
Search your test file with the expression to make sure it does what you expect.
« Last Edit: March 01, 2009, 02:32:57 pm by patch »

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #13 on: March 02, 2009, 12:25:01 am »
Quote
I suspect that is a problem with the replace function

Wouldn't that be the (?i).?((?<!\b).){0,5}Rip replacement at work?

Yes, it appears epititle doesn't work.

I found a bug while playing with the replacement function...

Attempting to change the order of any of the replace expressions using the up/down arrows results in the error:

Unexpected exception:
The source tree must contain the source node. (C:\Program Files\Virtual Treeview\Source\VirtualTrees.pas, line 26510)


The program becomes completely unresponsive—it has to be killed.

I was able to remove an expression, but adding one results in the error (after saving and reopening Preferences):

Unexpected exception:
List index out of bounds (8 )


On restarting the program, it's apparent the configuration file is corrupted, as settings are not recognized. I recovered by replacing the FindExps= line with one from backup.

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #14 on: March 02, 2009, 01:40:12 am »
I could not find any problems with the replace function and <epititle> works here as well. Could you explain how to reproduce the problem with eptitle?
Gentlemen, you can’t fight in here! This is the War Room!

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.4 File Scanner
« Reply #15 on: March 02, 2009, 03:25:07 am »
I forgot what I reported at the beginning of this topic...

Quote
When adding new episode files to existing series, the existing title and year are removed. Title is changed to "#S.E" and Year to "-1."

It also doesn't work if the series does not already exist. I don't know of any other circumstance where might see Episode title being passed from the filename. What do you mean by "<epititle> works here"?

I tried a few more things with the replace expressions issue:

I created a new database, using the same configuration file. The same errors occurred.

I removed the configuration file, forcing the program to create a new default configuration on startup. The same errors occurred with both the new database created in the previous test, and my regular database.

The first error doesn't always happen at the same time. Once I was able to change an expression's position in the list and close and reopen the Preferences dialog without the error happening. Then I tried to make another change, and the error happened immediately. The second error seems more consistent.

Offline nostra

  • Administrator
  • *****
  • Posts: 2852
    • View Profile
    • Personal Video Database
Re: 0.9.9.4 File Scanner
« Reply #16 on: March 03, 2009, 01:04:26 pm »
Quote
It also doesn't work if the series does not already exist. I don't know of any other circumstance where might see Episode title being passed from the filename. What do you mean by "<epititle> works here"?

Ehm, sorry, it seems like I have fixed the issue as you reported it the first time, but forgot about it ::) That's why it works here.

Quote
I created a new database, using the same configuration file. The same errors occurred.

I removed the configuration file, forcing the program to create a new default configuration on startup. The same errors occurred with both the new database created in the previous test, and my regular database.

The first error doesn't always happen at the same time. Once I was able to change an expression's position in the list and close and reopen the Preferences dialog without the error happening. Then I tried to make another change, and the error happened immediately. The second error seems more consistent.

You can stop investigating this issue as I have already found where the problem was and fixed it.
Gentlemen, you can’t fight in here! This is the War Room!

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.x File Scanner and Regular Expressions
« Reply #17 on: May 31, 2009, 08:29:49 pm »
Moved here from Beta forum—to provide a place to discuss how to modify the regex in the file scanner configuration so the scanner will better recognize your files.

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: 0.9.9.x File Scanner and Regular Expressions
« Reply #18 on: June 02, 2009, 01:54:40 am »
I've mentioned several times the effective use of the file scanner involves finding one's preferred balance between

  • the ability of the regex provided to recognize a variety of different filename structures, and
  • the consistency of the file naming structure (or "convention").

In other words, we have a choice of

  • including a comprehensive set of regex in the configuration to recognize a wide range of different filename structures, or
  • renaming all files to comply with a file naming convention which is recognized correctly by a simple set of regex.

There is no excaping the necessity of learning at least a little bit about regex. If one chooses the latter approach (to avoid having to adapt the regex to their circumstances), they will soon realize the best method of implementing the second step is to use a regex file renamer.

So here are some references I've found helpful...

Regular Expressions:


File Renaming:


Offline darichman

  • User
  • ***
  • Posts: 59
    • View Profile
TV Series & File Scanner
« Reply #19 on: June 06, 2009, 05:53:51 am »
Hello hello.

I've taken the plunge and have started looking at importing TV shows in PVD. I have all my TV shows in the file format...

E:\Television\Dark Angel\Season 2\Dark Angel - 2-03. Proof of Purchase

The default filescanner is picking up the show title as "Dark Angel - 1" and "Dark Angel - 2" etc and not picking up the Episode Name part of the filename. Before I go and to a whole lot of manual work, is there any way to set the filename scanner to pick up the season and the episode name if all my filenames are in this format?

I've been fiddling with the RegExps options for the file scanner to try and get PVD to pick files up as

<title> - <season>-<episode>. <eptitle>

So far I haven't had any luck. If anyone has any suggestions, I'd be really grateful :)

 

anything