English > Support
File scanner regular expressions
AimHere:
Hi,
I need some help with the configuration for the File Scanner, specifically, regular expressions.
I like how, when importing video files, PVD strips things like "CD1", "CD2", and so on from the filenames when generating a movie title. So, for example, "This Movie CD1 (2011, SomeStudio)" becomes "This Movie" in the title field.
Thing is, I have more and more movies that came from multiple DVD sets, where each DVD is broken into CD-sized files as well. They have filenames like "That Movie D1CD1", "That Movie D1CD2", "That Movie D2CD1", etc. ... OR, "Another Movie D1A", "Another Movie D1B", "Another Movie D2A", etc.
When I point PVD's file scanner to movies like these, I wind up with titles like "That Movie D1CD" or "Another Movie D" in the Scan Results window. Note how it retains part of the disc/CD identification tag. I have to edit the title and strip them out by hand.
Also, with the "DxCDy" naming convention, PVD doesn't group all the parts together under the same title, instead I get a separate item for each DVD ("That Movie D1CD" AND "That Movie D2CD"). So, I have to select both items and right-click to choose "Same Movie".
Now, I don't really know much about regular expressions, so I'm not sure how to go about fixing this in the preferences for the File Scanner. I'd like to retain the "D1CD1" or "D1A" tagging for the filenames, but keep it from carrying over to the titles in the File Scanner.
Any ideas?
Aimhere
rick.ca:
It's ability to recognize patterns like "That Movie D1CD2" is exactly why regex is used. The expressions in your configuration are evaluated until one matches the filename in question. You need to add one in the correct position that matches this particular pattern. For example...
(?i)^.*\\(?P<title>.*) D[0-9]CD[0-9].*
...would match D:\Video\Movies\That Movie D1CD2 plus whatever.mkv. A more elaborate expression could match variations in the " DxCDy " pattern, like ".DxCDyy." or "- Dx CDy" (spaces are significant!).
Better practice would be to rename all movie files to include the year. That helps resolve ambiguity in the title (when searching a data source) and allows a simple expression to match both Title and Year without fail...
(?i)^.*\\(?P<title>.*) \((?P<year>(19|2\d)\d{2})\).*
...matches anything like D:\Video\Movies\That Movie (2011) plus whatever.mkv.
But if you don't want to bother renaming these files (or any other files matching some other patterns) to include the year, there's nothing wrong with having multiple expressions to match various patterns.
Use the Utility to test regular expressions for extracting movie data from file names (link on Download page) to determine the expression needed in any circumstance.
AimHere:
Hi rick,
Sorry, I should have been more clear. The movie filenames already do include the year, e.g. "This Movie D1A (2011, Some Studio).avi", "That Movie D2CD1 (2010, AnotherStudio).avi", etc. I just want to strip out the "D1A" or "D2CD1" parts completely to create the titles in PVD's file scanner, rather than leaving the fragments that I'm getting (and have to edit out) as shown in my original post. And I want to do it in a way which is compatible with the existing (default) regular expressions already in the File Scanner configuration (i will accept modifying existing regex's to do what needs to be done).
AimHere:
Okay, I managed to figure out the strings I needed to add to the "find and replace" section in the File Scanner preferences to strip out the strings I wanted to remove:
(?i)\bD\d{1,2}CD\d{1,2}\b
(?i)\bD\d{1,2}[a-z]\b
(?i)\bD\d{1,2}\b
I just added these after the existing "(?i)\bCD\d{1,2}\b" line, and tested on a folder full of files named as we've been discussing. This seems to remove all of the "D1CD1" and "D1A" style strings from the filenames. Maybe there's a more elegant way to do the same thing with fewer RegEx's, but I'll take what works.
I've noticed another problem, though: the first line in the default set of "find and replace" expressions, "(?i).?((?<!\b).){0,5}Rip", is clearly intended to remove strings like "DVDRip" and "BDRip". But, I'm finding it ALSO removes any unrelated string that happens to end with the substring "rip". In other words, it removes words like "Trip", "Strip", "Grip", and so on. (And words like "Gripped" get mangled to "ped".) After some wrangling with it, I managed to come up with a RegEx that removes things like "DVDRip" while leaving words like "ripper", "tripping", etc. alone:
(?i)\b(CD|DVD|BD|Blu-Ray|BluRay|HD-DVD|HDDVD|VHS|Vinyl|Cassette|Tape|.{0})Rip(\s|\b)
Yeah, it's not pretty, but I figure it covers all the "rip" bases I'll ever encounter. :D (I had to add the fiddly bit at the end to strip out spaces that were creeping into the modified string.)
Aimhere
rick.ca:
You might be getting carried away with trying to remove things. The purpose is to recognize Title and, if possible, Year. If you can do that, you're done, regardless of what other crap is in the filename. Hopefully, the Title is always at the beginning. If you can recognize the disk numbering stuff, and it always comes after the Title, then you've got the Title. If the Year is always four numbers between a "(" and a "," or a ")", then you've got that too. There's nothing to be removed.
If, once you have a handle on regex basics, you still find it too complicated, you should probably be considering renaming the files so they can be recognized. Once you get existing files cleaned-up, you can adopt a workflow that renames files in consistent/recognizable patterns as soon as they hit your HDD. Think about it. It has to be done at some point. The earlier you rename files to something sensible, the less trouble they'll be.
I'm not advocating work for the sake of "neatness." This includes doing things like configuring torrent and ripping software so they do the renaming automatically. For example, my TV episodes are found, downloaded and renamed automatically. Properly renamed, they're automatically recognized and processed by PVD, and then the meta data fed automatically to my media manager. Aside from "supervising" the process, my job is to sit on the couch and enjoy my media using a remote. 8)
Navigation
[0] Message Index
[#] Next page
Go to full version