tvsubtitles query

This commit is contained in:
Fred Pauchet 2011-10-04 09:04:37 +02:00
parent e125bbc02b
commit dbbcb31154
2 changed files with 64 additions and 0 deletions

32
README Normal file
View File

@ -0,0 +1,32 @@
PigeonHole
==========
The main purpose of this application is to sort some specific types of files into a well-arranged directory.
I used it for classifying tv shows from a garbage folder into the right one, based on the filename which will be cleaned to help sorting.
How it works
------------
The project is splitted into several files :
* pigeonhole/pigeonhole.py : the one that should be run :)
* setup.py : not used yet, sorry.
* pigeonhole/config.py : where you should put your configuration.
### config.py ###
The configuration file contains the declaration of three variables :
1. useless_files_extensions : used to clean a folder when the content of this directory (and its subdirectories) is only composed by this kind of files. Do not try to put `*` inside this filter, I don't know the behavior yet...
2. shows_extensions : the files that need to be organized. The `process` method of the `PigeonHole` class won't look for anything else than these filetype, based the recognition of extensions and not on [magic numbers](http://en.wikipedia.org/wiki/List_of_file_signatures).
3. shows_dict : used for file that have a 'special name'
(ie. using 'tbbt' while the real name that can be found in the destination folder is much much longer)
Unit testing
------------
All tests are located inside the `pigeonhole/tests` directory. To launch them, use the following command, based on the python handbook:
python -m unittest discover
Temporary files and folders are created (and cleaned) to verify that the file behavior is going okay.

32
pigeonhole/subQuery.py Normal file
View File

@ -0,0 +1,32 @@
import urllib2
import re
from BeautifulSoup import BeautifulSoup
"""
Querying non web services through http interrogation and regex results retrieval.
"""
def query(showname):
print "Trying " + showname
socket = urllib2.urlopen('http://www.tvsubtitles.net/search.php?q=' + showname.replace(' ', '%20'))
soup = BeautifulSoup(socket.read())
socket.close()
results = soup.findAll(href=re.compile("/tvshow-([A-Za-z0-9]*)\.html$"))
if len(results) == 1:
"ouh yeah baby " + showname + " " + str(results[0])
elif len(results) == 0:
print "No results found for " + showname
else:
print "Here are the possible results for " + showname
for res in results:
print "\t" + str(res)
if __name__ == "__main__":
query('the big bang theory')
query('being erica')
query('white collar')
query('scrubs')
query('castle')