PDF Indexer 

Version
2.4 (last update on Jul 9, 2008)
Rating
Compatibility
Votes
22
Favoured
27
License
GPL
Non-Commercial
Type
Views
46211
Date Added
15 May 2006
Allow PDFs to be searched via the Joomla/Mambo search module. This component allows you to index PDFs on your site and the corresponding plugin (mosbot) allows that index to be searched. Directories that contain PDFs can either be set to public or registered. Will only work on servers using Linux or Windows.
Will not work in safemode.
bykeewhip on September 16, 2009
Very nice component, only wish is 1.5 native version.
I encountered the problem of reading pdfs in Latin1 encoding, so database population stopped at special characters like ë and à. Because Joomla 1.5 is utf-8, you need to get the text from the pdf in utf-8 also. You can change the command line of the pdftotext command for this. In admin.file_index.php near line 350 you change
pdftotext \"$original_name\" - 2>&1
to
pdftotext -enc UTF-8 \"$original_name\" - 2>&1
watch for the local and the component command
I don't know if this works for the Windows command.
Great component!
I encountered the problem of reading pdfs in Latin1 encoding, so database population stopped at special characters like ë and à. Because Joomla 1.5 is utf-8, you need to get the text from the pdf in utf-8 also. You can change the command line of the pdftotext command for this. In admin.file_index.php near line 350 you change
pdftotext \"$original_name\" - 2>&1
to
pdftotext -enc UTF-8 \"$original_name\" - 2>&1
watch for the local and the component command
I don't know if this works for the Windows command.
Great component!
bygeoffjones on May 19, 2009
Excellent with 1.0.15, but problems with 1.5. In particular, debug mode indicates you need to have a database table called jos_mambots (I copied mine from a site running 1.0.15), and there are admin errors displayed when running fastcgi. No resolution yet.
bycesark on May 14, 2009
It's a very useful extension, but need a solution for accented characters, essential for files in latin languages.
bydlebreux on March 13, 2009
This extension does what it says it does. But what it doesn't mention is that it doesn't integrate with DOCman. Which is fine. They shouldn't have to mention that.
However, if you use DOCman be aware that these extenstions - PDF Indexer & Doc Indexer - do not care about what security settings you have enabled in DOCman.
So if you have a Private / Restricted / Hidden file in DOCman it won't matter. Using the standard Joomla Search with this addon will find that file and allow ANY USER to open it.
Like I said, it's a good extension and does what it should. But just be aware that it creates a security hole if you're using DOCman.
However, if you use DOCman be aware that these extenstions - PDF Indexer & Doc Indexer - do not care about what security settings you have enabled in DOCman.
So if you have a Private / Restricted / Hidden file in DOCman it won't matter. Using the standard Joomla Search with this addon will find that file and allow ANY USER to open it.
Like I said, it's a good extension and does what it should. But just be aware that it creates a security hole if you're using DOCman.
byrkmani on September 3, 2008
I have gone through the review section and I am prompted to write this. This component works well with DOCman. To do this one has to install the Plugin (bot_docman_search_1.4.0rc2.zip) from http://joomlacode.org/gf/download/frsrelease/7085/24001/bot_docman_search_1.4.0rc2.zip.
This will enable the standard Joomla! Search to index DOCman documents. Next, follow the instructions on how to index PDFs through PDF Indexer from this page. Now everything will fall in place ...
**In fact I have disabled the standard 'DOCman Search' from my site, since it does not 'highlight' the search items.
This will enable the standard Joomla! Search to index DOCman documents. Next, follow the instructions on how to index PDFs through PDF Indexer from this page. Now everything will fall in place ...
**In fact I have disabled the standard 'DOCman Search' from my site, since it does not 'highlight' the search items.
byIAAF on August 2, 2008
This was, without a doubt, the easiest extension to install and configure that I have used. It literally took me less than 5 minutes. I have version 1.0.15 and was looking for something to index and incorporate search results from our PDF files into the joomla search. We maintain PDF files of all our school newspapers (dating back to mid 1930's) as well as our school yearbooks. How wonderful now to be able to search on a last name and be able to retrieve all info for that person from website content, newspapers & yearbooks! This is wonderful. Thank you so much. What a nice job.
byrootwiley on July 23, 2008
It seems the bad ratings here are either lone cases or problems with older versions. I ran this on J1.5.3 without a single hitch. Easy install, good controls on the back end, no problems at all.
For an organization with half our content tied up in PDF publications, this was a must-have.
Thank you, thank you, thank you.
For an organization with half our content tied up in PDF publications, this was a must-have.
Thank you, thank you, thank you.
bygabo.egui on July 22, 2008
Thanks to the author for this great extension!, its work perfectly.
It would be nice that this extension works together with DOCMAN extension, to provide a full solution to Joomla Community.
It would be nice that this extension works together with DOCMAN extension, to provide a full solution to Joomla Community.
bycarverdave on February 1, 2008
Extremely easy to install and configure. Worked great the first time. I'm new to Joomla and thought I'd messed up by making the newsletters pdf but because the way they were given to me by the not for profit that owns the site all the formatting would have been lost during conversion to HTML. Adobe made them look great and PDF Indexer saved me by making it easy to search them. Thanks Scott, great work!
It does exactly what it says, and the price for it is very reasonable. Support has a very fast turn-around!
byyamada on November 3, 2007
Ok time to review this one.
I bought it a few months ago and was pleased with it as i use a lot of pdf files.
It does indexes well but i did discover a big security problem.
When people use the search and find contents from your pdf it shows exactly where the files are.
The address isnt hidden is any way.
I contacted the author asking him to take care of this.
He said .."that shouldnt be to difficult to do"..
But still nothing happend.
Next excuse was that were some changes in his private life and i said...ok i will wait.
Long time after that...same excuse..
Nothing is happening.
So its very disapointing that some authors dont take complaints serious,not even with commercial components.....in this case anyway.
I bought it a few months ago and was pleased with it as i use a lot of pdf files.
It does indexes well but i did discover a big security problem.
When people use the search and find contents from your pdf it shows exactly where the files are.
The address isnt hidden is any way.
I contacted the author asking him to take care of this.
He said .."that shouldnt be to difficult to do"..
But still nothing happend.
Next excuse was that were some changes in his private life and i said...ok i will wait.
Long time after that...same excuse..
Nothing is happening.
So its very disapointing that some authors dont take complaints serious,not even with commercial components.....in this case anyway.
Owner's reply
First off, the change in my private life is I had my first child. Any first time parent knows the first two months are extremely hard and running on 4 hours of sleep isn't idea for writing code.
Secondly, this has been addressed in the latest version of PDF indexer.
Finally, this customer received a full refund because he misunderstood what the component was meant for.
It's hard to try to find the words to praise this component & search bot to do the excellent features proper justice. The only thing I had to to with my PDFs to get them indexed correctly was to remove a few security settings and re-save them. After that it indexed the PDFs very fast. About 30 MB of PDFs took about 3 seconds (On a dedicated server). For only 15 USD this is a bargain indeed. Not to give the developer any ideas, but I wouldn't think twice about paying 10 times that =) It is functionality like this that makes Joomla stand out and in front of the crowded jungle of other os CMS' out there.
bytwhaley on May 30, 2007
I had a couple odd errors when I installed, but I emailed the developer and he provided excellent technical support. Once I got around the issues the extension worked as promised.
Overall, it's a good extension that I would recommend without hesitation. I have almost 2000 pdfs on my site and it had no problem with indexing them all.
Overall, it's a good extension that I would recommend without hesitation. I have almost 2000 pdfs on my site and it had no problem with indexing them all.
byvistamedia on April 21, 2007
Excellent addon for Joomla!. Thanks a lot for your work.
byreverendspam on December 2, 2006
I purchased this program and it is not ready for prime time.
The install went simply as do most Joomla components. The initial indexing of 100 pdf files proved to be a resource hog as it send warning bells and whistles to my isp. PDF Indexer used over 20% of cpu resources.
My account was immediatley shut down by a linux script and I had to call my ISP to have my account reinstated. They told me that it was most likely caused by poorly written code and it is rare they have many problems with Joomla (which they advertise).
After the initial indexing the program will skip the previously indexed files and just index new files added. I would suggest that you index maybe 10 files at a time.
I informed the author and I think he may be looking into this. It would probably be easy to add some sort of timer when indexing that allows 5 files to be indexed every minute or so. If you have your own server this will not be an issue, but for those of us that are on shared systems we will get hammered by our isp's. Like I said this should be a simple fix adding a couple of lines of code.
The second issue I have is that most of my pdf files were created by Adobe acrobat 7. This caused pdf indexer to not work properly and gave errors on every file created with acrobat 7 (uses AES encryption - version 1.6 pdf files).
The reason behind this is because pdf indexer uses the program xpdf for its pdf to text conversion and then dumps the converted text into your database.
It is a widely known issue that xpdf is not capable of reading version 1.6 pdf files at this point in time. Therefore there were errors with every pdf ver 1.6 file. It works fine with ver 1.5 and below.
If I would have known all this up front I would have not purchased the program in its current state.
Once you get past these problems on the backend, the frontend appears to work well in that the pdf's that did index showed up properly in a search.
I feel that the author is not very responsive to these issues for charging for a product that only half works.
This can be a great product, but it relies on an outdated engine (xpdf). The author should let folks know this up front before they purchase.
The install went simply as do most Joomla components. The initial indexing of 100 pdf files proved to be a resource hog as it send warning bells and whistles to my isp. PDF Indexer used over 20% of cpu resources.
My account was immediatley shut down by a linux script and I had to call my ISP to have my account reinstated. They told me that it was most likely caused by poorly written code and it is rare they have many problems with Joomla (which they advertise).
After the initial indexing the program will skip the previously indexed files and just index new files added. I would suggest that you index maybe 10 files at a time.
I informed the author and I think he may be looking into this. It would probably be easy to add some sort of timer when indexing that allows 5 files to be indexed every minute or so. If you have your own server this will not be an issue, but for those of us that are on shared systems we will get hammered by our isp's. Like I said this should be a simple fix adding a couple of lines of code.
The second issue I have is that most of my pdf files were created by Adobe acrobat 7. This caused pdf indexer to not work properly and gave errors on every file created with acrobat 7 (uses AES encryption - version 1.6 pdf files).
The reason behind this is because pdf indexer uses the program xpdf for its pdf to text conversion and then dumps the converted text into your database.
It is a widely known issue that xpdf is not capable of reading version 1.6 pdf files at this point in time. Therefore there were errors with every pdf ver 1.6 file. It works fine with ver 1.5 and below.
If I would have known all this up front I would have not purchased the program in its current state.
Once you get past these problems on the backend, the frontend appears to work well in that the pdf's that did index showed up properly in a search.
I feel that the author is not very responsive to these issues for charging for a product that only half works.
This can be a great product, but it relies on an outdated engine (xpdf). The author should let folks know this up front before they purchase.
Owner's reply
"PDF Indexer used over 20% of cpu resources."
This is the first report of this I have seen.
"It is a widely known issue that xpdf is not capable of reading version 1.6 pdf files"
It does read and index the files. There is just a warning at the beginning of the index.
I responded to this user within 1 hour of his complaint and then never heard from him again.
byJose_manises on May 29, 2006
I like so much this component it is really worth the money, easy to install, easy to use and it really does what it says, index and search into pdf's.
If you need to index pdf's consider trying this component.
If you want to know in which webpages I have used it, for reference, send me an email to:
jose@joseargudo.es
If you need to index pdf's consider trying this component.
If you want to know in which webpages I have used it, for reference, send me an email to:
jose@joseargudo.es





