The Joomla! Extensions Directory ™


PDF Indexer ComponentPlugin

Now Shipping PDF Indexer v3.3!

Completely rewritten for Joomla 1.5.

Allow PDFs to be indexed and searched via the Joomla/Mambo search module. This component allows you to index PDFs on your site and the corresponding plugin (mosbot) allows that index to be searched. Directories that contain PDFs can either be set to public, registered, or do not index. Will work on servers using Linux, Windows, OSx, and FreeBSD.

Will not work in safemode nor will it work when popen is disabled.

Does not work with godaddy hosting.

Report

2011-08-19
Reviews: 4
Being a digital archivist of historical periodicals and books from the late 19th and early 20th Centuries, this is the best plug-in out there. Stay with the best, forget the rest. The developer has offered one-on-one support and definitely warranted my donation to him, twice!!!

The installation was flawless and other than a minor hiccup (perhaps by me) it has worked seamlessly from the first time it was installed. Future development, and financial support, is what this plug-in is worthy of.

I have to other digital archivists recommended this plug in multiple times and will continue to do so in the future. It is head and shoulders above other similar offerings.
2010-11-19
Reviews: 1
The latest version works great! I contacted the developer about a bug and he responded and fixed the issue the same day! The app will index multiple files in the same directory now. Good deal!
2010-05-18
Reviews: 1
The extension is quite OK, but the DOCman integration could be better. The index does not link to DOCman records at all, instead it links directly to the PDFs in the /dmdocuments directory, bypassing the entire DOCman permission system.

MrDog, you said you have a small change that you made so that the index will point to DOCman records, could you post that somewhere?

The developer seems to have abandoned PDF Indexer after he had his child. Since PDF Indexer is free software released under GPL, maybe it would be possible to make an updated version that supports at least UTF-8 indexing out of the box?
2010-01-10
Reviews: 2
Just installed this system and it seems to work really well.
There was an earlier comment on integration with DOCMAN which is true - it does not respect DOCMAN restricted access. But I wrote a very small change which seems to overcome this so if anyone is interested contact me
Owner's reply

Hi MyDog,

Do you mind shooting me an email with the change you wrote. I would like to take a look and include it in my latest release.

Thanks,

Nate Maxfield
nate[at]natemaxfield.com

2009-09-16
Reviews: 1
Very nice component, only wish is 1.5 native version.
I encountered the problem of reading pdfs in Latin1 encoding, so database population stopped at special characters like ë and à. Because Joomla 1.5 is utf-8, you need to get the text from the pdf in utf-8 also. You can change the command line of the pdftotext command for this. In admin.file_index.php near line 350 you change
pdftotext \"$original_name\" - 2>&1
to
pdftotext -enc UTF-8 \"$original_name\" - 2>&1
watch for the local and the component command
I don't know if this works for the Windows command.
Great component!
2009-05-19
Reviews: 6
Excellent with 1.0.15, but problems with 1.5. In particular, debug mode indicates you need to have a database table called jos_mambots (I copied mine from a site running 1.0.15), and there are admin errors displayed when running fastcgi. No resolution yet.
2009-05-14
Reviews: 1
It's a very useful extension, but need a solution for accented characters, essential for files in latin languages.
2009-03-13
Reviews: 3
This extension does what it says it does. But what it doesn't mention is that it doesn't integrate with DOCman. Which is fine. They shouldn't have to mention that.
However, if you use DOCman be aware that these extenstions - PDF Indexer & Doc Indexer - do not care about what security settings you have enabled in DOCman.
So if you have a Private / Restricted / Hidden file in DOCman it won't matter. Using the standard Joomla Search with this addon will find that file and allow ANY USER to open it.

Like I said, it's a good extension and does what it should. But just be aware that it creates a security hole if you're using DOCman.
2008-09-03
Reviews: 9
I have gone through the review section and I am prompted to write this. This component works well with DOCman. To do this one has to install the Plugin (bot_docman_search_1.4.0rc2.zip) from http://joomlacode.org/gf/download/frsrelease/7085/24001/bot_docman_search_1.4.0rc2.zip.

This will enable the standard Joomla! Search to index DOCman documents. Next, follow the instructions on how to index PDFs through PDF Indexer from this page. Now everything will fall in place ...

**In fact I have disabled the standard 'DOCman Search' from my site, since it does not 'highlight' the search items.
2008-08-02
Reviews: 1
This was, without a doubt, the easiest extension to install and configure that I have used. It literally took me less than 5 minutes. I have version 1.0.15 and was looking for something to index and incorporate search results from our PDF files into the joomla search. We maintain PDF files of all our school newspapers (dating back to mid 1930's) as well as our school yearbooks. How wonderful now to be able to search on a last name and be able to retrieve all info for that person from website content, newspapers & yearbooks! This is wonderful. Thank you so much. What a nice job.
2008-07-23
Reviews: 7
It seems the bad ratings here are either lone cases or problems with older versions. I ran this on J1.5.3 without a single hitch. Easy install, good controls on the back end, no problems at all.

For an organization with half our content tied up in PDF publications, this was a must-have.

Thank you, thank you, thank you.
2008-07-22
Reviews: 2
Thanks to the author for this great extension!, its work perfectly.

It would be nice that this extension works together with DOCMAN extension, to provide a full solution to Joomla Community.
2008-02-01
Reviews: 1
Extremely easy to install and configure. Worked great the first time. I'm new to Joomla and thought I'd messed up by making the newsletters pdf but because the way they were given to me by the not for profit that owns the site all the formatting would have been lost during conversion to HTML. Adobe made them look great and PDF Indexer saved me by making it easy to search them. Thanks Scott, great work!
2008-01-09
Reviews: 2
It does exactly what it says, and the price for it is very reasonable. Support has a very fast turn-around!
0 of 1 people found this review helpful
2007-11-03
Reviews: 44
Ok time to review this one.
I bought it a few months ago and was pleased with it as i use a lot of pdf files.
It does indexes well but i did discover a big security problem.
When people use the search and find contents from your pdf it shows exactly where the files are.
The address isnt hidden is any way.

I contacted the author asking him to take care of this.
He said .."that shouldnt be to difficult to do"..

But still nothing happend.
Next excuse was that were some changes in his private life and i said...ok i will wait.

Long time after that...same excuse..
Nothing is happening.

So its very disapointing that some authors dont take complaints serious,not even with commercial components.....in this case anyway.
Owner's reply

First off, the change in my private life is I had my first child. Any first time parent knows the first two months are extremely hard and running on 4 hours of sleep isn't idea for writing code.

Secondly, this has been addressed in the latest version of PDF indexer.

Finally, this customer received a full refund because he misunderstood what the component was meant for.

2007-09-14
Reviews: 7
It's hard to try to find the words to praise this component & search bot to do the excellent features proper justice. The only thing I had to to with my PDFs to get them indexed correctly was to remove a few security settings and re-save them. After that it indexed the PDFs very fast. About 30 MB of PDFs took about 3 seconds (On a dedicated server). For only 15 USD this is a bargain indeed. Not to give the developer any ideas, but I wouldn't think twice about paying 10 times that =) It is functionality like this that makes Joomla stand out and in front of the crowded jungle of other os CMS' out there.
2007-05-30
Reviews: 1
I had a couple odd errors when I installed, but I emailed the developer and he provided excellent technical support. Once I got around the issues the extension worked as promised.

Overall, it's a good extension that I would recommend without hesitation. I have almost 2000 pdfs on my site and it had no problem with indexing them all.
2007-04-21
Reviews: 9
Excellent addon for Joomla!. Thanks a lot for your work.
3 of 3 people found this review helpful
2006-12-02
Reviews: 9
I purchased this program and it is not ready for prime time.

The install went simply as do most Joomla components. The initial indexing of 100 pdf files proved to be a resource hog as it send warning bells and whistles to my isp. PDF Indexer used over 20% of cpu resources.

My account was immediatley shut down by a linux script and I had to call my ISP to have my account reinstated. They told me that it was most likely caused by poorly written code and it is rare they have many problems with Joomla (which they advertise).

After the initial indexing the program will skip the previously indexed files and just index new files added. I would suggest that you index maybe 10 files at a time.

I informed the author and I think he may be looking into this. It would probably be easy to add some sort of timer when indexing that allows 5 files to be indexed every minute or so. If you have your own server this will not be an issue, but for those of us that are on shared systems we will get hammered by our isp's. Like I said this should be a simple fix adding a couple of lines of code.

The second issue I have is that most of my pdf files were created by Adobe acrobat 7. This caused pdf indexer to not work properly and gave errors on every file created with acrobat 7 (uses AES encryption - version 1.6 pdf files).

The reason behind this is because pdf indexer uses the program xpdf for its pdf to text conversion and then dumps the converted text into your database.

It is a widely known issue that xpdf is not capable of reading version 1.6 pdf files at this point in time. Therefore there were errors with every pdf ver 1.6 file. It works fine with ver 1.5 and below.

If I would have known all this up front I would have not purchased the program in its current state.

Once you get past these problems on the backend, the frontend appears to work well in that the pdf's that did index showed up properly in a search.

I feel that the author is not very responsive to these issues for charging for a product that only half works.

This can be a great product, but it relies on an outdated engine (xpdf). The author should let folks know this up front before they purchase.
Owner's reply

"PDF Indexer used over 20% of cpu resources."
This is the first report of this I have seen.

"It is a widely known issue that xpdf is not capable of reading version 1.6 pdf files"
It does read and index the files. There is just a warning at the beginning of the index.

I responded to this user within 1 hour of his complaint and then never heard from him again.

1 of 1 people found this review helpful
2006-05-29
Reviews: 1
I like so much this component it is really worth the money, easy to install, easy to use and it really does what it says, index and search into pdf's.

If you need to index pdf's consider trying this component.

If you want to know in which webpages I have used it, for reference, send me an email to:

jose@joseargudo.es
Page 1 of 2