JCrawler

Features:
- sitemap generation on the fly, no additional plugins needed
- submit sitemap to 4 searchengines (google, Bing, ask.com, moreover)
- Automatic priority calculation based on internal PageRank (experimental)
- shows bad links, or not crawlable sites
- with curl: max 250 parallel connections to spider urls (configurable)
- exclude list
- modification of robots.txt with the location of the sitemap
- crawling with curl of fopen
- saves config in a xml-file
There is a new forum and a new dedicated page for JCrawler!
JCrawler is under redeveloppment to create with even more features.
Note:
- fopen or Curl is required to crawl the site
I used standard settings and after few minutes I got "Internal Server Error" page. When I checked my CPU usage on a hosting server, it reached quickly max allowed limit, hold for a while, then goes to zero and then up again. So far I noticed, it looks like crash because of lack of CPU power.
So I tried again but with lower settings "Max parallel connections" from 50 to 30 (I know it's about RAM not CPU but thought that with less connection CPU will be not so overwhelmed), result the same, too hight CPU usage, everything crashes for a while, the same CPU usage pattern.
Again I lowered settings. This time: priority to 0.3, Max parallel connections to 5. Result: an error site with: "Fatal error: Cannot pass parameter 2 by reference in /home/xxxxx/public_html/administrator/components/com_jcrawler/admin.jcrawler.php on line 189. CPU usage again hits maximum and crashes. Memory usage is ok, far from reaching limits.
I checked FTP and did find sitemap.xml but given above, I don't trust it is done well or till the end.
I have a payable hosting account and it suppose to be one of the best in given area, yet it can't handle JCrawler. So either hosting is overrated or JCrawler simply works poorly.
Thanks, but I am deinstalling JCrawler, doesn't work for me :(.
Thx a lot
Hey,
Thanks for your review.
Yahoo's search results are powered by Bing (earlier MSN) from Microsoft and AOL's search results are comming from Google.
Jcrawler covers both search engines to submit to, maybe you got other ideas?
Greets Patrick
Also, this is the only one which actually crawls your site, producing a complete sitemap. Extremely useful if you want to clean your sh404SEF database and rebuilt it with the proper URL.
Simply a "Must Have" extension.
Now, hours later I'm still hoping that my provider can fix this for me as there is nothing I can do myself from here.
NO component should be able to have such drastic consequences for a website, so that's clearly 'very poor'.
Hy,
Thanks for your review.
If you choose too many parallel connections the webserver can be overloaded/going down.
But JCrawler does not delete/change/add any files (except of the sitemap.xml)/MySQL data or E-Mails. It just reads every page of your Website and looks for links. Afterwards it writes the links into an file. That's it.
Actually i'm developing JCrawler 2.0, maybe you can get in touch in our forum to resolve your issues.
Thanks, Patrick
I would suggest the developer make provision for added more search engine.
In all a good work. Thumbs Up.
Thank You.
I use several sitemap extensions but it is different.
special thank to developers.
Regards
Amirhosein Soltani
The only thing I seem to be having a problem with is my robots.txt file. I have pages blocked by the robots file but Jcrawler is still including them in the sitemap. No real problem at the moment as they are easy to find and I can delete them manually.
The file is readable and while the line:
Disallow: /index.php?*page=shop.ask*
Works in Google it doesn't in the component.
Thanks
I have installed JCrawler on my joomla site, but I see the sitemap.xml file wasn't updated since long time. Isn't this JCrawler runs automatically and update sitemap.xml file? What can cause my sitemap.xml to not get updated?
Any help is greatly appreciated.
Thanks,
Samir Patel
...the sitemap.xml is not get updated automatically.
But i'm working on a plugin, which recreates the sitemap after adding new content.
Patrick
Theres just a little wish: Please provide the possibility to start the crawling from the frontend.
Thank you to the developer.

