Web crawling
Address Parser
Online Tools
Link checker
Email hider
Brief manual for Cliver Crawler Host

Deploying Cliver Crawler Host for Windows

Crawler Host for Windows needs the respective .NET framework and MS SQL Server Express. Be sure SQL server is running.
- Unpack CrawlerHost folder and locate it where you want.
- Launch _CrawlerManager.exe. If it runs first time, it will ask to set the folder where Crawler Host will keep logs.
- Right-click the sys tray icon of Manager and open Settings window. If you want Crawler Host to email notifications, set SMTP server credentials and also default email address there.
- Open Crawlers window and make the needed settings there.

The system was designed to work without human intrusion. Never do manual changes in the database because it may bring to data loss.

_CrawlerManager process should run all the time as you want Crawler Host to operate. Thus to have Crawler Host operating permanently you may want to create a task in Task Scheduler that will do autologon and launch _CrawlerManager.exe on Windows startup.

If you need to launch/stop a crawler, it can be done by Manager's commands; doing it beyond Manager is not recommended.
Crawlers window: Configuration fields

id state site command admin_emails run_time_span crawl_product_timeout restart_delay_if_broken comment
Unique ID of the crawler. Usually it is the target site. Current mode of the crawler. DISABLED means the crawler is switched off. A comment field containing the target site url. STOP - kill the crawler if it is running and not launch it until this command stays;
RESTART - stop the crawler if it is running and start it;
FORCE - launch the crawler immediately ignoring its _next_time_start;
Comma- or new line- separated email addresses where notifications are sent. Period in seconds between the crawler starts. Period in seconds within which the running crawler should update data. When exceeded, the crawler is considered hanged. Period in seconds after that the crawler will be restarted if it was broken. Any notes.

These fields can be modified directly in the window. Change is accepted when the edited row loses focus.
Crawlers window: System fields

_last_session_state _next_start_time _session_start_time _last_start_time _last_end_time _last_process_id _last_log _archive _products_table _last_product_time
Current state of the crawler. Time when the crawler will start. Time when the last session was started. Time when the crawler was started. It can be later than _session_start_time if the crawler was broken. Time when the crawler stopped last time. Id of the last crawler process. Path to the last session log. Old system info. Name of the table where the crawler stores data. The last time when the crawler touched the products table.

These fields reflect state of crawlers and cannot be modified.

Stop Check Now
Suspend Manager service. Process recently made changes immediately.
Settings window

Settings window allows to set:
  • max number of crawler threads that Crawler Host can run at the same time;
  • email parameters that are needed only if you want to receive email notifications;

2006-2019 CliverSoft