Introduction
RegexTreeer (pronounced [redʒeks tri:Ə]) is a GUI software tool designed to solve complex parsing tasks by means of utilizing superposed regular expressions and standardizing development of data parsers thus making this work much easier. It implements an innovative technology of building and managing trees of regexes. It is suitable for parsing any texts, and particularly web pages since it has a web browser as an integral part of the application.
Background
When the text that you want to parse has no strongly defined format, often it is impossible to write a single regex that can fetch exactly the required data from there. In those cases if even a single regex can be written, it may get a grotesque form that is hard to write and even harder to perform. Hence, to solve a complex parsing task, you have to write several regexes so that they are applied in turn one after another to the text or to the results of the previous regex.
Debugging of such a regex construction is a wearisome work. It is so because, although there are a lot of regex debug tools, in the case of superposed regexes they become not too handy as allow debugging only one regex at once. That means you have to intercept captures of the previous regex in order to debug the regex that is applied after. Keeping in mind that debugging should be performed on many matches to get confidence in the regex, it appears real headache. Yet again the same problem arises while updating the parser, after new peculiarities in the input text have been found that occurs often enough.
Another problem is that the parsing code appears non-readable and intricate because of presence of many regexes in it and superposed operations on the parsed intermediate data. The code around regexes has to conform to the logic that they dictate, so in most cases you will not be able to change the regexes without changing the code and vice versa. As a result you end up with obscure code that is very hard in maintenance.
RegexTreeer was developed to eliminate these problems.
The basic features of RegexTreeer
- graphic interface for building and debugging collections of superposed regexes called regex trees;
- storing regex tree in XML file of the predefined format (regex tree file) so that it can be used by many applications;
- exposing Cliver.Parser library that can be used within .NET code in order to perform parsing with regex trees and get the parsed data as a tree-like structure;
- providing coordination between html source code and its view in a web browser having in mind parsing html pages with regular expressions;
Copyright © 2006-2008, Sergey Stoyan, CliverSoft
This help file has been generated by the freeware version of HelpNDoc