HTML

Jose Lerebours fpgroups at gmail.com
Tue Nov 12 14:01:15 PST 2019


In an age of WSDK, SOAP, JSON, API and tons of other ways to up/down 
data between two point, parsing HTML documents seems like a self 
inflicted wound or being trapped in an environment where real life 
solutions are not possible.

Is it possible that there are really no other way to obtain the data?  
Are you just fetching for web pages and parsing through the response to 
get info off of them (google search, amazon.com, ebay.com, etc.)?

Almost everyone out there has an API of sort - Are you limited to 
parsing through HTML documents to get the info you seek?

In addition, these questions come to mind:

a. Are HTML expected to be exactly the same every time?
b. Do properties to target objects in HTML document have specific 
settings you can use?
c. How well versed are you in producing HTML documents?  I mean, are you 
familiar with the tags and their respective properties?
d. Is this HTML document to be "uploaded", "emailed" ... !?! Meaning, 
where does the document come from and by which means are you getting them?

If you are dealing with a "fixed" format type HTML document, I am sure 
you can get a lot more specific help if you posted a sample (partial or 
full) for others to review.

I 2nd Mark's remarks, I would stay as far away as possible from writing 
parser that requires constant tuning and maintenance. HTML documents 
these days are "dynamic" and from one rendering to the next, the 
wording, even elements/objects IDs are different (I have seem pages 
where the object ID appear to be hashed and not same from one inquiry to 
the next).

In other words - parsing could be very tricky and more of a head ache 
you may want to deal with.  Wait till you need to start parsing PDF 
documents ;-)

As far as rendering HTML, this is easy!

Now, fP-Tech just released a web based product so that you can render 
your filePro based data to a browser using, I am guessing, native code - 
Why pay $900 or whatever OneGate costs in lieu of buying this new 
product directly from fP-Tech?

NOTE: Not advocating against nor in favor of any one product, simply 
saying that if you can push your data using native code w/ no learning 
curve, why go to a 3rd party for a solution ... This is entirely your 
decision to make.

Good luck!


On 11/12/19 3:56 PM, Richard Kreiss via Filepro-list wrote:
> Tony,
>
> I have written a parser too handle HTML file returned by a clients vendor. FilePro does not have a native parser for html. Therefore you might need to use an outside program to parse the HTLM INTO A TXT OR CSV FILE TO IMPORT INTO FILEPro.
>
> AT THIS POINT THERE IS NO EASY WAY TO DO THIS. Fp Tech did work on a program to output HTML but not a parser to import HTML. The issue I have run into is my client’s vendor required me to send them “clean” HTML but the never returned a clean HTML. File for me to work with. My parser program has had to account for the variations in format.
>
> Richard Kreiss
> GCC CONSULTING
> Sent from my iPhone
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list


More information about the Filepro-list mailing list