HTML
Jose Lerebours
fpgroups at gmail.com
Tue Nov 12 14:01:15 PST 2019
In an age of WSDK, SOAP, JSON, API and tons of other ways to up/down
data between two point, parsing HTML documents seems like a self
inflicted wound or being trapped in an environment where real life
solutions are not possible.
Is it possible that there are really no other way to obtain the data?
Are you just fetching for web pages and parsing through the response to
get info off of them (google search, amazon.com, ebay.com, etc.)?
Almost everyone out there has an API of sort - Are you limited to
parsing through HTML documents to get the info you seek?
In addition, these questions come to mind:
a. Are HTML expected to be exactly the same every time?
b. Do properties to target objects in HTML document have specific
settings you can use?
c. How well versed are you in producing HTML documents? I mean, are you
familiar with the tags and their respective properties?
d. Is this HTML document to be "uploaded", "emailed" ... !?! Meaning,
where does the document come from and by which means are you getting them?
If you are dealing with a "fixed" format type HTML document, I am sure
you can get a lot more specific help if you posted a sample (partial or
full) for others to review.
I 2nd Mark's remarks, I would stay as far away as possible from writing
parser that requires constant tuning and maintenance. HTML documents
these days are "dynamic" and from one rendering to the next, the
wording, even elements/objects IDs are different (I have seem pages
where the object ID appear to be hashed and not same from one inquiry to
the next).
In other words - parsing could be very tricky and more of a head ache
you may want to deal with. Wait till you need to start parsing PDF
documents ;-)
As far as rendering HTML, this is easy!
Now, fP-Tech just released a web based product so that you can render
your filePro based data to a browser using, I am guessing, native code -
Why pay $900 or whatever OneGate costs in lieu of buying this new
product directly from fP-Tech?
NOTE: Not advocating against nor in favor of any one product, simply
saying that if you can push your data using native code w/ no learning
curve, why go to a 3rd party for a solution ... This is entirely your
decision to make.
Good luck!
On 11/12/19 3:56 PM, Richard Kreiss via Filepro-list wrote:
> Tony,
>
> I have written a parser too handle HTML file returned by a clients vendor. FilePro does not have a native parser for html. Therefore you might need to use an outside program to parse the HTLM INTO A TXT OR CSV FILE TO IMPORT INTO FILEPro.
>
> AT THIS POINT THERE IS NO EASY WAY TO DO THIS. Fp Tech did work on a program to output HTML but not a parser to import HTML. The issue I have run into is my client’s vendor required me to send them “clean” HTML but the never returned a clean HTML. File for me to work with. My parser program has had to account for the variations in format.
>
> Richard Kreiss
> GCC CONSULTING
> Sent from my iPhone
> _______________________________________________
> Filepro-list mailing list
> Filepro-list at lists.celestial.com
> Subscribe/Unsubscribe/Subscription Changes
> http://mailman.celestial.com/mailman/listinfo/filepro-list
More information about the Filepro-list
mailing list