Guest guest Posted April 14, 2008 Report Share Posted April 14, 2008 Anyone still following this, I have a more robust version 16 now http://compkarori.no-ip.biz:8090/Autofile It has an install button that downloads and installs all the necessary files for you. ( Except you manually still have to setup the path for ghostscript ). testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of 14 different types of documents. -- Graham Chiu http://www.synapsedirect.com Synapse - the use from anywhere EMR. Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 14, 2008 Report Share Posted April 14, 2008 Holy god Graham I cannot keep up with you.just got the winzip utility working - now I guess I don't need it...OK guess I've been promoted to version 16.Hey, some of us gotta work and take care of the chickens and do the billing! :-)LynnTo: From: compkarori@...Date: Tue, 15 Apr 2008 00:53:14 +1200Subject: Re: Autofile now available Anyone still following this, I have a more robust version 16 now http://compkarori.no-ip.biz:8090/Autofile It has an install button that downloads and installs all the necessary files for you. ( Except you manually still have to setup the path for ghostscript ). testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of 14 different types of documents. -- Graham Chiu http://www.synapsedirect.com Synapse - the use from anywhere EMR. Going green? See the top 12 foods to eat organic. Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 14, 2008 Report Share Posted April 14, 2008 I'm planning on trying it out sometime later today at work where I have a Windows computer with Ghostscript and ImageMagic installed. I'd be most interested in using it on faxes. I don't get too many .pdfs from other people and I don't have software that creates .pdfs from scans. I do use .pdf for creating my printed insurance claims, because .pdf is very good for cross platform printing where you need everything to come out in exacely the right place on a pre-printed paper form. Have you thought of using .pdf " forms " ? (I'm sure you have !) There is software for extrating the data from the fields in .pdf forms that would not require any OCR. Of course getting the other party to send you the data in a .pdf form would not be easy, esp. when you have to deal with so many people sending you so many different types of documents. > > Anyone still following this, I have a more robust version 16 now > > http://compkarori.no-ip.biz:8090/Autofile > > It has an install button that downloads and installs all the necessary > files for you. ( Except you manually still have to setup the path for > ghostscript ). > > testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of > 14 different types of documents. > > -- > Graham Chiu > http://www.synapsedirect.com > Synapse - the use from anywhere EMR. > Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 15, 2008 Report Share Posted April 15, 2008 I tried it tonight on Windows. The Installation is slick. I already had Ghostscript and Imagemagick, as I use them also in tkFP, but I let it install yours anyway. It seemed to find them in the path. Not sure if it's finding your newer versions or the old versions I already had installed. I had some faxes in C:\var\spool\fax\incoming I used that because it's the same incoming fax directory name I use on the Linux version of tkFP. So that way C:\var\spool\fax\incoming is analagous to /var/spool/fax/incoming on the Linux version. Anyway, I tried making a new rule by using your rule as a template and changing the title. And what text I was looking for " Tulare District Hospital Lab " I let it run and of course it did not seem to recognize anything. It did move all the unrecognized files to C:\var\spool\fax\incoming\unrecognized Questions I have so far are: Does it depend on file name ending such as .g3 or .tiff or .pdf to determine file type? All the faxes in there I have do not end in .tiff or .g3 or .pdf , they end in a .01, .02 etc for page numbers. The rule looks like it is specifying on what lines and how many characters long something you are looking for is located, and a pattern to match. Is that right? How do I know how the position of something is going to be if it has not been OCR'd yet? Do I guess or does it make any difference? Can you match text with wildcards? If I gave you a sample document, could you make me a rule to use on it for me? Should I see an image in the box that says " No Image " ? - I never see an image in there. Does it move all the files to " unrecognized " before running the OCR - I think it does because that happens very fast and then it appears to be working hard on one file at a time. My machine is quite slow so I am not sure I have acutally OCR'd all the fax images in the unrecognized folder. I'm letting it run and see what happens. Do you still need my IP address to run this? Or is it all running locally on my own machine? > > Anyone still following this, I have a more robust version 16 now > > http://compkarori.no-ip.biz:8090/Autofile > > It has an install button that downloads and installs all the necessary > files for you. ( Except you manually still have to setup the path for > ghostscript ). > > testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of > 14 different types of documents. > > -- > Graham Chiu > http://www.synapsedirect.com > Synapse - the use from anywhere EMR. > Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 15, 2008 Report Share Posted April 15, 2008 > > I tried it tonight on Windows. The Installation is slick. > I already had Ghostscript and Imagemagick, as I use them also in tkFP, > but I let it install > yours anyway. It seemed to find them in the path. Not sure > if it's finding your newer versions or the old versions I already I would think Imagemagick would create a new path, but ghostscript doesn't affect the path in my experience so it is likely finding the old version > had installed. I had some faxes in C:\var\spool\fax\incoming > I used that because it's the same incoming fax directory name > I use on the Linux version of tkFP. So that way > C:\var\spool\fax\incoming is analagous to /var/spool/fax/incoming > on the Linux version. Anyway, I tried making a new rule by using > your rule as a template and changing the title. And what text I > was looking for " Tulare District Hospital Lab " I let it run and of > course it did not seem to recognize anything. It did move all the > unrecognized files to C:\var\spool\fax\incoming\unrecognized > > Questions I have so far are: > > Does it depend on file name ending such as .g3 or .tiff or .pdf > to determine file type? All the faxes in there I have do not end > in .tiff or .g3 or .pdf , they end in a .01, .02 etc for page numbers. Ahh... yes it does. if they use an extension like .01 ... etc, then it will ignore them. You will have to rename them to *.tiff for the moment. > The rule looks like it is specifying on what lines and how many > characters long something you are looking for is located, and > a pattern to match. Is that right? How do I know how the position No. You load up a PDF or TIFF, and create a zone in which you are going to OCR. So, nothing to do with lines, or characters long but positions on the page. > of something is going to be if it has not been OCR'd yet? Do I guess > or does it make any difference? just need to start some training by creating the rules. > Can you match text with wildcards? It does a proximity match on the words it finds ... ie. how close one word is to another. > If I gave you a sample document, could you make me a rule to use on > it for me? Sure. > Should I see an image in the box that says " No Image " ? - I never see > an image in there. No you won't because it never found a match for the rule to recognise the file so it never went further. > Does it move all the files to " unrecognized " before running the OCR - No. > Do you still need my IP address to run this? Or is it all running > locally on my own machine? No, it's running on my server -- Graham Chiu http://www.synapsedirect.com Synapse - the use from anywhere EMR. Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 18, 2008 Report Share Posted April 18, 2008 Graham, I tried the latest autofile version 17 with the rules file you sent me based on the fax lab report image I sent you. The system works and recognized the file when I put it in with a bunch of others. I think you are on to a very useful tool. I have to learn how to make my own rules now so I can set it up for some other types of faxes. I think it would work well if the fax file you are receiving is generated by a computer from data the computer has stored, as opposed to something that was scanned by hand, (at least with the type of scanner or fax machine I have). > > Anyone still following this, I have a more robust version 16 now > > http://compkarori.no-ip.biz:8090/Autofile > > It has an install button that downloads and installs all the necessary > files for you. ( Except you manually still have to setup the path for > ghostscript ). > > testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of > 14 different types of documents. > > -- > Graham Chiu > http://www.synapsedirect.com > Synapse - the use from anywhere EMR. > Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 18, 2008 Report Share Posted April 18, 2008 Graham, I was thinking with lab reports, whether one might even be able to extract the actual results data from the OCR'd image using your set up for electronic filing in the EMR in a structured format like SQL, XML or other for later use in generating flow charts, quality measurement etc. It could be extracted and then the user could verify that it matches the data in the image file before allowing it to be filed or signed off. > > > > Anyone still following this, I have a more robust version 16 now > > > > http://compkarori.no-ip.biz:8090/Autofile > > > > It has an install button that downloads and installs all the necessary > > files for you. ( Except you manually still have to setup the path for > > ghostscript ). > > > > testing ... 25 of my pdfs, 8 of someone elses .. 100% recognition of > > 14 different types of documents. > > > > -- > > Graham Chiu > > http://www.synapsedirect.com > > Synapse - the use from anywhere EMR. > > > Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 18, 2008 Report Share Posted April 18, 2008 Here's a video showing exactly that ... http://synapse-movies.s3.amazonaws.com/synapse-scanning.wmv > > > > > > > Graham, > > I was thinking with lab reports, whether one might even be able to > extract the actual results data from the OCR'd image using your set up -- Graham Chiu http://www.synapsedirect.com Synapse - the use from anywhere EMR. Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 19, 2008 Report Share Posted April 19, 2008 Graham, Very nice. Is that " tesseract " you are using for the OCR engine in that demo? I think I can do that in Tcl/Tk also. GOCR comes with a front end GUI that is actually written in Tcl/Tk the last time I looked. Since I can't get an HL7 feed from my hospital lab, this might be the next best thing. People should look at your video demos and see that you can do some really good stuff with low cost or no cost software. > > > > > > > > > > > > > > Graham, > > > > I was thinking with lab reports, whether one might even be able to > > extract the actual results data from the OCR'd image using your set up > > > -- > Graham Chiu > http://www.synapsedirect.com > Synapse - the use from anywhere EMR. > Quote Link to comment Share on other sites More sharing options...
Guest guest Posted April 19, 2008 Report Share Posted April 19, 2008 The problem is how to de-identify the data in the video and still get it to demonstrate! > > > It's a commercial OCR web service I use as at this stage as I didn't > find tesseract good enough but it is being developed actively so it > might at one stage replace the commercial service in the future > > > > > > > > > > > > > > > > > Graham, > > > > Very nice. Is that " tesseract " you are using for the OCR engine > > in that demo? I think I can do that in Tcl/Tk also. GOCR > > comes with a front end GUI that is actually written in Tcl/Tk > > the last time I looked. Since I can't get an HL7 feed from my > > hospital lab, this might be the next best thing. People should > > look at your video demos and see that you can do some really good > > stuff with low cost or no cost software. > > > > > > > > > > -- > > > Graham Chiu > http://www.synapsedirect.com > Synapse - the use from anywhere EMR. > -- Graham Chiu http://www.synapsedirect.com Synapse - the use from anywhere EMR. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.