SUMMARY: Converting a PDF file to text in Solaris 10

From: Benjamin DeMora <Benjamin.DeMora_at_vivista.sungard.com>
Date: Thu Jul 06 2006 - 07:22:05 EDT
OK - converting PDF files to ascii text files within Solaris...

This can easily be done using pdftotext, which ships as part of the xpdf
static linked precompiled binary available from
http://www.foolabs.com/xpdf/

One thing to note - this conversion program can produce a large number
of additional whitespace characters in the resulting file. These can be
cleaned up and removed by compiling and running a quick C program:

-------------begin space.c-------------------

#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
int c;
int spaceOn=0;
if (argc < 2)
  exit(1);
fp=fopen(argv[1], "r");
if (!fp)
  exit(1);
while ((c=getc(fp)) != EOF) {
  if (c != ' ') {
    printf("%c", c);
    spaceOn=0;
  }
  else {
    if (spaceOn == 0) {
      printf("%c", c);
      spaceOn=1;
    }
  }
}
}
----------end space.c------------------

-----------

Benjamin J de Mora
UNIX Systems Engineer
Systems Management
SunGard Vivista

This message has been checked for all known viruses on behalf of SunGard
Vivista by MessageLabs.

http://www.messagelabs.com or Email: mailsweeper.info_at_vivista.sungard.com

For further information http://www.sungard.com/vivista
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Jul 6 07:22:36 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:59 EST