Add some newlines to make it easier to fiddle with:
mh@schrute /Volumes/data/Users/mh/so/xml --> cat splitxml.c
#include
int main()
{
int c;
int first = 1;
while ((c = getchar()) != EOF) {
if (c == '<') {
if (first == 1)
first = 0;
else
putchar('\n');
}
putchar(c);
}
}
python xml.sax reports errors on these chars:
grep -l '&#x[0-9][0-9];' *.xml
to fix:
perl -pi -e 's/&#x[0-9][0-9];//g' *.xml