Dear All,
I'm new to Calibre, however those of you who are not surely know about the problem of broken lines when converting PDF to ePUB. <BR> codes appear wherever they want to and split text into thousands of passages which looks weird.
This article (https://dearauthor.com/ebooks/calibr...nversion-tips/) suggests using Heuristic Processing during conversion to get rid of <BR>s, but it didn't work for me - I used the range from 0.4 to 0.6 with absolutely no result.
The same article proposes to use Search & Replace function and it was a solution in my case! I used the following logic: \. +<br>(*SKIP)(*FAIL)|\<br>|\d +<br>
I assumed that <BR>s after dot (".") were an author-defined start of the new passage, so i didn't touch them (\. +<br>(*SKIP)), while standalone <BR>s (\<br>) and <BR>s which follow any word (\d +<br>) were replaced with nothing (= deleted), as almost always they were breaking sentence into useless passages.
Everything would have been prefectly fine, except one thing: the above-mentioned algorythm deletes "useful" <BR>s after headlines, which are usually highlighted with <b> code (<b>THIS IS HEADLINE </b><br>) and paragraphs (chapters???), which are highlighted with <a id> code (<a id="p8"></a> <br>).
So, what I need is to add an exception to my algorythm so that <BR>s are not deleted when they follow </a> and </b> codes. I played around with quite a number of different variants, but still can't find my Grails. Possibly (*SKIP)(*FAIL) architecture does not suppose multiple skip logic: I ignore 1 parameter from the very beginning and want to add 2 more - so finally 3 in total.
Any thoughts?
I'm new to Calibre, however those of you who are not surely know about the problem of broken lines when converting PDF to ePUB. <BR> codes appear wherever they want to and split text into thousands of passages which looks weird.
This article (https://dearauthor.com/ebooks/calibr...nversion-tips/) suggests using Heuristic Processing during conversion to get rid of <BR>s, but it didn't work for me - I used the range from 0.4 to 0.6 with absolutely no result.
The same article proposes to use Search & Replace function and it was a solution in my case! I used the following logic: \. +<br>(*SKIP)(*FAIL)|\<br>|\d +<br>
I assumed that <BR>s after dot (".") were an author-defined start of the new passage, so i didn't touch them (\. +<br>(*SKIP)), while standalone <BR>s (\<br>) and <BR>s which follow any word (\d +<br>) were replaced with nothing (= deleted), as almost always they were breaking sentence into useless passages.
Everything would have been prefectly fine, except one thing: the above-mentioned algorythm deletes "useful" <BR>s after headlines, which are usually highlighted with <b> code (<b>THIS IS HEADLINE </b><br>) and paragraphs (chapters???), which are highlighted with <a id> code (<a id="p8"></a> <br>).
So, what I need is to add an exception to my algorythm so that <BR>s are not deleted when they follow </a> and </b> codes. I played around with quite a number of different variants, but still can't find my Grails. Possibly (*SKIP)(*FAIL) architecture does not suppose multiple skip logic: I ignore 1 parameter from the very beginning and want to add 2 more - so finally 3 in total.
Any thoughts?