HTML – How to Join Texts and Optimize Formatting with Algorithms

algorithmformattinghtml

I have text objects (text1, text2, etc) with formatting information (eg. [bold, italic]). Now I want to concat these texts and format them in HTML.

For a simple case it would transform the following

text1[bold,italic]
text2[]
text3[bold]

into this HTML:

<b><i>text1</i></b>
text2
<b>text3</b>

But I want to join formattings where possible, not format each text individually. For example the following

text1[bold,italic]
text2[bold]
text3[]

Should result in this HTML:

<b>
  <i>text1</i>
  text2
</b>
text3

Additionally I want to wrap the "longest" formatting to the "outside" of the DOM. For example in the following case, italics is the longer formatting chain and thus wraps around the bold element:

text1[bold,italic]
text2[italic]
text3[]

Result:

<i>
  <b>text1</b>
  text2
</i>
text3

What would be a good algorithmic approach to solve this problem? Answers in pseudocode or any programming language would be helpful.

Best Answer

For minimizing the total number of tags you have to write, the simple greedy algorithm is optimal. At each position in the text:

  1. If any formatting is turning off, write out end tags matching the preceding start tags until all the required formatting is turned off; then
  2. Write start tags for any formatting that needs to turn on, in descending order of the length of the following text that it covers.