PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sun Jun 16, 2024 7:06 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 9 posts ] 
Author Message
PostPosted: Wed Dec 08, 2010 12:38 pm 
Offline

Joined: Thu Oct 16, 2008 1:54 pm
Posts: 14
I'm using MigraDoc to insert user-specified text into a PDF within a specified box. If I don't insert soft hyphens, long words (eg: URLs and email addresses) are not wrapped onto new lines, and escape the bounds of the box. If I do insert soft hyphens, MigraDoc chooses to break some lines in the middle of a word instead of at a space.


For example, without soft hyphens I sometimes get:
Code:
+-----------------------------+
|thisisalongemailaddress@somedomain.com
+-----------------------------+

which I want to be rendered as:
Code:
+-----------------------------+
|thisisalongemailaddress@some |
|domain.com                   |
+-----------------------------+



With soft hyphens, I get:
Code:
+-----------------------------+
|This is a line of text conta-|
|ining spaces which should wr-|
|ap at a space.               |
+-----------------------------+

which I want to be rendered as:
Code:
+-----------------------------+
|This is a line of text       |
|containing spaces which      |
|should wrap at a space.      |
+-----------------------------+



Is there no way to make MigraDoc wrap long unbroken words without messing up the wrapping of normal lines? :?


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 3:21 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3100
Location: Cologne, Germany
Soft hyphens should be placed between syllables (so use "con-tains" and "wrap", not "c-o-n-t-a-i-n-s" and "w-r-a-p").

MigraDoc breaks at whitespace and hyphens. If you find non-letters followed by letters or digit, insert a soft hyphen there (e-mail addresses will then break: "test@-test.-com").
For e-mail addresses a "breakable non-space" would be better than a soft hyphen, but that is not (yet) supported by MigraDoc. If an e-mail address breaks there will be a visible hyphen that is not part of the original e-mail address.

If a long word doesn't fit, MigraDoc doesn't break it (unlike Word which will break it). Both ways are imperfect.

If there's a preview, tell the user to insert spaces or soft hyphens to get optimal line breaks.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 5:15 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
Hello, (I edited to shorten it up a bit)

I am looking for this solution as well. The problem I have with soft-hyphens is that we'd have to have some routine to "soft-hyphenate" everything since we don't know what will fit and what won't until the layout is in progress. In an interactive editor the user can decide where to break, but many apps don't have that luxury.

I was thinking that I'd like to have a way for the paragraph rendering/wrapping code forceably wrap at the character boundary. ie: Break the word at the character that won't fit when no soft-hyphen is present rather than overlapping its boundary.

In tables, this results in text overlapping the text in the next cell, which can be impossible to read, or text that renders off page which is also impossible to read without using acrobat and a boatload of tricks to get it back on page... :)

As mentioned, MS Word forces content to stay within tables and columns (sometimes wrapping at the character boundary) when they don't fit.

So, maybe the challenge is to locate (via measurement) the last character in a given string of formatted text fits into a supplied rectangle (width), then break the line/string at that point (or breaking at soft hyphens if present would be a bonus). MigraDoc handles the tricky part of measuring/wrapping the various types of non-text content, but we'd need to resort to the forced character wrapping only whenever a word itself is too wide for the whole cell, textframe, page width, (whatever the parent bounding area is).

Is anyone aware of a method somewhere in PDFsharp or Migradoc to measure a formatted string and return the index of the character that doesn't fit the supplied X dimension?
Perhaps using the Graphics.MeasureString() overload that returns the number of charactersFitted could be of help?

-Jeff


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 13, 2010 3:30 pm 
Offline

Joined: Thu Oct 16, 2008 1:54 pm
Posts: 14
Thomas Hoevel wrote:
tell the user to insert spaces or soft hyphens to get optimal line breaks


Not really an option - our users are idiots! :roll:


jeffhare wrote:
Is anyone aware of a method somewhere in PDFsharp or Migradoc to measure a formatted string and return the index of the character that doesn't fit the supplied X dimension?


I managed to find a partial workaround using orestone's code. [1]

I use regular expressions to match each string of non-breaking characters ([^\s-]+), use the TextMeasurement class to measure the string and, if it's too long for the space, use NHunspell [2] to insert soft hyphens.

It's not perfect, but it does what I need for now.

[1] http://forum.pdfsharp.net/viewtopic.php?f=2&t=747#p1990
[2] http://nhunspell.sourceforge.net/


Top
 Profile  
Reply with quote  
PostPosted: Tue Dec 14, 2010 3:20 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
Thanks Richard,

I have tested the solution provided in #1 and it seems promising. There are some degenerative cases that this will probably still fail with, and maybe there's another solution.

The main corner case I can think of offhand is when a 'word' in the document is rendered with mixed formatting, like partially underlined, changes style or font in the middle.

ie: ThisVeryLongwordisusingmixedstyle

The above word would actually be about 5 different FormattedText elements in the paragraph. Each word tested/measured would likely fit, yet when they're all rendered together, they probably wouldn't wrap. I could be wrong, so I'll have to test this to know for sure. Perhaps the trick is to use this technique down in migradoc.

Thanks again!
-Jeff


Top
 Profile  
Reply with quote  
PostPosted: Sat Mar 05, 2011 2:18 am 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
So, I have developed a rather simple solution to this that folds pretty well into the Migradoc paragraph formatter.

The solution was a very simple overload of the XGraphic.MeasureString() method that takes two extra parameters. One for the max width, and an out parameter for the number of characters that actually fit. It returns the width of either the whole string of just the part that fits.

A little magic use of the "Tag" field and we can easily convey how much of the word can be formatted in the available space. The remainder of the word gets processed again and may wrap more times the same way.

-Jeff


Top
 Profile  
Reply with quote  
PostPosted: Sun Mar 13, 2011 9:11 am 
Offline

Joined: Sun Mar 13, 2011 8:31 am
Posts: 1
I have a similar problem. Could you please post your solution / overload method?

Thanks!

Ronny


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 16, 2011 11:11 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
My Solution to the MigraDoc forced word wrapping breaks down this way. (I should add that I know this can be optimized, but I haven't done that yet.)

Please let me know where I could clarify this explanation. I didn't proof read it very carefully yet

* A New XGraphics.MeasureString() overload that takes a Max Width parameter and an OUT parameter parameter that might be set to contains the number of characters that actually fit if the word was too long.
* A new value added to account for when WordWrap case is necessary: FormatResult.WordWrap
* Tweak ParagraphRenderer.FormatWord(string word) and ParagraphRenderer.MeasureString(string word) methods to use the new MeasureString method.
* Add handler for this WordWrap case handler in the ParagraphRenderer.Format()

Here's a rough explanation that may make more sense after looking at the code:

When Format() iterates over all the words (currentLeaf.Current) in the current paragraph, it calls FormatElement() which eventually calls FormatWord(), then FormatAsWord() on each word. We need to get in the middle here and break off the piece of the word that fits before it gets to FormatAsWord() when it's too long.

So, FormatWord() measures the string, ultimately using the new XGraphics method, and if it fits, FormatAsWord(width) does the formatting normally, otherwise, it returns the new state FormatResult.WordWrap instead.

When FormatResult.WordWrap is returned back to the "Format()" method, the new Case handler below picks out just the part of the word that fits, modifies the the currentLeaf to be the fitting part of the word and reTries the FormatWord() thing again (which obviously fits). Once that part of the word got processed, it stays in this WordWrap case handler and loops over the remaining part of the word, doing the same thing, inserting new Leafs into the paragraph elements for each part of the word that fits. This looping allows, say a Chinese document with no spaces or punctuation to continuously wrap into the available area.

The new XGraphics routine measures and returns the width of the part of the word that fits the available area. If it doesn't all fit, also return how many characters do fit. What is KEY here is that the currentLeaf.Current.Tag property is used to carry the number of fitting characters from the XGraphics measurer back to the Formatter in order for it to split off the fitting part of the word.

Here's the code: Only 2 files get modified...

In PDFsharp\code\PdfSharp\PdfSharp.Drawing\XGraphics.cs
Code:
/// <summary>
/// This is a special version of MeasureString that returns the width of the
/// portion of the string that fits.
/// </summary>
public XSize MeasureString(string text, XFont font, double desWidth, out int numFittingCharacters)
{
#if !GDI
   throw(new System.NotSupportedException);
#endif

    // Measure the Supplied String
    SizeF size = gfx.MeasureString(text, font.RealizeGdiFont());

    // It fits, so just return the size and indicate that everything fit.
    if ((size.Width < desWidth))
    {
        numFittingCharacters = text.Length;
        return XSize.FromSizeF(size);
    }

    float nWidth = size.Width;

    string tempString = text;
    string workString = "";

    for (numFittingCharacters = text.Length; (numFittingCharacters > 0) && (nWidth > desWidth); numFittingCharacters--)
    {
        // Start at the end of the string and
        // keep shortening until it fits.
        // ----------------------------------
        workString = tempString.Substring(0, numFittingCharacters);
       
        size = gfx.MeasureString(workString, font.RealizeGdiFont());
        nWidth = size.Width;
    }
    return XSize.FromSizeF(size);
}



The rest of the changes are In MigraDoc\code\MigraDoc.Rendering\MigraDoc.Rendering\ParagraphRenderer.cs

Add WordWrap to the existing FormatResult enum:
Code:
/// <summary>
/// Results that can occur when processing a paragraph element
/// during formatting.
/// </summary>
enum FormatResult
{
    /// <summary>
    /// Ignore the current element during formatting.
    /// </summary>
    Ignore,

    /// <summary>
    /// Continue with the next element within the same line.
    /// </summary>
    Continue,

    /// <summary>
    /// Start a new line from the current object on.
    /// </summary>
    NewLine,

    /// <summary>
    /// Break formatting and continue in a new area (e.g. a new page).
    /// </summary>
    NewArea,

    /// <summary>
    /// Only part of the word fit. Need to split the word at the specified
    /// location and insert the remaining word after this one.
    /// </summary>
    WordWrap
}



Find this method: internal override void Format(Area area, FormatInfo previousFormatInfo)
Find the switch(result) statement in this method and add a new handler for this FormatResult.WordWrap case.

Code:
case FormatResult.WordWrap:
    {
        if (string.IsNullOrEmpty(((Text)currentLeaf.Current).Content))
            Debug.WriteLine("empty string!");

        if (currentLeaf.Current.Tag != null && currentLeaf.Current.Tag is int)
        {
            int fittingCharacters = (int)currentLeaf.Current.Tag;
            currentLeaf.Current.Tag = null;

            Text txtObject = currentLeaf.Current as Text;
            Debug.WriteLine("CurrentWord Before: " + txtObject.Content);
            if (txtObject != null)
            {
                string fits = txtObject.Content.Substring(0, fittingCharacters);
                string remaining = txtObject.Content.Substring(fittingCharacters);
                FormattedText parent = DocumentRelations.GetParentOfType(currentLeaf.Current, typeof(FormattedText)) as FormattedText;
                if (parent != null)
                {
                    int currentWordIndex = parent.Elements.IndexOf(currentLeaf.Current);
                    parent.Elements.InsertObject(currentWordIndex + 1, new Text(remaining));
                    txtObject.Content = fits;
                }
            }
            Debug.WriteLine("CurrentWord After: " + txtObject.Content);
        }
    }
    break;



In the Same File as above, replace the FormatResult FormatWord(string word) method with this new version:
Code:
/// <summary>
/// Helper function for formatting word-like elements like text and fields.
/// </summary>
FormatResult FormatWord(string word)
{
    XUnit width = MeasureString(word);

    if (currentLeaf.Current.Tag != null)
        return FormatResult.WordWrap;

    if (width > 0)
        return FormatAsWord(width);
    else
        return FormatResult.Ignore;
}


Again, In the Same File as above, replace the XUnit MeasureString(string word) method with this:

Code:
XUnit MeasureString(string word)
{
    if (string.IsNullOrEmpty(word))
        return 0;

    int len = word.Length;
    int numFittingCharacters = 0;
    XUnit width=0;
    XFont xFont = CurrentFont;

    // Determine how many characters of this word will fit in the supplied area.
    try
    {
        if (formattingArea != null)
        {
            width = Gfx.MeasureString(word, xFont, formattingArea.Width.Point, out numFittingCharacters).Width;
            if (numFittingCharacters < len && numFittingCharacters > 0)
            {
                // Only set this when just a portion of the word fits
                currentLeaf.Current.Tag = numFittingCharacters;
            }
        }
        else
            width = Gfx.MeasureString(word, xFont, StringFormat).Width;
    }
    catch (Exception ex)
    {
        Debug.WriteLine("Null formatting area?" + ex.Message);
    }

    Font font = CurrentDomFont;

    if (font.Subscript || font.Superscript)
        width *= FontHandler.GetSubSuperScaling(xFont);

    return width;
}


That should pretty much do it.


Top
 Profile  
Reply with quote  
PostPosted: Sun Nov 03, 2013 10:27 pm 
Offline

Joined: Wed Dec 26, 2012 10:32 pm
Posts: 6
I run under the same issue, Jeff. Can you please post a working solution or just the DLLs because your code get some errors with references(other methods).

PDFun


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 162 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group