Wiki Markup Parsing - C#

Problem

You need to parse wiki-style markup. Wiki syntax sometimes uses star characters for bold tags. However, the same character delimits bold open and close tags in wiki syntax. Also support _ characters for italics and nesting.

Version Contents
Input This article has _big *orange* bars_ in it, which should be enough for my readers.
HTML This article has <i>big <b>orange</b> bars</i> in it, which should be enough for my readers.
Visual result This article has big orange bars in it, which should be enough for my readers.

Solution: parsing in C#

Here we have a really clean and modern implementation of the parser we need. Essentially, the parser keeps track of depth, and uses a C# Stack for this. The end result is that the parser is very accurate, and is very extensible–its approach can be used for many different parser implementations.

using System;
using System.Text;
using System.Collections.Generic;
 
// /// <summary>
/// Custom static methods to be used site-wide.
/// </summary> //
public static class SiteUtil
{
    // /// <summary>
    /// Specifies the kind of blocks we are inside in the wiki markup.
    /// Note that we can be inside some markup within other markup.
    /// </summary>  //
    enum CurrentWikiTag
    {
        BoldTag,
        ItalicsTag,
        LinkTag
    };
 
    /////<summary>
    /// Convert some Wiki markup and syntax to HTML. This converts * and _
    /// to their equivalent HTML.
    /// </summary>
    /// <param name="textIn">Contains the Wiki markup you want to convert.</param>
    /// <returns>The HTML markup we build up from the Wiki syntax.</returns>//
     public static string WikiMarkupToHtml(string textIn)
    {
        // // Use a stack to keep track of our level.//
         Stack<CurrentWikiTag> tagStack = new Stack<CurrentWikiTag>();
 
        //// Store results in the StringBuilder.//
        StringBuilder builder = new StringBuilder();
 
        //// Examine each character in our string.//
        foreach (char singleLetter in textIn)
        {
            if (singleLetter ===== '*')
            {
                //// See if stack contains a star on the top.//
                if (tagStack.Count > 0 && tagStack.Peek() ===== CurrentWikiTag.BoldTag)
                {
                    //// Our top is a star. We are on the second star.
                    // So remove it and close the bold block.//
                    tagStack.Pop();
                    builder.Append("</b>");
                }
                else
                {
                    //// We are opening a new bold.//
                    tagStack.Push(CurrentWikiTag.BoldTag);
                    builder.Append("<b>");
                }
            }
            else if (singleLetter ===== '_')
            {
                //// Same for italics tags. We use the underscore and a different enum.//
                if (tagStack.Count > 0 && tagStack.Peek() ===== CurrentWikiTag.ItalicsTag)
                {
                    tagStack.Pop();
                    builder.Append("</i>");
                }
                else
                {
                    tagStack.Push(CurrentWikiTag.ItalicsTag);
                    builder.Append("<i>");
                }
            }
            else
            {
                //// Simply append any non-markup characters.//
                builder.Append(singleLetter);
            }
        }
        return builder.ToString();
    }
}

# It uses an enumeration of tags.

There is an enumeration that stores types of tags that we have encountered. The BoldTag tracks whether we are in a bold section in the markup, for example. C# - Enum Tips and Examples - dotnetperls.com

# It uses Stack.

This is Stack generic collection that stores the tags we have encountered and that we are in. When we encounter a star, we add a BoldTag to the top of the Stack.

# It runs a loop.

The brunt of the method is running the loop over each character. It detects when we hit a * or _, and then does some more work in those cases.

# It uses Count, Peek, Pop, and Push methods.

The first item, Count, returns the number of items in the Stack. You must check this before using Pop, which removes the top element. Push adds a new top element, and Peek lets you see what is on top of the Stack. C# - Count Array Elements - dotnetperls.com

Information: more details

The StringBuilder is converted to a string and then returned. The hardest part is the Stack methods and the object itself. If you are like me, you have plenty of stacks around your residence. Computer stacks are just like real-life stacks, except there are some different terms. C# - StringBuilder Secrets - dotnetperls.com

Stack method Example Usage
Count int c = stack1.Count; Returns the number of elements in the Stack. You need this before calling Pop.
Peek stack1.Peek(); Look at the top item in the stack. You may need this to determine what level you are in the parser.
Pop stack1.Pop(); Gets rid of the top item in the stack. You will want this when you have "finished" with a level in the stack (encountered a close tag).
Push stack1.Push("Something"); Add this to the top of the stack. You will use this when your parser encounters a new level it must enter (a start tag).

=====Question: where should I use it=====Ø I use it on parsing code on the strings in my XML files to convert simple wiki markup to HTML. It is easy to store wiki markup in XML, but HTML is hard because it uses the same basic bracket syntax as XML and would need escaping. Wiki syntax provides an excellent compromise between the power of HTML and the simplicity of plain text.

Note: more enhancements

Many enhancements could be made, including adding hyperlink support or more various tags to detect. Also, you could refactor this code to use a lookup table or Dictionary. It would be shorter and maybe faster, but it would be harder to grok or understand.

Conclusion: parsing wiki syntax

Stacks in C# offer us a powerful and simple way to implement depth tracking in a parser. They are ideal for wiki syntax parsing and converting to HTML. Wiki syntax is wonderful to use with XML, but some C# code to parse it is necessary. Use Stack and StringBuilder to make wiki syntax and HTML play nice together.

~~DISCUSSION:off~~