A bot has 2 types of memory at it’s disposal: short and long term memory, but with some tricks, a mid term memory function can also be simulated.

Short term memory

Or also called ‘volatile’ as it’s content is lost in time, can be accessed through the use of variables in the patterns.  Input patterns support 2 types of variables: regular variables, which can collect any type of content and thesaurus variables, which can only collect words that are equal to or are children of the thesaurus item referenced by the variable (in other words, they are filters). Asset variables, a third type of variable, is theoretically also possible, but not yet implemented. These are also filters, like thesaurus variables, except that they filter on concrete (asset) data instead of abstract (thesaurus) data.

The basic usage of this short term memory is simple: to provide a mechanism for collecting values of variable parts in the input patterns so that these values can later be used in the output and do-patterns for providing a response and feeding the long term memory.

Regular variables

As already mentioned, regular variables can’t filter on the values that they collect, but they are optionally able to limit the number of words that they collect, either as a specific number or a range. Here are some short input-pattern examples:

I’m called $var[.]

I’m called $var:1[.]

I’m called $var:1-3[.]

I’m called $var:4:CollectSpaces[.]

Copy $from:collectSpaces to $to:collectspaces

The $var constitutes the variable (‘$’ is the variable operator, followed by the name). Every word (except spaces, if ‘CollectSpaces’ is not specified) is collected by this variable until the pattern matcher finds a word in the input that follows the variable in the pattern or until the range is fully used. This variable can then be used in the output and do-patterns of the same rule (and also from other rules, if you are certain that the input-pattern is part of the result set, which can be checked upon, more on that  later).

Thesaurus variables

Thesaurus variables are a special type of input variable: they provide a mechanism for filtering possible input to a sub-branch of the thesaurus. If the input can’t be found in that branch, the pattern wont be activated. So, this is a filtering mechanism. The actual value that was found, can be accessed like any regular variable, through it’s name. There is no mechanism for providing length or range values though since they have no meaning here: it’s either an exact match with a thesaurus node or it isn’t. Here are some examples:

I’m called ^var:noun [.]

I’m called ^var:noun.name [.]

I’m called ^var:noun.(first name) [.]

I’m ^var:number years old[.]

A thesaurus variable always starts with  a ‘^’ followed by it’s name. The ‘:’ indicates the start of the thesaurus path and should always be followed with a POS (part of speech). These are the supported POS values:

noun, verb, adjective (or adj), adverb (or adv), article (or art), pronoun (or pron), conjunction (or conj), interjection (or inter), preposition (or prep), number, integer (or int), double.

You can stop there, which would indicate that you want any word of the specified part of speech. You can also continue the path with a ‘.’ followed by a text value (put into brackets if it’s multiple words).  This allows you to further refine the thesaurus path. Note that you don’t need to start at the root of the thesaurus that you are using, just as long as you are comfortable that it will point to a unique word within the tree (otherwise you can have multiple matches,… which might also be desirable). Note that the last 3 POS values (number, int and double) can’t have any further path specifiers, they have to stop at the POS value.

Collecting multiple values

It’s possible to use the same variable name multiple times in the same input pattern. This allows you to collect a list of values for the same variable. Thesaurus and regular variables can be intermixed. Here are some examples:

{$name ,} and ^name:noun.name are here  //catches something like: Tom, Flint and Warner are here

Note that there is a difference when a regular variable collects multiple words at a single location compared to when it collects single words at multiple locations in the pattern. When a single location collects multiple words, this group of words is combined into a compound word (as in ‘baby gear’), but when words are collected at multiple locations, a list is created. This list can later-on (in the long term memory) be labeled as AND, OR or LIST (unspecified).

Using short term memory

Up until now, we’ve only been talking about how to collect the values for the short term memory.  Of course, there’s no point in doing that unless you can actually do something with these values. That’s done in the output and do-patterns. As already mentioned, you access the content through the variable names. Here are some output-pattern examples:

Ok, I see, your name is $var\.

So you are $var, nice to meet you!

So, you can $verb:Infinitive, can you?

I see, $name:interleaf(“\, “, ” and “)

At it’s most basic form you specify the ‘$’ operator followed by the name of the variable that you want to render. Note that you should always use the ‘$’ operator while rendering, even if the value was collected using a thesaurus variable ( ^ ). This is because the ^ operator is used to access the long-term, abstract memory (the thesaurus data itself).

Rendering the value as it was collected, is useful but often we want to do a little more, sometimes we need to do some kind of change or transformation to the values, like conjugating a verb, get the plural of a noun or find the attribute for the value (see later),…  This is done through functions that you define in the path. A function starts with a ‘:’ followed by the name of the function (a list of all the available functions will come shortly) and optionally a list of arguments for the function, specified between brackets and separated by a ‘,’. Note: if you use the ‘,’ sign as an argument value, it must always be escaped with a \ Also, if you need to preserve spaces, the argument should be placed between brackets (as in the last example).

Long term memory

The second major type of memory that’s available to the bot is used to store and retrieve values so that they can cross the boundary of the single-shot input/response system, in other words: long term memory. Currently, there are 2 types: a thesaurus structure for storing abstract information and assets which maintain concrete knowledge.  Typically, you use this data to compare against short-term variables, render previously stored data or store newly acquired knowledge.

The thesaurus

As already mentioned, thesaurus variables are used in the input-patterns so that the valid content for a variable can be filtered.  When the ‘^’ operator is used in output, conditional or do patterns however, it behaves a little bit different: it becomes a value generator instead of collector.  Consider the following output patterns:

We are in ^noun.month[$time:month-1]

I like ^noun.food.(Italian food):random

I ^verb.be:conjugate(#bot) trying something complicated   //render: I am trying something complicate

As you can see, a thesaurus output-path contains a mix of statics and functions which eventually result into 0, 1 or 2 values. Because they render values and don’t collect it, no name is required. You can use the [] operator to select a child at a specific index position, like in the first example, which is used the generate the name of the month instead of a number. Note that the index is 0 based. In case that a static path item contains multiple words (like ‘Italian food’), use () brackets to group them. Also, if there are no values found for the path, any spaces that follow it in the output are stripped.

You can also store new data in the thesaurus. This is done in the calculation or do-sections. There are basically 2 operations that you can do at 2 different levels: you can add or remove values either as thesaurus children or as conjugations/references. To explain the difference between children and conjugations or references, take the following examples and how they are stored:

A house is a building ^noun.building += house
The plural of bird is birds ^noun.bird->plural = birds
The opposite of good is bad ^adj.good->opposite = bad
seagulls are a type of the singular of birds ^noun.birds->singular += seagull
The superlative of the opposite of good is worst ^adj.good->opposite->superlative = worst

In the first example, we are declaring a child relationship: house is a building. If you have done any coding before, the syntax might be vaguely familiar: the left part of the statement contains the thesaurus path, the ‘+=’ operator to indicate that we want to create an ‘is child’ relationship, and on the right-side comes the value that needs to be stored. This could be a variable reference, an asset, another thesaurus path,….

The second and third examples look identical and for all intent and purpose, they are. The only difference is on the inside: in the first example ‘plural’ is a known conjugation form, ‘opposite’ is not. The statement used for storing this information, is a little bit different. First of, the thesaurus path ends with a ‘->’ followed by the name of the relationship that you would like to edit. Next, we use the ‘=’ assign operator instead of ‘+=’ to indicate that we want to change the relationship value.

The 2 last examples demonstrate what happens when you use the –> operator together with the += assignment or when you use multiple –> operators. When combining += with –>, you will first calculate the full result of the left side.  So in our example, we first take the singular value of ‘birds’, then we add a child to this result, which is ‘bird’. A similar thing happens when you use multiple –> operators: the value is calculated.

Except for the POS value at the start of the path, every other item in a thesaurus path can be a static, a variable reference, an asset path or another thesaurus path. This allows for tremendous flexibility in the way that you store data. We could generalize some of the previous statements like this:

^noun.building += $value

^noun.($singular)->plural = $value

^adj.good->($relationship) = $value

Removing values from the thesaurus is done using the ‘-=’ operator or by assigning to the ‘null’ value. Like with storing, all parts can be static or variable. This is probably best explained with some examples:

A house is not a building ^noun.building -= house
Bird has no plural ^noun.bird->plural = null
A $value is not a $node [.] ^noun.($node) -= $value

Assets

As already mentioned, assets could theoretically also be used in the input, but that’s not yet supported. If someone has a need for this, let me know, it’s not that tremendously difficult to add, it just creates a little more overhead.

Anyway, like thesaurus paths, asset paths can be used in output, do and conditional patterns. They are declared in much the same way as thesaurus paths by using the ‘.’ (dot) or ‘:’ (function) operators, except that they start with a # and ‘–>’ (links) are not supported.  For the thesaurus path, the ‘.’ (dot) operator selected a child node, for assets, this selects an attribute value. Here are a few output examples:

My name is #bot.name

your children are called #user.child.name:interleaf(“\, “, “ and “)

a book is made of $(#(^noun.book).component.name):interleaf(“\, “, “ and “)

Bot and User are hardcoded assets and refer to me and you respectively, from the bot’s point of view. In the third example, the first value in the asset path, is actually a thesaurus path. This results in concrete information about abstract data (a book is made of paper, ink, glue,…).

Also in the last example, the entire asset is the first value in a normal variable path, because an asset path will always calculate it’s result based on 1 value, if the previous path item resulted in multiple values (like ‘component), the next part of the path is calculated as if there was only 1 result (internally, a split is done), and only at the end of the path, all results are joined. This doesn’t work for ‘:interleaf’, it expects a list of values to combine. A variable path can do this, hence this construct.

To store asset data, the = (assign), += (assign add), != (assign not) and !+= (assign add not) operators are used. Removing data is done with the –= (assign remove) operator.   Take the following examples (input statement to the left, how to store/remove it to the right):

My eyes are blue #user.eye.color = blue
I have a dog. #user += dog
My dog’s name is not doggy #user.dog.name != doggy
I don’t have a tiger #user !+= tiger
I have big blue eyes #user.eye.color:extra.size = big
My eyes are also brown #user.eye.color &= brown
my eyes are brown or blue #user.eye.color = brown
#user.eye.color |= blue
my eyes are brown, blue #user.eye.color = blue
#user.eye.color ;= blue
Remove my dog #user –= dog
remove my eye color #user.eye –= color

When you use the ‘=’ (assign) operator, you declare an ‘is’ relationship: ‘color’ becomes the attribute, ‘blue’ the value. Since ‘blue’ is not an asset, but just a word, we have a terminator: blue can’t have any more children.  But, there is a way to cross this border, by using the ‘:extra’ function, as in the 5th example.
If instead, you want to declare that something is not y, you can use the not-assign operator (!=). This allows you to still store the information that something is not. Be careful though, there is a thin line between being and not being, if you don’t check on this in the conditions, you might say that something is, while it isn’t (sounds familiar?).

The ‘+=’ or assign-add operator is used to create ‘has’ relationships, like in the second example. The major difference with the first one is that ‘dog’ becomes the attribute and the value becomes a new asset that will represent the dog. There is also the not version:  !+= which is used to indicate that the asset doesn’t have something.

If you want to create a list of values, you can use either the ‘;=’, ‘|=’ or ‘&=’ operators. The first one creates (or adds to) a generic list, the second is for OR lists and the last for AND lists. The generic list operator can add to any type of list without modifying it’s type. The |= and &= operators will change a generic list to OR and AND respectively. When you try to add an item to an OR list with an AND operator, you create a new list object that contains the original OR list and the newly added item. the same goes for an OR operator with an AND list.

Finally, you can also actually remove an attribute value. This is done with the ‘-=’ operator. The  right-side should be the name of the attribute that you want to remove. The value that is removed get’s cleaned up automatically, so if the value was another asset which isn’t referenced anymore after the remove, the entire asset will be destroyed. (Removing items from a list has to be done with the ‘:Remove’ function.)

As already mentioned, every asset value can always use the ‘:extra’ function to get to a sub-asset. There are a few other functions worth mentioning which allow you to expand the dataset that the asset can store. These are used to declare things like when, where, why, how, amount,… Functions are:

:why provides access to the ‘reason’ path #user.dog:why = “likes dogs”
:when provides access to the ‘time’ path #user.dog:when = “10 years ago”
:where provides access to the ‘location’ path #user:where.preposition = in
#user:where.object = chair
or #user:where = “in the chair”
:how provides access to the ‘method’ path #user.dog:how = received from some friends who had a bit of an accident
:amount Allows you to specify that the same value should be counted multiple times. When the value is an asset, it indicates how many identical assets should be counted. #User.hand:count = 2
:who provides access to the ‘persons’ path #user.see:who = man  //user sees a man
:what provides access to the ‘objects’ path #user.eat:what = food  //user eats food
:then provides access to the causality path #user.eat:then.who = #user
#user.eat:then.attribute = state
#user.eat:then.value != hungry
or #user.eat:then = “I’m not hungry”

Mid term memory

Some functions, like the ‘:attribute’ function (which is able to extract, for instance,  ‘color’ from ‘blue’, or ‘name’ from ‘Jan), make use of context, if it is declared. This context is usually a list of asset paths that point to some memory region of the bot. The idea is that, together with a response, you also generate the meaning of what was said and store this information in the asset that you declared as context. If you refresh this context on each run, you effectively have simulated mid term memory.

The basic setup for using mid term memory consist out of:

  • a global context declaration so that the system knows where to go look for contextual info.
  • some global do-after-each-statement patterns. These are responsible for erasing the previously collected data and possibly creating an echo.
  • some global do-on-startup patterns which will remove any data from the previous run.
  • do-patterns on each rule to actually collect the knowledge about what is being said.

Context:

#bot.memory.subject

#bot.memory.attribute

#bot.memory.object

#bot.PrevMem.subject

#bot.PrevMem.attribute

#bot.PrevMem.object

 

Do after output:

#bot.prevmem = #bot.memory

#bot –= memory

 

do on startup:

#bot –= memory

#bot –= prevmem

 

on pattern (example pattern = I like $value [.])

#bot.memory.subject = #user

#bot.memory.attribute = like

#bot.memory.object = $value

 

Mid term memory also becomes very useful once you start working with recursive sub rules/topics. This technique allows you to rebuild the extracted data.

Patterns

All the different types of patterns (input, output, conditional) could also be considered as a form of long term memory. Internally, they are stored in exactly the same manner as all the other data. As such, they can also be manipulated in a similar manner as the other long term data. Although, at the time of writing, there is still limited support for this.  More on that to come.

 

It’s possible to create your own characters for the chatbot designer app. The process consists out of 2 parts: first you need to create a set of images, once that’s done, you need to make a ccs file that combines all the images.

Images

The animations and visemes are stored as a series of images which are displayed in rapid succession, much like how old school film works. Before you can display these images though, they need to be created. Now, in the olden days, they used to have big chunky camera’s for that. Things have changed a bit since. We can use tools like DAZ3d, Poser, Blender,…

Content

What you put in the images is entirely up to you. There aren’t many limits with respect to content in the sense that there aren’t any ‘expected’ parts. That said, since we are dealing with chatbots, I’d try to put in something resembling a mouth somewhere, just to get some lip-syncing working. Most 3D tools these days, come with some way to manipulate mouth positions. The designer uses 21 different images for lip-sync (+ 1 for silence, which is actually the background image). Most 3D tools only provide morphs for 16 mouth positions. The remaining 5 visemes can be created by reusing other images or by creating your own, if the 3d app allows it. Here’s a good visual overview of all the visemes.

Animations like blinking eyes or moving hair also need to be exported from your 3D package as a series of images. Luckily most 3d software provides a feature to just that. Also, some animations can use the same image multiple times, it’s always best to reuse the same image in those cases, to save memory. For instance, an eye blink can be done with 2 or 3 images: eye fully open is provided by the background, half closed and fully closed.

File formats

Currently, the following file formats are supported:

BMP bitmap
GIF Graphics interchange format
JPEG Joint photographics experts group
PNG Portable network graphics
TIFF Tagged Image file format

It should also be possible to use xaml files (to create flash typed characters) , but this is not yet tested and will most likely not yet work. Give me a shout if you would like to do this.

Processing

Once you have created all the images, it’s best to process them a little further. At this stage, every image always uses up the full viewable area of the character. That’s ok, if all you want to do is lip-sync and possibly a little animation during idle times. But in this setup, your character can’t blink while speaking. To accomplish this, we need to cut out only the valid parts of each image and make the rest transparent. This way, the system can overlay multiple images and create the illusion of different parts moving at the same time.
Another added advantage of this process is that it usually (depending on the file type) shrinks the file. PNG files, when edited with the correct tools, will use less space if there is a bigger transparent area.

There are multiple ways you can crop the images. You can use the eraser tool found in most bitmap editing tools like paint.net.  Though, I’ve noticed that at least some versions of Photoshop don’t shrink png files when using the eraser, so that’s maybe not the best tool for the job. Personally, I used a little home-made app to do the Capturejob. You can download it from here.

It’s very simple to use. As you can see on the screenshot, there are 2 big buttons to the left which contain the source images. Press on each button to load the images. The top image should contain the root, in the bottom, you put the images that need to be cropped. To the right, on top, you see the result for the currently selected target image (seen in the bottom button). With the slider, you can scroll through the loaded images.
On the next line, there is a checkbox and a slider. These control the calculation process: do you want the parts of the image that are different or the same, and with the slider you select the tolerance level used to calculate the difference. When you put the slider fully to the left, there is 0 tolerance, meaning that there can be no difference between the pixel in the source image and that of the image that needs cropping. Put the slider fully to the right and the difference has to be very big.
Finally, with the button labeled ‘save’, you can save the processed images to a directory that you select (the original files are kept).

Depending on the way that the images are rendered in the 3D package, the file format and the quality of the rendering engine, there can be a bigger or smaller difference between parts of the images that should be the same. That’s why you normally have to play a little with the tolerance level. The lower, the better. Sometimes it’s better to keep a lower tolerance level and manually remove the remaining dots with an eraser in a bitmap editing package.

CCS file

When you’re ready with your images, it’s time to put them all together, a bit like a collage. Unfortunately, there currently isn’t yet a ready-made tool for this, so you are gonna have to do a little xml writing. Fortunately, there’s a lot of copy paste involved. The outer xml tag, the start of the file, is a <Character> tag, like so:

<?xml version="1.0" encoding="utf-8"?>
<Character xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</Character>

Now, Let’s go over each section in order of the file content:

Character Info

The first section, and the easiest, is the character information. This is how it looks like:

  <CharacterInfo>
    <Name>Mika</Name>
    <Author>Ady Di Pierro</Author>
    <Copyright>Copyright 2011 Ady Di Pierro</Copyright>
    <License />
    <AuthorWebsite>http://www.laticisimagery.com.au/</AuthorWebsite>
    <CreationDate>2011-08-07T14:43:26.602+02:00</CreationDate>
    <LastUpdateDate>2011-08-07T14:43:26.602+02:00</LastUpdateDate>
    <Rating>
      <Rating>Unknown</Rating>
      <Sexual>false</Sexual>
      <Violence>false</Violence>
      <Other>false</Other>
      <Description />
    </Rating>
    <OnlineOptions>
      <OnlineCharacterBaseUrl />
      <PreferredWidth>150</PreferredWidth>
      <PreferredHeight>150</PreferredHeight>
    </OnlineOptions>
  </CharacterInfo>

Just copy the above and paste it into your xml file, just under <Character>. This part only contains reference information about the character: what’s it’s name, who designed it, what’s the licence, possibly a website,… All of the info in this section is optional, you can leave the tags empty, but it’s best to include them. This information, by the way, is displayed in the popup window on the ‘chatbot window’ (the button in the lower-left corner on the images window).

Online options are currently still skipped but will probably be used in future, online versions.

Background

<Background>
   <ImageResource>images\Kima00.png</ImageResource>
   <ImageResource>images\KimaEars.png</ImageResource>

</Background>

The background section defines the images that should be used as background (obviously). 2 things worth mentioning: a background is defined as an ImageResource. This is used often throughout the file. Whenever you want to reference an image file, you use this element. The text part of the xml element defines the relative path to the image (that is relative to the CCS file).

Also, and that’s perhaps the weirdest part, you can declare multiple backgrounds. Every image in this list will be displayed as a background, unless an animation turned one of the images off. And that’s the main usage of having multiple backgrounds: so that animations can turn part of the background off while playing. A good example are Mika’s ears.  They are drawn in a separate background image, so that the ‘flip-ears’ animation, can hide the background-ears while it’s playing. We do this cause some images in the animation sequence are smaller then the ears in rest, which would otherwise give ugly results with half an ear overlapped and the rest still visible. I’m certain there are plenty of other cool tricks to done with this feature. Hiding a background image is explained in the ‘Animations’ section.

Animations

The ‘Animations’ section is one of the bigger parts of the file. This is where you declare all the animation sequences available to the character for emotional expressions and idle times. The background animations (which run all the time, not jus at idle times), are declared somewhere else. Anyway, here’s how an animation definition looks like:

  <Animations> 
    <Animation>
      <AnimationFrames>
        <AnimationFrame>
          <Duration>5</Duration>
          <ImageResource>images\other\Kima Ears (01).png</ImageResource>
          <VisemeGroupName/>
        </AnimationFrame>
          <Duration>5</Duration>
          <ImageResource>images\other\Kima Ears (01).png</ImageResource>
          <VisemeGroupName />
        </AnimationFrame>
      </AnimationFrames>
      <EnableFrameSpeaking>true</EnableFrameSpeaking>
      <HoldLastFrameForSpeak>false</HoldLastFrameForSpeak>
      <FirstFrameUnderlay>false</FirstFrameUnderlay>
      <BackgroundSuppress>KimaEars.png</BackgroundSuppress>
      <Name>earsmove1</Name>
      <ZIndex>0</ZIndex>
    </Animation>
  </Animations>

The section starts with an <Animations> element, which can contain 0, 1 or more <Animation> elements. Each animation defines a series of ‘frames’, where a frame represents a single image in the animation. A frame contains a Duration section, an ImageResource and a VisemeGroupName. The last one, you can forget about, it’s there because of backward compatibility reasons with the original CCS file format and should always be empty (for now). We’ve already been over the ImageResource element and the duration simply declares how long that the image should be displayed. This is expressed in milliseconds.

Underneath the frames, you need to declare some more info about the animation. Besides the name of the animation, which is used to start the animation in a rule’s output patterns, you have:

EnableFrameSpeaking When true, speech is allowed during the animation. When false, speech will wait until the animation is done.
HoldLastFrameForSpeak Is currently not used and can be true or false.
FirstFrameUnderlay When true, the complete background is hidden and the first frame of the animation is used as background
BackgroundSuppress This element can be declared multiple times. Each element contains the name of a background image that needs to be hidden while the animation is running.
ZIndex Optionally determines the ZIndex at which the animation is displayed. This is useful to move parts in front or behind other parts. The background always has a ZIndex of 0.

It might seem a tremendous task at first to declare every frame in an animation like this. But it turns out that most animations can be build using between 3 and 7 images, sometimes reusing images to build sequences of about 10-20 frames. So all in all, this is still manageable.

VisemeGroups

The next section declares the images used for lip-syncing. The basic structure looks like the following xml snippet (note that it doesn’t contain an entry for all 22 viseme images, just the first 2).

  <VisemeGroups>
    <VisemeGroup>
      <Name>Default</Name>
      <ZIndex>1</ZIndex>
      <VisemeImages>
        <VisemeImage>
          <VisemeIndex>0</VisemeIndex>
          <ImageResource>images\Kima00.png</ImageResource>
        </VisemeImage>
        <VisemeImage>
          <VisemeIndex>1</VisemeIndex>
          <ImageResource>images\Visemes\Kima03 EH.png</ImageResource>
        </VisemeImage>
      </VisemeImages>
    </VisemeGroup>
  </VisemeGroups>

The file format already allows for multiple viseme groups to be declared, although at the time of writing, only the first one is used (and supported). Multiple viseme groups could be useful for moving heads: a viseme group for each head position.  As such, the ‘name’ element for each group would be used to reference a group. But, as already mentioned, this is something for the future. 

The ZIndex element defines the Z-order that should be applied to the viseme images. This allows you to manipulate the order of the images. For instance, you could use this to move a part of the background image on top of the viseme images.

Next comes the ‘VisemeImages’ group, with a ‘VisemeImage’ for each mouth position + optionally 1 extra image for the silence position.  Each VisemeImage contains an ImageResource (as described above) and an Index (VisemeIndex), which determines the letter that the image represents (so the order in which the VisemeImages are declared, is irrelevant).

The silence is primarily for backward compatibility with verbot characters (which don’t have a separate background section). If you have a ‘Background’ section, the viseme image at index 0, will be skipped, otherwise it’s used as the background. That’s why the ‘background’ section has to be declared before the VisemeGroups.

IdleLevels

Idle time is the time when a bot doesn’t have anything to say or any emotion to show. In other words, nothing’s happening. In order to create the illusion of being alive during this period, you can use idle levels to start animations (that were declared in the ‘Animations’ section) when the bot is idle. Here’s the definition:

  <IdleLevels>
    <IdleLevel>
      <MinStartDelay>5</MinStartDelay>
      <MaxStartDelay>15</MaxStartDelay>
      <MinDuration>5</MinDuration>
      <MaxDuration>15</MaxDuration>
      <MinInterval>2</MinInterval>
      <MaxInterval>8</MaxInterval>
      <AnimationNames>
        <AnimationName>earsmove1</AnimationName>
        <AnimationName>eyes squint</AnimationName> 
      </AnimationNames>
    </IdleLevel>
  </IdleLevels>

Like most other sections, you can again declare multiple IdleLevels. In this example, we only have 1 though. The idea behind multiple IdleLevels is this: when the idle time starts, the first idle level is activated, but when the duration of the level has ended, the next one is activated until the last one is reached, which remains running until some activity happens. This way, you can have a bot act differently after x amount of idle time, progressively.

Each idle level contains 4 bits of information: 3 time ranges and a list of animation-names that the idle level can use. Each range has a Min and Max component, indicating the lower and upper part of the range. The different times are used for:

Delay A value is selected at random from this range to delay the start of the animation (which is otherwise immediately after the last output)
Duration A value is selected from this range to determine the duration of the idle level. This is only used if the level isn’t the last in the list.
Interval Each time an animation finishes, a value is selected from this range to determine how long the system should wait before it starts another animation from the list (this is selected at random).

BackgroundAnimations

Idle levels are very useful to create a sense of liveliness, but they can be expanded upon. Sometimes, it’s also useful to have animations run all the time in the background, even while speaking. A good example of this could be breathing or eye blinking. For this purpose, there is the final section, called ‘BackgroundAnimations’, which contains a series of special animation definitions, all of which will run all the time, in a loop. Here’s the definition:

  <BackgroundAnimations>
    <Animation>
      <AnimationFrames>
        <AnimationFrame>
          <Duration>10</Duration>
          <ImageResource>images\other\Kima Nose Flare (01).png</ImageResource>
          <VisemeGroupName/>
        </AnimationFrame>
        <AnimationFrame>
          <Duration>10</Duration>
          <ImageResource>images\other\Kima Nose Flare (02).png</ImageResource>
          <VisemeGroupName />
        </AnimationFrame>
      </AnimationFrames>
      <Name>nosebreath</Name>
      <LoopStyle>VarTimer</LoopStyle>
      <MinStartDelay>1</MinStartDelay>
      <MaxStartDelay>3</MaxStartDelay>
      <ZIndex>2</ZIndex>
    </Animation>   
  </BackgroundAnimations>

Unlike idle levels, the animation is declared inline this time.  That’s because there is a small difference in declaration between this type and idle/emotion animations. Another reason is to make certain that background animations can’t be used as emotions (they run all the time anyway, no need to start them separately).

The way that the individual frames are declared is the same as with regular animations: a list of ‘AnimationFrame’ objects that define the image, the duration and an optional viseme group. The difference is in the extra options: there is an extra ‘LoopStyle’ and ‘StartDelay’ range. ZIndex is also the same as in animations: It determines the Z-Order at which the animation is displayed, the higher the number, the more to the top it will be.

The ‘StartDelay’ range is used to build in a small idle time between each animation loop. Each time that the animation needs to be started, a value is selected at random from within the range. That’s the delay which will be used. Not every loop-style makes use  of this delay, some do, others skip it.

Finally, the LoopStyle element can have the following values:

None The least useful: no looping at all.
Jojo At each start, the end frame is selected at random from the entire list of frames. When this end is reached, the animation is played in reverse until the first frame is reached and the loop starts again.  Each start is delayed by x amount of time, where x is a random number picked from the StartDelay range.
FrontToBack The images is played continuously, front to back without pauses.
VarTimer The images is played front to back in loop, but each time with a delay that is selected from the StartDelay range.

And that’s it. Once you’ve got all these sections laid down, you’ve got yourself a character. All that remains, is to copy the files to the {my documents}\NND\Characters dir and restart the chatbot designer. Enjoy.

 

The fingers are still smoking and the joints are glowing red-hot but I got there. Time for a first beta release! There are still a few things here and there, but hey, it’s a beta, right.

So, with no further ado, here’s the basic version and also the pro (the latter will remain active until the end of the year). I will most likely also release a demo of the designer version – which allows full debugging and extending/replacing of the network – in a couple of days, after I’ve cleaned up some more demo projects.
I might also release a 32 and 64 bit specific version in order to support voices that were compiled for a specific platform, other than the one you are running on. The currently released versions will run at 32 or 64 bit, depending on what system you have.

Please, if you experience any ‘hanging’ situations (no reply is coming and in the lower-right corner, the second nr – with tooltip ‘The total nr of still active processors’ -  never goes back to 0), let me know. Different processors can give different results and I don’t have the resources to spend on different hardware setups, so I’m expecting some ‘issues’ in this area. Your help is much appreciated.

a word about the included demos

  • Name & age: contains some patterns that demonstrate how to access the ‘name’ and ‘age’ settings which can be supplied in the ‘chatbot properties’ window.
  • SysMan: demonstrates how to access external .net functions. It provides access to most of the File, Path and Directory functions.
  • Thesaurus operations: shows how to manipulate (add, remove,…) thesaurus data  using do-patterns.
  • Asset operations: shows how you can manipulate the memory.
  • Complete the sequence: from the previous demo, shows how to perform the ‘complete the list’ trick.
  • All: this is the start of a common, reusable library of patterns, most of which don’t even have output, but only manipulate the memory. I’m hoping that this can become the basis of a new approach to pattern matching.

Also, none of the demos includes thesaurus data. this can be imported from this thesaurus file. I’ve done this cause this data is still ‘under heavy construction’. The thesaurus currently contains a little more than 2000 words (not much), but can easily be extended using different import methods.

Thesaurus variables, sub-topics and InvertedWho

Next week, I’ll probably be spending some time putting together the documentation. In the mean time, there are a few tricks which were used in the common library that I’d like to mention.

Firstly, the whole thing is full of statements like:
^subj:noun.name
^subj:adj.possesive
These are ‘thesaurus’ variables. The first word (in this case ‘subj’) is the name of the variable (so you can access the values in the output or do patterns). The other words describe a path into the thesaurus. Any child of this path will give a match.
Thesaurus variables are very powerful, but also more taxing to the system compared to statics (though, usually less than regular variables). When you have lots of patterns that use thesaurus variables, like the common lib, it’s best not to let the system auto resolve synonyms. More on that later.

A second feature:  sub-topics. This packs a serious punch, in all ways you look at it. In short, it’s possible to reference a single rule or an entire group of rules (a topic) from within another input pattern, like this:
~subject (am|'m|is|'s|are|be) ~object
This single pattern can capture anything from: I’m Jan or my name is Jan over My aunt’s name is Rita to The big tree is a little bigger and anything in between. Even more interesting, this technique allows you to do something that I call ‘topic-inheritance’, which basically means you can extend or overwrite the behavior of patterns. I plan to use this technique to build a ‘Watson’ like chatbot on top of this common lib.

Finally, ‘InvertedWho’ simply refers to how the memory is used in the common lib. The basic topics like ‘subject’, ‘object’, ‘location’, ‘time’ and ‘numbers’ don’t generate any output, but store the data in 2 memory streams: the first forms the ‘inverted statement. So ‘I’ becomes ‘you’ and ‘mine’ ‘yours’ (guess for what that’s used). The second stream tries to store the actual meaning (which is a collection of references to other memory addresses or words grouped in an organized and structured way). These things are done in the ‘do patterns’. These are hidden by default in the editor, but can be made visible on each pattern (or shft + ctrl + d to expand/collapse them all at the same time).

There are many, many more details and cool features to talk about. Stay tuned.

 

A final pre-release video on how you can call .net functions from within your chatbot. The idea behind this feature is to allow you to extend your chatbot with custom features. This will only be available in the pro version though.

Note: the video is best viewed in max resolution and full screen to see all the details.

 

Check out the new character, called ‘Mika’:

Pretty cool He!. I think so as well. The character is another of Laticis Imagery’s creations. Ady provided all the images and I assembled them into a single character. The video demonstrates all the available expressions, which can be activated in the output using ‘mark’ ssml tags.

Perhaps some more information on the project: The first release, the basic version should be ready in a short while now, when I have created some content (which should also be a perfect opportunity to work out some of the final details). The basic edition will be a free (as in beer) version. After that, the pro will be prepared for release which will contain some more functionality like user interface automation, and/or home automation (not certain yet what to do first).

Some of the features that will be available in the first release:

  • Select if the bot starts the conversation or waits for some input on startup. Opening statements can be declared in the bot’s properties page.
  • You can declare custom memory operations that need to be performed each time the bot starts.
  • ‘Do patterns’ are also executed each time output was generated.
  • Input repetition is recognized (stored in memory as a counter) and can be handled with custom, conditional output patterns.
  • When no patterns matched, the system will use one of the custom fallback outputs.
  • Input patterns are grouped together into a single rule. These patterns share the same set of possible output patterns.
  • Multiple output patterns can be declared for a single rule. You can select if a random item needs to be selected from the list or if each item needs to be used in sequence (useful for story telling bots).
  • Each rule can have it’s own do patterns, which are used to manipulate the memory.
  • Rules are grouped together in topics (the 2 files that are imported in the video, each represent a topic), which are responsible for providing context. This allows you to declare the same pattern in multiple topics (useful for short statements like ‘why, when, yes, no,…’
  • Additional context can be added through do patterns and can be queried in conditions.
  • It’s possible to declare conditional questions at the level of a topic, meaning that multiple output patterns can share the same questions. The first one who’s condition matches will be used for outputs that don’t declare their own question.
  • A single output pattern can link to other output patterns, indicating that It should be used if the rule it belongs too, is the answer to a question declared in one of the linked outputs. This is useful to properly handle responses or when the user doesn’t respond as expected.
  • Time and date are supported in the output and conditionals through a variable. When used in combination with the thesaurus, some pretty powerful things can be done.
  • Test-cases for running automated tests on your bot.
  • Synonyms are automatically resolved in the input. This is a very powerful feature that’s able to recognize and replace compound words in the input. For instance,  if an input pattern contains ‘what is’ and the system knows the synonyms for ‘what is’ are ‘whats, what’s, wats, wat is, wat’s’, then you only need to declare 1 input pattern to recognize all of the possible synonyms.
  • Synonyms can be managed from the thesaurus editor.
  • The following operators can be used in the input patterns:
    • () group input together
    • [] option: words between the brackets are optional, not required to be present in the input
    • {} loop: words between the brackets can be found 0, 1 or more times (useful for lists)
    • | choice: the input needs to contain either the left part or the right part of the choice. This can be combined with an option, group or choice, like: [I | you | he | she | we | they]
    • $name: variable declaration: collects words that can be used in the output or conditions.
    • ^path: thesaurus variable declaration: the input needs to contain a word (or compound) that is a child of the specified thesaurus path (very powerful). The actual collected word can be used in the output/conditions like a regular variable.
    • && the and operator allows you to declare groups of words that need to be present in the input, but which can have ‘holes’ in between them, ex: (hello) && (what’s your name)
  • conditions and outputs can also use:
    • #path: declares a data-path into the memory.
    • ~name: to reference topics.
  • There is a built in topic-editor or you can edit them directly in xml format.
  • The built-in topic editor has a spell checker.
  • Patterns with errors have a red line, making them easy to find. The error text can be seen as a tooltip or in the log.
 

For those who  have  been wondering what the bleep I have been up to for the past few months, well check out the video:

Looks pretty cool he? So, what happened? Well in short, I took the neural network designer, removed everything complicated from view that remotely had anything to do with ‘neurons’ but kept all the memory functionality, added a character engine and a simple but powerful pattern matcher (implemented in neural code, pretty cool I think,… and simple stuff).

The character in this demo (actually called Tara – and still in development) is designed and named by Ady Di Pierro from Laticis Imagery.  The drawings were made with DAZ3 and assembled in something very similar to verbot’s CCS file.  In fact, all verbot characters should work in this character engine as well, cause I only added features to the file format but didn’t change any existing.    The images were manually assembled for this character cause there is no char-editor yet.  That is scheduled.

Also check out what Roger Davie (aka Freddy, Admin of the AI  dreams forum) did with the forum’s bot:

The images are also rendered with DAZ. He definitely has the visemes already worked out better than in the first demo.

So, how did all of this come to be, you might wonder. Well, I think you can thank Wendell Cowart from the chatterbox challenge for this. He originally contacted me with a request for a new ‘pattern based’ chatbot project. I took this as a nice challenge to demonstrate exactly just how flexible resonating neural networks are. Soon, Patti Roberts of Bildgesmythe also joined in. Together, they basically told me how they would like to have things, which features they were looking for and such. Thus, this little project was able to come into existence at record speed.
Patti and Wendell also helped out a lot with the initial ‘mid development’ testing, which I’m sure you understand is a pretty frustrating job to do, as things are usually not yet behaving the way they are expected to. So many thanks for cracking out those basic ‘issues’.

Anyway, for those who would like to play with it for themselves, a first public beta release will most likely be coming shortly. Just keep in mind that any first release will be a ‘technology preview’ sort of say.

© 2012 Neural Network Design blog Suffusion theme by Sayontan Sinha