3. Moose’s internal web scraping language

I’m just getting started with documenting the little mini programming language that has been built into the Moose for extracting useful data from webpages and RSS feeds. Much more will follow. But if you stumble onto this page in its current unfinished form, just nevermind it.

CommandCommand NameDescriptionClick for details page
^ or ; or * or ~DelimitersThe very first character determines the "Delimiter" that will separate all the commands from each other. It can be any character, but I suggest ^Delimiters
aKeep AfterFind something, Keep whatever is After it. Discards preceeding stuff. Whatever remains is called the Chop Result.KeepAfter
bKeep BeforeFind something, Keep whatever is Before it. Discards subsequent stuff. Whatever remains is called the Chop Result.KeepBefore
gKeep After IncludingFind something, Keep it and whatever is After it. Discards preceeding stuff. Whatever remains is called the Chop Result.KeepAfterIncluding
hKeep Before IncludingFind something, Keep it and whatever is Before it. Discards subsequent stuff. Whatever remains is called the Chop Result.KeepBeforeIncluding
SaveAppendAppend to SavedAppend the current Chop Result into the Saved-Area, after whatever was already there.SaveAppend
SavePrependPrepend to SavedInserts (prepends) the Chop Result into the beginning of the Saved-Area, ahead of whatever was already there.SavePrepend
RestoreDiscardedRestore DiscardedRestore the subsequent stuff that was discarded.
RestoreOriginalRestore OriginalRestores the original text, starting again prior to any chopping.
RestoreUndoRestore UndoRestores as if Undoing the last 'Keep After' chop.
RestoreSavedRestore SavedMoves Saved Area content back into the Chop Result, then Clears Saved Area.
ContinueContinueContinue does Append to Saved Area followed by Restore Discarded.
tTextInserts some text, Appended to the Saved Area.
' ' (spacebar)Text spacedInserts a space, then some text, Appended to the Saved Area.
uText in frontInserts some text, ahead of everything else in the Saved Area.
+PlusConverts the current Chop Result into a numeric value, then performs Addition using whatever follows the plus sign.
-MinusConverts the current Chop Result into a numeric value, then Subtracts whatever follows the minus sign.
*MultiplyConverts the current Chop Result into a numeric value, then Multipies with whatever follows.
/DivideConverts the current Chop Result into a numeric value, then Divided by whatever follows.
^PowerConverts the current Chop Result into a numeric value, then uses whatever follows as a power Exponent.
%ModuloConverts the current Chop Result into a numeric value, and applies Modulo to it.
AbsAbsolute valueAbsolute value. Negative numbers are turned into Positive numbers.
dDecimal pointsRounds the Chop Result to show 0 or more decimal points. eg. D2 means 2 decimal points.
TrimTrimTrim removes extra whitespace from the Chop Result.
IntIntDiscards the decimal points from a number in Chop Result, without rounding.
ReplaceReplaceReplaces the first matching Thing in Chop Result with a Replacement.
ReplaceAllReplaceAllReplaces All the matching Things in Chop Result the Replacement.
GetXMLGetXMLGets a value from with XML tags, substituting a default value if Empty or Fail.
IfIfTest the Chop Result, If the test is 'True', then do an inline action, or whatever is inbetween IF..ENDIF or IF..ELSE
ElseifElseif
ElseElse
EndifEndif
If testssee table below to see what IF's can test for.
If inline actionssee table below for inline things like If:Empty:Exit
If And OrHow to use 'And' and 'Or' logic with IFs.
ClearClearClear the current Chop Result.
ClearAllClearAllClear the current Chop Result and Saved-Area.
ClearSavedClearSavedClear the Saved Area.
LeftLeftKeeps the Left-most characters of Chop Result and discards the rest. eg. Left:3 keeps 3 characters.
RightRightKeeps the Right-most characters of Chop Result and discards the rest. eg. Right:4 keeps 4 characters.
MidMidKeeps characters from the Middle of Chop Result, discarding the rest. eg. Mid:5:10 starts at the 5th character and keeps 10 characters.
StripHTMLStripHTMLRemoves as much HTML tags as it can from the current Chop Result, to leave only plain text.
StripTagsStripTagsRemoves xml tags like or but keeps what is inside the tags.
StripHeaderStripHeaderRemoves the http header.
RegExRegExApplies a 'Regular Expression' (RegEx) onto the current Chop Result.
TagsTagsPuts <xml> Tags:___ before and after the Chop Result, then Continue.
DebugDebugStarts Debug mode to step through commands one at a time.
GetCDATAGetCDATAFinds and extracts into Chop Result whatever is in the next nearest CDATA, then Continue.
rsslistfRSS list forwardSeparates web page items into a list RSS style going forward.
rsslistbRSS list backwardSeparates web page items into a list RSS style going backwards.
PrepRssPrep as RSSPrepares a webpage as if it were a list of things like an RSS feed.

Below are the various IF tests

Test, ExampleDescriptionLink for details
If:s=

eg. If:s=fish
If the string in the Chop Result is equal to:
If:s<>

eg. If:s<>fish
If the string in the Chop Result is not equal to:
If:sl=

Eg. If:sl=5
If the length of the string in the Chop Result is equal to:
If:sl<>

eg. If:sl<>5
If the length of the string in the Chop Result is not equal to:
If:sl<

eg. If:sl<5
If the length of the string in the Chop Result is less than:
If:sl>

eg. If:sl>0
If the length of the string in the Chop Result is greater than:
If:sl<=

eg. If:sl<=5
If the length of the string in the Chop Result is less than or equal to:
If:sl>=

eg. If:sl>=5
If the length of the string in the Chop Result is greater than or equal to:
If:n=

eg. If:n=12
If the number in the Chop Result is equal to:
If:n<>

eg. If:n<>5
If the number in the Chop Result is not equal to:
If:n>

eg. If:n>5
If the number in the Chop Result is greater than:
If:n<

eg. If:n<5
If the number in the Chop Result is less than:
If:n>=

eg. If:n>=5
If the number in the Chop Result is greater than or equal to:
If:n<=

eg. If:n<=5
If the number in the Chop Result is less than or equal to:
If:EmptyIf the Chop Result is empty, length of zero.
If:FoundIf a previous Find command was used, and it succeeded, this will be True.
If:NotFoundIf a previous Find command didn't match, this will be True.
If:f

eg. If:frss
Tries to Find a word or string within Chop Result, allowing 'If' to test for Found or NotFound.
If:x

eg. If:xrss
If letters after x are 'not found', the IF test succeeds with NotFound true.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>