YeenDeer softness blog (programming and electronics)

Ellie the Yeen is a soft YeenDeer that mweeoops and does programming

View on GitHub About Bots Projects Tags

Escapes

Escaping is when you want to place some code in some other code and you modify it first in order to. Here are some examples of the concept and how it might look in different situations.

Escaping code that is generated somehow is a extremely important practice. SQL injections are a huge issue when it comes to code that is not properly escaped therefore you should use prepared statements whenever you can in SQL. In this article however we are going to look at places where the code is escaped.

There are various examples of this but the definition of it would be that you have any form of data structure that is usually a string and want to fit it inside another data structure. Things such as numbers tend to not be needed to be escaped as most languages use the same format for numbers but it is a good practice to do it anyway. You could call this a form of code generation as the goal is to produce a textual representation of the data inside the target language as a literal.

Now the goal is to produce something that does not break the syntax of the target language. While prepending and appending double quote " might work many times to escape something it does not work on something actually containing double quote and while prepending \ to each double quote might work in those cases it might not be the best representation of the string.

Due to the nature of this there might be multiple representation of the same string such as some strings contain characters that might or might not be escaped such as newlines that might either be an actual newline written in the code or the escaped version \n.

HTML and XML

HTML is escaped by first replacing & with with &amp; or &#38; then < with &lt; or &#60; and > with &gt; or &#62;. The characters &, < and > have special meaning in HTML for example a HTML tag looks like <tag></tag>. XML can have some additional restrictions of what is allowed and there might be other additional restrictions on both XML and HTML when it comes to unprintable characters.

Attribute

Attributes in HTML have additional restrictions of what characters need to be escaped. It also has some characters that technically no longer needs to be escaped like < and > but " needs to be escaped into &quot; or &#34; and ' into &apos; or &#39;. There are esceptions and special cases to this but it is better to escape than not escape to prevent accidents.

Optional

It is possible to escape characters that you do not need to escape for readability reasons such as replacing \n an actual newline with &#10;. An example of this is that you can write

<meta name="og:description" content="something
with
lines" />

and it will parse just fine but you should really write

<meta name="og:description" content="something&#10;with&#10;lines" />

if you want some better looking indentation. As you see &#10; is a newline since it is of the value 10 in base 10 which is a newline in encodings such as ASCII and UTF-8.

See html.escape to escape HTML in Python.

URL

URLs use something called URL encoding to escape characters that have a special meaning inside URLs or in application/x-www-form-urlencoded HTTP requests. URL encoding works that all characters that have special meaning or is not printable are replaced with a percent sign % and two hex characters which means that it uses an alphabet not only 0-9 but also A-Z. This means that newline encoded is %0A rather than &#10; as it is in HTML as the number base is 16 and not 10.

The characters that should be escaped in an URL is both the non printable characters which are under 32 in ASCII and UTF-8 and above 126 in ASCII and the following printable chars.

@&*][,"|!(' #;<^=`+%$)\}:{>?/
# Which escapes into
%40%26%2A%5D%5B%2C%22%7C%21%28%27%20%23%3B%3C%5E%3D%60%2B%25%24%29%5C%7D%3A%7B%3E%3F%2F
# Which is 3 times bigger

There are more things to consider for example a special rule that says that a space URL in a browser might either be written as + as plus %20 or an actual space ` ` which will be converted into %20 unlike the +. Certain APIs might have strange rules of when to use plus and when to use + or %20 such as the Telldus API in the past.

There are some other oddities such as / is not escaped by Python’s quote but it is using quote_plus and I have no idea why but found out when checking the Python source code for this article.

As always it is possible to escape characters that do not need to be escaped like you can escape the entirety of the alphabet in uppercase which looks like this

%41%42%43%44%45%46%47%48%49%4a%4b%4c%4d%4e%4f%50%51%52%53%54%55%56%57%58%59%5a

which could maybe be good if you want to hide some code so you accidentally do not remember it or something like that or maybe you are trying to sneak past some code past some anti cheat or antivirus program.

You can use the following function to completely URL encode any string in Python.

def escall(i): return ''.join('%%%02x' % ord(a) for a in i)

See urllib.quote to escape URLs in Python.

JSON

JSON which is the object notation for JavaScript used for data transport is quite simple that it is based on C and has similar escape codes and it is very well documented. It is valid code in JavaScript and sort of valid Python. There is a funny trick you can do in Python where you enter the line

null,false,true=None,False,True

and after it is in scope you can usually just eval to get the JSON out which is something you should really not do with untrusted data.

JSON uses unicode escape codes for escaping so if we make an example function to escape it would look like this

def escall2(i): return ''.join('\\u%04x' % ord(a) for a in i)

and would escape every single character to a string in JSON.

Anyway this some basic examples how escaping in programming languages work and I hope you had fun reading.

*Mweeoops*

By ellietheyeen 25 October 2024 Permalink Tags: javascript json python


Instance: