Our neural network can process any kind of plain text as long as it is provided in a .txt file located at your projects input folder. It is also possible to provide not only one but many text files.
Think of anything that you want to mix up, reproduce, produce more of. E.g. Trump Tweets, any dolphin related content, Pablo Neruda poetry, ASCII comics, images converted to text, rap lyrics, artwork descriptions, declaration of human rights, other laws and acts, Seinfeld episode plots, music notation, all your favorite books at the same time, bible psalms, Confucius’ teachings, H&D slack history, or something that you have written yourself.
We will need at least a few pages of content in order to get the model running at all. And even then, the output will very quickly be similar to the input ("overfitting"). The more input there is, the more variation there will be in the end. But also, the more input you have, the longer the training will take. 50kb to 5mb is probably a good amount text to start with.
Even in plain text, there are many ways for formatting e.g. Markdown, XML, JSON, YAML. Also you can use empty lines, tabs, spaces or any other markup that you can think of e.g.:
Tweet 2018-01-13 13:14 Sat AMERICA FIRST! Hearts: 152962 Retweets: 37694 Published with: Twitter for iPhone
# Leave me a place underground ## From: Las Piedras del Cielo ## By: Pablo Neruda Leave me a place underground, a labyrinth, where I can go, when I wish to turn, without eyes, without touch, in the void, to dumb stone, or the finger of shadow.
I said: I am learnign machine learning U said: Speaking of machine learning, did you see the dolphin swimming in the river this morning?
The model will only index a fixed number of different characters (by default it is 98) inside of the ASCII range. The model won’t be able to recognize or produce characters outside of this range.
my-new-project/input and continue.