Atlas zur deutschen Alltagssprache

Lauch oder Porree? Schornstein oder Kamin? Reicht’s oder langt’s jetzt? Wo ist der zu gewinnende Blumentopf geläufig? Und wie nennt man eigentlich das Stofftuch zum Naseputzen in anderen Teilen unseres Sprachraums?

Diese und weitere Fragen klären die Erhebungen im Atlas zur deutschen Alltagssprache (AdA) von den Germanistikern der Uni Augsburg. Hochinteressant.

Love is in the air

Well, it’s that time of the year again. Where you and your beloved think of each other, while getting whipped. Happy Lupercalia festival everyone. Grab your dog and goats and get ready for some bloody sacrif……. pardon me? Oh wait, you’re not expecting ancient (pre-)roman festivals regarding wolves and fertility?

Okay, so you’re more for that other stuff that’s more commonly celebrated these days. Fine with me. So, today we are celebrating the 1739th anniversary of the beheading of the patron saint of the beekeepers, Valentinus of Interamna, by giving our lovers flowers, candies and things that are not quite G-rated. Yay for us. (The most common theory of V-Day. And somewhat fitting… I mean if Christians can revere the „tool“ that was used to execute their prophet, lovers can celebrate their love on the day when one of their advocates got inhumed.)

Saudi Arabia preferred to ban the pagan V-Day. No signs of love allowed, not even red flowers, let alone roses. The police is even searching the flower shops for „suspicious“ wares. Brits on the other hand prefer to bet on whether Prince William will propose or not.

Isn’t it a funny world?

How many URLs are there?

I’ve been wondering… how many URLs are out there in the internet? Not just domain names, but real URLs, including files and parameters. Different protocols as well. I think those are quite a few.

The thought occured to me while working on the concept of lonks. For the community edition I want to save URLs into a seperate table and just refer to them through IDs, so that they are not directly connected to the bookmark entries. That also reflects the idea of a somewhat normalized database and makes anonymizing referers more easy.

But the (random) IDs have to be in the right size from the start to last for eternity (or at least close to that). Otherwise some URLs could be identified to be created after a certain timestamp. On the other side they should be short enough to waste no unnecessary dataspace and don’t make the the referer URLs too long.

Just using numbers looks lame. But I can’t use all characters as well or there’ll be an ID that makes sense as a word. Maybe even a swear word. You don’t want http://lonks/nr1idiot to direct to your site, do you? Going hex is a bit restricted as well, but is the best common system.

In addition I thought of a system that splits the alphabet into chunks, which will makes it virtually impossible to create a word. I still have to figure out if that system is any good and how many IDs I can squeeze out of it with a decent amount of digits. If that won’t work out, I guess I’l stick to 4-16 digit hex (64 bit).

Okay, lets do the math with 4-16 digits (always including numbers) just for fun.

  • hex
  • 3 no-vowel chunks
  • No vowels
  • All characters

Maybe a case-sensitive character system will help to reduce the digits and/or increase the possible number of IDs. But maybe hex is enough… considering there won’t be the need to save every url of the internet anyway.

Am I thinking too much? Or am I just megalomaniac? Still the question remains… how many URLs are there?

Update: Just did the number crunching on a case-sensitive version of the 3 no-vowel chunks: 36,349,704,372,835,319,666,931 Somewhat a nice intermediate number. Looks mysterious and leet as well, so I might go for that. So many possibilities, that most of the time there won’t be the need to generate another random id, in case it is already in use. The speed of the queries and searches will be an interesting factor in the end, but I guess that problem will be solved when it arises.