Tuesday, February 25, 2014

Network simulations and "realistic" results. The WSN case.

This post is not a rant, it's more a philosophical thing. I hope you don't mind.

Network simulators are widely used tools in academia (from teaching to research) and industry (from research to new technology pre-deployment evaluation). However, I found out that often the simulators are used without thinking about how close to reality the results will be. Leading to disaster (sometimes) and to useless research (more often).

There's a nice presentation from +Sally Floyd (http://www.icir.org/floyd/talks/WNS2-Oct06.pdf) discussing some interesting topics about what's a simulator goal should be. I cannot agree more: the simulator's user should first find out why he/she is using the tool, and then use the appropriate tool.

Now, this is often not done. More than often, a simulator is chosen for one of the following compelling reasons:

  1. Everybody in the lab is using it (or, my company always uses it).
  2. I already know it, why switch.
  3. There's a model in the simulator that seems to suit my need.
  4. Etc.
From these points, the most relevant is missing:
  • Will the results be useful to what I'm looking for ?
Let's take an example: Wireless Sensor Networks (WSN) and Internet of Things (IoT).

In order to simulate a WSN, you have to have a model, and the model must reflect an actual scenario.

L1/2, the PHY and MAC layers. 

Usually WSNs use IEEE 802.15.4, but it's not the only choice. For example, Body Area Networks can use 802.15.6, or Bluetooth LE.
Even the same protocol (e.g., 802.15.4) have different variants. And for each one something changes. Packet framing, access methods, etc. 802.15.4 in 2.4GHz and in 800MHz are slightly different, and 802.15.4e is completely different.

L3, it's IP...

Nope, it's not "simply" IP. IETF is advocating the use of 6LoWPAN, ZigBee is using a different approach. Other (proprietary) systems use completely different things.

Routing

On this topic, everybody re-invented the wheel. RPL, RIME, LEACH, HERD, [Controlled] Flooding, 
whatever. they're all different.

L4 and above protocols

You'd say that here things should go better... no.
UDP, TCP, HTTP, CoRE/CoAP, OMA variants, ZigBee ones, etc. Name a random one, chances are that somebody is using it.

And now the most astonishing one...

Channel models

It has been proved that simulating any wireless channel is challenging. The model, no matter how good it is, will never exactly capture all the things from a real system, like interference, scattering objects, etc.
To ad complexity to this, let's just say this: two devices, using exactly the same protocol, can have completely different performances. I personally found that 2 devices from 2 different vendors have drastically different performances. One can reach 15m, the other 50m.

This seems a minor detail, but when your scenario is "let's place X devices in the area", you'd like to know how many are need to ensure the network connectivity. Moreover the node's density is driven by the actual phenomena to be observed, not by how crappy their radio device is.

Summarizing: panic. You mis-model one of the above and your simulations will give you some results. False, like a 3$ coin.

However, i said I wasn't king to post a rant. So, here are the good news. What ns-3 can do for you.
  1. Channel model: it can be easily changed. It's even possible to use experimental data (but you'll need the BER Vs distance).
  2. PHY/MAC: Actually the lr-wpan model is being reviewed. It's implementing the Contiki's NULLMAC, but it's possible (not easy, but possible) to extend the model to mimic other MAC protocols. Mind that this is an extremely important point to make realistic simulations.
  3. IP: IPv6 and 6LoWPAN are supported out-of-the-box.
  4. Routing: bad news here. I'm working on an RPL implementation, but it's taking AGES. The protocol is very complex. And believe me when I say: actual OSes do not have complete implementations. They just have the bare minimum (and sometimes it's even bugged).
  5. L4 and above: UDP is there, upper layers no. However, DCE could come to the rescue here. On the other hand, usually simulating exactly the application layers is out of scope.
I'd say that ns-3 is not yet ready for IoT WSN simulations out of the box, but we're close. Very close.

Have fun simulating, and don't forget to check the models before using them !

About the "help me finding my thesis" request

Today I want to talk about a strange phenomenon I've seen recently: students asking online about ideas for their thesis.

Nothing strange, you'd say, exchanging ideas is a very good thing. What puzzles me is the content of the requests.

Working at a University, I'm used to assign and follow masters or Ph.D thesis. When a student comes to me to ask for a thesis topic, we discuss a lot about the research topics we carry in our lab, along with the student's interests and background. Then we settle down on a suitable topic and I give the student a number of source material to study.

During the thesis work, we have also periodic meetings to verify the student's progress and how the (obvious) issues can be solved.

I expect the student to play an active role in his/her work, by suggesting thing I didn't think of, find new ways to solve issues, and so on. In order to do that, I (of course) need to know the topic he/she is working on.

And here comes the thing that is puzzling me.

I keep finding people asking the user groups (or by private e-mails) about a research topic. It's like if their tutors didn't discuss about possible topics with them, just outlining an extremely broad topic.
E.g., "find a topic related to MANETs, or VANETs" (Mobile Ad-Hoc Networks and Vehicular Ad-Hoc Networks, respectively).

Now, these topics are so broad-range that a tutor can not possibly follow all the research topics that could come up. An example for VANETs:

  • Intra-vehicular channel model.
  • Fast channel scanning/allocation.
  • Efficient resource management.
  • QoS support.
  • Routing (Location-based, semantic, cluster based, etc.).
  • Cluster forming.
  • IP[v6] and Mobile IPv6.
  • Traffic patterns.
  • Resource discovery algorithms.
  • Mobile cloud networking.
  • Mobile cloud applications.
  • Security (e.g., how to join a secure cluster).
  • Privacy in VANETs.
And that's just to name a few coming to my mind. I voluntarily left out the more specific topics related to the actual technology used, i.e., 802.11p, 802.15.x, LTE, D2D, etc., plus all the choices about protocols to be used in all the different parts of the network.

What will happen if the student comes up with an idea and his/her tutor is not actively involved in that research topic. Because let's be serious, nobody can know everything. The tutor will not be able to judge the idea difficulty and originality. And will not be able to understand what are the hidden issues in the proposed idea (the student might not have any clue either).

Result: a thesis that's too hard, or too simple. If they're lucky.
If they are unlucky, the thesis will have an extremely bad background, and the results will be plainly useless. In this case, the student will learn nothing. Not what's the scientific research method, not what's the real work method. Nothing at all, the thesis will be just paperwork.

Of course there may be lucky cases, where the student will come up with an interesting idea and the tutor is actually able to follow him/her. But I'm not confident that this is a normal case.

Mind, I'm not blaming the students to ask this kind of questions online (i.e., help me finding my thesis topic). I'm questioning their tutors.
Probably it's a matter of knowledge. I simply don't know how this process could work. Maybe I'm missing something. I hope I'm missing something.

If anybody can enlighten me, I'll be very happy.

About the "I'm not good at programming" type

Small rant today, all about the classical guy saying "I'm not good at programming" (may be a girl as well).

By now you should know, I do research stuff. Mostly my research is about networking protocols, that's why I develop for ns-3.

Now, the amount of people claiming (or thinking) to be a good researcher in this field, and to be terrible at programming (in any language) is impressive. Nothing wrong, you'd say... maybe, or maybe not.

The main problem is: if you're unable to think about the implementation, you'll do mistakes in designing. Simple as that.

The example comes from a paper I've read a few days ago. Some smart guys devised a fantastic method to implement an extremely efficient routing protocol. Fantastic results. Impressive, for real. BUT... but the proposed thing could not be implemented. They forgot that data have to be known before being used. And to know data, you have to transmit them, with the usual delays, loss probability and so on.

Example: "Let's assume that node A wants to transit something to node B. Node A knows the number of packets being received by node B". Good, Node A can NOT know that number. Not without asking Node B, which, in turn, make it pointless the assumption (because the two nodes are already communicating).

What's this? Poor understanding of how things works. Mind that the authors of that paper also did some simulations proving that their method was superior to the literature ones.

How they did those simulations? Magic? No. Many simulators allow you to do "tricks". If you don't develop the code in the right way, you can actually gain infinite omniscience in your nodes, and have a lot of (wrong) data. Good job.

Learn to program, and don't take shortcuts. Protocols are painful sometimes, but developing them forces you to think about how real systems work.

Monday, February 24, 2014

Open source and code sharing (2)

Second rant on this topic (I hope the last one... for now).

Let's make some simple assumptions:

  1. You're developing something. It's experimental code about research stuff.
  2. Some colleague politely asks "May I use your code? It would really benefit my research".
You are, of course, going to release the code as Open Source soon™, so what's the point of keeping it private?

Well, there are a number of reasons. Just to name a few:

  • The code is not ready. It contains bugs, or poorly modelled parts, and you don't want that other people could mistakenly take the results for good ones, while they're wrong.
  • You spent a LOT of time on that code, and you'd like to use it first.
Those are two compelling points. The fist one is, basically, originating from the incredible amount of dumb people in the world. They can take a simulator like ns-3, and assume that everything is bug-free. Then they launch a simulation, find some interesting data and they can publish them. All this without the doubt that the data are, indeed, the result of a bug rather than something real.
Don't underestimate dumb people, they can find a valid reason for about anything, even if it's unreal. I could even name (and shame) some scientific papers. And they are cited too (sadly not to point out that the authors are dumb).

The second point is all about your work. You did that code for a reason: to do some research and publish it. Isn't it? However publishing a paper isn't that easy. Sometimes papers get stuck, sometimes they're rejected because the reviewers are funny guys (like the one that asked why we didn't cite Shannon's paper... sure, let me cite also P.Artusi, I used it yesterday). Anyway, it's not easy or fast to publish, and papers are one of our major outcome (more rants on this in the future).
As a consequence, researchers are more keen on "opening" their software once some papers have been published. Before that... not very much.

Said so, what you'd do ? You'd give access to your code or not ?
Yes, you do. Because the people asking are from a well-known research centre, and because it could open new collaboration possibilities.

... and then they disappear. You told them that the code may be bugged. You told them that you really hope to have feedbacks, like bugs, code improvements and so on. You told them that you'd like to collaborate and share. Gone, like tears in rain.

And now ? What to do ?

Simple. You wait for the next big code change. You collect the bugs and you do NOT fix them. Not on the repository you gave them access to. Then you revoke the permissions and you keep working, Hoping that they'll get stuck in a terrible bug and that they'll loose their faces when they'll try to use your bugged code. Because bugs are there, and you know.

And the lesson is: when you use an Open Source project, remember to contribute to it. Anything, even a small thing, is the sign that you respect who worked on it before you. Do not take anything for granted, Open Source isn't "someone was dumb enough to give me his time". Those developers might be dumb (in your eyes), but they can be evil as well, and they do remember. Everything.

Sunday, February 23, 2014

Open source and code sharing

Open source software means to share the code with a community, seems logical... or not ?

I'll not start the usual utterly long post about the Better World Built on Open Source Code. Too many have done that.

I want to discuss about a bad habit I'm seeing more and more: assuming that since you're contributing to Open Source software, anyone can ask you the code you're working on with little o no "thanks".

And this leads to the question: why someone should contribute to Open Source software ?
The answer is not simple, and I'm neither a psychologist or a sociologist, so take the following with a grain of salt.
In my opinion the answers may be:
  • Self-esteem: being the author of a publicly-available software (maybe used by many) is a great reward.
  • Hobby/fun: the author is doing it by hobby, and he/she can't care less to get some money out of it.
  • Antagonism/idealism: the author is trying to **** the system, and wants to go against the "lobbies".
  • It's research stuff: the author is a researcher, and the code has been used to prove a theory or a model. Since the research is (usually) paid by other means, the code may be released to the public.
Whatever the reason is, there's one point in common. The author wants to be recognized as the author. Sharing doesn't means that the code is given for free. Implicitly, anybody using it is giving you credit for your work.

Now, what is the worst thing that can happen with Open Source software ?
The answer is simple: that the author doesn't receive proper credits. That's why the Copyrights are used. The various Open Source licenses are all about this: you may use the code, but you have to give proper credits.

But there's another point. Making the code better. Improve it. And this will be the topic of the next rant.

Welcome...

This is the fist post, and as all the "first" posts, it's just there to state why I open this blog.

Well, to make the long story short, I found out that:
  1. I want to have a place to write what comes to my mind. Rants, mostly, but also useful stuff.
  2. Sometimes some (I'd say about 1%) of the stuff I'll write may be useful to somebody.
  3. Why not ?
  4. I also wanted to test Google AdSense. Curiosity.
And now about the topics. I'm an Assistant Professor at University, and I am an ns-3 contributor and maintainer.  As a consequence, I stumble on a lot of bugs and user's requests. Some of them are interesting, some are funny, some are... disappointing. I'll write about all of them.
However I also love dogs and cooking, so I may write about those as well. And, of course, I'll write about random stuff, because... why not ?

So, welcome to my blog. The updates will be strictly random. Because I'm not a reliable guy.

Have fun !