Coding, ns-3, generic geek stuff and everything else: 2014

Saturday, April 26, 2014

ns-3 Testing - If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Testing...

a lot of books have been written about software testing, and there are University courses about software testing. Hell, it's a full research field, I can't pretend to say anything close to be meaningful about this in one post. Why should I write something about it ?

The answer is simple: to explain (once more) why testing is important, and how to approach the testing phase in ns-3.

First and foremost: there's a ns-3 manual section about testing, and it even explains what are the generic goals of software testing (Correctness, Validation and Verification, Robustness, Performance, and Maintainability). Amazing, isn't it ? (The most amazing part is that nobody seems to read the manual, or even know what's inside it.)

Anyway... to test or not to test, that's the question. Not really, tests are necessary, so there's no question. The question may be about what to test.

Before writing a test, one have to consider that ns-3 i a network simulator. If you're writing tests, you probably wrote a protocol, and a protocol involves two (or more) network entities, exchanging data (packets) with delays and random error / losses.
For a software engineer this is often shocking. Testing is much like any other kind of software test, with the twist that API calls can be deferred, mangled, etc. You transmit 10 and the receiver understands 100.

And this is the first lesson: it's not all about positive tests (i.e., conformance testing), it's instead about negative tests: you have to test what happens when the "normal" operation fails.
You have to test this: A send [x] to B, B replies [f(x)] to A.
But also: A send [x] to B, B understand [x'] and replies [f(x')] to A, which understand [g(x')]. A conversation between deaf, and nothing should go crazy in this (beside the developer, of course).

Ok, this is good, but it's too generic. So... some rules. In order to make an effective test for a protocol you should do the following:

Static test (format): check that the packet are well formed.
Positive test (conformance): check that the behaviour you have is the expected one.
Negative test: try shuffling the packet order, loosing a packet and so on.
Vulnerability test (the paranoid's one): try messing up with the packet's data. Packet's fields representing lengths are a good start.

Point 3 is often not performed in simulators, but it may be worth considering if the simulator have to be used in emulation mode. In this case... more to follow.

Point 1 is important, but you need something able to understand the packet's format. A popular choice is Wireshark. However, also Wireshark has its limits. If the protocol is new, chances are that Wireshark can not "dissect" its packets. Moreover, Wireshark itself may be bugged, or not understand all the protocol's options. E.g., right now Wireshark understands only the ETX metric in RPL. It can't understand the HC metric. In this case, only inspecting the packet byte by byte is possible. terribly boring - but necessary.

There's a last kind of test, the duck test: If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
In other words, compare against a real system.

This is the most advanced possible test. And it's one of the most valuable and useless tests at the same time: the comparison against a reference implementation.
Before even thinking to do it, consider the following:

You need to have an implementation to compare against.
You need to setup a test environment with both your stack and the "other" stack.

And this is where doubts can arise.

If you have a working implementation, why the hell did you develop the protocol in the simulator? (false problem, maybe the simulator is the reference implementation, maybe you want to change stuff and in the real system you can't, maybe you need the power of the simulator to perform massive tests).
Something is not working as expected, who's the bad guy? (welcome to cross-implementation debugging).
How can I force the "other" system to send me "wrong" packets?

Well, I'm not going to explain all the possible cases, let's just say that testing against a real system is like doing normal tests, just much harder.

BUT, there's the duck.

As is: you're calling your protocol a duck because it behaves like a duck, but... who told you that the thing you're calling a duck is a duck?

If you're comparing against a "bad" example, you'll tune your system to that bad example, and both will be consistent with each other... in doing the bad thing- "Follow my steps" [Marty Feldman in Frankenstein Junior].

The bottom line: tests are important. Do them, and make sure they're covering the most important parts of the protocols. Don't avoid them just because it's hard to do them, do them. Bugs are more than often spotted by tests, because the "example" will show you how the system performs when all is right.

Tuesday, April 22, 2014

Silence is golden

My granny was used to say: if you don't have anything nice to say, don't talk.

This is not the case. I have plenty of stuff to say, but I've been a bit busy. In strictly random order:

A house to buy
GSoC and SOCIS to apply for, review student's applications, give scores and so on
Financed projects to apply for (they'll be financed... if they'll be accepted!)
Papers to write (yes, I have to write papers as well)
Sleep, drink and eat. Sometime breath as well.

I have some busy days. But it could be worse.

Anyway, the next posts will be about useful but terribly boring things.

Debugging - How I Learned to Stop Worrying and Love the Bug
Valgrind - O Bug, Where Art Thou?
Testing - If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.
Documentation - <put here your quote>

Have fun all !

Thursday, March 27, 2014

An effective path to learn C++ (or any other programming language)

~~ I have all the answers, but I don't remember the questions. ~~

Yes, this post is about learning C++, but the suggestions apply to any other programming language.
I could have named it "The True System to Learn C++ in an Easy and Painless Way in less then 30 minutes", but I'm a bit shy.

First and foremost, a little bit of history.
I didn't learn programming in a class, I'm self-taught.
Well, that's not totally true. I took a programming 101 course at University. I learnt FORTRAN (it was The language for scientific applications back then, and no, I'm not that old).

Anyway, at one point I decided I needed a more "modern" language, and I decided to learn C.
I hate from the bottom of my hearth the "X for Dummies" kind of books, so I took The Source: The K&R (Brian W. Kernighan and Dennis Ritchie, C Programming Language).
I loved and hated that book. It's a beautiful manual, it tells you exactly all what you need to know about C, but it tells you nothing about how to use it. Well, it does... but just a little bit. It's like if somebody would explain you exactly how an engine works and how to build all the parts, but not how to actually build one.
It's the sort of book assuming that, if you know exactly how the parts work, then putting all together is easy. It isn't.

Then I moved to C++, and since I learn from experience, I grabbed The Book: Bjarne Stroustrup,

The C++ Programming Language. Ok, that wasn't a smart move.

I loved that book, but it's a manual as well. I had to read it THREE times to start understanding something. It makes it clear from the beginning that C++ is powerful but it's not simple. It explains everything (and by everything, I mean it). It's a reference manual. But definitely it's not a simple way to
learn C++.
I still suggest to buy it, and keep it on your desk as the Reference Manual. Not as a learning book, tho.

I've learnt C++ the hard way, and I was able to write fairly complex programs. By fairly complex I mean a whole network simulation framework and a good part of the DVB-RCS protocol stack.

Still, it wasn't the end. A friend suggested to take a look at a new thing coming out: Design Patterns. So I grabbed The Book (it's an habit, a bad one): E. Gamma, R. Helm, R. Johnson, and J. Vlissides Design Patterns: Elements of Reusable Object-Oriented Software.
Great book. A punch in the groin, tho. It was like "Oh, great, I did everything wrong so far". But no good advices on how to do it right.

So, 2 books and some hundreds pages after, I had he theory, but not the practice.

Then, something changed. I started using the thing I studied. And slowly the pieces did go where they was meant to be.

How? Well, that's the point. The sooner you start this phase, the sooner you'll master the programming language (it took me years, just because nobody told me the trick).

So, here are the tricks. Follow them and you'll master C++ in no time (and any programming language).

Trick #1: grab a GOOD book.

This one has been suggested by Mathieu Lacage, and it's a very good one: A. Shalloway and J. Trott, Design Patterns Explained: A New Perspective on Object-oriented Design.

The real trick is: don't try to learn Object Oriented programming and move to Design Patterns once you have mastered OO. Learn them at the same time. The reason? They're two side of the same coin.

Trick #2: start programming for real.

Don't do the book's exercises. Well, do them, but don't do just those.

Find a good excuse to program (e.g., join an Open Source project like ns-3) and use your skills.

Sure thing, you'll find yourself like a fish out of water at first, but you'll learn quickly. You don't learn how to drive a car by reading a book, you have to drive.

At first you'll do mistakes, and your code will be ugly. However, in due time, you'll learn the tricks and you'll start becoming better and better.

Remember, practice make it perfect, not studying. Practice without study is bad, study without practice is equally bad.

The real suggestion here? Find something that matches your interests. You have to find it interesting, not boring. And if you choose something only because it matches your job, it may be boring as hell.

If you follow the two simple tricks I told you, you'll become a programming master in no time: less than 1 year.

But... but... you said 30 minutes...

I lied.

Sunday, March 16, 2014

How to develop a new protocol

Today a not-so-small post about protocol development.

One of the recurring themes in the ns-3 users group is:
"How do I develop [my own, a new, a modification] [routing, scheduling, MAC, etc.] protocol."

There are many possible answers, depending on the case, but basically the root is always the same. When you reach the point of asking this, you have done a mistake in your development process.

I'll limit the scope of this post to a subset of the above possibilities. However, I'll try to be as generic as possible, in order to fit also the other cases.
Let's assume the question is:

"How do I develop a new routing protocol."

and let's assume you want to use ns-3. Because this blog is about ns-3, mainly.

In order to write a new routing protocol (any protocol, indeed), the following steps are required:

Requirement analysis
Protocol Design
Development
Test

Usually you'll want to iterate through these steps (at least through 2-4), much like in agile programming.

Let's see what one should expect from each phase.

Requirement analysis

The requirement analysis should fix what are your goals, the nodes capabilities and so on. Don't ever say "it's obvious". Requirements are never obvious. Check RFC 5826 for a routing requirement document example.

Protocol Design

This is the phase where you write down your protocol. Remember that you have to write (at least) THREE things:

Format - how the packet is made, the length of each field, the encoding, etc.
Syntax - what's the meaning of the data carried by the packet (again, it's not "obvious")
Semantic - what's the behaviour of a node upon receiving a packet, what, when and why a packet have to be sent.

A lot of people forget one of these elements, basically making it impossible to implement the idea.

One very important thing: remember the data. Data are not "known", they must be measured. A node doesn't know the channel quality, it measures the channel quality. If you plan to use something in your protocol, make sure to be able to measure it !
I've seen far too many "scientific" papers with this mistake. A success, for the authors. A fail, for the reviewers. A shame, for the scientific community.

Summarising:

Collect data and extract measures from data.
Send the measures to other nodes when something happens.
React upon receiving a message.

Development

If you reach this stage, and your design was done in the right way, it's simply a matter of coding.

The format: define a new set of headers (or data packets) carrying your informations. Remember that measures will have to be device-independent. As an example, suppose you want to send "3.14". You will not send a double in the packet, you'll send a given number of bits representing "3.14". There are a number of ways to do this, just choose the one that suits you.

The syntax: it's how you write and read the packets. What's the meaning of each field. In ns-3 terms, it's the Serialize and Deserialize functions.

The semantic: open some sockets to receive the messages from other nodes, open some sockets to send the packets to other nodes (they could as well be the same ones), and write the logic of each function.

Testing

Once everything is done, remember to test your shiny new protocol. I'd strongly suggest to write specific tests to check all the functionalities, especially the error cases. You know, shit happens, so better to check that your protocol gracefully recover from it.

ns-3 specific tricks

This is quite specific to ns-3, but any simulation (or real) system will have something similar.

Let's not forget that we was talking about a routing protocol. Thus, it's logical that the protocol semantic will build a routing table. Not a big issue. Usually a routing table is little more than: 1) destination (e.g., 2001:db8:f00d:cafe::/60), 2) next hop (e.g., fe80::21b:63ff:fef0:6acd), 3) interface, 4) prefix to use (if necessary).

The routing table will be used in two places: RouteOutput and RouteInput.

RouteOutput: the obvious one. It's called when a packet is being sent.
RouteInput: the not-so-obvious one. It's called when the packet is received, to decide if it has to be forwarded or it's for the node itself.

Indeed, RouteInput is the real "routing" one: it decides if the packet has to be forwarded, what outgoing interface has to be used in the forwarding, etc.

RouteOutput "simply" finds out the optimal routing when a packet is sent from the local node.

When a packet is created and sent, it will pass thru multiple RouteOutput (yes, more than once, thanks to UDP, IP, etc.), then from RouteInput (once for each router is passes thru and once for the destination node).

Suggestions to understand all this? Find a "simple" routing protocol and study its specification and the implementation.

Example: RIPng for IPv6 (RFC 2080). It's short, simple and well written. The implementation for ns-3 is expected to be available starting from ns-3.20.

Saturday, March 15, 2014

Multiple IPv6 global addresses on an interface

Today no rants. An how-to instead.

Ns-3 is a great simulation framework. It's powerful, complex, and the documentation is good. However, we can't document everything. As a consequence, sometimes an how-to is useful.

Of course ns-3 has a how-to wiki page, but sometimes it's better to write interesting stuff also elsewhere.

Anyway, no more chitchatting. The how-to.

Suppose you have 2 nodes, each one with its NetDevice. You install IP and assign addresses. Business as usual.
Now you decide that you want to assign a specific address to a node. Not just a specific one, one extra address.

Let's say that you want this:

Node 0:
1- fe80::200:ff:fe00:1/64
2- 2001:db8::200:ff:fe00:1/64
3- 2001:1:f00d:cafe::42/64

Node 1:
1- fe80::200:ff:fe00:2/64
2- 2001:db8::200:ff:fe00:2/64
3- 2001:1:f00d:cafe::666/64

Note the SLAAC configured addresses (1 and 2) and the manually added addresses (3).

How to do that ?

Easy... sort of. It's easy once you know how.

// d is the NetDeviceContainer

Ipv6AddressHelper ipv6;

NS_LOG_INFO ("Assign IPv6 Addresses.");

Ipv6InterfaceContainer i = ipv6.Assign (d);

// arbitrary address assignment

{
Ptr<NetDevice> device = d.Get (0);
Ptr<Node> node = device->GetNode ();
Ptr<Ipv6> ipv6proto = node->GetObject<Ipv6> ();
int32_t ifIndex = 0;
ifIndex = ipv6proto->GetInterfaceForDevice (device);

Ipv6InterfaceAddress ipv6Addr =

Ipv6InterfaceAddress (

Ipv6Address ("2001:1:f00d:cafe::42"), Ipv6Prefix (64));

ipv6proto->AddAddress (ifIndex, ipv6Addr);
}

That's for Node 1. Node 2 is similar.

Easy? Yes.

Obvious? Nope.

What's the problem and why it's not easier?

The answer is: you need the interface index. Unfortunately, only the Ipv6 class knows the interface index for a given NetDevice, so you first have to find what is that index, then it's a matter of one instruction.

If you think that the interface is at index 1 (because it is), then you're going to have a bad time.

That's because the interface 0 is the Loopback, and you can't be totally sure that a given NetDevice will have a specific index at IP level.

You can, of course, put a static number there (e.g., if the Node just have one NetDevice, it's kinda safe to assume that the index is 1. However, this assumption will not hold anymore if you add more NetDevices.

Anyway, adding multiple IPv6 global addresses to a NetDevice is possible, and it's not too hard once you know how to do it.

BTW... let's try a little test. Why the two nodes need two global addresses? What address 2 can not do?

Thursday, March 13, 2014

Mercurial on OS X, tips

Extremely small post on something useful.

The ns-3 source control management of choice is Mercurial.
Installing Mercurial on OS X is not an issue, just download the package and install it. Simple as that.
Once installed, you can use a terminal to give all the commands you want, or even use a GUI (see Mercurial Mac native GUIs).

What I missed the most, however, is the ability to auto-complete commands from the terminal.
In Linux you can write "hg p" and hit tab. The result is:

tommaso:ns-3-dev pecos$ hg p

parents  patch    paths    phase    pull     push     

or even... "hg push --move" + tab:

tommaso:ns-3-dev pecos$ hg qpush --move 

Ndisc.diff               TagFragmentation.diff    issue6821106_36001.diff

Handy... but it's not working on OS X. However there is a way to enable it. And it's easy too.

Download the mercurial source tarball (http://mercurial.selenic.com/downloads) and choose "Mercurial xxxx source release" where xxxx is the latest version;
Unpack the file, and open the folder. Open the folder named "contrib";
Copy the file named bash_completion somewhere (e.g., $HOME/bin);
Modify the bash profile to source it;
Reopen your terminals and enjoy.

Let's do a practical example. Let's assume the latest Mercurial version is 2.9.1, and let's assume my bash profile is named ~/.profile

The commands to give are:

tommaso:~ pecos$ cd ~

tommaso:~ pecos$ cp Downloads/mercurial-2.9.1/contrib/bash_completion ~/bin

tommaso:~ pecos$ nano ~/.profile

and add the following line to .profile:

source $HOME/bin/bash_completion

Note that in some systems ".profile" could be named ".bash_profile". It's the same stuff.

If you're a terminal guy (like me) you'll really enjoy it.

Tuesday, March 4, 2014

How to ask for help in an user forum

Today no rants, I'll write an how-to.

The occasion for this comes from the ns-3 user forum. It's a kinda peculiar forum, as "users" are not really simply users.

Ns-3 is a network simulator, and is GUI-less. There are some graphical front ends, but the primary way to work with it is by writing a C++ or Python program.
As a consequence, its users are (mostly) able to develop a program, sometimes even very complex ones.

No matter how good the user is, often they find issues, and the forum is where they look for help.
Some posts in the forum are "easy" to answer to, some are... less. And this is why I'm writing this how-to.

The post topic: state your issue briefly.
The post: describe your issue in detail.
The attachments: use them, if needed.
The code in the post: don't do it.
Reply to old posts: avoid it.
Patience is a virtue: don't expect a reply in hours.

Let's see what I mean with some examples.

You have an issue. The first thing is to search the forum for old posts. Maybe somebody had the same issue and was solved. Let's suppose you found something similar, but no definitive answer was given (or you can't fix your issue following the thread you found).

This is a case where you may reply in the thread, but keep in mind that ns-3 is an evolving system (around 4 releases/year), and the code base may be very different from, let's say, 4-5 releases ago.

A better option is to start a new thread and link the relevant threads in your post.

You don't find anything. You start a new thread. Do not post a message with a "help needed" topic. Be specific. The more specific, the better.

The post body: state exactly what's your problem, along with relevant info, like: ns-3 version, operating system, compiler version, if you did modify the code, etc.

If you have issues with one of your programs, do not post snippets, attach the code to the message. Don't copy it in the body, attachments are there for a reason.

And then... patience. There are a few people checking the foams almost daily, but they're not paid for that. So, wait. If you don't get an answer, don't give up. Try to write to the code maintainers, and be polite, they're not paid either.

The last thing is a suggestion (a strong one). Be polite. If you're planning to ask somebody to give you his/her code, remember to ask 'em kindly. "Please"is just 6 keystrokes. Moreover, especially if the posts are old (let's say, more than 6 months ago), chances are that the original poster isn't following the forum anymore. Try writing them directly, it might work.

Have fun coding !

Sunday, March 2, 2014

Coding jump-start, is it good or bad ?

Jump-start: to start or restart rapidly or forcefully <advertising can jump–start a political campaign>, from merriam-webster.com.
Referred to coding, it's the practice to start coding with only a basic knowledge of the framework you're using.

At first it seems a very bad idea, isn't it ?
For real it's not a totally bad thing, at least if you remember some basic good practices. The problem is that more than often these "good practices" are totally forgot.

The good of jump-start coding

The benefit of coding jump-start is that you're forced to face the code early in the learning process. This is usually a very good thing, as coding isn't (only) about memorising a number of rules. It's primarily about becoming fluent in what you're doing.

In the very same way, if you want to learn a foreign language, the best thing to do is to learn some basics and then go abroad and be forced to speak in the foreign language. It will be hard at first, you'll make a lot of mistakes, but you'll learn. Learn or perish (sort of).

However, would you ever go abroad only mowing how to ask a bottle of water, some meal and a bed? Not really. You'll bring a dictionary, a pocket phrases book and your mobile phone (Internet translators).

In the same way, you can jump-start in the code, but you have first to learn where to find informations.

The bad of jump-start coding

Starting to code early has also a number of drawbacks. You'll be slower, as you'll have to search for a number of things, probably not well documented and maybe documented in some obscure place.

However the nastier thing is that you'll probably find extremely sub-optimal solutions, or even "solutions" that seem to work (they work for your case, but they don't in another).

Now, this is normal. You do one thing once. it takes ages and you do mistakes. You do it twice, and it's a bit better. Do it 10 times or more and it's easy (and it's working). It's called experience.

Jump-start coding or not ?

So, the bottom line is: yes, start coding as soon as you can, but follow the rules:

Learn by example. Look at the examples, understand them and learn. Copying an example isn't learning, it's copying.
The manual and the documentation are your primary sources, learn how to use them. Open-souce projects often have a bug repository (e.g., ns-3 Bugzilla) and an API automatically generated documentation (e.g., ns-3 Doxygen).
Don't rush. Only because you're (finally) writing some code, you're still learning.
Rush. Try to do something complex. If you stick to the easy things, everything will always seems easy (because it is easy). Only by doing something complex you'll learn for real. However...
Simplify your task: onion coding. Split your complex task into sub-tasks and complete them one by one. or add complexity little by little.

Note: rule 5 means that probably you'll have to use a lot of refactoring. That's ok, you're still learning. and it's part of the learning process.

And now... go coding.

PS: don't forget the documentation and the tests... more on this soon™

Tuesday, February 25, 2014

Network simulations and "realistic" results. The WSN case.

This post is not a rant, it's more a philosophical thing. I hope you don't mind.

Network simulators are widely used tools in academia (from teaching to research) and industry (from research to new technology pre-deployment evaluation). However, I found out that often the simulators are used without thinking about how close to reality the results will be. Leading to disaster (sometimes) and to useless research (more often).

There's a nice presentation from +Sally Floyd (http://www.icir.org/floyd/talks/WNS2-Oct06.pdf) discussing some interesting topics about what's a simulator goal should be. I cannot agree more: the simulator's user should first find out why he/she is using the tool, and then use the appropriate tool.

Now, this is often not done. More than often, a simulator is chosen for one of the following compelling reasons:

Everybody in the lab is using it (or, my company always uses it).
I already know it, why switch.
There's a model in the simulator that seems to suit my need.
Etc.

From these points, the most relevant is missing:

Will the results be useful to what I'm looking for ?

Let's take an example: Wireless Sensor Networks (WSN) and Internet of Things (IoT).

In order to simulate a WSN, you have to have a model, and the model must reflect an actual scenario.

L1/2, the PHY and MAC layers.

Usually WSNs use IEEE 802.15.4, but it's not the only choice. For example, Body Area Networks can use 802.15.6, or Bluetooth LE.

Even the same protocol (e.g., 802.15.4) have different variants. And for each one something changes. Packet framing, access methods, etc. 802.15.4 in 2.4GHz and in 800MHz are slightly different, and 802.15.4e is completely different.

L3, it's IP...

Nope, it's not "simply" IP. IETF is advocating the use of 6LoWPAN, ZigBee is using a different approach. Other (proprietary) systems use completely different things.

Routing

On this topic, everybody re-invented the wheel. RPL, RIME, LEACH, HERD, [Controlled] Flooding,

whatever. they're all different.

L4 and above protocols

You'd say that here things should go better... no.

UDP, TCP, HTTP, CoRE/CoAP, OMA variants, ZigBee ones, etc. Name a random one, chances are that somebody is using it.

And now the most astonishing one...

Channel models

It has been proved that simulating any wireless channel is challenging. The model, no matter how good it is, will never exactly capture all the things from a real system, like interference, scattering objects, etc.

To ad complexity to this, let's just say this: two devices, using exactly the same protocol, can have completely different performances. I personally found that 2 devices from 2 different vendors have drastically different performances. One can reach 15m, the other 50m.

This seems a minor detail, but when your scenario is "let's place X devices in the area", you'd like to know how many are need to ensure the network connectivity. Moreover the node's density is driven by the actual phenomena to be observed, not by how crappy their radio device is.

Summarizing: panic. You mis-model one of the above and your simulations will give you some results. False, like a 3$ coin.

However, i said I wasn't king to post a rant. So, here are the good news. What ns-3 can do for you.

Channel model: it can be easily changed. It's even possible to use experimental data (but you'll need the BER Vs distance).
PHY/MAC: Actually the lr-wpan model is being reviewed. It's implementing the Contiki's NULLMAC, but it's possible (not easy, but possible) to extend the model to mimic other MAC protocols. Mind that this is an extremely important point to make realistic simulations.
IP: IPv6 and 6LoWPAN are supported out-of-the-box.
Routing: bad news here. I'm working on an RPL implementation, but it's taking AGES. The protocol is very complex. And believe me when I say: actual OSes do not have complete implementations. They just have the bare minimum (and sometimes it's even bugged).
L4 and above: UDP is there, upper layers no. However, DCE could come to the rescue here. On the other hand, usually simulating exactly the application layers is out of scope.

I'd say that ns-3 is not yet ready for IoT WSN simulations out of the box, but we're close. Very close.

Have fun simulating, and don't forget to check the models before using them !

About the "help me finding my thesis" request

Today I want to talk about a strange phenomenon I've seen recently: students asking online about ideas for their thesis.

Nothing strange, you'd say, exchanging ideas is a very good thing. What puzzles me is the content of the requests.

Working at a University, I'm used to assign and follow masters or Ph.D thesis. When a student comes to me to ask for a thesis topic, we discuss a lot about the research topics we carry in our lab, along with the student's interests and background. Then we settle down on a suitable topic and I give the student a number of source material to study.

During the thesis work, we have also periodic meetings to verify the student's progress and how the (obvious) issues can be solved.

I expect the student to play an active role in his/her work, by suggesting thing I didn't think of, find new ways to solve issues, and so on. In order to do that, I (of course) need to know the topic he/she is working on.

And here comes the thing that is puzzling me.

I keep finding people asking the user groups (or by private e-mails) about a research topic. It's like if their tutors didn't discuss about possible topics with them, just outlining an extremely broad topic.
E.g., "find a topic related to MANETs, or VANETs" (Mobile Ad-Hoc Networks and Vehicular Ad-Hoc Networks, respectively).

Now, these topics are so broad-range that a tutor can not possibly follow all the research topics that could come up. An example for VANETs:

Intra-vehicular channel model.
Fast channel scanning/allocation.
Efficient resource management.
QoS support.
Routing (Location-based, semantic, cluster based, etc.).
Cluster forming.
IP[v6] and Mobile IPv6.
Traffic patterns.
Resource discovery algorithms.
Mobile cloud networking.
Mobile cloud applications.
Security (e.g., how to join a secure cluster).
Privacy in VANETs.

And that's just to name a few coming to my mind. I voluntarily left out the more specific topics related to the actual technology used, i.e., 802.11p, 802.15.x, LTE, D2D, etc., plus all the choices about protocols to be used in all the different parts of the network.

What will happen if the student comes up with an idea and his/her tutor is not actively involved in that research topic. Because let's be serious, nobody can know everything. The tutor will not be able to judge the idea difficulty and originality. And will not be able to understand what are the hidden issues in the proposed idea (the student might not have any clue either).

Result: a thesis that's too hard, or too simple. If they're lucky.

If they are unlucky, the thesis will have an extremely bad background, and the results will be plainly useless. In this case, the student will learn nothing. Not what's the scientific research method, not what's the real work method. Nothing at all, the thesis will be just paperwork.

Of course there may be lucky cases, where the student will come up with an interesting idea and the tutor is actually able to follow him/her. But I'm not confident that this is a normal case.

Mind, I'm not blaming the students to ask this kind of questions online (i.e., help me finding my thesis topic). I'm questioning their tutors.

Probably it's a matter of knowledge. I simply don't know how this process could work. Maybe I'm missing something. I hope I'm missing something.

If anybody can enlighten me, I'll be very happy.

About the "I'm not good at programming" type

Small rant today, all about the classical guy saying "I'm not good at programming" (may be a girl as well).

By now you should know, I do research stuff. Mostly my research is about networking protocols, that's why I develop for ns-3.

Now, the amount of people claiming (or thinking) to be a good researcher in this field, and to be terrible at programming (in any language) is impressive. Nothing wrong, you'd say... maybe, or maybe not.

The main problem is: if you're unable to think about the implementation, you'll do mistakes in designing. Simple as that.

The example comes from a paper I've read a few days ago. Some smart guys devised a fantastic method to implement an extremely efficient routing protocol. Fantastic results. Impressive, for real. BUT... but the proposed thing could not be implemented. They forgot that data have to be known before being used. And to know data, you have to transmit them, with the usual delays, loss probability and so on.

Example: "Let's assume that node A wants to transit something to node B. Node A knows the number of packets being received by node B". Good, Node A can NOT know that number. Not without asking Node B, which, in turn, make it pointless the assumption (because the two nodes are already communicating).

What's this? Poor understanding of how things works. Mind that the authors of that paper also did some simulations proving that their method was superior to the literature ones.

How they did those simulations? Magic? No. Many simulators allow you to do "tricks". If you don't develop the code in the right way, you can actually gain infinite omniscience in your nodes, and have a lot of (wrong) data. Good job.

Learn to program, and don't take shortcuts. Protocols are painful sometimes, but developing them forces you to think about how real systems work.

Monday, February 24, 2014

Open source and code sharing (2)

Second rant on this topic (I hope the last one... for now).

Let's make some simple assumptions:

You're developing something. It's experimental code about research stuff.
Some colleague politely asks "May I use your code? It would really benefit my research".

You are, of course, going to release the code as Open Source soon™, so what's the point of keeping it private?

Well, there are a number of reasons. Just to name a few:

The code is not ready. It contains bugs, or poorly modelled parts, and you don't want that other people could mistakenly take the results for good ones, while they're wrong.
You spent a LOT of time on that code, and you'd like to use it first.

Those are two compelling points. The fist one is, basically, originating from the incredible amount of dumb people in the world. They can take a simulator like ns-3, and assume that everything is bug-free. Then they launch a simulation, find some interesting data and they can publish them. All this without the doubt that the data are, indeed, the result of a bug rather than something real.
Don't underestimate dumb people, they can find a valid reason for about anything, even if it's unreal. I could even name (and shame) some scientific papers. And they are cited too (sadly not to point out that the authors are dumb).

The second point is all about your work. You did that code for a reason: to do some research and publish it. Isn't it? However publishing a paper isn't that easy. Sometimes papers get stuck, sometimes they're rejected because the reviewers are funny guys (like the one that asked why we didn't cite Shannon's paper... sure, let me cite also P.Artusi, I used it yesterday). Anyway, it's not easy or fast to publish, and papers are one of our major outcome (more rants on this in the future).
As a consequence, researchers are more keen on "opening" their software once some papers have been published. Before that... not very much.

Said so, what you'd do ? You'd give access to your code or not ?
Yes, you do. Because the people asking are from a well-known research centre, and because it could open new collaboration possibilities.

... and then they disappear. You told them that the code may be bugged. You told them that you really hope to have feedbacks, like bugs, code improvements and so on. You told them that you'd like to collaborate and share. Gone, like tears in rain.

And now ? What to do ?

Simple. You wait for the next big code change. You collect the bugs and you do NOT fix them. Not on the repository you gave them access to. Then you revoke the permissions and you keep working, Hoping that they'll get stuck in a terrible bug and that they'll loose their faces when they'll try to use your bugged code. Because bugs are there, and you know.

And the lesson is: when you use an Open Source project, remember to contribute to it. Anything, even a small thing, is the sign that you respect who worked on it before you. Do not take anything for granted, Open Source isn't "someone was dumb enough to give me his time". Those developers might be dumb (in your eyes), but they can be evil as well, and they do remember. Everything.

Sunday, February 23, 2014

Open source and code sharing

Open source software means to share the code with a community, seems logical... or not ?

I'll not start the usual utterly long post about the Better World Built on Open Source Code. Too many have done that.

I want to discuss about a bad habit I'm seeing more and more: assuming that since you're contributing to Open Source software, anyone can ask you the code you're working on with little o no "thanks".

And this leads to the question: why someone should contribute to Open Source software ?
The answer is not simple, and I'm neither a psychologist or a sociologist, so take the following with a grain of salt.
In my opinion the answers may be:

Self-esteem: being the author of a publicly-available software (maybe used by many) is a great reward.
Hobby/fun: the author is doing it by hobby, and he/she can't care less to get some money out of it.
Antagonism/idealism: the author is trying to **** the system, and wants to go against the "lobbies".
It's research stuff: the author is a researcher, and the code has been used to prove a theory or a model. Since the research is (usually) paid by other means, the code may be released to the public.

Whatever the reason is, there's one point in common. The author wants to be recognized as the author. Sharing doesn't means that the code is given for free. Implicitly, anybody using it is giving you credit for your work.

Now, what is the worst thing that can happen with Open Source software ?
The answer is simple: that the author doesn't receive proper credits. That's why the Copyrights are used. The various Open Source licenses are all about this: you may use the code, but you have to give proper credits.

But there's another point. Making the code better. Improve it. And this will be the topic of the next rant.

Welcome...

This is the fist post, and as all the "first" posts, it's just there to state why I open this blog.

Well, to make the long story short, I found out that:

I want to have a place to write what comes to my mind. Rants, mostly, but also useful stuff.
Sometimes some (I'd say about 1%) of the stuff I'll write may be useful to somebody.
Why not ?
I also wanted to test Google AdSense. Curiosity.

And now about the topics. I'm an Assistant Professor at University, and I am an ns-3 contributor and maintainer. As a consequence, I stumble on a lot of bugs and user's requests. Some of them are interesting, some are funny, some are... disappointing. I'll write about all of them.
However I also love dogs and cooking, so I may write about those as well. And, of course, I'll write about random stuff, because... why not ?

So, welcome to my blog. The updates will be strictly random. Because I'm not a reliable guy.

Have fun !