Maldives Vacation Report 2016

Just spent 9 days in Maldives doing nothing! So here’s a writeup and a bunch of photos.

“Magical”, “paradise”, “heaven” and other things, they say. So we decided to check it out for ourselves. Another factor was that after driving-heavy vacations in some previous years (e.g. USA or Iceland), the kids wanted to just do nothing for a bit. Sit in the water & snorkel, basically.

So off we went, all four of us. Notes in random order:

Picking where to go

This one’s tough. We don’t like resorts and hotels, so at first thought about going airbnb-style. Turns out, there isn’t much of that going on in Maldives; and some of the places that have airbnb listings up are actually just ye olde hotels or guesthouses anyway.

Lazy vacation being lazy, then it was “pick an island & resort” time. This primarily depends on your budget; from guesthouses at the low end to… well I guess there’s no limit how expensive it can go. “Very” would be an accurate description.

There are other factors, like whether you want to dive or snorkel (then look for diving spots & reef information), how much entertainment options you want, whether you’re bringing kids etc. etc.

What kinda sucks about many of the blogs on Maldives, is that they are written as if an honest impression from some traveller, only to find in a small print that hey, the resort or some travel agency covered their expenses. Apparently this “travel blogger” is an actual profession that people make money on; a subtle form of advertisement. Good for them, but makes you wonder how biased their blog posts are.

We wanted a smallish resort, that’s kinda “midrange” in terms of prices, has good reviews and has a decent house reef. Upon some searching, picked Gangehi (tripadvisor link) kinda randomly. Here are basic impressions:

  • Good: nice, small, clean, good reef, nice beach area for small kids.
  • Neutral: food was a mixed bag (some very good, some meh).

Going there

It feels like “hey it’s somewhat beyond Turkey, not that far away”, but it’s more like “Sri Lanka distance away”. For us that was a drive to Vilnius, 2.5hr flight to Istanbul, and 8hr flight to Male, and from there a ~30 minute seaplane flight to the resort.

Seeing the atols during the plane landing is very impressive, especially if you haven’t seen such a thing before (none of us had). They do not look like a real thing :)

Maldives is a whole bunch of separate tiny islands, so the only choice of travel is either by boat or seaplane. None of us flew that before either, so that was interesting too! Here it is, and here’s the resort’s “airport”. After this, even the airports we have here in Lithuania are ginormous by comparison:

Maldives with kids?!

This is supposedly a honeymoon destination, or something. Instead, we went about 14 years too late for that, and with two kids. Punks not dead! It’s fine; at least in our resort there weren’t that many honeymoon couples, actually. There was no special enternainment for kids, but hey, water that you can spend full day in, snorkeling and sand. And an iPad for when that gets boring (whole-family-fun this time was Smallworld 2).

There are some resorts that have special “kid activities” (not sure if we would have cared for that though), and I’m told there are others that explicitly do not allow kids. But overall, if your kids like water you should be good to go.

Maldives in July?!

July is very much a non-season in Maldives – it’s the rain season, and the temperature is colder by a whopping 2 degrees! This leaves it at 31C in the day, and 28C in the night. The horror! :)

The upside of going there now: fewer people around, and apparently prices somewhat lower. We lucked out in that almost all the rain was during the nights; over whole week got maybe 15 minutes of rain in the daytime.

Snorkeling

Now, none of us are divers. We have never snorkeled before either. Most of us (except Marta) can barely swim as well – e.g. my swimming skills are “I’m able to not drown” :) So this time we decided “ok let’s try snorkeling”, and diving (including learning how to do that) will be left for another time.

It’s pretty cool.

Apparently the current El Niño has contributed quite a lot to coral bleaching in Maldives; the same reef was quite a lot more colorful half a year ago.

We’re the ones that don’t have any sort of GoPro, and considered getting one before the trip for taking pictures. However, we don’t care about video at all, so went for Olympus TG-4 instead. RAW support and all that. The underwater and water splashing photos are from that camera.

The other photos on this post are either Canon EOS 70D with Canon 24-70mm f/2.8 L II lens or an iPhone 6.

What else is there to do?

Not much, actually :) Splash in the water and walk around:

Watch an occasional bird, stingray (here, cowtail stingray, pastinachus sephen) or shark (no photos of sharks, but there’s a ton of little sharks all around; they are not dangerous at all):

Build castles made of sand, that melt into the sea eventually. Watch hermit crabs drag their shells around.

Take pictures in the traditional “photo to post on facebook” style. Apparently this makes you look awesome, or something.

That thing with houses on top of water is a cute hack by the resorts. The amount of land available is extremely limited, so hey, let’s build houses on water and tell people they are awesome! :)

Take nighttime photos. This is almost equator, so the sun sets very early; by 7:30PM it’s already dark.

Just walk around, if you can call it that. This island was maybe 150 meters in diameter, so “the walks” are minutes in length.

Go on a boat trip to (try to) see dolphins. We actually saw them:

Another boat trip to see a local non-resort island (Mathiveri):

Oh, and the sunsets. Gotta take pictures of the sunsets:

And that’s about it. The rest of the time: reading, sleeping, doing nothing :)

Conclusion?

What would I do differently next time?

Spending 8-9 days in a single place is stretching it. The amount of photos above make it sound like there’s a lot of things to do, but you can do all that in two days. If you’re really prepared for a “do nothing” style vacation, then it’s fine. I’m not used to that (though a friend told me: “Aras, you just need to practice more! No one gets it from the 1st time!”). So I’d probably either do the whole thing shorter, or split it up in two different islands, plus perhaps a two-day guesthouse stay at a local (non-resort) island for a change. Apparently that even has a term, “island hopping”.

Would there be next time?

Not sure yet. It was very nice to see, once. Maybe some other time too – but very likely for next vacation or two we’ll go back to “travel & drive around & see things” style. But if we want another lazy vacation, then it’s a really good place… if your budget allows. This trip was slightly more expensive than our US roadtrip, for example.

So that’s it, until next time!


Solving DX9 Half-Pixel Offset

Summary: the Direct3D 9 “half pixel offset” problem that manages to annoy everyone can be solved in a single isolated place, robustly, and in a way where you don’t have to think about it ever again. Just add two instructions to all your vertex shaders, automatically.

…here I am wondering if the target audience for D3D9 related blog post in 2016 is more than 7 people in the world. Eh, whatever!

Background

Direct3D before version 10 had this pecularity called “half pixel offset”, where viewport coordinates are shifted by half a pixel compared to everyone else (OpenGL, D3D10+, Metal etc.). This causes various problems, particularly with image post-processing or UI rendering, but elsewhere too.

The official documentation ("Directly Mapping Texels to Pixels"), while being technically correct, is not exactly summarized into three easy bullet points.

The typical advice is various: “shift your quad vertex positions by half a pixel” or “shift texture coordinates by half a texel”, etc. Most of them talk almost exclusively about screenspace rendering for image processing or UI.

The problem with all that, is that this requires you to remember to do things in various little places. Your postprocessing code needs to be aware. Your UI needs to be aware. Your baking code needs to be aware. Some of your shaders need to be aware. When 20 places in your code need to remember to deal with this, you know you have a problem.

3D has half-pixel problem too!

While most of material on D3D9 half pixel offset talks about screen-space operations, the problem exists in 3D too! 3D objects are rendered slightly shifted compared to what happens on OpenGL, D3D10+ or Metal.

Here’s a crop of a scene, rendered in D3D9 vs D3D11:

And a crop of a crop, scaled up even more, D3D9 vs D3D11:

Root Cause and Solution

The root cause is that viewport is shifted by half a pixel compared to where we want it to be. Unfortunately we can’t fix it by changing all coordinates passed into SetViewport, shifting them by half a pixel (D3DVIEWPORT9 coordinate members are integers).

However, we have vertex shaders. And the vertex shaders output clip space position. We can adjust the clip space position, to shift everything by half a viewport pixel. Essentially we need to do this:

// clipPos is float4 that contains position output from vertex shader
// (POSITION/SV_Position semantic):
clipPos.xy += renderTargetInvSize.xy * clipPos.w;

That’s it. Nothing more to do. Do this in all your vertex shaders, setup shader constant that contains viewport size, and you are done.

I must stress that this is done across the board. Not only postprocessing or UI shaders. Everything. This fixes the 3D rasterizing mismatch, fixes postprocessing, fixes UI, etc.

Wait, why no one does this then?

Ha. Turns out, they do!

Maybe it’s common knowledge, and only I managed to be confused? Sorry about that then! Should have realized this years ago…

Solving This Automatically

The “add this line of HLSL code to all your shaders” is nice if you are writing or generating all the shader source yourself. But what if you don’t? (e.g. Unity falls into this camp; zillions of shaders already written out there)

Turns out, it’s not that hard to do this at D3D9 bytecode level. No HLSL shader code modifications needed. Right after you compile the HLSL code into D3D9 bytecode (via D3DCompile or fxc), just slightly modify it.

D3D9 bytecode is documented in MSDN, “Direct3D Shader Codes”.

I thought whether I should be doing something flexible/universal (parse “instructions” from bytecode, work on them, encode back into bytecode), or just write up minimal amount of code needed for this patching. Decided on the latter; with any luck D3D9 is nearing it’s end-of-life. It’s very unlikely that I will ever need more D3D9 bytecode manipulation. If in 5 years from now we’ll still need this code, I will be very sad!

The basic idea is:

  1. Find which register is “output position” (clearly defined in shader model 2.0; can be arbitrary register in shader model 3.0), let’s call this oPos.
  2. Find unused temporary register, let’s call this tmpPos.
  3. Replace all usages of oPos with tmpPos.
  4. Add mad oPos.xy, tmpPos.w, constFixup, tmpPos and mov oPos.zw, tmpPos at the end.

Here’s what it does to simple vertex shader:

vs_3_0           // unused temp register: r1
dcl_position v0
dcl_texcoord v1
dcl_texcoord o0.xy
dcl_texcoord1 o1.xyz
dcl_color o2
dcl_position o3  // o3 is position
pow r0.x, v1.x, v1.y
mul r0.xy, r0.x, v1
add o0.xy, r0.y, r0.x
add o1.xyz, c4, -v0
mul o2, c4, v0
dp4 o3.x, v0, c0 // -> dp4 r1.x, v0, c0
dp4 o3.y, v0, c1 // -> dp4 r1.y, v0, c1
dp4 o3.z, v0, c2 // -> dp4 r1.z, v0, c2
dp4 o3.w, v0, c3 // -> dp4 r1.w, v0, c3
                 // new: mad o3.xy, r1.w, c255, r1
                 // new: mov o3.zw, r1

Here’s the code in a gist.

At runtime, each time viewport is changed, set vertex shader constant (I picked c255) to contain (-1.0f/width, 1.0f/height, 0, 0).

That’s it!

Any downsides?

Not much :) The whole fixup needs shaders that:

  • Have an unused constant register. Majority of our shaders are shader model 3.0, and I haven’t seen vertex shaders that use all 32 temporary registers. If that is a problem, “find unused register” analysis could be made smarter, by looking for an unused register just in the place between earliest and latest position writes. I haven’t done that.
  • Have an unused constant register at some (easier if fixed) index. Base spec for both shader model 2.0 and 3.0 is that vertex shaders have 256 constant registers, so I just picked the last one (c255) to contain fixup data.
  • Have instruction slot space to put two more instructions. Again, shader model 3.0 has 512 instruction slot limit and it’s very unlikely it’s using more than 510.

Upsides!

Major ones:

  • No one ever needs to think about D3D9 half-pixel offset, ever, again.
  • 3D rasterization positions match exactly between D3D9 and everything else (D3D11, GL, Metal etc.).

Fixed up D3D9 vs D3D11. Matches now:

I ran all the graphics tests we have, inspected all the resulting differences, and compared the results with D3D11. Turns out, this revealed a few minor places where we got the half-pixel offset wrong in our shaders/code before. So additional advantages (all Unity specific):

  • Some cases of GrabPass were sampling in the middle of pixels, i.e. slightly blurred results. Matches DX11 now.
  • Some shadow acne artifacts slightly reduced; matches DX11 now.
  • Some cases of image postprocessing effects having a one pixel gap on objects that should have been touching edge of screen exactly, have been fixed. Matches DX11 now.

All this will probably go into Unity 5.5. Still haven’t decided whether it’s too invasive/risky change to put into 5.4 at this stage.


Backporting Fixes and Shuffling Branches

“Everyday I’m Shufflin’” – LMFAO

For past few months at work I’m operating this “graphics bugfixes” service. It’s a very simple, free (*) service that I’m doing to reduce overhead of doing “I have this tiny little fix there” changes. Aras-as-a-service, if you will. It works like explained in the image on the right.

(*) Where’s the catch? It’s free, so I’ll get a million people to use it, and then it’s a raging success! My plan is perfect.

Backporting vs Forwardporting Fixes

We pretty much always work on three releases at once: the “next” one (“trunk”, terminology back from Subversion source control days), the “current” one and the “previous” one. Right now these are:

  • trunk: will become Unity 5.5 sometime later this year (see roadmap).
  • 5.4: at the moment in fairly late beta, stabilization/polish.
  • 5.3.x: initially released end of 2015, currently “long term support” release that will get fixes for many months.

Often fixes need to go into all three releases, sometimes with small adjustments. About the only workflow that has worked reliably here is: “make the fix in the latest release, backport to earlier as needed”. In this case, make the fix on trunk-based code, backport to 5.4 if needed, and to 5.3 if needed.

The alternative could be making the fix on 5.3, and forward porting to 5.4 & trunk. This we do sometimes, particularly for “zomg everything’s on fire, gotta fix asap!” type of fixes. The risk with this, is that it’s easy to “lose” a fix in the future releases. Even with best intentions, some fixes will be forgotten to be forward-ported, and then it’s embarrassing for everyone.

Shufflin’ 3 branches in your head can get confusing, doubly so if you’re also trying to get something else done. Here’s what I do.

1: Write Down Everything

When making a rollup pull request of fixes to trunk, write down everything in the PR description.

  • List of all fixes, with human-speak sentence of each (i.e. what would go into release notes). Sometimes a summary of commit message already has that, sometimes it does not. In the latter case, look at the fix and describe in simple words what it does (preferably from user’s POV).
  • Separate the fixes that don’t need backporting, from the ones that need to go into 5.4 too, and the ones that need to go into both 5.4 and 5.3.
  • Write down who made the fix, which bug case numbers it solves, and which code commits contain the fix (sometimes more than one!). The commits are useful later when doing actual backports; easier than trying to fish them out of the whole branch.

Here’s a fairly small bugfix pull request iteration with all the above:

Nice and clean! However, some bugfix batches do end up quite messy; here’s the one that was quite involved. Too many fixes, too large fixes, too many code changes, hard to review etc.:

2: Do Actual Backporting

We’re using Mercurial, so this is mostly grafting commits between branches. This is where having commit hashes written down right next to fixes is useful.

Several situations can result when grafting things back:

  • All fine! No conflicts no nothing. Good, move on.
  • Turns out, someone already backported this (communication, it’s hard!). Good, move on.
  • Easy conflicts. Look at the code, fix if trivial enough and you understand it.
  • Complex conflicts. Talk with author of original fix; it’s their job to either backport manually now, or advise you how to fix the conflict.

As for actual repository organization, I have three clones of the codebase on my machine: trunk, 5.4, 5.3. Was using a single repository and switching branches before, but it takes too long to do all the preparation and building, particularly when switching between branches that are too far away from each other. Our repository size is fairly big, so hg share extension is useful - make trunk the “master” repository, and the other ones just working copies that share the same innards of a .hg folder. SSDs appreciate 20GB savings!

3: Review After Backporting

After all the grafting, create pull requests for 5.4 and 5.3, scan through the changes to make sure everything got through fine. Paste relevant bits into PR description for other reviewers:

And then you’re left with 3 or so pull requests, each against a corresponding release. In our case, this means potentially adding more reviewers for sanity checking, running builds and tests on the build farm, and finally merging them into actual release once everything is done. Dance time!

This is all.


10 Years at Unity

Turns out, I started working on this “Unity” thing exactly 10 years ago. I wrote the backstory in “2 years later” and “4 years later” posts, so not worth repeating it here.

A lot of things have happened over these 10 years, some of which are quite an experience.

Seeing the company go through various stages, from just 4 of us back then to, I dunno, 750 amazing people by now? is super interesting. You get to experience the joys & pains of growth, the challenges and opportunities allowed by that and so on.

Seeing all the amazing games made with Unity is extremely motivating. Being a part this super-popular engine that everyone loves to hate is somewhat less motivating, but hey let’s not focus on that today :)

Having my tiny contributions in all releases from Unity 1.2.2 to (at the time of writing) 5.3.1 and 5.4 beta feels good too!

What now? or “hey, what happened to Optimizing Unity Renderer posts?”

Last year I did several “Optimizing Unity Renderer” posts (part 1, part 2, part 3) and then, when things were about to get interesting, I stopped. Wat happend?

Well, I stopped working on them optimizations; the multi-threaded rendering and other optimizations are done by people who are way better at it than I am (@maverikou, @n3rvus, @joeldevahl and some others who aren’t on twitterverse).

So what I am doing then?

Since mid-2015 I’ve moved into kinda-maybe-a-lead-but-not-quite position. Perhaps that’s better characterized as “all seeing evil eye” or maybe “that guy who asks why A is done but B is not”. I was the “graphics lead” a number of years ago, until I decided that I should just be coding instead. Well, now I’m back to “you don’t just code” state.

In practice I do several things for past 6 months:

  • Reviewing a lot of code, or the “all seeing eye” bit. I’ve already been doing quite a lot of that, but with large amount of new graphics hires in 2015 the amount of graphics-related code changes has gone up massively.
  • Part of a small team that does overall “graphics vision”, prioretization, planning, roadmapping and stuff. The “why A is done when B should be done instead” bit. This also means doing job interviews, looking into which areas are understaffed, onboarding new hires etc.
  • Bugfixing, bugfixing and more bugfixing. It’s not a secret that stability of Unity could be improved. Or that “Unity does not do what I think it should do” (which very often is not technically “bugs”, but it feels like that from the user’s POV) happens a lot.
  • Improving workflow for other graphics developers internally. For example trying to reduce the overhead of our graphics tests and so on.
  • Work on some stuff or write some documentation when time is left from the above. Not much of actual coding done so far, largest items I remember are some work on frame debugger improvements (in 5.3), texture arrays / CopyTexture (in 5.4 beta) and a bunch of smaller items.

For the foreseeable future I think I’ll continue doing the above.

By now we do have quite a lot of people to work on graphics improvements; my own personal goal is that by mid-2016 there will be way less internet complaints along the lines of “unity is shit”. So, less regressions, less bugs, more stability, more in-depth documentation etc.

Wish me luck!


Careful With That STL map insert, Eugene

So we had this pattern in some of our code. Some sort of “device/API specific objects” need to be created out of simple “descriptor/key” structures. Think D3D11 rasterizer state or Metal pipeline state, or something similar to them.

Most of that code looked something like this (names changed and simplified):

// m_States is std::map<StateDesc, DeviceState>

const DeviceState* GfxDevice::CreateState(const StateDesc& key)
{
	// insert default state (will do nothing if key already there)
	std::pair<CachedStates::iterator, bool> res = m_States.insert(std::make_pair(key, DeviceState()));
	if (res.second)
	{
		// state was not there yet, so actually create it
		DeviceState& state = res.first->second;
		// fill/create state out of key
	}
	// return already existing or just created state
	return &res.first->second;
}

Now, past the initial initialization/loading, absolute majority of CreateState calls will just return already created states.

StateDesc and DeviceState are simple structs with just plain old data in them; they can be created on the stack and copied around fairly well.

What’s the performance of the code above?

It is O(logN) complexity based on how many states are created in total, that’s a given (std::map is a tree, usually implemented as a red-black tree; lookups are logarithmic complexity). Let’s say that’s not a problem, we can live with logN complexity there.

Yes, STL maps are not quite friendly for the CPU cache, since all the nodes of a tree are separately allocated objects, which could be all over the place in memory. Typical C++ answer is “use a special allocator”. Let’s say we have that too; all these maps use a nice “STL map” allocator that’s designed for fixed allocation size of a node and they are all mostly friendly packed in memory. Yes the nodes have pointers which take up space etc., but let’s say that’s ok in our case too.

In the common case of “state is already created, we just return it from the map”, besides the find cost, are there any other concerns?

Turns out… this code allocates memory. Always (*). And in the major case of state already being in the map, frees the just-allocated memory too, right there.

“bbbut… why?! how?”

(*) not necessarily always, but at least in some popular STL implementations it does.

Turns out, quite some STL implementations have map.insert written in this way:

node = allocateAndInitializeNode(key, value);
insertNodeIfNoKeyInMap(node);
if (didNotInsert)
	destroyAndFreeNode(node);

So in terms of memory allocations, calling map.insert with a key that already exists is more costly (incurs allocation+free). Why?! I have no idea.

I’ve tested with several things I had around.

STLs that always allocate:

Visual C++ 2015 Update 1:

_Nodeptr _Newnode = this->_Buynode(_STD forward<_Valty>(_Val));
return (_Insert_nohint(false, this->_Myval(_Newnode), _Newnode));

(_Buynode allocates, _Insert_nohint at end frees if not inserted).

Same behaviour in Visual C++ 2010.

Xcode 7.0.1 default libc++:

__node_holder __h = __construct_node(_VSTD::forward<_Vp>(__v));
pair<iterator, bool> __r = __node_insert_unique(__h.get());
if (__r.second)
    __h.release();
return __r;

STLs that only allocate when need to insert:

These implementations first do a key lookup and return if found, and only if not found yet then allocate the tree node and insert it.

Xcode 7.0.1 with (legacy?) libstdc++.

EA’s EASTL. See red_black_tree.h.

@msinilo’s RDESTL. See rb_tree.h.

Conclusion?

STL is hard. Hidden differences between platforms like that can bite you. Or as @maverikou said, “LOL. this calls for a new emoji”.

In this particular case, a helper function that manually does a search, and only insert if needed would help things. Using a lower_bound + insert with iterator “trick” to avoid second O(logN) search on insert might be useful. See this answer on stack overflow.

Curiously enough, on that (and other similar) SO threads other answers are along the lines of “for simple key/value types, just calling insert will be as efficient”. Ha. Haha.