"SkeletonNet: Shape Pixel to Skeleton Pixel" paper review
arxiv link
my comments
- very similar to unet, with some modifications to network architecture:
- during downsampling convolutions, paddings are applied so no minor dimension reductions occur. This allows downsampling results to be directly concatenated with same-level upsampling results.
- New concept called ‘side layers’ are introduced. Each level’s output tensor which have difference shapes, will be processed to a tensor with the same width/height of the final output. This is called the side layer. Each side layers will be merged and then combined with the ‘normal’ down-upsampled output tensor to produce the final output. This will allow deep level output tensors to have a more direct relationship with the final output which can be interpreted as low-resolution tensor’s information becoming more influential.