You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -20,7 +22,7 @@ The [Go Cascadia package](https://github.com/andybalholm/cascadia) implements CS
20
22
21
23
## Usage
22
24
23
-
####$ {{exec "cascadia" | color "sh"}}
25
+
### $ {{exec "cascadia" | color "sh"}}
24
26
25
27
Its output has two modes, _none-block selection mode_ and _block selection mode_, depending on whether the `--piece` parameter is given on the command line or not.
26
28
@@ -42,9 +44,7 @@ In summary,
42
44
- The block selection mode will output HTML as text in a `tsv`/`csv` table form by default
43
45
* if the `--piece` selection is prefixed with `RAW:`, then that specific block selection will output in HTML instead. See the following for details.
44
46
45
-
## Examples
46
-
47
-
### None-block selection mode
47
+
### Examples
48
48
49
49
All the three `-i -o -c` options are required. By default it reads from `stdin` and output to `stdout`:
50
50
@@ -104,185 +104,13 @@ $ cat /tmp/out.html
104
104
</body>
105
105
```
106
106
107
-
For more on using the `--style` option, check out ["adding styles"](https://github.com/suntong/cascadia/wiki/Adding-styles).
108
-
109
-
#### Multi-selection
110
-
111
-
Of course, any number of selections are allowed (provided out of box from the CSS selection "`,`" syntax):
Or, to make the multi-selection explicit on cli, emphasizing selecting being from different parts using different selectors, one can provide multiple `--css` on the command line. E.g.,
121
-
122
-
cascadia -o -i http://www.iciba.com/conformity -c 'div.js-base-info > div > div > div.in-base-top.clearfix' -c 'div.js-base-info > div > div > ul' -c 'div.js-base-info > div > div > li' -c 'div.info-article.article-tab'
123
-
124
-
It'll construct the return from all four `-c` CSS selectors.
125
-
126
-
It has the same effect as using the "`,`" syntax, but
127
-
128
-
- The CSS selectors are provided explicitly with multiple `--css` parameters.
129
-
- The "`,`" syntax will return according to the order the selections occur in source, while
130
-
- The multiple `--css` will return according to the order the `--css` parameters.
131
-
132
-
### Block selection mode
133
-
134
-
First, as the none-block selection mode will output the selection as HTML _source_, so if you want HTML _text_ instead, then you can make use of the block selection mode.
135
-
136
-
```sh
137
-
$ echo'<div class="container"><p align="justify"><b>Name: </b>John Doe</p></div>'| tee /tmp/cascadia.xml | cascadia -i -o -c 'div > p'
1. Onedrive is slow on Linux but fast with a ?Windows? user-agent (2016) microsoft.com
161
-
2. Starting today, users of Firefox can also enjoy Netflix on Linux netflix.com
162
-
3. Research Debt distill.pub
163
-
...
164
-
27. USPS Informed Delivery ? Digital Images of Front of Mailpieces usps.com
165
-
28. Performance bugs ? the dark matter of programming bugs forwardscattering.org
166
-
29. Most items of clothing have complicated international journeys bbc.co.uk
167
-
30. High-performance employees need quieter work spaces qz.com
168
-
```
169
-
170
-
It's poor man's scrapper tool if text are the only thing needed. For scrapping beyond text, then just go one step further, to use [andrew-d/goscrape](https://github.com/andrew-d/goscrape) (or my [goscrape](https://github.com/suntong/goscrape) instead, which has some enhancements to it).
171
-
172
-
Again, if text are the only thing needed, then `cascadia` might be already enough. Here is how to scrap Hacker News _across several pages_:
1. Starting today, users of Firefox can also enjoy Netflix on Linux netflix.com
178
-
2. Onedrive is slow on Linux but fast with a ?Windows? user-agent (2016) microsoft.com
179
-
3. Research Debt distill.pub
180
-
...
181
-
27. Yes I Still Want to Be Doing This at 56 (2012) thecodist.com
182
-
28. Performance bugs ? the dark matter of programming bugs forwardscattering.org
183
-
29. USPS Informed Delivery ? Digital Images of Front of Mailpieces usps.com
184
-
30. High-performance employees need quieter work spaces qz.com
185
-
31. Most items of clothing have complicated international journeys bbc.co.uk
186
-
32. Telstra?s Gigabit Class LTE Network cellularinsights.com
187
-
...
188
-
58. The New Laptop Ban Adds to Travelers' Lack of Privacy and Security eff.org
189
-
59. QEMU: user-to-root privesc inside VM via bad translation caching chromium.org
190
-
60. Startups that debuted at Y Combinator W17 Demo Day 2 techcrunch.com
191
-
61. The Cracking Monolith: Forces That Call for Microservices semaphoreci.com
192
-
62. Amsterdam Airport Launches API Platform schiphol.nl
193
-
...
194
-
88. Founder Stories: Leah Culver of Breaker (YC W17) ycombinator.com
195
-
89. Find out what you, or someone on your team, did on the last working day github.com
196
-
90. PSD2 ? a directive that will change banking in Europe evry.com
197
-
```
198
-
199
-
By default it uses tab `\t` as fields delimiter, so the output is in `.tsv` format. To change to `.csv`, add `-d ,` to the command line.
200
-
201
-
202
-
#### Twitter Search
203
-
204
-
Block selection mode is poor man's web scrapping tool, and it is very simple to use. Here is another _practical_ example -- Twitter searching. We all know that you have to [pay for the Twitter Search API and it _only serves Tweets from the past week_](https://dev.twitter.com/rest/public/search). With `cascadia`, you can search the tweets for free, and get the latest content as well.
205
-
206
-
Here is how I watch for Toronto/GTA's Gas Price Alert, _without getting all other tweets_ from him:
Gas Price Alert #Toronto #GTA #Hamilton #Ottawa #LdnOnt #Barrie #Kitchener #Niagara #Windsor N/C Tues and to a 2ct/l HIKE gor Wednesday
214
-
215
-
Jul 6
216
-
Gas Price Alert #Toronto #GTA #LdnOnt #Hamilton #Ottawa #Barrie #KW to see a 1 ct/l drop @ for Friday July 7
217
-
218
-
May 30
219
-
Gas Price Alert #Toronto #GTA #Ottawa #LdnOnt #Hamilton #KW #Barrie #Windsor prices won't change Wednesday but will DROP 1 ct/l Thursday
220
-
221
-
May 15
222
-
Gas Price Alert #Toronto #GTA #Barrie #Hamilton #LdnOnt #Ottawa #KW #Windsor NO CHANGE @ except gas bar shenanigans for Tues & Wednesday
223
-
224
-
Mar 7
225
-
Gas Price Alert #Toronto #GTHA #LdnOnt #Ottawa #Barrie #KW #Windsor to see a 1 cent a litre HIKE Wed March 8 (to 107.9 in the #GTA)
226
-
227
-
```
228
-
229
-
230
-
### Reconstruct the separated pages
231
-
232
-
Many web sites annoyingly separated one file into several small pieces so that they can show it to you in different web pages, with different ads. However, I'd like to view them in one page and no ads. Or, at least that is what I'd been hoping for all the time, but I didn't have an easy way of doing it until now, with `cascadia`.
233
-
234
-
235
-
With `cascadia` then no more programming is necessary. All we need to do now is to pass on some command line parameters, and the magic will happen. There are so many such sites that break thing into several small pieces, the following two are those I just did the other day.
236
-
237
-
The first one is separated across over 23 pages! Twenty-three! I would just give up if I don't have `cascadia`, but with it, it is so simple:
The [first page is here](http://www.chinadmd.com/file/prrxtuivvxsxxwwaexuuwovp_1.html), and [all 23-pages are collected here](https://docs.google.com/document/d/1HkJ2oxvRSvoaNXl0n3t-uGhT5Dd08cvDbP9tB9Dmy8Q/preview). I collect them as plain text because the HTML were just wrapping around the plain text, thus no need HTML, plain text is good enough.
243
-
244
-
Collecting as HTML is no trouble either. Here is another example:
The [fifth page is here](http://www.shangxueedu.com/shuxue/ksdg/20170113_162_5.html), and [all pages are collected here](https://docs.google.com/document/d/1StFwP7kChHiGsL-hm3tnY29bsBRQWCU7xdhu2shsGcg/preview). Please check them out.
249
-
250
-
## More On CSS Selector
251
-
252
-
I'm not an expert on CSS Selector at all, but the following resources are what I found most helpful to me.
253
-
254
-
-[CSS Selectors Cheat Sheet](http://butlerccwebdev.net/support/css-selectors-cheatsheet.html) I think It's very good, because it's usage oriented and very practical, i.e., it arranges the Selectors according to their purposes. If that's too dry for you, check out
255
-
-[The 30 CSS Selectors You Must Memorize](http://code.tutsplus.com/tutorials/the-30-css-selectors-you-must-memorize--net-16048) It only lists those selectors that are important, but it gives concrete examples and explanations
256
-
-[CSS Selector Reference](http://www.w3schools.com/cssref/css_selectors.asp) from w3schools. This is the one I most often refer to, because the list is comprehensive, and there is also an online [CSS Selector Tester](http://www.w3schools.com/cssref/trysel.asp) that really helped me learn and understand
257
-
258
-
## Download/Install
259
-
260
-
261
-
### Download binaries
262
-
263
-
- The latest binary executables are available right under the github release page
as the result of the Continuous-Integration process.
266
-
- I.e., they are built during every git tagged push, automatically by [GitHub Actions](https://github.com/features/actions), right from the source code, truely WYSIWYG.
267
-
- The `.deb`, `.rpm` and `.apk` packages are readily available, as well as the executables for other Linux and Windows as well.
268
-
- Pick & choose the binary executable that suits your OS and its architecture. E.g., for Linux, it would most probably be the `{{.Name}}_ver_linux_amd64.tar.gz` file.
269
-
- Unzip it and put the executable somewhere in the PATH, after downloading it.
270
-
271
-
272
-
### Install Source
273
-
274
-
To install the source code instead:
275
-
276
-
```
277
-
go get github.com/suntong/{{.Name}}
278
-
```
279
-
280
-
## Author(s) & Contributor(s)
281
-
282
-
Tong SUN
283
-

107
+
- For more on using the `--style` option, check out ["adding styles"](https://github.com/suntong/cascadia/wiki/Adding-styles).
108
+
- For more examples, check out the [wiki](https://github.com/suntong/cascadia/wiki/), which includes but not limits to,
_Powered by_[**WireFrame**](https://github.com/go-easygen/wireframe), the _one-stop wire-framing solution_ for Go cli based projects, from start to deploy.
0 commit comments